当前位置: 首页 > 工具软件 > Hunt Library > 使用案例 >

Detours (api hook)library介绍

盛琪
2023-12-01
++++++++++++++++++++++++
In-line function hooking (see especially the Detours library from Hunt/Brubacher)
++++++++++++++++++++++++

Detours library (using it raw or unmodified): 

 - user-level hooking only

   why? because what detours lib does is overwrite the first 5 bytes of
   the hooked function ... and if the hooked function lives in kernel
   space, then we'll get a fatal mem error when we try to overwrite 
   these bytes

   --> confirm : c'est vrai. also the addys of kernel syscalls not
       accessible in same way that addys of win32 api are. so can't use
       existing framework to trivially obtain kernel function addys,
       i.e. those found in the SSDT. Even forgetting for a moment that
       operating as a userland program we wouldn't have sufficient
       permissions to change the permissions on such memory regions in the
       way that is done here for detours.

   ==> in summary, not trivial to adapt detours to work for kernel syscall
       hooking, is not clear that it would be better to adapt detours vs. to
       start from scratch for such a goal.

 - also check on whether detours is an NT/XP/2k/2003 only soln or whether
   also works on '9x ===> seems not.

   - works for all x86 versions of NT, Win2K, XP

   "However, under Windows 95, Windows 98, and Windows ME, the DetourFunction*
    APIs do not work unless the program is running under a debugger (the
    process was created with the DEBUG_PROCESS flag on the call to the
    CreateProcess* APIs)."

    --> limitation derives from two sources: 

	(1) non-support of Win '9x (including Win ME) for 
	    CreateRemoteThread(...) is why injdll doesn't work for Win '9x

	(2) the fact that shared virtual memory (which goes from 2 GB to 3 GB 
	in Win '9x) is not copy-on-write. So VirtualProtect, VirtualProtectEx 
	don't work on memory regions w/in that range. 

	System DLLs are mapped into that area. So we can't (trivially) change 
	the permission on memory regions containing win32 api functions... 
	which means we can't overwrite the first five bytes of such functions 
	to cause control to transfer to our detour function.

	==> note that if the process is created with the DEBUG_PROCESS
	    flag, DLLs *are* mapped with copy-on-write protection...
	   
        ==> so does VirtualProtect{Ex} still fail in such a case?

	    if so, then even if we modify "withdll.exe" to create the
	    process with the DEBUG_PROCESS flag, we still can't use
	    detours in its present form to hook Win '9x

            if not, then GetProcAddress(...) appears to operate
            differently for Win '9x apps run in Debug mode, i.e. it
            returns a debug thunk address, not the actual address

	    - so, would take some work to adapt detours to follow the
	      debug thunk to the actual code and overwrite that there?

       Appears that if we ONLY use "withdll" with our target executables
       and use -d:traceapi and change <detours/src/creatwith.cpp> function
       DetourCreateProcessWithDll{A,W} to create the process with the
       DEBUG_PROCESS creation flag, that hooking will work on Win '9x.

       ==> Maybe there's still a hangup which maybe has to do with 
	   the fact that FlushInstructionCache(...) isn't supported
	   for Win '9x or maybe it works correctly transparently.

    ======================================================================
    So I think the deal here is that the problem comes in overwriting the
    first 5 bytes of target functions

    Target functions by and large live in DLLs

    DLLs live in shared virtual memory

    For Win '9x, shared virtual memory is from 2 GB to 3 GB

    And the function which changes the permissions on memory SO THAT 5
    bytes of it can be overwritten DOESNT work for Win '9x for shared
    virtual memory

     - VirtualProtect, VirtualProtectEx for Win '9x (and ME) don't work
       for shared virtual mem (2GB to 3GB)

     - system DLLs are loaded into shared virtual mem

     - so if we want to change the code in one such DLL's function (which
       presumably also lives in that 2 GB to 3 GB space) then we'd need
       to first change the mem permissions on that area to READ/WRITE/EXEC

     - for Win '9x we cannot make that change via VirtualProtect nor
       VirtualProtectEx for any mem region w/in 2 GB to 3 GB

    I do believe that the "withdll" functionality still works... to inject
    a DLL into a user process, however "injdll" -- which injects a dll 
    into an *existing* user process -- would NOT work since that fxnality
    relies on CreateRemoteThread which isn't supported for Win '9x

    "While Windows NT, Windows 2000 and Windows XP always map DLLs into 
     processes with copy-on-write mapping (which Detours needs in order 
     to patch the binary image), Windows 95, Windows 98, and Windows ME 
     only map DLLs with copy-on-write if the process was started with 
     the DEBUG_PROCESS flag on the call to CreateProcess." [README]

    "Windows 95 doesn't implement copy-on-write in the operating system. 
     With copy-on-write, the operating system will share a common code 
     page in memory, but when a process writes to that memory, the memory 
     is copied so that the individual process gets its own copy that will 
     not interfere with any other process. In the Windows 95 architecture, 
     any memory that is above the 2GB line is shared among all processes. 
     If one process were to write a breakpoint to this shared memory area 
     without the copy-on-write, the breakpoint would apply to all processes, 
     not just the one being debugged."
    ======================================================================

    CreateRemoteThread(...) not supported for Win '9x

     - only supported for 2k, nt, xp

     - a means to injecting a DLL into a targeted process

       -- why is this necessary?

       -- absent this injection, you can't force a process to call your
          functions or, if it does, to be able to resolve those functions

     - we want to force a process to call LoadLibrary(...) with our DLL as the arg

     HANDLE CreateRemoteThread( HANDLE hProcess, // IN
				LPSECURITY_ATTRIBUTES lpThreadAttributes, // IN
				SIZE_T dwStackSize, // IN
				LPTHREAD_START_ROUTINE lpStartAddress, // IN
				LPVOID lpParameter, // IN
				DWORD dwCreationFlags, // IN
				LPDWORD lpThreadId // OUT
			      );
   
    // hProcess : a handle to the process in which this thread is to be created
       --> that handle must have the following access rights :
           (1) PROCESS_CREATE_THREAD
	   (2) PROCESS_QUERY_INFORMATION
	   (3) PROCESS_VM_OPERATION, PROCESS_VM_READ, PROCESS_VM_WRITE

    // lpThreadAttributes : pointer to security attributes of new thread
       --> specifies security descriptor for new thread
       --> if NULL, thread gets default security descriptor and the
	   returned thread handle canNOT be inherited

    // dwStackSize : initial size of the new thread's stack in bytes
       --> if 0, uses default size for the executable

    // lpStartAddress : pointer to application-defined function to be
    // executed by the thread; represents starting address of the thread
    // in the remote process

       --> ThreadProc function : is an application-defined function
           - serves as starting address for a thread

       DWORD WINAPI ThreadProc( LPVOID lpParameter );

       --> lpParameter : thread data passed to the function

    // lpParameter : pointer to var to be passed to thread function

    // dwCreationFlags : flags that control creation of the thread
       --> if 0, thread runs immediately after created

    // lpThreadId : pointer to a var that receives the thread ID
       --> if NULL, thread ID is not returned

 - So how do we use this?

   /* ------------------------------------------------------------
    * So we write our own version of TheadProc which essentially
    * calls LoadLibrary on the string provided
    * ------------------------------------------------------------ */
   DWORD WINAPI ThreadProc( LPVOID lpParameter ) {

      HMODULE targLib = LoadLibrary( lpParameter );

      return targLib;
   }   

   void main( ) {

      HANDLE hProcessForHooking = ;

      hThread = CreateRemoteThread( hProcessForHooking,
				    NULL,   // thread attrs
				    0,      // stack size
				    ThreadProc, // pointer to fxn to execute
				    "C://HookTool.dll", // argument to that fxn
				    0,      // creation flags
				    NULL ); // thread ID wont be returned
  }

 - so, the deal with detours is that there are a couple different
   functionalities on which other functionalities build

   +++++++++++++++++++++++++++++++++++++++++++++++++++++
   withdll : is defined in <detours/samples/withdll.cpp>
   +++++++++++++++++++++++++++++++++++++++++++++++++++++
   e.g. usage : withdll -d:traceapi.dll myexe.exe

    this will :

     (1) create a process with the specified app name and (optional) args

	 - this process is created with the suspend flag so that it is
	   initially suspended

         - this is done via : <detours/samples/creatwith.cpp> function :

	   P = DetourCreateProcessWithDll{A,W}

    (2) then <detours/src/creatwith.cpp> function : 

	InjectLibrary( P.hProcess,
		       P.hThread,
		       GetProcessAddress(
		         GetModuleHandle{A,W}(kernel32.dll),
			 LoadLibrary{A,W} ),
		       traceapi.dll,
		       strlen(traceapi.dll) + 1 );

        which :

	 (a) suspends the thread

	 (b) gets the contents of the control registers (ESP, EIP, EBP, ...)

	   -- which includes current stack pointer (ESP)
	   -- sets nCodeBase = ESP - { space for our assembly code + 
				       space for our args }

	      ==> we're going to write some assembly code and this is the
		  address (within the addy space of the given process) where
		  that code will begin (and so execution should begin)

           -- will create a buffer with assembly code instructions which will :

	      (1) PUSH "your_dll_name" onto the stack
	      (2) CALL LoadLibrary (where (1) is arg to that call)
	      (3) restore the EAX, EBX, ..., ESI, EDI, EBP, ESP, ...
		  values to what they are in (b)
              (4) JMP <to original code start, EIP from (b)>

	   -- then makes stack pointer point 4 below

	   -- and instruction pointer point to where your code will be
              written to (nCodeBase)

	 - changes permissions to read/write nCodeBase

	 - writes starting at nCodeBase with above assembly code (which will
	   cause app to LoadLibrary( yourdll ) then restore the registers
	   to their current contents (before that call) then return to
	   the code they were originally going to execute)

         - then calls FlushInstructionCache(...) to make sure that this
	   new code (starting at nCodeBase) overwrites any existing code 
	   in memory for this process

	 - then sets the thread context so that the new ESP and EIP will take hold

	 - then resumes the thread's execution

	 ==> basically inserts a LoadLibrary(...) call for an arbitrary
	     DLL (specified by you via the command line) into the process's
	     code so that this LoadLibrary is done before the process begins
	     executing then the process returns to exec'ing its normal /
	     original code.

    --> so the 64k question is : is this fxnality supported on Win '9x?

	- well, FlushInstructionCache in Win '9x has no effect

	- for Win '9x, VirtualProtectEx cannot be used on any mem region
	  in shared virtual address space (0x8000000 - 0xbfffffff)
	  -- which is from 2 GB to 3 GB in the virtual addy space
	  -- as noted, this region is shared between processes
	  -- system DLLs are loaded here, also memory mapped files are
	     mapped here

        ==> So in this case the memory whose protection bits we want to
	    change lives on the user stack, which is probably somewhere in
	    the user virtual addy space (from 4 MB to 2 GB)

            - so we should be able to call VirtualProtectEx on that area

	    - and we should be allowed to execute code that lives on the stack
	      (in that location that we've just written w/our assembly code)

            - maybe the inability to call FlushInstructionCache to an
              effect is a deal breaker but, if not, seems that this
              functionality should hold for the win '9x model

   +++++++++++++++++++++++++++++++++++++++++++++++++++
   injdll : is defined in <detours/samples/injdll.cpp>
   +++++++++++++++++++++++++++++++++++++++++++++++++++
   e.g. usage : injdll -p:<pid> -d:traceapi.dll

   - this injects a DLL into an already-executing process (the PID of
     which is specified above as a command-line arg to this program)

   - this opens the specified process

   - then calls DetourContinueProcessWithDll{A,W} which is defined in 

     <detours/src/creatwith.cpp>

     - and which does :

       calls InjectLibraryOld (also in <detours/src/creatwith.cpp>)

       which in turn calls CreateRemoteThread to inject the provided dll
       (from the command line, e.g. "traceapi.dll" from above)

    ==> This (injdll) is the functionality that requires support for
	CreateRemoteThread and so is NOT supported for Win '9x
	except if the original process (which we are attempting to inject
	a DLL into was created in DEBUG mode, which is highly unlikely)

    - basically what this does is :

      (1) open specified process

      (2) allocates memory in that process with read/write permissions

      (3) writes ThreadFunc function and argument to it in that memory

      (4) then calls CreateRemoteThread -- passing function address and
	  arg address where just wrote ThreadFunc to that (remote)
	  process's address space as well as the name of the DLL to inject

      (5) then waits for that created thread to complete executing then
	  closes the handle to it then returns

    ==> won't work for Win '9x since CreateRemoteThread(...) not supported
	on that (those) platforms

--------
I. Intro
--------
Detours : library for intercepting arbitrary win32 binary functions (read:
	  win32 api functions) on x86 machines

 - interception code applied dynamically at run-time

 - replaces first few instructions of target function (which we'll call OVERWRITTEN)
   with an unconditional jump to the user-provided detour function

 - then the trampoline function consists of: OVERWRITTEN then an
   unconditional jump to the remainder of the target function

 - the detour function can then invoke the target function as a subroutine
   via invoking the trampoline function

 - detours are inserted at execution time 

   -- code of target function modified in memory

 - detours guaranteed to work "regardless of the mehod used by the app or
   system code to locate the target function"

   -- think they really mean : "regardless of whether the function is (in a
      library) that is statically linked, dynamically linked or delay loaded..."

Detours also provides functions : 

 - to edit the IAT of any binary
 - to inject a DLL into a new or an existing process
   -- then the injected DLL can can detour any win32 function "whether in
      the application or system libraries"
 - to attach arbitrary data segments to existing binaries

------------------
II. Implementation
------------------
===================================
A. Interception of binary functions
===================================
 - at runtime, detours replaces first few instructions of target function
   with an unconditional jump to user-provided detour function

 - when execution reaches the target function, control jumps directly to
   the user-supplied detour function

 - detour function does whatever

 - then detour function may return control to the source function (the
   original caller) OR may invoke the trampoline function - which invokes
   the target function without interception

 - when the target function completes, it returns control to the detour function

 - the detour function does whatever then returns control to the source function

++++
How?
++++
The detours library intercepts target functions by rewriting their
in-process binary image

 - rewrites target function
 - rewrites matching trampoline function

The tramp function can be allocated dynamically or statically.

 - if statically, the trampoline always invokes the target function w/o
   the detour

 - before insert a detour, static trampoline contains single jump to target fxn

 - after insert detour, trampoline contains OVERWRITTEN and jmp to remainder
   of target function

-----------------------
To detour a function...
-----------------------
 - alloc mem for dynamic tramp fxn (if no static tramp provided)

 - enable write access to both the target and the tramp

 - copies instructions from target to tramp until at least 5 bytes have
   been copied

 - then adds a jmp instruction at end of tramp to the first non-copied
   instruction of target fxn

 - restore original page permissions on both target and tramp

 - flushes CPU instruction cache

==================================
B. Payloads and DLL Import Editing
==================================
Attach arbitrary data segments to a win32 binary ("payloads")
Edit DLL import address tables

Detours creates new section : .detours

 - between export table (the RVA of which is specified in the 0'th entry
   of the DataDirectory which itself is in the Optional Headers which are
   part of the IMAGE_NT_HEADERS of a PE file) and the debug symbols

 - debug symbols MUST reside last in a win32 binary

 - the .detours section contains a detours header record and a copy of the
   original PE header (PE header == IMAGE_NT_HEADERS)

 - if modifying the IAT, detours creates new IAT, appends it to the copied
   PE header, then makes the original PE header point to the new IAT

   -- "makes the original PE header point to the new IAT" == change the
      RVA stored in the second entry in the Data Directory (which points
      to the IAT RVA) to contain the RVA of the new IAT in our .detours section

 - any data segments to be added are then written at the end of the
   .detours section then the debug symbols are appended

 - reversal easy : restor original PE header from .detours section then
   remove .detours section

---------------------
Why create a new IAT?
---------------------
 - preserves original IAT

 - new IAT can contain renamed import DLLs and functions or entirely new
   DLLs and functions

   --> can make YOUR DLL be the first one loaded when an app runs

 - question : so this is done at run-time? when, precisely? after the app
   has been loaded? or before? (before makes mor sense but how modify
   app's image in mem before that app has been loaded?)

Detours also provides routines for enumerating the binary files mapped
into an address space; can also locate payloads w/in those mapped binaries

 - each payload identified by a 128-bit globally unique identifier (guid)

OK. I think this actually modifies the binaries on disk ... not just in mem

 - which makes more sense

----------------------------------------------
Injecting a DLL into a new or existing process
----------------------------------------------
 - inject : detours writes LoadLibrary(...) call into the target process
   with VirtualAllocEx and WriteProcessMemory then invokes call with
   CreateRemoteThread 

   ==> believe this is NOT supported on Win '9x : CONFIRM

   ==> and figure : can detours itself be used on a Win '9x machine?

------------------
III. Using Detours
------------------
User code must include the detours.h header file and link with detours.lib

(1) to intercept a function with a static trampoline

 - create the trampoline with the DETOUR_TRAMPOLINE macro

 DETOUR_TRAMPOLINE( trampoline_prototype, target_name )

 e.g. DETOUR_TRAMPOLINE( VOID WINAPI SleepTrampoline( DWORD ),
			 Sleep );

 fyi, Actual Sleep function signature (from windows.h, is in kernel32.dll):

   VOID Sleep( DWORD dwMilliseconds );

 "Note that for proper interception: the prototype, target, trampoline,
  and detour functions must all have exatly the same call signature
  including number of arguments and calling convention."

 interecepting the target function : invoke DetourFunctionWithTrampoline
 with two args : (1) trampoline, (2) pointer to the detour function

     note that the target function is already encoded in the trampoline
     and so it not needed as an arg

 e.g. DetourFunctionWithTrampoline( (PBYTE)SleepTrampoline,
				    (PBYTE)SleepDetour );

 where

 VOID WINAPI SleepDetour( DWORD dw ) {

    return SleepTrampoline( dw );
 }

(2) to intercept a function with a dynamic trampoline

 - call DetourFunction with two arguments : (1) a pointer to the target
   function and a pointer to the detour function

 - e.g. 
   
   #include <windows.h>
   #include <detours.h>

   VOID (*DynamicTrampoline)(VOID) = NULL;

   VOID DynamicDetour( VOID ) {

	return DynamicTrampoline();
   }

   void main( void ) {

      VOID (*DynamicTarget)(VOID) = TargetFunction;

      DynamicTrampoline = (FUNCPTR)DetourFunction( (PBYTE)DynamicTarget,
						   (PBYTE)DynamicDetour );
      ...

      // below function can be used w/either static or dynamic tramps
      DetourRemoveTrampoline( DynamicTrampoline );
   }

 - DetourFunction : allocates a new trampoline and inserts the
   appropriate interception code into the target function

Static tramps very easy to use when target function is available as a link
symbol; 

DetourFindFunction : can find the pointer to a function when that function
is exported from a known DLL or if debugging symbols are available for the
target function's binary.

 - takes two args : the name of the binary and the name of the function

 - first tries via LoadLibrary(...) and GetProcAddress(...)

 - then uses ImageHlp library to search available debugging symbols

 - the fxn pointer returned by DetourFindFunction can be given to
   DetourFunction to create a dynamic trampoline

Programmer's responsibility to make sure that no other threads are
exec'ing in addy space while a detour is inserted or removed

 - one approach : call functions in the Detours library from a DLL main routine...

--------------
IV. Evaluation
--------------

Other approaches :

 (1) call replacement in app source code

  - calls to target fxn in app replaced with calls to detour fxn

  - requires access to source code

 (2) call replacement in app binary

  - modify app binary to replace calls to target fxn w/calls to detour fxn

  - requires being able to identify all applicable call sites

    -- requires symbolic info which may not be present in general binaries

    -- also would miss dynamically-linked calls to the target fxn
       (i.e. which work by loading dll then getprocaddress(...)) as well
       as calls which use late-demand binding?

 (3) DLL redirection

  - modify DLL import entries in binary to point to a detour DLL

  - fails to intercept DLL internal calls and calls on pointers obtained
    from GetProcAddress(...)

 (4) Breakpoint trapping

  - insert debugging breakpoint into the target function

  - have debugging exception handler invoke the detour function

  - but debugging exceptions suspend all application threads

  - requires second OS process to catch the debug exception

    ==> heavy performance penalty

============
References :
============
 http://research.microsoft.com/sn/detours/
 http://research.microsoft.com/~galenh/Publications/HuntUsenixNt99.pdf
 http://www.sisecure.com/pdf/cs-2003-01.pdf

+++++++++++++++++++
Detours Usage Notes
+++++++++++++++++++

(1) withdll

(2) setdll

(3) 

 
 类似资料: