++++++++++++++++++++++++ In-line function hooking (see especially the Detours library from Hunt/Brubacher) ++++++++++++++++++++++++ Detours library (using it raw or unmodified): - user-level hooking only why? because what detours lib does is overwrite the first 5 bytes of the hooked function ... and if the hooked function lives in kernel space, then we'll get a fatal mem error when we try to overwrite these bytes --> confirm : c'est vrai. also the addys of kernel syscalls not accessible in same way that addys of win32 api are. so can't use existing framework to trivially obtain kernel function addys, i.e. those found in the SSDT. Even forgetting for a moment that operating as a userland program we wouldn't have sufficient permissions to change the permissions on such memory regions in the way that is done here for detours. ==> in summary, not trivial to adapt detours to work for kernel syscall hooking, is not clear that it would be better to adapt detours vs. to start from scratch for such a goal. - also check on whether detours is an NT/XP/2k/2003 only soln or whether also works on '9x ===> seems not. - works for all x86 versions of NT, Win2K, XP "However, under Windows 95, Windows 98, and Windows ME, the DetourFunction* APIs do not work unless the program is running under a debugger (the process was created with the DEBUG_PROCESS flag on the call to the CreateProcess* APIs)." --> limitation derives from two sources: (1) non-support of Win '9x (including Win ME) for CreateRemoteThread(...) is why injdll doesn't work for Win '9x (2) the fact that shared virtual memory (which goes from 2 GB to 3 GB in Win '9x) is not copy-on-write. So VirtualProtect, VirtualProtectEx don't work on memory regions w/in that range. System DLLs are mapped into that area. So we can't (trivially) change the permission on memory regions containing win32 api functions... which means we can't overwrite the first five bytes of such functions to cause control to transfer to our detour function. ==> note that if the process is created with the DEBUG_PROCESS flag, DLLs *are* mapped with copy-on-write protection... ==> so does VirtualProtect{Ex} still fail in such a case? if so, then even if we modify "withdll.exe" to create the process with the DEBUG_PROCESS flag, we still can't use detours in its present form to hook Win '9x if not, then GetProcAddress(...) appears to operate differently for Win '9x apps run in Debug mode, i.e. it returns a debug thunk address, not the actual address - so, would take some work to adapt detours to follow the debug thunk to the actual code and overwrite that there? Appears that if we ONLY use "withdll" with our target executables and use -d:traceapi and change <detours/src/creatwith.cpp> function DetourCreateProcessWithDll{A,W} to create the process with the DEBUG_PROCESS creation flag, that hooking will work on Win '9x. ==> Maybe there's still a hangup which maybe has to do with the fact that FlushInstructionCache(...) isn't supported for Win '9x or maybe it works correctly transparently. ====================================================================== So I think the deal here is that the problem comes in overwriting the first 5 bytes of target functions Target functions by and large live in DLLs DLLs live in shared virtual memory For Win '9x, shared virtual memory is from 2 GB to 3 GB And the function which changes the permissions on memory SO THAT 5 bytes of it can be overwritten DOESNT work for Win '9x for shared virtual memory - VirtualProtect, VirtualProtectEx for Win '9x (and ME) don't work for shared virtual mem (2GB to 3GB) - system DLLs are loaded into shared virtual mem - so if we want to change the code in one such DLL's function (which presumably also lives in that 2 GB to 3 GB space) then we'd need to first change the mem permissions on that area to READ/WRITE/EXEC - for Win '9x we cannot make that change via VirtualProtect nor VirtualProtectEx for any mem region w/in 2 GB to 3 GB I do believe that the "withdll" functionality still works... to inject a DLL into a user process, however "injdll" -- which injects a dll into an *existing* user process -- would NOT work since that fxnality relies on CreateRemoteThread which isn't supported for Win '9x "While Windows NT, Windows 2000 and Windows XP always map DLLs into processes with copy-on-write mapping (which Detours needs in order to patch the binary image), Windows 95, Windows 98, and Windows ME only map DLLs with copy-on-write if the process was started with the DEBUG_PROCESS flag on the call to CreateProcess." [README] "Windows 95 doesn't implement copy-on-write in the operating system. With copy-on-write, the operating system will share a common code page in memory, but when a process writes to that memory, the memory is copied so that the individual process gets its own copy that will not interfere with any other process. In the Windows 95 architecture, any memory that is above the 2GB line is shared among all processes. If one process were to write a breakpoint to this shared memory area without the copy-on-write, the breakpoint would apply to all processes, not just the one being debugged." ====================================================================== CreateRemoteThread(...) not supported for Win '9x - only supported for 2k, nt, xp - a means to injecting a DLL into a targeted process -- why is this necessary? -- absent this injection, you can't force a process to call your functions or, if it does, to be able to resolve those functions - we want to force a process to call LoadLibrary(...) with our DLL as the arg HANDLE CreateRemoteThread( HANDLE hProcess, // IN LPSECURITY_ATTRIBUTES lpThreadAttributes, // IN SIZE_T dwStackSize, // IN LPTHREAD_START_ROUTINE lpStartAddress, // IN LPVOID lpParameter, // IN DWORD dwCreationFlags, // IN LPDWORD lpThreadId // OUT ); // hProcess : a handle to the process in which this thread is to be created --> that handle must have the following access rights : (1) PROCESS_CREATE_THREAD (2) PROCESS_QUERY_INFORMATION (3) PROCESS_VM_OPERATION, PROCESS_VM_READ, PROCESS_VM_WRITE // lpThreadAttributes : pointer to security attributes of new thread --> specifies security descriptor for new thread --> if NULL, thread gets default security descriptor and the returned thread handle canNOT be inherited // dwStackSize : initial size of the new thread's stack in bytes --> if 0, uses default size for the executable // lpStartAddress : pointer to application-defined function to be // executed by the thread; represents starting address of the thread // in the remote process --> ThreadProc function : is an application-defined function - serves as starting address for a thread DWORD WINAPI ThreadProc( LPVOID lpParameter ); --> lpParameter : thread data passed to the function // lpParameter : pointer to var to be passed to thread function // dwCreationFlags : flags that control creation of the thread --> if 0, thread runs immediately after created // lpThreadId : pointer to a var that receives the thread ID --> if NULL, thread ID is not returned - So how do we use this? /* ------------------------------------------------------------ * So we write our own version of TheadProc which essentially * calls LoadLibrary on the string provided * ------------------------------------------------------------ */ DWORD WINAPI ThreadProc( LPVOID lpParameter ) { HMODULE targLib = LoadLibrary( lpParameter ); return targLib; } void main( ) { HANDLE hProcessForHooking = ; hThread = CreateRemoteThread( hProcessForHooking, NULL, // thread attrs 0, // stack size ThreadProc, // pointer to fxn to execute "C://HookTool.dll", // argument to that fxn 0, // creation flags NULL ); // thread ID wont be returned } - so, the deal with detours is that there are a couple different functionalities on which other functionalities build +++++++++++++++++++++++++++++++++++++++++++++++++++++ withdll : is defined in <detours/samples/withdll.cpp> +++++++++++++++++++++++++++++++++++++++++++++++++++++ e.g. usage : withdll -d:traceapi.dll myexe.exe this will : (1) create a process with the specified app name and (optional) args - this process is created with the suspend flag so that it is initially suspended - this is done via : <detours/samples/creatwith.cpp> function : P = DetourCreateProcessWithDll{A,W} (2) then <detours/src/creatwith.cpp> function : InjectLibrary( P.hProcess, P.hThread, GetProcessAddress( GetModuleHandle{A,W}(kernel32.dll), LoadLibrary{A,W} ), traceapi.dll, strlen(traceapi.dll) + 1 ); which : (a) suspends the thread (b) gets the contents of the control registers (ESP, EIP, EBP, ...) -- which includes current stack pointer (ESP) -- sets nCodeBase = ESP - { space for our assembly code + space for our args } ==> we're going to write some assembly code and this is the address (within the addy space of the given process) where that code will begin (and so execution should begin) -- will create a buffer with assembly code instructions which will : (1) PUSH "your_dll_name" onto the stack (2) CALL LoadLibrary (where (1) is arg to that call) (3) restore the EAX, EBX, ..., ESI, EDI, EBP, ESP, ... values to what they are in (b) (4) JMP <to original code start, EIP from (b)> -- then makes stack pointer point 4 below -- and instruction pointer point to where your code will be written to (nCodeBase) - changes permissions to read/write nCodeBase - writes starting at nCodeBase with above assembly code (which will cause app to LoadLibrary( yourdll ) then restore the registers to their current contents (before that call) then return to the code they were originally going to execute) - then calls FlushInstructionCache(...) to make sure that this new code (starting at nCodeBase) overwrites any existing code in memory for this process - then sets the thread context so that the new ESP and EIP will take hold - then resumes the thread's execution ==> basically inserts a LoadLibrary(...) call for an arbitrary DLL (specified by you via the command line) into the process's code so that this LoadLibrary is done before the process begins executing then the process returns to exec'ing its normal / original code. --> so the 64k question is : is this fxnality supported on Win '9x? - well, FlushInstructionCache in Win '9x has no effect - for Win '9x, VirtualProtectEx cannot be used on any mem region in shared virtual address space (0x8000000 - 0xbfffffff) -- which is from 2 GB to 3 GB in the virtual addy space -- as noted, this region is shared between processes -- system DLLs are loaded here, also memory mapped files are mapped here ==> So in this case the memory whose protection bits we want to change lives on the user stack, which is probably somewhere in the user virtual addy space (from 4 MB to 2 GB) - so we should be able to call VirtualProtectEx on that area - and we should be allowed to execute code that lives on the stack (in that location that we've just written w/our assembly code) - maybe the inability to call FlushInstructionCache to an effect is a deal breaker but, if not, seems that this functionality should hold for the win '9x model +++++++++++++++++++++++++++++++++++++++++++++++++++ injdll : is defined in <detours/samples/injdll.cpp> +++++++++++++++++++++++++++++++++++++++++++++++++++ e.g. usage : injdll -p:<pid> -d:traceapi.dll - this injects a DLL into an already-executing process (the PID of which is specified above as a command-line arg to this program) - this opens the specified process - then calls DetourContinueProcessWithDll{A,W} which is defined in <detours/src/creatwith.cpp> - and which does : calls InjectLibraryOld (also in <detours/src/creatwith.cpp>) which in turn calls CreateRemoteThread to inject the provided dll (from the command line, e.g. "traceapi.dll" from above) ==> This (injdll) is the functionality that requires support for CreateRemoteThread and so is NOT supported for Win '9x except if the original process (which we are attempting to inject a DLL into was created in DEBUG mode, which is highly unlikely) - basically what this does is : (1) open specified process (2) allocates memory in that process with read/write permissions (3) writes ThreadFunc function and argument to it in that memory (4) then calls CreateRemoteThread -- passing function address and arg address where just wrote ThreadFunc to that (remote) process's address space as well as the name of the DLL to inject (5) then waits for that created thread to complete executing then closes the handle to it then returns ==> won't work for Win '9x since CreateRemoteThread(...) not supported on that (those) platforms -------- I. Intro -------- Detours : library for intercepting arbitrary win32 binary functions (read: win32 api functions) on x86 machines - interception code applied dynamically at run-time - replaces first few instructions of target function (which we'll call OVERWRITTEN) with an unconditional jump to the user-provided detour function - then the trampoline function consists of: OVERWRITTEN then an unconditional jump to the remainder of the target function - the detour function can then invoke the target function as a subroutine via invoking the trampoline function - detours are inserted at execution time -- code of target function modified in memory - detours guaranteed to work "regardless of the mehod used by the app or system code to locate the target function" -- think they really mean : "regardless of whether the function is (in a library) that is statically linked, dynamically linked or delay loaded..." Detours also provides functions : - to edit the IAT of any binary - to inject a DLL into a new or an existing process -- then the injected DLL can can detour any win32 function "whether in the application or system libraries" - to attach arbitrary data segments to existing binaries ------------------ II. Implementation ------------------ =================================== A. Interception of binary functions =================================== - at runtime, detours replaces first few instructions of target function with an unconditional jump to user-provided detour function - when execution reaches the target function, control jumps directly to the user-supplied detour function - detour function does whatever - then detour function may return control to the source function (the original caller) OR may invoke the trampoline function - which invokes the target function without interception - when the target function completes, it returns control to the detour function - the detour function does whatever then returns control to the source function ++++ How? ++++ The detours library intercepts target functions by rewriting their in-process binary image - rewrites target function - rewrites matching trampoline function The tramp function can be allocated dynamically or statically. - if statically, the trampoline always invokes the target function w/o the detour - before insert a detour, static trampoline contains single jump to target fxn - after insert detour, trampoline contains OVERWRITTEN and jmp to remainder of target function ----------------------- To detour a function... ----------------------- - alloc mem for dynamic tramp fxn (if no static tramp provided) - enable write access to both the target and the tramp - copies instructions from target to tramp until at least 5 bytes have been copied - then adds a jmp instruction at end of tramp to the first non-copied instruction of target fxn - restore original page permissions on both target and tramp - flushes CPU instruction cache ================================== B. Payloads and DLL Import Editing ================================== Attach arbitrary data segments to a win32 binary ("payloads") Edit DLL import address tables Detours creates new section : .detours - between export table (the RVA of which is specified in the 0'th entry of the DataDirectory which itself is in the Optional Headers which are part of the IMAGE_NT_HEADERS of a PE file) and the debug symbols - debug symbols MUST reside last in a win32 binary - the .detours section contains a detours header record and a copy of the original PE header (PE header == IMAGE_NT_HEADERS) - if modifying the IAT, detours creates new IAT, appends it to the copied PE header, then makes the original PE header point to the new IAT -- "makes the original PE header point to the new IAT" == change the RVA stored in the second entry in the Data Directory (which points to the IAT RVA) to contain the RVA of the new IAT in our .detours section - any data segments to be added are then written at the end of the .detours section then the debug symbols are appended - reversal easy : restor original PE header from .detours section then remove .detours section --------------------- Why create a new IAT? --------------------- - preserves original IAT - new IAT can contain renamed import DLLs and functions or entirely new DLLs and functions --> can make YOUR DLL be the first one loaded when an app runs - question : so this is done at run-time? when, precisely? after the app has been loaded? or before? (before makes mor sense but how modify app's image in mem before that app has been loaded?) Detours also provides routines for enumerating the binary files mapped into an address space; can also locate payloads w/in those mapped binaries - each payload identified by a 128-bit globally unique identifier (guid) OK. I think this actually modifies the binaries on disk ... not just in mem - which makes more sense ---------------------------------------------- Injecting a DLL into a new or existing process ---------------------------------------------- - inject : detours writes LoadLibrary(...) call into the target process with VirtualAllocEx and WriteProcessMemory then invokes call with CreateRemoteThread ==> believe this is NOT supported on Win '9x : CONFIRM ==> and figure : can detours itself be used on a Win '9x machine? ------------------ III. Using Detours ------------------ User code must include the detours.h header file and link with detours.lib (1) to intercept a function with a static trampoline - create the trampoline with the DETOUR_TRAMPOLINE macro DETOUR_TRAMPOLINE( trampoline_prototype, target_name ) e.g. DETOUR_TRAMPOLINE( VOID WINAPI SleepTrampoline( DWORD ), Sleep ); fyi, Actual Sleep function signature (from windows.h, is in kernel32.dll): VOID Sleep( DWORD dwMilliseconds ); "Note that for proper interception: the prototype, target, trampoline, and detour functions must all have exatly the same call signature including number of arguments and calling convention." interecepting the target function : invoke DetourFunctionWithTrampoline with two args : (1) trampoline, (2) pointer to the detour function note that the target function is already encoded in the trampoline and so it not needed as an arg e.g. DetourFunctionWithTrampoline( (PBYTE)SleepTrampoline, (PBYTE)SleepDetour ); where VOID WINAPI SleepDetour( DWORD dw ) { return SleepTrampoline( dw ); } (2) to intercept a function with a dynamic trampoline - call DetourFunction with two arguments : (1) a pointer to the target function and a pointer to the detour function - e.g. #include <windows.h> #include <detours.h> VOID (*DynamicTrampoline)(VOID) = NULL; VOID DynamicDetour( VOID ) { return DynamicTrampoline(); } void main( void ) { VOID (*DynamicTarget)(VOID) = TargetFunction; DynamicTrampoline = (FUNCPTR)DetourFunction( (PBYTE)DynamicTarget, (PBYTE)DynamicDetour ); ... // below function can be used w/either static or dynamic tramps DetourRemoveTrampoline( DynamicTrampoline ); } - DetourFunction : allocates a new trampoline and inserts the appropriate interception code into the target function Static tramps very easy to use when target function is available as a link symbol; DetourFindFunction : can find the pointer to a function when that function is exported from a known DLL or if debugging symbols are available for the target function's binary. - takes two args : the name of the binary and the name of the function - first tries via LoadLibrary(...) and GetProcAddress(...) - then uses ImageHlp library to search available debugging symbols - the fxn pointer returned by DetourFindFunction can be given to DetourFunction to create a dynamic trampoline Programmer's responsibility to make sure that no other threads are exec'ing in addy space while a detour is inserted or removed - one approach : call functions in the Detours library from a DLL main routine... -------------- IV. Evaluation -------------- Other approaches : (1) call replacement in app source code - calls to target fxn in app replaced with calls to detour fxn - requires access to source code (2) call replacement in app binary - modify app binary to replace calls to target fxn w/calls to detour fxn - requires being able to identify all applicable call sites -- requires symbolic info which may not be present in general binaries -- also would miss dynamically-linked calls to the target fxn (i.e. which work by loading dll then getprocaddress(...)) as well as calls which use late-demand binding? (3) DLL redirection - modify DLL import entries in binary to point to a detour DLL - fails to intercept DLL internal calls and calls on pointers obtained from GetProcAddress(...) (4) Breakpoint trapping - insert debugging breakpoint into the target function - have debugging exception handler invoke the detour function - but debugging exceptions suspend all application threads - requires second OS process to catch the debug exception ==> heavy performance penalty ============ References : ============ http://research.microsoft.com/sn/detours/ http://research.microsoft.com/~galenh/Publications/HuntUsenixNt99.pdf http://www.sisecure.com/pdf/cs-2003-01.pdf +++++++++++++++++++ Detours Usage Notes +++++++++++++++++++ (1) withdll (2) setdll (3)