I say most because GetProcAddress() is in Kernel32.dll, so how do we get the address of GetProcAddress() if we don't have access to GetProcAddress() yet? It becomes a real chicken and egg problem, but I solved it after many hours of research into semi-documented Windows internals and much trial and error.
In this post I want to go over getting access to Kernel32.dll, parsing the PE format to find the export address of a symbol (e.g., GetProcAddress), and using X Macros to define and load functions we need with concise code. The end result of this project was an exe that displays a message centered in a window. The size of the exe was 3,584 bytes, with every byte of machine language coming from this original source code.
This is for X64 only, but could be extended to 32 bit platforms.
Getting Access to Kernel32.dll
Since we'll be compiling without any libraries, we need to specify the entry point to the executable. I used MainStartup, which you'll see below looks fairly normal (except for the first function call):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | void MainStartup(void) { DynamicLink(); HINSTANCE Instance = GetModuleHandleA(0); WNDCLASSA windowClass = {}; windowClass.style = CS_HREDRAW | CS_VREDRAW ; windowClass.lpfnWndProc = WndProc; windowClass.hInstance = Instance; windowClass.lpszClassName = "NoLibrariesWindowClass"; windowClass.hbrBackground = (HBRUSH)GetStockObject(WHITE_BRUSH); if (RegisterClassA(&windowClass)) { HWND WindowHandle = CreateWindowExA( 0, "NoLibrariesWindowClass", "Greetings", WS_OVERLAPPEDWINDOW | WS_VISIBLE, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, 0, 0, Instance, 0 ); if (WindowHandle) { MSG Message; while (GetMessageA(&Message, NULL, 0, 0)) { TranslateMessage(&Message); DispatchMessageA(&Message); } } } return; } |
DynamicLink() is where the magic kicks off, here's the first half:
1 2 3 4 5 6 7 8 | void DynamicLink(void) { HMODULE Kernel32 = GetKernel32Module(); GetProcAddress = GetGetProcAddress(Kernel32); LoadLibraryA = (LoadLibraryA_t*)GetProcAddress(Kernel32, "LoadLibraryA"); HMODULE User32 = LoadLibraryA("user32.dll"); HMODULE Gdi32 = LoadLibraryA("gdi32.dll"); |
Luckily windows loads Kernel32.dll into every application, so we don't need to call LoadLibrary() on it, but we still need to determine where it's located in memory. There is something called a Thread Environment Block (TEB) in our process, which has a pointer to a Process Environment Block (PEB), which has a pointer to loader data, which has a pointer to the head of a linked list of modules (i.e., exe and dlls) currently loaded into our process.
Access to the TEB is obtained through reading a value out of the GS CPU register. The X64 compiler doesn't allow inline assembly, so we have to use an the intrinsic __readgsqword(). From there I defined some simple structs with padding offsets to get to just the values I needed. I originally used casting and pointer arithmetic, but this method ended up looking cleaner. Here's an example struct I used for the TEB:
1 2 3 4 5 | struct win32_teb { uint8 Padding1[0x60]; win32_peb *PEB; }; |
In the code below, I follow the linked list until I find kernel32.dll module by name, using a simple CompareMemory() function I wrote (the module name is in Unicode).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | HMODULE GetKernel32Module() { wchar_t Kernel32Name[] = L"kernel32.dll"; win32_teb* TEB = (win32_teb*)__readgsqword(0x30); win32_ldr_data_entry* LoaderDataEntry = TEB->PEB->LoaderData->LoaderDataEntry; while (LoaderDataEntry->DllBase) { if (CompareMemory(LoaderDataEntry->DllNameBuffer, Kernel32Name, min(LoaderDataEntry->DllNameLength, sizeof(Kernel32Name))) == 0) { return (HMODULE)LoaderDataEntry->BaseAddress; } LoaderDataEntry = (win32_ldr_data_entry*)(LoaderDataEntry->LinkedList.Flink); } return NULL; } |
Parsing the Kernel32 Memory Layout (PE Image)
With the address of the Kernel32.dll in hand, it's now time to find the location of the GetProcAddress() symbol. To do this, I referred to the Portable Executable (PE) specification to get the layout of the various tables and fields. There are several levels of indirection, with most fields returning addresses relative to the base of the image (i.e., the address we obtained from above).
In a nutshell, the MSDOS header provides the offset to the PEHeader, which provides the offset to the export table, which provides the offset to the name pointer table. I defined the following Macro to make this easier to work with
1 | #define PE_GET_OFFSET(module, offset) ((uint8*)(module) + offset) |
The name pointer table is an array of offsets to strings that correspond to the exported symbol. This is where we can look for "GetProcAddress". The strings are in lexical order, so you can use a binary search to quickly find it's index in the table. Once we have the index of this string, we use the same index in the ordinal table, which in turn gives us the index into the export address table. The value obtained from this table is again an offset, so we add this value to the base address of Kernel32.dll and finally, we have the address of GetProcAddress()!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | GetProcAddress_t* GetGetProcAddress(HMODULE Kernel32) { // Module is now in the EXE Format win32_msdos *MSDOSHeader = (win32_msdos*)PE_GET_OFFSET(Kernel32, 0); win32_pe *PEHeader = (win32_pe*)PE_GET_OFFSET(Kernel32, MSDOSHeader->PEOffset); win32_pe_export_table *ExportTable = (win32_pe_export_table*)PE_GET_OFFSET(Kernel32, PEHeader->ExportTable.VirtualAddress); uint32* NamePointerTable = (uint32*)PE_GET_OFFSET(Kernel32, ExportTable->NamePointerRVA); // binary search for GetProcAddress int Low = 0; int High = ExportTable->NumberofNamePointers - 1; int Index; char *ProcName; int CompareResult = 0; do { if (CompareResult > 0) { Low = Index; } else if (CompareResult < 0) { High = Index; } Index = (High + Low) / 2; ProcName = (char*)PE_GET_OFFSET(Kernel32, NamePointerTable[Index]); } while ((CompareResult = CompareStrings("GetProcAddress", ProcName)) != 0); // the same Index is used for the ordinal value uint16* OrdinalTable = (uint16*)PE_GET_OFFSET(Kernel32, ExportTable->OrdinalTableRVA); uint16 GetProcAddressOrdinal = OrdinalTable[Index]; uint32* ExportAddressTable = (uint32*)PE_GET_OFFSET(Kernel32, ExportTable->ExportAddressTableRVA); uint32 GetProcAddressRVA = ExportAddressTable[GetProcAddressOrdinal]; return (GetProcAddress_t*)PE_GET_OFFSET(Kernel32, GetProcAddressRVA); } |
Loading Functions with X Macros
Now that we have GetProcAddress(), we can use that along with the Kernel32.dll address to obtain the address of LoadLibaryA(). From there we can load anything we need. To make the loading code less verbose, I used the concept of X Macros to define a macro variable (WIN32_DYNAMIC_PROCS) that contains all of the functions I want to load, enclosed in an undefined macro function WPROC. Here's a snippet:
1 2 3 4 5 6 7 8 9 | #define WIN32_DYNAMIC_PROCS \ WPROC(Kernel32, "GetModuleHandleA", HMODULE WINAPI, GetModuleHandleA_, (LPCSTR lpModuleName)) \ WPROC(User32, "MessageBoxA", int WINAPI, MessageBoxA_, (HWND hWnd, LPCSTR lpText, LPCSTR lpCaption, UINT uType)) \ WPROC(User32, "RegisterClassA", ATOM WINAPI, RegisterClassA_, (const WNDCLASSA *lpWndClass)) \ WPROC(User32, "DefWindowProcA", LRESULT WINAPI, DefWindowProcA_, (HWND hWnd, UINT Msg, WPARAM wParam, LPARAM lParam)) \ WPROC(User32, "CreateWindowExA", HWND WINAPI, CreateWindowExA_, (DWORD dwExStyle, LPCSTR lpClassName, LPCSTR lpWindowName, DWORD dwStyle, int x, int y, int nWidth, int nHeight, HWND hWndParent, HMENU hMenu, HINSTANCE hInstance, LPVOID lpParam)) \ WPROC(User32, "GetMessageA", BOOL WINAPI, GetMessageA_, (LPMSG lpMsg, HWND hWnd, UINT wMsgFilterMin, UINT wMsgFilterMax)) \ WPROC(User32, "TranslateMessage", BOOL WINAPI, TranslateMessage_, (const MSG *lpMsg)) \ WPROC(User32, "DispatchMessageA", LRESULT WINAPI, DispatchMessageA_, (const MSG *lpmsg)) |
Later on, I defined the WPROC macro to setup the typedef and global static variable:
1 2 3 4 5 6 7 | #define WPROC(Module, DllName, ReturnType, Name, Params) \ typedef ReturnType Name##t Params; \ static Name##t *Name; WIN32_DYNAMIC_PROCS #undef WPROC |
Finally, I redefine WPROC in the second half of my DynamicLink() function to get the address of the proc in question:
1 2 3 4 5 6 | #define WPROC(Module, DllName, ReturnType, Name, Params) \ Name = (Name##t*)GetProcAddress(Module, DllName); WIN32_DYNAMIC_PROCS #undef WPROC |
I used appended underscores to all of the global function pointers so that they don't conflict with the Windows header and then used a simple define macro to allow the use of their original name. One could get rid of this if they defined what they needed from Windows.h and then didn't include it.
Here's the complete program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 | // example executable without import libraries // compile with: cl /O2 /GS- main.cpp /link /NOLOGO /NODEFAULTLIB /SUBSYSTEM:WINDOWS /MACHINE:X64 /ENTRY:"MainStartup" #include <Windows.h> #include <stdint.h> #include <intrin.h> typedef uint8_t uint8; typedef uint16_t uint16; typedef uint32_t uint32; typedef uint64_t uint64; #define GetProcAddress GetProcAddress_ #define LoadLibraryA LoadLibraryA_ #define MessageBoxA MessageBoxA_ #define RegisterClassA RegisterClassA_ #define GetModuleHandleA GetModuleHandleA_ #define DefWindowProcA DefWindowProcA_ #define CreateWindowExA CreateWindowExA_ #define GetMessageA GetMessageA_ #define TranslateMessage TranslateMessage_ #define DispatchMessageA DispatchMessageA_ #define PostQuitMessage PostQuitMessage_ #define BeginPaint BeginPaint_ #define EndPaint EndPaint_ #define GetStockObject GetStockObject_ #define SelectObject SelectObject_ #define TextOutA TextOutA_ #define GetTextMetricsA GetTextMetricsA_ #define GetClientRect GetClientRect_ #define PE_GET_OFFSET(module, offset) ((uint8*)(module) + offset) struct win32_ldr_data_entry { LIST_ENTRY LinkedList; LIST_ENTRY UnusedList; PVOID BaseAddress; PVOID Reserved2[1]; PVOID DllBase; PVOID EntryPoint; PVOID Reserved3; USHORT DllNameLength; USHORT DllNameMaximumLength; PWSTR DllNameBuffer; }; struct win32_ldr_data { uint8 Padding1[0x20]; win32_ldr_data_entry *LoaderDataEntry; }; struct win32_peb { uint8 Padding1[0x18]; win32_ldr_data *LoaderData; }; struct win32_teb { uint8 Padding1[0x60]; win32_peb *PEB; }; struct win32_msdos { uint8 Padding1[0x3C]; uint32 PEOffset; }; struct win32_pe_image_data { uint32 VirtualAddress; uint32 Size; }; struct win32_pe { // COFF uint8 Signature[4]; uint16 Machine; uint16 NumberOfSections; uint32 TimeDateStamp; uint32 PointerToSymbolTable; uint32 NumberOfSymbols; uint16 SizeOfOptionalHeader; uint16 Characteristics; // Assuming PE32+ Optional Header since this is 64bit only // standard fields uint16 Magic; uint8 MajorLinkerVersion; uint8 MinorLinkerVersion; uint32 SizeOfCode; uint32 SizeOfInitializedData; uint32 SizeOfUninitializedData; uint32 AddressOfEntryPoint; uint32 BaseOfCode; // windows specific fields uint64 ImageBase; uint32 SectionAlignment; uint32 FileAlignment; uint16 MajorOperatingSystemVersion; uint16 MinorOperatingSystemVersion; uint16 MajorImageVersion; uint16 MinorImageVersion; uint16 MajorSubsystemVersion; uint16 MinorSubsystemVersion; uint32 Win32VersionValue; uint32 SizeOfImage; uint32 SizeOfHeaders; uint32 CheckSum; uint16 Subsystem; uint16 DllCharacteristics; uint64 SizeOfStackReserve; uint64 SizeOfStackCommit; uint64 SizeOfHeapReserve; uint64 SizeOfHeapCommit; uint32 LoaderFlags; uint32 NumberOfRvaAndSizes; // data directories win32_pe_image_data ExportTable; win32_pe_image_data ImportTable; win32_pe_image_data ResourceTable; win32_pe_image_data ExceptionTable; win32_pe_image_data CertificateTable; win32_pe_image_data BaseRelocationTable; win32_pe_image_data Debug; win32_pe_image_data Architecture; win32_pe_image_data GlobalPtr; win32_pe_image_data TLSTable; win32_pe_image_data LoadConfigTable; win32_pe_image_data BoundImport; win32_pe_image_data IAT; win32_pe_image_data DelayImportDescriptor; win32_pe_image_data CLRRuntimeHeader; win32_pe_image_data ReservedTable; }; struct win32_pe_export_table { uint32 ExportFlags; uint32 TimeDateStamp; uint16 MajorVersion; uint16 MinorVersion; uint32 NameRVA; uint32 OrdinalBase; uint32 AddressTableEntries; uint32 NumberofNamePointers; uint32 ExportAddressTableRVA; uint32 NamePointerRVA; uint32 OrdinalTableRVA; }; int CompareMemory(void *p1, void *p2, size_t n) { uint8* b1 = (uint8*)p1; uint8* b2 = (uint8*)p2; while (n && *b1 == *b2) { n--; b1++; b2++; } return n ? *b1 - *b2 : 0; } int CompareStrings(char *s1, char *s2) { while (*s1 && *s2 && *s1 == *s2) { s1++; s2++; } return (*s1 == *s2) ? 0 : *s1 - *s2; } HMODULE GetKernel32Module() { wchar_t Kernel32Name[] = L"kernel32.dll"; win32_teb* TEB = (win32_teb*)__readgsqword(0x30); win32_ldr_data_entry* LoaderDataEntry = TEB->PEB->LoaderData->LoaderDataEntry; while (LoaderDataEntry->DllBase) { if (CompareMemory(LoaderDataEntry->DllNameBuffer, Kernel32Name, min(LoaderDataEntry->DllNameLength, sizeof(Kernel32Name))) == 0) { return (HMODULE)LoaderDataEntry->BaseAddress; } LoaderDataEntry = (win32_ldr_data_entry*)(LoaderDataEntry->LinkedList.Flink); } return NULL; } typedef FARPROC WINAPI GetProcAddress_t(HMODULE Module, LPCSTR ProcName); static GetProcAddress_t *GetProcAddress_; typedef HMODULE WINAPI LoadLibraryA_t(LPCSTR FileName); static LoadLibraryA_t *LoadLibraryA_; GetProcAddress_t* GetGetProcAddress(HMODULE Kernel32) { // Module is now in the EXE Format win32_msdos *MSDOSHeader = (win32_msdos*)PE_GET_OFFSET(Kernel32, 0); win32_pe *PEHeader = (win32_pe*)PE_GET_OFFSET(Kernel32, MSDOSHeader->PEOffset); win32_pe_export_table *ExportTable = (win32_pe_export_table*)PE_GET_OFFSET(Kernel32, PEHeader->ExportTable.VirtualAddress); uint32* NamePointerTable = (uint32*)PE_GET_OFFSET(Kernel32, ExportTable->NamePointerRVA); // binary search for GetProcAddress int Low = 0; int High = ExportTable->NumberofNamePointers - 1; int Index; char *ProcName; int CompareResult = 0; do { if (CompareResult > 0) { Low = Index; } else if (CompareResult < 0) { High = Index; } Index = (High + Low) / 2; ProcName = (char*)PE_GET_OFFSET(Kernel32, NamePointerTable[Index]); } while ((CompareResult = CompareStrings("GetProcAddress", ProcName)) != 0); // the same Index is used for the ordinal value uint16* OrdinalTable = (uint16*)PE_GET_OFFSET(Kernel32, ExportTable->OrdinalTableRVA); uint16 GetProcAddressOrdinal = OrdinalTable[Index]; uint32* ExportAddressTable = (uint32*)PE_GET_OFFSET(Kernel32, ExportTable->ExportAddressTableRVA); // The PE Documentation explicitly says you must subtract the OrdinalBase from the Ordinal to get the true // index into the address table, however, I found through testing that this is not the case. // This appears to confirm a problem with the documentation: http://stackoverflow.com/questions/5653316/pe-export-directory-tables-ordinalbase-field-ignored uint32 GetProcAddressRVA = ExportAddressTable[GetProcAddressOrdinal]; return (GetProcAddress_t*)PE_GET_OFFSET(Kernel32, GetProcAddressRVA); } #define WIN32_DYNAMIC_PROCS \ WPROC(Kernel32, "GetModuleHandleA", HMODULE WINAPI, GetModuleHandleA_, (LPCSTR lpModuleName)) \ WPROC(User32, "MessageBoxA", int WINAPI, MessageBoxA_, (HWND hWnd, LPCSTR lpText, LPCSTR lpCaption, UINT uType)) \ WPROC(User32, "RegisterClassA", ATOM WINAPI, RegisterClassA_, (const WNDCLASSA *lpWndClass)) \ WPROC(User32, "DefWindowProcA", LRESULT WINAPI, DefWindowProcA_, (HWND hWnd, UINT Msg, WPARAM wParam, LPARAM lParam)) \ WPROC(User32, "CreateWindowExA", HWND WINAPI, CreateWindowExA_, (DWORD dwExStyle, LPCSTR lpClassName, LPCSTR lpWindowName, DWORD dwStyle, int x, int y, int nWidth, int nHeight, HWND hWndParent, HMENU hMenu, HINSTANCE hInstance, LPVOID lpParam)) \ WPROC(User32, "GetMessageA", BOOL WINAPI, GetMessageA_, (LPMSG lpMsg, HWND hWnd, UINT wMsgFilterMin, UINT wMsgFilterMax)) \ WPROC(User32, "TranslateMessage", BOOL WINAPI, TranslateMessage_, (const MSG *lpMsg)) \ WPROC(User32, "DispatchMessageA", LRESULT WINAPI, DispatchMessageA_, (const MSG *lpmsg)) \ WPROC(User32, "PostQuitMessage", VOID WINAPI, PostQuitMessage_, (int nExitCode)) \ WPROC(User32, "BeginPaint", HDC, BeginPaint_, (HWND hwnd, LPPAINTSTRUCT lpPaint)) \ WPROC(User32, "EndPaint", BOOL, EndPaint_, (HWND hWnd, const PAINTSTRUCT *lpPaint)) \ WPROC(User32, "GetClientRect", BOOL WINAPI, GetClientRect_, (HWND hWnd, LPRECT lpRect)) \ WPROC(Gdi32, "GetStockObject", HGDIOBJ, GetStockObject_, (int fnObject)) \ WPROC(Gdi32, "SelectObject", HGDIOBJ, SelectObject_, (HDC hdc, HGDIOBJ hgdiobj)) \ WPROC(Gdi32, "TextOutA", BOOL, TextOutA_, (HDC hdc, int nXStart, int nYStart, LPCSTR lpString, int cchString)) \ WPROC(Gdi32, "GetTextMetricsA", BOOL, GetTextMetricsA_, (HDC hdc, LPTEXTMETRIC lptm)) #define WPROC(Module, DllName, ReturnType, Name, Params) \ typedef ReturnType Name##t Params; \ static Name##t *Name; WIN32_DYNAMIC_PROCS #undef WPROC void DynamicLink(void) { HMODULE Kernel32 = GetKernel32Module(); GetProcAddress = GetGetProcAddress(Kernel32); LoadLibraryA = (LoadLibraryA_t*)GetProcAddress(Kernel32, "LoadLibraryA"); HMODULE User32 = LoadLibraryA("user32.dll"); HMODULE Gdi32 = LoadLibraryA("gdi32.dll"); #define WPROC(Module, DllName, ReturnType, Name, Params) \ Name = (Name##t*)GetProcAddress(Module, DllName); WIN32_DYNAMIC_PROCS #undef WPROC } LRESULT CALLBACK WndProc(HWND Window, UINT Message, WPARAM WParam, LPARAM LParam) { LRESULT result = 0; switch (Message) { case WM_PAINT: { char Text[] = "No Libraries!"; PAINTSTRUCT Paint; TEXTMETRIC Metrics; RECT Rect; HDC DeviceContext = BeginPaint(Window, &Paint); SelectObject(DeviceContext, GetStockObject(ANSI_FIXED_FONT)); GetTextMetricsA(DeviceContext, &Metrics); GetClientRect(Window, &Rect); // Center text int TextLen = sizeof(Text) - 1; int TextWidth = TextLen * Metrics.tmAveCharWidth; int TextHeight = Metrics.tmHeight; int X = max((Rect.right - Rect.left - TextWidth) / 2, 0); int Y = max((Rect.bottom - Rect.top - TextHeight) / 2, 0); TextOutA(DeviceContext, X, Y, Text, TextLen); EndPaint(Window, &Paint); } break; case WM_CLOSE: PostQuitMessage(0); break; default: result = DefWindowProcA(Window, Message, WParam, LParam); break; } return result; } void MainStartup(void) { DynamicLink(); HINSTANCE Instance = GetModuleHandleA(0); WNDCLASSA windowClass = {}; windowClass.style = CS_HREDRAW | CS_VREDRAW ; windowClass.lpfnWndProc = WndProc; windowClass.hInstance = Instance; windowClass.lpszClassName = "NoLibrariesWindowClass"; windowClass.hbrBackground = (HBRUSH)GetStockObject(WHITE_BRUSH); if (RegisterClassA(&windowClass)) { HWND WindowHandle = CreateWindowExA( 0, "NoLibrariesWindowClass", "Greetings", WS_OVERLAPPEDWINDOW | WS_VISIBLE, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, 0, 0, Instance, 0 ); if (WindowHandle) { MSG Message; while (GetMessageA(&Message, NULL, 0, 0)) { TranslateMessage(&Message); DispatchMessageA(&Message); } } } return; } |