2 posts
HOWTO - Building without Import Libraries
Over the weekend I worked on a project to build a Windows executable without any libraries whatsoever, esp. the import libraries. Now we already learned how to avoid import libraries when we were first introduced to Direct Input and logically we could extend the same method to cover the functions in User32.dll and most of Kernel32.dll.

I say most because GetProcAddress() is in Kernel32.dll, so how do we get the address of GetProcAddress() if we don't have access to GetProcAddress() yet? It becomes a real chicken and egg problem, but I solved it after many hours of research into semi-documented Windows internals and much trial and error.

In this post I want to go over getting access to Kernel32.dll, parsing the PE format to find the export address of a symbol (e.g., GetProcAddress), and using X Macros to define and load functions we need with concise code. The end result of this project was an exe that displays a message centered in a window. The size of the exe was 3,584 bytes, with every byte of machine language coming from this original source code.

This is for X64 only, but could be extended to 32 bit platforms.

Since we'll be compiling without any libraries, we need to specify the entry point to the executable. I used MainStartup, which you'll see below looks fairly normal (except for the first function call):

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 void MainStartup(void) { DynamicLink(); HINSTANCE Instance = GetModuleHandleA(0); WNDCLASSA windowClass = {}; windowClass.style = CS_HREDRAW | CS_VREDRAW ; windowClass.lpfnWndProc = WndProc; windowClass.hInstance = Instance; windowClass.lpszClassName = "NoLibrariesWindowClass"; windowClass.hbrBackground = (HBRUSH)GetStockObject(WHITE_BRUSH); if (RegisterClassA(&windowClass)) { HWND WindowHandle = CreateWindowExA( 0, "NoLibrariesWindowClass", "Greetings", WS_OVERLAPPEDWINDOW | WS_VISIBLE, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, 0, 0, Instance, 0 ); if (WindowHandle) { MSG Message; while (GetMessageA(&Message, NULL, 0, 0)) { TranslateMessage(&Message); DispatchMessageA(&Message); } } } return; } 

DynamicLink() is where the magic kicks off, here's the first half:

 1 2 3 4 5 6 7 8 void DynamicLink(void) { HMODULE Kernel32 = GetKernel32Module(); GetProcAddress = GetGetProcAddress(Kernel32); LoadLibraryA = (LoadLibraryA_t*)GetProcAddress(Kernel32, "LoadLibraryA"); HMODULE User32 = LoadLibraryA("user32.dll"); HMODULE Gdi32 = LoadLibraryA("gdi32.dll"); 

Luckily windows loads Kernel32.dll into every application, so we don't need to call LoadLibrary() on it, but we still need to determine where it's located in memory. There is something called a Thread Environment Block (TEB) in our process, which has a pointer to a Process Environment Block (PEB), which has a pointer to loader data, which has a pointer to the head of a linked list of modules (i.e., exe and dlls) currently loaded into our process.

Access to the TEB is obtained through reading a value out of the GS CPU register. The X64 compiler doesn't allow inline assembly, so we have to use an the intrinsic __readgsqword(). From there I defined some simple structs with padding offsets to get to just the values I needed. I originally used casting and pointer arithmetic, but this method ended up looking cleaner. Here's an example struct I used for the TEB:

 1 2 3 4 5 struct win32_teb { uint8 Padding1[0x60]; win32_peb *PEB; }; 

In the code below, I follow the linked list until I find kernel32.dll module by name, using a simple CompareMemory() function I wrote (the module name is in Unicode).

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 HMODULE GetKernel32Module() { wchar_t Kernel32Name[] = L"kernel32.dll"; win32_teb* TEB = (win32_teb*)__readgsqword(0x30); win32_ldr_data_entry* LoaderDataEntry = TEB->PEB->LoaderData->LoaderDataEntry; while (LoaderDataEntry->DllBase) { if (CompareMemory(LoaderDataEntry->DllNameBuffer, Kernel32Name, min(LoaderDataEntry->DllNameLength, sizeof(Kernel32Name))) == 0) { return (HMODULE)LoaderDataEntry->BaseAddress; } LoaderDataEntry = (win32_ldr_data_entry*)(LoaderDataEntry->LinkedList.Flink); } return NULL; } 

Parsing the Kernel32 Memory Layout (PE Image)

With the address of the Kernel32.dll in hand, it's now time to find the location of the GetProcAddress() symbol. To do this, I referred to the Portable Executable (PE) specification to get the layout of the various tables and fields. There are several levels of indirection, with most fields returning addresses relative to the base of the image (i.e., the address we obtained from above).

In a nutshell, the MSDOS header provides the offset to the PEHeader, which provides the offset to the export table, which provides the offset to the name pointer table. I defined the following Macro to make this easier to work with

 1 #define PE_GET_OFFSET(module, offset) ((uint8*)(module) + offset) 

The name pointer table is an array of offsets to strings that correspond to the exported symbol. This is where we can look for "GetProcAddress". The strings are in lexical order, so you can use a binary search to quickly find it's index in the table. Once we have the index of this string, we use the same index in the ordinal table, which in turn gives us the index into the export address table. The value obtained from this table is again an offset, so we add this value to the base address of Kernel32.dll and finally, we have the address of GetProcAddress()!

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 GetProcAddress_t* GetGetProcAddress(HMODULE Kernel32) { // Module is now in the EXE Format win32_msdos *MSDOSHeader = (win32_msdos*)PE_GET_OFFSET(Kernel32, 0); win32_pe *PEHeader = (win32_pe*)PE_GET_OFFSET(Kernel32, MSDOSHeader->PEOffset); win32_pe_export_table *ExportTable = (win32_pe_export_table*)PE_GET_OFFSET(Kernel32, PEHeader->ExportTable.VirtualAddress); uint32* NamePointerTable = (uint32*)PE_GET_OFFSET(Kernel32, ExportTable->NamePointerRVA); // binary search for GetProcAddress int Low = 0; int High = ExportTable->NumberofNamePointers - 1; int Index; char *ProcName; int CompareResult = 0; do { if (CompareResult > 0) { Low = Index; } else if (CompareResult < 0) { High = Index; } Index = (High + Low) / 2; ProcName = (char*)PE_GET_OFFSET(Kernel32, NamePointerTable[Index]); } while ((CompareResult = CompareStrings("GetProcAddress", ProcName)) != 0); // the same Index is used for the ordinal value uint16* OrdinalTable = (uint16*)PE_GET_OFFSET(Kernel32, ExportTable->OrdinalTableRVA); uint16 GetProcAddressOrdinal = OrdinalTable[Index]; uint32* ExportAddressTable = (uint32*)PE_GET_OFFSET(Kernel32, ExportTable->ExportAddressTableRVA); uint32 GetProcAddressRVA = ExportAddressTable[GetProcAddressOrdinal]; return (GetProcAddress_t*)PE_GET_OFFSET(Kernel32, GetProcAddressRVA); } 

Now that we have GetProcAddress(), we can use that along with the Kernel32.dll address to obtain the address of LoadLibaryA(). From there we can load anything we need. To make the loading code less verbose, I used the concept of X Macros to define a macro variable (WIN32_DYNAMIC_PROCS) that contains all of the functions I want to load, enclosed in an undefined macro function WPROC. Here's a snippet:

 1 2 3 4 5 6 7 8 9 #define WIN32_DYNAMIC_PROCS \ WPROC(Kernel32, "GetModuleHandleA", HMODULE WINAPI, GetModuleHandleA_, (LPCSTR lpModuleName)) \ WPROC(User32, "MessageBoxA", int WINAPI, MessageBoxA_, (HWND hWnd, LPCSTR lpText, LPCSTR lpCaption, UINT uType)) \ WPROC(User32, "RegisterClassA", ATOM WINAPI, RegisterClassA_, (const WNDCLASSA *lpWndClass)) \ WPROC(User32, "DefWindowProcA", LRESULT WINAPI, DefWindowProcA_, (HWND hWnd, UINT Msg, WPARAM wParam, LPARAM lParam)) \ WPROC(User32, "CreateWindowExA", HWND WINAPI, CreateWindowExA_, (DWORD dwExStyle, LPCSTR lpClassName, LPCSTR lpWindowName, DWORD dwStyle, int x, int y, int nWidth, int nHeight, HWND hWndParent, HMENU hMenu, HINSTANCE hInstance, LPVOID lpParam)) \ WPROC(User32, "GetMessageA", BOOL WINAPI, GetMessageA_, (LPMSG lpMsg, HWND hWnd, UINT wMsgFilterMin, UINT wMsgFilterMax)) \ WPROC(User32, "TranslateMessage", BOOL WINAPI, TranslateMessage_, (const MSG *lpMsg)) \ WPROC(User32, "DispatchMessageA", LRESULT WINAPI, DispatchMessageA_, (const MSG *lpmsg)) 

Later on, I defined the WPROC macro to setup the typedef and global static variable:

 1 2 3 4 5 6 7 #define WPROC(Module, DllName, ReturnType, Name, Params) \ typedef ReturnType Name##t Params; \ static Name##t *Name; WIN32_DYNAMIC_PROCS #undef WPROC 

Finally, I redefine WPROC in the second half of my DynamicLink() function to get the address of the proc in question:

 1 2 3 4 5 6 #define WPROC(Module, DllName, ReturnType, Name, Params) \ Name = (Name##t*)GetProcAddress(Module, DllName); WIN32_DYNAMIC_PROCS #undef WPROC 

I used appended underscores to all of the global function pointers so that they don't conflict with the Windows header and then used a simple define macro to allow the use of their original name. One could get rid of this if they defined what they needed from Windows.h and then didn't include it.

Here's the complete program:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 // example executable without import libraries // compile with: cl /O2 /GS- main.cpp /link /NOLOGO /NODEFAULTLIB /SUBSYSTEM:WINDOWS /MACHINE:X64 /ENTRY:"MainStartup" #include #include #include typedef uint8_t uint8; typedef uint16_t uint16; typedef uint32_t uint32; typedef uint64_t uint64; #define GetProcAddress GetProcAddress_ #define LoadLibraryA LoadLibraryA_ #define MessageBoxA MessageBoxA_ #define RegisterClassA RegisterClassA_ #define GetModuleHandleA GetModuleHandleA_ #define DefWindowProcA DefWindowProcA_ #define CreateWindowExA CreateWindowExA_ #define GetMessageA GetMessageA_ #define TranslateMessage TranslateMessage_ #define DispatchMessageA DispatchMessageA_ #define PostQuitMessage PostQuitMessage_ #define BeginPaint BeginPaint_ #define EndPaint EndPaint_ #define GetStockObject GetStockObject_ #define SelectObject SelectObject_ #define TextOutA TextOutA_ #define GetTextMetricsA GetTextMetricsA_ #define GetClientRect GetClientRect_ #define PE_GET_OFFSET(module, offset) ((uint8*)(module) + offset) struct win32_ldr_data_entry { LIST_ENTRY LinkedList; LIST_ENTRY UnusedList; PVOID BaseAddress; PVOID Reserved2[1]; PVOID DllBase; PVOID EntryPoint; PVOID Reserved3; USHORT DllNameLength; USHORT DllNameMaximumLength; PWSTR DllNameBuffer; }; struct win32_ldr_data { uint8 Padding1[0x20]; win32_ldr_data_entry *LoaderDataEntry; }; struct win32_peb { uint8 Padding1[0x18]; win32_ldr_data *LoaderData; }; struct win32_teb { uint8 Padding1[0x60]; win32_peb *PEB; }; struct win32_msdos { uint8 Padding1[0x3C]; uint32 PEOffset; }; struct win32_pe_image_data { uint32 VirtualAddress; uint32 Size; }; struct win32_pe { // COFF uint8 Signature[4]; uint16 Machine; uint16 NumberOfSections; uint32 TimeDateStamp; uint32 PointerToSymbolTable; uint32 NumberOfSymbols; uint16 SizeOfOptionalHeader; uint16 Characteristics; // Assuming PE32+ Optional Header since this is 64bit only // standard fields uint16 Magic; uint8 MajorLinkerVersion; uint8 MinorLinkerVersion; uint32 SizeOfCode; uint32 SizeOfInitializedData; uint32 SizeOfUninitializedData; uint32 AddressOfEntryPoint; uint32 BaseOfCode; // windows specific fields uint64 ImageBase; uint32 SectionAlignment; uint32 FileAlignment; uint16 MajorOperatingSystemVersion; uint16 MinorOperatingSystemVersion; uint16 MajorImageVersion; uint16 MinorImageVersion; uint16 MajorSubsystemVersion; uint16 MinorSubsystemVersion; uint32 Win32VersionValue; uint32 SizeOfImage; uint32 SizeOfHeaders; uint32 CheckSum; uint16 Subsystem; uint16 DllCharacteristics; uint64 SizeOfStackReserve; uint64 SizeOfStackCommit; uint64 SizeOfHeapReserve; uint64 SizeOfHeapCommit; uint32 LoaderFlags; uint32 NumberOfRvaAndSizes; // data directories win32_pe_image_data ExportTable; win32_pe_image_data ImportTable; win32_pe_image_data ResourceTable; win32_pe_image_data ExceptionTable; win32_pe_image_data CertificateTable; win32_pe_image_data BaseRelocationTable; win32_pe_image_data Debug; win32_pe_image_data Architecture; win32_pe_image_data GlobalPtr; win32_pe_image_data TLSTable; win32_pe_image_data LoadConfigTable; win32_pe_image_data BoundImport; win32_pe_image_data IAT; win32_pe_image_data DelayImportDescriptor; win32_pe_image_data CLRRuntimeHeader; win32_pe_image_data ReservedTable; }; struct win32_pe_export_table { uint32 ExportFlags; uint32 TimeDateStamp; uint16 MajorVersion; uint16 MinorVersion; uint32 NameRVA; uint32 OrdinalBase; uint32 AddressTableEntries; uint32 NumberofNamePointers; uint32 ExportAddressTableRVA; uint32 NamePointerRVA; uint32 OrdinalTableRVA; }; int CompareMemory(void *p1, void *p2, size_t n) { uint8* b1 = (uint8*)p1; uint8* b2 = (uint8*)p2; while (n && *b1 == *b2) { n--; b1++; b2++; } return n ? *b1 - *b2 : 0; } int CompareStrings(char *s1, char *s2) { while (*s1 && *s2 && *s1 == *s2) { s1++; s2++; } return (*s1 == *s2) ? 0 : *s1 - *s2; } HMODULE GetKernel32Module() { wchar_t Kernel32Name[] = L"kernel32.dll"; win32_teb* TEB = (win32_teb*)__readgsqword(0x30); win32_ldr_data_entry* LoaderDataEntry = TEB->PEB->LoaderData->LoaderDataEntry; while (LoaderDataEntry->DllBase) { if (CompareMemory(LoaderDataEntry->DllNameBuffer, Kernel32Name, min(LoaderDataEntry->DllNameLength, sizeof(Kernel32Name))) == 0) { return (HMODULE)LoaderDataEntry->BaseAddress; } LoaderDataEntry = (win32_ldr_data_entry*)(LoaderDataEntry->LinkedList.Flink); } return NULL; } typedef FARPROC WINAPI GetProcAddress_t(HMODULE Module, LPCSTR ProcName); static GetProcAddress_t *GetProcAddress_; typedef HMODULE WINAPI LoadLibraryA_t(LPCSTR FileName); static LoadLibraryA_t *LoadLibraryA_; GetProcAddress_t* GetGetProcAddress(HMODULE Kernel32) { // Module is now in the EXE Format win32_msdos *MSDOSHeader = (win32_msdos*)PE_GET_OFFSET(Kernel32, 0); win32_pe *PEHeader = (win32_pe*)PE_GET_OFFSET(Kernel32, MSDOSHeader->PEOffset); win32_pe_export_table *ExportTable = (win32_pe_export_table*)PE_GET_OFFSET(Kernel32, PEHeader->ExportTable.VirtualAddress); uint32* NamePointerTable = (uint32*)PE_GET_OFFSET(Kernel32, ExportTable->NamePointerRVA); // binary search for GetProcAddress int Low = 0; int High = ExportTable->NumberofNamePointers - 1; int Index; char *ProcName; int CompareResult = 0; do { if (CompareResult > 0) { Low = Index; } else if (CompareResult < 0) { High = Index; } Index = (High + Low) / 2; ProcName = (char*)PE_GET_OFFSET(Kernel32, NamePointerTable[Index]); } while ((CompareResult = CompareStrings("GetProcAddress", ProcName)) != 0); // the same Index is used for the ordinal value uint16* OrdinalTable = (uint16*)PE_GET_OFFSET(Kernel32, ExportTable->OrdinalTableRVA); uint16 GetProcAddressOrdinal = OrdinalTable[Index]; uint32* ExportAddressTable = (uint32*)PE_GET_OFFSET(Kernel32, ExportTable->ExportAddressTableRVA); // The PE Documentation explicitly says you must subtract the OrdinalBase from the Ordinal to get the true // index into the address table, however, I found through testing that this is not the case. // This appears to confirm a problem with the documentation: http://stackoverflow.com/questions/5653316/pe-export-directory-tables-ordinalbase-field-ignored uint32 GetProcAddressRVA = ExportAddressTable[GetProcAddressOrdinal]; return (GetProcAddress_t*)PE_GET_OFFSET(Kernel32, GetProcAddressRVA); } #define WIN32_DYNAMIC_PROCS \ WPROC(Kernel32, "GetModuleHandleA", HMODULE WINAPI, GetModuleHandleA_, (LPCSTR lpModuleName)) \ WPROC(User32, "MessageBoxA", int WINAPI, MessageBoxA_, (HWND hWnd, LPCSTR lpText, LPCSTR lpCaption, UINT uType)) \ WPROC(User32, "RegisterClassA", ATOM WINAPI, RegisterClassA_, (const WNDCLASSA *lpWndClass)) \ WPROC(User32, "DefWindowProcA", LRESULT WINAPI, DefWindowProcA_, (HWND hWnd, UINT Msg, WPARAM wParam, LPARAM lParam)) \ WPROC(User32, "CreateWindowExA", HWND WINAPI, CreateWindowExA_, (DWORD dwExStyle, LPCSTR lpClassName, LPCSTR lpWindowName, DWORD dwStyle, int x, int y, int nWidth, int nHeight, HWND hWndParent, HMENU hMenu, HINSTANCE hInstance, LPVOID lpParam)) \ WPROC(User32, "GetMessageA", BOOL WINAPI, GetMessageA_, (LPMSG lpMsg, HWND hWnd, UINT wMsgFilterMin, UINT wMsgFilterMax)) \ WPROC(User32, "TranslateMessage", BOOL WINAPI, TranslateMessage_, (const MSG *lpMsg)) \ WPROC(User32, "DispatchMessageA", LRESULT WINAPI, DispatchMessageA_, (const MSG *lpmsg)) \ WPROC(User32, "PostQuitMessage", VOID WINAPI, PostQuitMessage_, (int nExitCode)) \ WPROC(User32, "BeginPaint", HDC, BeginPaint_, (HWND hwnd, LPPAINTSTRUCT lpPaint)) \ WPROC(User32, "EndPaint", BOOL, EndPaint_, (HWND hWnd, const PAINTSTRUCT *lpPaint)) \ WPROC(User32, "GetClientRect", BOOL WINAPI, GetClientRect_, (HWND hWnd, LPRECT lpRect)) \ WPROC(Gdi32, "GetStockObject", HGDIOBJ, GetStockObject_, (int fnObject)) \ WPROC(Gdi32, "SelectObject", HGDIOBJ, SelectObject_, (HDC hdc, HGDIOBJ hgdiobj)) \ WPROC(Gdi32, "TextOutA", BOOL, TextOutA_, (HDC hdc, int nXStart, int nYStart, LPCSTR lpString, int cchString)) \ WPROC(Gdi32, "GetTextMetricsA", BOOL, GetTextMetricsA_, (HDC hdc, LPTEXTMETRIC lptm)) #define WPROC(Module, DllName, ReturnType, Name, Params) \ typedef ReturnType Name##t Params; \ static Name##t *Name; WIN32_DYNAMIC_PROCS #undef WPROC void DynamicLink(void) { HMODULE Kernel32 = GetKernel32Module(); GetProcAddress = GetGetProcAddress(Kernel32); LoadLibraryA = (LoadLibraryA_t*)GetProcAddress(Kernel32, "LoadLibraryA"); HMODULE User32 = LoadLibraryA("user32.dll"); HMODULE Gdi32 = LoadLibraryA("gdi32.dll"); #define WPROC(Module, DllName, ReturnType, Name, Params) \ Name = (Name##t*)GetProcAddress(Module, DllName); WIN32_DYNAMIC_PROCS #undef WPROC } LRESULT CALLBACK WndProc(HWND Window, UINT Message, WPARAM WParam, LPARAM LParam) { LRESULT result = 0; switch (Message) { case WM_PAINT: { char Text[] = "No Libraries!"; PAINTSTRUCT Paint; TEXTMETRIC Metrics; RECT Rect; HDC DeviceContext = BeginPaint(Window, &Paint); SelectObject(DeviceContext, GetStockObject(ANSI_FIXED_FONT)); GetTextMetricsA(DeviceContext, &Metrics); GetClientRect(Window, &Rect); // Center text int TextLen = sizeof(Text) - 1; int TextWidth = TextLen * Metrics.tmAveCharWidth; int TextHeight = Metrics.tmHeight; int X = max((Rect.right - Rect.left - TextWidth) / 2, 0); int Y = max((Rect.bottom - Rect.top - TextHeight) / 2, 0); TextOutA(DeviceContext, X, Y, Text, TextLen); EndPaint(Window, &Paint); } break; case WM_CLOSE: PostQuitMessage(0); break; default: result = DefWindowProcA(Window, Message, WParam, LParam); break; } return result; } void MainStartup(void) { DynamicLink(); HINSTANCE Instance = GetModuleHandleA(0); WNDCLASSA windowClass = {}; windowClass.style = CS_HREDRAW | CS_VREDRAW ; windowClass.lpfnWndProc = WndProc; windowClass.hInstance = Instance; windowClass.lpszClassName = "NoLibrariesWindowClass"; windowClass.hbrBackground = (HBRUSH)GetStockObject(WHITE_BRUSH); if (RegisterClassA(&windowClass)) { HWND WindowHandle = CreateWindowExA( 0, "NoLibrariesWindowClass", "Greetings", WS_OVERLAPPEDWINDOW | WS_VISIBLE, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, 0, 0, Instance, 0 ); if (WindowHandle) { MSG Message; while (GetMessageA(&Message, NULL, 0, 0)) { TranslateMessage(&Message); DispatchMessageA(&Message); } } } return; } 
2 posts
HOWTO - Building without Import Libraries
Edited by a_null on
Haha great post! I've only used the following with 32 bit code. Never tested it with 64 bit code.

If you aren't using the CRT at all then you can do some fun tricks. When the windows loader calls the entry point eax is loaded with a pointer into kernel32.dll. (on the stack theres a pointer into ntdll.dll if you wanted to use the undocumented api) Coincidentally from what I've found, this pointer is always a page away from the base of kernel32.dll (all modules are loaded on page boundaries) on all versions of Windows XP and later.

 1 2 3 4 5 6 7 8 void Entry() { DWORD base; _asm { xor ax, ax // align eax pointer to page boundaries by clearing lower word sub eax, 0x10000 // roll back a page to the base address of k32 mov base, eax } // do something with base... 

If you're uncomfortable with the single-page away thing on all versions of Windows, to future proof yourself you can always just roll back pages until you get the base instead of just doing it once:
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 void Entry() { DWORD base; _asm mov base, eax for(base &= 0xFFFF0000; *(short *)base != 'ZM'; base -= 0x10000) if (base <= 0) return; auto pe = (PIMAGE_NT_HEADERS)(base + ((PIMAGE_DOS_HEADER)base)->e_lfanew); auto exportTable = (PIMAGE_EXPORT_DIRECTORY)(base + pe->OptionalHeader.DataDirectory[0].VirtualAddress); auto exportNames = (PSTR *)(base + exportTable->AddressOfNames); //auto exportFuncs = (DWORD *)(base + exportTable->AddressOfFunctions); for(DWORD i = 0; i < exportTable->NumberOfNames; ++i) MessageBoxA(0, (PSTR)(base + exportNames[i]), "Export:", MB_OK); } 

The above is just a little test snippet. You'd want to scan for the LoadLibrary and GetProcAddress exports. If you're sizecoding and have the base of any general dll, a lot of times this is done by comparing the hashes of the strings or through other methods. This sort of loop is also quite useful if you want to find the baseaddress of any module that you have a pointer into.

Another trick is that the windows loader puts the address of the PEB in ebx. Kinda useful if you wanna save bytes instead of getting it through the TEB.
  1 2 3 4 5 6 7 8 9 10 11 #include #include typedef int (WINAPI *MsgBoxWPtr)(HWND, PWSTR, PWSTR, UINT); void Entry() { PPEB peb; _asm mov peb, ebx ((MsgBoxWPtr)GetProcAddress(LoadLibrary("user32.dll"), "MessageBoxW")) (0, peb->ProcessParameters->CommandLine.Buffer, L"cmdline", MB_OK); } 

EDIT: Now I'm really off topic but another cool thing that I actually haven't tested but have wanted to for a while is using the OpenGL driver addresses in the TEB. It's undocumented but at (using 32 bit code) fs:[18] theres a GlDispatchTable[0x118] containing function pointers to all the opengl functions that you could call. This could actually circumvent going through opengl32.dll but it would just cause more trouble for yourself. It'd probably be most useful if you wanted to hook one of them from a dll.
Casey Muratori
801 posts / 1 project
Casey Muratori is a programmer at Molly Rocket on the game 1935 and is the host of the educational programming series Handmade Hero.
HOWTO - Building without Import Libraries
AWESOME!!!

- Casey
Mārtiņš Možeiko
2371 posts / 2 projects
HOWTO - Building without Import Libraries
This was not working for me on Windows 8.1 x64.

I needed to change CompareMemory function to be case insensitive, because win32_ldr_data_entry had kernel32.dll module name in uppercase - "KERNEL32.DLL". ntt.dll is lowercase though.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 int CompareMemory(void *p1, void *p2, size_t n) { uint8* b1 = (uint8*)p1; uint8* b2 = (uint8*)p2; while (n) { uint8 c1 = *b1; uint8 c2 = *b2; c1 = c1 >= 'A' && c1 <= 'Z' ? c1 - 'A' + 'a' : c1; c2 = c2 >= 'A' && c2 <= 'Z' ? c2 - 'A' + 'a' : c2; if (c1 != c2) return c1 - c2; n--; b1++; b2++; } return n ? *b1 - *b2 : 0; } 

But pretty cool that this works. Although I could imaging some anti-virus or anti-malware software could be suspicious of such executables.
Ameen Sayegh
51 posts
HOWTO - Building without Import Libraries
what you did is avoid LINKING to kernel32.dll Right?

but programs on windows will always have kernel32.dll loaded

But what are the benefits of avoid linking to kernel32.dll since it is there on every machine? Is there any?

The other thing is shouldn't you call ExitProcess() rather than return from the main function.

Your work is VERY VERY NICE.
Thanks
2 posts
HOWTO - Building without Import Libraries
aameen951
what you did is avoid LINKING to kernel32.dll Right?

but programs on windows will always have kernel32.dll loaded

But what are the benefits of avoid linking to kernel32.dll since it is there on every machine? Is there any?

The other thing is shouldn't you call ExitProcess() rather than return from the main function.

Your work is VERY VERY NICE.
Thanks

Thank you. kernel32.dll is loaded into every Windows program as far as I am aware. What I did was avoid linking the program with the import library (kernel32.lib). Here is a fairly good explanation of the difference between using an import library versus the LoadLibrary()/GetProcAddress() combo:

http://stackoverflow.com/question...port-library-work-details#3573527
Ameen Sayegh
51 posts
HOWTO - Building without Import Libraries
Edited by Ameen Sayegh on
OK, now I understand what you mean.
Now you don't need kernel32.lib to compile the program or you don't need any .lib file to compile the program.
But according to the link GCC linker doesn't need these .lib files, it can get the information needed from the DLL directly.
It seems possible but I don't know why MSVC linker didn't do it?

Anyway the code you showed looks reliable and it is boilerplate you put it and off you go and that's good.

You didn't say if you need to call ExitProcess at the end of MainStartup instead of return, because the other post said "you should not return from WinMainCRTStartup function ever, so I am calling ExitProcess function at end of it"?
Mārtiņš Možeiko
2371 posts / 2 projects
HOWTO - Building without Import Libraries
Edited by Mārtiņš Možeiko on
MSVC can not use dll file directly in linker. But you can easily generate import library (.lib file) from dll file.

First generate .def file. It's a simple text file that lists exports for dll file. You can write it either by hand or by using pexport utility: http://sourceforge.net/projects/m.../pexports-0.46-mingw32-bin.tar.xz
Then use lib.exe (from MSVC command-line) to create .lib file from generated .def file.

Here's an example for OpenGL32.dll:
 1 2 pexports.exe C:\Windows\System32\OpenGL32.dll >OpenGL32.def lib.exe /def:OpenGL32.def /OUT:OpenGL32.lib 
27 posts
HOWTO - Building without Import Libraries
Edited by mojobojo on
I am making an attempt to get this loader working in x64 assembly. I have it almost working however LoadLibraryA is crashing for some odd reason. Maybe someone here knows why this could happen. My program compiles with NASM and I have a sneaking suspicion it may have something to do with me overlapping DOS and PE headers to save space.

EDIT: mmozeiko found my issue, thanks to him.

Working code can be found here
https://gist.github.com/mojobojo/921a5af897e86bb940a2
Mārtiņš Možeiko
2371 posts / 2 projects
HOWTO - Building without Import Libraries
Verify your suspicion - don't overlap headers. Does it crashes also then? If so, use the [strike]force[/strike] debugger! Where does crash happen? Which place in code is 0000000140010253 address? Is it call to GetProcAddress? Is it call to MessageBoxA? Are the values in registers you are passing correct - both arguments and actual pointer you are jumping to? Compare with real import address. Is your stack 16-bytes aligned when calling function (not counting pushed return address)? Are you accounting for "home space" - 8*4=32 bytes of stack that can be used by functions you are calling (don't store there values important to you).
27 posts
HOWTO - Building without Import Libraries
Edited by mojobojo on
mmozeiko
Is your stack 16-bytes aligned when calling function (not counting pushed return address)? Are you accounting for "home space" - 8*4=32 bytes of stack that can be used by functions you are calling (don't store there values important to you).

You got it, it was my stack alignment. I had NO IDEA that the stack needed to be 16 byte aligned. Thank you, I cant believe that's what it was. We need a rep system. +1 for you my friend.

So check this out. Windows doesn't give me a properly aligned stack pointer from the start. I just had a look at an application and it subtracts like this to align it...

 1 sub rsp, 28h 
Mārtiņš Možeiko
2371 posts / 2 projects
HOWTO - Building without Import Libraries
Yeah, it's documented here: https://msdn.microsoft.com/en-US/library/ew5tede7.aspx

The rsp % 16 is always 8 at entry of function. The functions you are calling rely on that. Most likely they are subtracting some number N, where N % 16 = 8 to get rsp to be again 16 byte aligned and then stores SSE registers with aligned mov operation. But if you pass rsp which is which %16 is 0 then they will get wrong alignment for rsp and aligned store or load instruction for SSE register will raise exception.
27 posts
HOWTO - Building without Import Libraries
Well for anyone interested in an assembly example, here it is. I will take some time tomorrow to clean up the code a bit and comment it to make it a little more legible. Thank you very much to mmozeiko for helping me find the problem.

https://gist.github.com/mojobojo/921a5af897e86bb940a2
19 posts
HOWTO - Building without Import Libraries
What changes do I have to make in order to run this masterpiece on a 32 bit platform?
Mārtiņš Možeiko
2371 posts / 2 projects
HOWTO - Building without Import Libraries
You'll need to change win32_pe structure to support 32-bit header. And change code line that gets TEB structure, on 32-bit its done differently. Otherwise the logic stays the same.