HOWTO - Building without Import Libraries

Over the weekend I worked on a project to build a Windows executable without any libraries whatsoever, esp. the import libraries. Now we already learned how to avoid import libraries when we were first introduced to Direct Input and logically we could extend the same method to cover the functions in User32.dll and most of Kernel32.dll.

I say most because GetProcAddress() is in Kernel32.dll, so how do we get the address of GetProcAddress() if we don't have access to GetProcAddress() yet? It becomes a real chicken and egg problem, but I solved it after many hours of research into semi-documented Windows internals and much trial and error.

In this post I want to go over getting access to Kernel32.dll, parsing the PE format to find the export address of a symbol (e.g., GetProcAddress), and using X Macros to define and load functions we need with concise code. The end result of this project was an exe that displays a message centered in a window. The size of the exe was 3,584 bytes, with every byte of machine language coming from this original source code.

This is for X64 only, but could be extended to 32 bit platforms.

Getting Access to Kernel32.dll

Since we'll be compiling without any libraries, we need to specify the entry point to the executable. I used MainStartup, which you'll see below looks fairly normal (except for the first function call):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
void 
MainStartup(void)
{
    DynamicLink();

    HINSTANCE Instance = GetModuleHandleA(0);
    WNDCLASSA windowClass = {};
    windowClass.style = CS_HREDRAW | CS_VREDRAW ;
    windowClass.lpfnWndProc = WndProc;
    windowClass.hInstance = Instance;
    windowClass.lpszClassName = "NoLibrariesWindowClass";
    windowClass.hbrBackground = (HBRUSH)GetStockObject(WHITE_BRUSH);

    if (RegisterClassA(&windowClass))
    {
        HWND WindowHandle = CreateWindowExA(
            0, "NoLibrariesWindowClass", "Greetings",
            WS_OVERLAPPEDWINDOW | WS_VISIBLE,
            CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT,
            0, 0, Instance,	0
            );

        if (WindowHandle)
        {
            MSG Message;
            while (GetMessageA(&Message, NULL, 0, 0))
            {
                TranslateMessage(&Message);
                DispatchMessageA(&Message);
            }
        }
    }

    return;
}


DynamicLink() is where the magic kicks off, here's the first half:

1
2
3
4
5
6
7
8
void
DynamicLink(void)
{
    HMODULE Kernel32 = GetKernel32Module();
    GetProcAddress = GetGetProcAddress(Kernel32);
    LoadLibraryA = (LoadLibraryA_t*)GetProcAddress(Kernel32, "LoadLibraryA");
    HMODULE User32 = LoadLibraryA("user32.dll");
    HMODULE Gdi32 = LoadLibraryA("gdi32.dll");


Luckily windows loads Kernel32.dll into every application, so we don't need to call LoadLibrary() on it, but we still need to determine where it's located in memory. There is something called a Thread Environment Block (TEB) in our process, which has a pointer to a Process Environment Block (PEB), which has a pointer to loader data, which has a pointer to the head of a linked list of modules (i.e., exe and dlls) currently loaded into our process.

Access to the TEB is obtained through reading a value out of the GS CPU register. The X64 compiler doesn't allow inline assembly, so we have to use an the intrinsic __readgsqword(). From there I defined some simple structs with padding offsets to get to just the values I needed. I originally used casting and pointer arithmetic, but this method ended up looking cleaner. Here's an example struct I used for the TEB:

1
2
3
4
5
struct win32_teb
{
    uint8 Padding1[0x60];
    win32_peb *PEB;
};


In the code below, I follow the linked list until I find kernel32.dll module by name, using a simple CompareMemory() function I wrote (the module name is in Unicode).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
HMODULE
GetKernel32Module()
{
    wchar_t Kernel32Name[] = L"kernel32.dll";
    win32_teb* TEB = (win32_teb*)__readgsqword(0x30);
    win32_ldr_data_entry* LoaderDataEntry = TEB->PEB->LoaderData->LoaderDataEntry;
    
    while (LoaderDataEntry->DllBase) {
        if (CompareMemory(LoaderDataEntry->DllNameBuffer, Kernel32Name, min(LoaderDataEntry->DllNameLength, sizeof(Kernel32Name))) == 0)
        {
            return (HMODULE)LoaderDataEntry->BaseAddress;
        }
        LoaderDataEntry = (win32_ldr_data_entry*)(LoaderDataEntry->LinkedList.Flink);
    }

    return NULL;
}


Parsing the Kernel32 Memory Layout (PE Image)

With the address of the Kernel32.dll in hand, it's now time to find the location of the GetProcAddress() symbol. To do this, I referred to the Portable Executable (PE) specification to get the layout of the various tables and fields. There are several levels of indirection, with most fields returning addresses relative to the base of the image (i.e., the address we obtained from above).

In a nutshell, the MSDOS header provides the offset to the PEHeader, which provides the offset to the export table, which provides the offset to the name pointer table. I defined the following Macro to make this easier to work with

1
#define PE_GET_OFFSET(module, offset) ((uint8*)(module) + offset)


The name pointer table is an array of offsets to strings that correspond to the exported symbol. This is where we can look for "GetProcAddress". The strings are in lexical order, so you can use a binary search to quickly find it's index in the table. Once we have the index of this string, we use the same index in the ordinal table, which in turn gives us the index into the export address table. The value obtained from this table is again an offset, so we add this value to the base address of Kernel32.dll and finally, we have the address of GetProcAddress()!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
GetProcAddress_t*
GetGetProcAddress(HMODULE Kernel32)
{
    // Module is now in the EXE Format
    win32_msdos *MSDOSHeader = (win32_msdos*)PE_GET_OFFSET(Kernel32, 0);
    win32_pe *PEHeader = (win32_pe*)PE_GET_OFFSET(Kernel32, MSDOSHeader->PEOffset);
    win32_pe_export_table *ExportTable = (win32_pe_export_table*)PE_GET_OFFSET(Kernel32, PEHeader->ExportTable.VirtualAddress);
    uint32* NamePointerTable = (uint32*)PE_GET_OFFSET(Kernel32, ExportTable->NamePointerRVA);

    // binary search for GetProcAddress
    int Low = 0;
    int High = ExportTable->NumberofNamePointers - 1;
    int Index;
    char *ProcName;
    int CompareResult = 0;
    do
    {
        if (CompareResult > 0)
        {
            Low = Index;
        }
        else if (CompareResult < 0)
        {
            High = Index;
        }
        Index = (High + Low) / 2;
        ProcName = (char*)PE_GET_OFFSET(Kernel32, NamePointerTable[Index]);
    } while ((CompareResult = CompareStrings("GetProcAddress", ProcName)) != 0);

    // the same Index is used for the ordinal value
    uint16* OrdinalTable = (uint16*)PE_GET_OFFSET(Kernel32, ExportTable->OrdinalTableRVA);
    uint16 GetProcAddressOrdinal = OrdinalTable[Index];

    uint32* ExportAddressTable = (uint32*)PE_GET_OFFSET(Kernel32, ExportTable->ExportAddressTableRVA);
    uint32 GetProcAddressRVA = ExportAddressTable[GetProcAddressOrdinal];
    return (GetProcAddress_t*)PE_GET_OFFSET(Kernel32, GetProcAddressRVA);
}


Loading Functions with X Macros

Now that we have GetProcAddress(), we can use that along with the Kernel32.dll address to obtain the address of LoadLibaryA(). From there we can load anything we need. To make the loading code less verbose, I used the concept of X Macros to define a macro variable (WIN32_DYNAMIC_PROCS) that contains all of the functions I want to load, enclosed in an undefined macro function WPROC. Here's a snippet:

1
2
3
4
5
6
7
8
9
#define WIN32_DYNAMIC_PROCS \
    WPROC(Kernel32, "GetModuleHandleA", HMODULE WINAPI, GetModuleHandleA_, (LPCSTR lpModuleName)) \
    WPROC(User32, "MessageBoxA", int WINAPI, MessageBoxA_, (HWND hWnd, LPCSTR lpText, LPCSTR lpCaption, UINT uType)) \
    WPROC(User32, "RegisterClassA", ATOM WINAPI, RegisterClassA_, (const WNDCLASSA *lpWndClass)) \
    WPROC(User32, "DefWindowProcA", LRESULT WINAPI, DefWindowProcA_, (HWND hWnd, UINT Msg, WPARAM wParam, LPARAM lParam)) \
    WPROC(User32, "CreateWindowExA", HWND WINAPI, CreateWindowExA_, (DWORD dwExStyle, LPCSTR lpClassName, LPCSTR lpWindowName, DWORD dwStyle, int x, int y, int nWidth, int nHeight, HWND hWndParent, HMENU hMenu, HINSTANCE hInstance, LPVOID lpParam)) \
    WPROC(User32, "GetMessageA", BOOL WINAPI, GetMessageA_, (LPMSG lpMsg, HWND hWnd, UINT wMsgFilterMin, UINT wMsgFilterMax)) \
    WPROC(User32, "TranslateMessage", BOOL WINAPI, TranslateMessage_, (const MSG *lpMsg)) \
    WPROC(User32, "DispatchMessageA", LRESULT WINAPI, DispatchMessageA_, (const MSG *lpmsg))


Later on, I defined the WPROC macro to setup the typedef and global static variable:

1
2
3
4
5
6
7
#define WPROC(Module, DllName, ReturnType, Name, Params) \
    typedef ReturnType Name##t Params; \
    static Name##t *Name;

WIN32_DYNAMIC_PROCS

#undef WPROC


Finally, I redefine WPROC in the second half of my DynamicLink() function to get the address of the proc in question:

1
2
3
4
5
6
#define WPROC(Module, DllName, ReturnType, Name, Params) \
    Name = (Name##t*)GetProcAddress(Module, DllName);

    WIN32_DYNAMIC_PROCS

#undef WPROC


I used appended underscores to all of the global function pointers so that they don't conflict with the Windows header and then used a simple define macro to allow the use of their original name. One could get rid of this if they defined what they needed from Windows.h and then didn't include it.

Here's the complete program:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
// example executable without import libraries
// compile with: cl /O2 /GS- main.cpp /link /NOLOGO /NODEFAULTLIB /SUBSYSTEM:WINDOWS /MACHINE:X64 /ENTRY:"MainStartup"

#include <Windows.h>
#include <stdint.h>
#include <intrin.h>

typedef uint8_t uint8;
typedef uint16_t uint16;
typedef uint32_t uint32;
typedef uint64_t uint64;

#define GetProcAddress GetProcAddress_
#define LoadLibraryA LoadLibraryA_
#define MessageBoxA MessageBoxA_
#define RegisterClassA RegisterClassA_
#define GetModuleHandleA GetModuleHandleA_
#define DefWindowProcA DefWindowProcA_
#define CreateWindowExA CreateWindowExA_
#define GetMessageA GetMessageA_
#define TranslateMessage TranslateMessage_
#define DispatchMessageA DispatchMessageA_
#define PostQuitMessage PostQuitMessage_
#define BeginPaint BeginPaint_
#define EndPaint EndPaint_
#define GetStockObject GetStockObject_
#define SelectObject SelectObject_
#define TextOutA TextOutA_
#define GetTextMetricsA GetTextMetricsA_
#define GetClientRect GetClientRect_

#define PE_GET_OFFSET(module, offset) ((uint8*)(module) + offset)

struct win32_ldr_data_entry
{
    LIST_ENTRY LinkedList;
    LIST_ENTRY UnusedList;
    PVOID BaseAddress;
    PVOID Reserved2[1];
    PVOID DllBase;
    PVOID EntryPoint;
    PVOID Reserved3;
    USHORT DllNameLength;
    USHORT DllNameMaximumLength;
    PWSTR  DllNameBuffer;
};

struct win32_ldr_data
{
    uint8 Padding1[0x20];
    win32_ldr_data_entry *LoaderDataEntry;
};

struct win32_peb
{
    uint8 Padding1[0x18];
    win32_ldr_data *LoaderData;
};

struct win32_teb
{
    uint8 Padding1[0x60];
    win32_peb *PEB;
};

struct win32_msdos
{
    uint8 Padding1[0x3C];
    uint32 PEOffset;
};

struct win32_pe_image_data
{
    uint32 VirtualAddress;
    uint32 Size;
};

struct win32_pe
{
    // COFF
    uint8 Signature[4];
    uint16 Machine;
    uint16 NumberOfSections;
    uint32 TimeDateStamp;
    uint32 PointerToSymbolTable;
    uint32 NumberOfSymbols;
    uint16 SizeOfOptionalHeader;
    uint16 Characteristics;

    // Assuming PE32+ Optional Header since this is 64bit only
    // standard fields
    uint16 Magic; 
    uint8 MajorLinkerVersion;
    uint8 MinorLinkerVersion;
    uint32 SizeOfCode;
    uint32 SizeOfInitializedData;
    uint32 SizeOfUninitializedData;
    uint32 AddressOfEntryPoint;
    uint32 BaseOfCode;

    // windows specific fields
    uint64 ImageBase;
    uint32 SectionAlignment;
    uint32 FileAlignment;
    uint16 MajorOperatingSystemVersion;
    uint16 MinorOperatingSystemVersion;
    uint16 MajorImageVersion;
    uint16 MinorImageVersion;
    uint16 MajorSubsystemVersion;
    uint16 MinorSubsystemVersion;
    uint32 Win32VersionValue;
    uint32 SizeOfImage;
    uint32 SizeOfHeaders;
    uint32 CheckSum;
    uint16 Subsystem;
    uint16 DllCharacteristics;
    uint64 SizeOfStackReserve;
    uint64 SizeOfStackCommit;
    uint64 SizeOfHeapReserve;
    uint64 SizeOfHeapCommit;
    uint32 LoaderFlags;
    uint32 NumberOfRvaAndSizes;

    // data directories
    win32_pe_image_data ExportTable;
    win32_pe_image_data ImportTable;
    win32_pe_image_data ResourceTable;
    win32_pe_image_data ExceptionTable;
    win32_pe_image_data CertificateTable;
    win32_pe_image_data BaseRelocationTable;
    win32_pe_image_data Debug;
    win32_pe_image_data Architecture;
    win32_pe_image_data GlobalPtr;
    win32_pe_image_data TLSTable;
    win32_pe_image_data LoadConfigTable;
    win32_pe_image_data BoundImport;
    win32_pe_image_data IAT;
    win32_pe_image_data DelayImportDescriptor;
    win32_pe_image_data CLRRuntimeHeader;
    win32_pe_image_data ReservedTable;
};

struct win32_pe_export_table
{
    uint32 ExportFlags;
    uint32 TimeDateStamp;
    uint16 MajorVersion;
    uint16 MinorVersion;
    uint32 NameRVA;
    uint32 OrdinalBase;
    uint32 AddressTableEntries;
    uint32 NumberofNamePointers;
    uint32 ExportAddressTableRVA;
    uint32 NamePointerRVA;
    uint32 OrdinalTableRVA;
};

int 
CompareMemory(void *p1, void *p2, size_t n)
{
    uint8* b1 = (uint8*)p1;
    uint8* b2 = (uint8*)p2;
    while (n && *b1 == *b2) {
        n--;
        b1++;
        b2++;
    }
    return n ? *b1 - *b2 : 0;
}

int
CompareStrings(char *s1, char *s2)
{
    while (*s1 && *s2 && *s1 == *s2)
    {
        s1++;
        s2++;
    }
    return (*s1 == *s2) ? 0 : *s1 - *s2;
}

HMODULE
GetKernel32Module()
{
    wchar_t Kernel32Name[] = L"kernel32.dll";
    win32_teb* TEB = (win32_teb*)__readgsqword(0x30);
    win32_ldr_data_entry* LoaderDataEntry = TEB->PEB->LoaderData->LoaderDataEntry;
    
    while (LoaderDataEntry->DllBase) {
        if (CompareMemory(LoaderDataEntry->DllNameBuffer, Kernel32Name, min(LoaderDataEntry->DllNameLength, sizeof(Kernel32Name))) == 0)
        {
            return (HMODULE)LoaderDataEntry->BaseAddress;
        }
        LoaderDataEntry = (win32_ldr_data_entry*)(LoaderDataEntry->LinkedList.Flink);
    }

    return NULL;
}

typedef FARPROC WINAPI GetProcAddress_t(HMODULE Module, LPCSTR ProcName);
static GetProcAddress_t *GetProcAddress_;

typedef HMODULE WINAPI LoadLibraryA_t(LPCSTR FileName);
static LoadLibraryA_t *LoadLibraryA_;

GetProcAddress_t*
GetGetProcAddress(HMODULE Kernel32)
{
    // Module is now in the EXE Format
    win32_msdos *MSDOSHeader = (win32_msdos*)PE_GET_OFFSET(Kernel32, 0);
    win32_pe *PEHeader = (win32_pe*)PE_GET_OFFSET(Kernel32, MSDOSHeader->PEOffset);
    win32_pe_export_table *ExportTable = (win32_pe_export_table*)PE_GET_OFFSET(Kernel32, PEHeader->ExportTable.VirtualAddress);
    uint32* NamePointerTable = (uint32*)PE_GET_OFFSET(Kernel32, ExportTable->NamePointerRVA);

    // binary search for GetProcAddress
    int Low = 0;
    int High = ExportTable->NumberofNamePointers - 1;
    int Index;
    char *ProcName;
    int CompareResult = 0;
    do
    {
        if (CompareResult > 0)
        {
            Low = Index;
        }
        else if (CompareResult < 0)
        {
            High = Index;
        }
        Index = (High + Low) / 2;
        ProcName = (char*)PE_GET_OFFSET(Kernel32, NamePointerTable[Index]);
    } while ((CompareResult = CompareStrings("GetProcAddress", ProcName)) != 0);

    // the same Index is used for the ordinal value
    uint16* OrdinalTable = (uint16*)PE_GET_OFFSET(Kernel32, ExportTable->OrdinalTableRVA);
    uint16 GetProcAddressOrdinal = OrdinalTable[Index];

    uint32* ExportAddressTable = (uint32*)PE_GET_OFFSET(Kernel32, ExportTable->ExportAddressTableRVA);
    // The PE Documentation explicitly says you must subtract the OrdinalBase from the Ordinal to get the true
    // index into the address table, however, I found through testing that this is not the case.
    // This appears to confirm a problem with the documentation: http://stackoverflow.com/questions/5653316/pe-export-directory-tables-ordinalbase-field-ignored
    uint32 GetProcAddressRVA = ExportAddressTable[GetProcAddressOrdinal];
    return (GetProcAddress_t*)PE_GET_OFFSET(Kernel32, GetProcAddressRVA);
}

#define WIN32_DYNAMIC_PROCS \
    WPROC(Kernel32, "GetModuleHandleA", HMODULE WINAPI, GetModuleHandleA_, (LPCSTR lpModuleName)) \
    WPROC(User32, "MessageBoxA", int WINAPI, MessageBoxA_, (HWND hWnd, LPCSTR lpText, LPCSTR lpCaption, UINT uType)) \
    WPROC(User32, "RegisterClassA", ATOM WINAPI, RegisterClassA_, (const WNDCLASSA *lpWndClass)) \
    WPROC(User32, "DefWindowProcA", LRESULT WINAPI, DefWindowProcA_, (HWND hWnd, UINT Msg, WPARAM wParam, LPARAM lParam)) \
    WPROC(User32, "CreateWindowExA", HWND WINAPI, CreateWindowExA_, (DWORD dwExStyle, LPCSTR lpClassName, LPCSTR lpWindowName, DWORD dwStyle, int x, int y, int nWidth, int nHeight, HWND hWndParent, HMENU hMenu, HINSTANCE hInstance, LPVOID lpParam)) \
    WPROC(User32, "GetMessageA", BOOL WINAPI, GetMessageA_, (LPMSG lpMsg, HWND hWnd, UINT wMsgFilterMin, UINT wMsgFilterMax)) \
    WPROC(User32, "TranslateMessage", BOOL WINAPI, TranslateMessage_, (const MSG *lpMsg)) \
    WPROC(User32, "DispatchMessageA", LRESULT WINAPI, DispatchMessageA_, (const MSG *lpmsg)) \
    WPROC(User32, "PostQuitMessage", VOID WINAPI, PostQuitMessage_, (int nExitCode)) \
    WPROC(User32, "BeginPaint", HDC, BeginPaint_, (HWND hwnd, LPPAINTSTRUCT lpPaint)) \
    WPROC(User32, "EndPaint", BOOL, EndPaint_, (HWND hWnd, const PAINTSTRUCT *lpPaint)) \
    WPROC(User32, "GetClientRect", BOOL WINAPI, GetClientRect_, (HWND hWnd, LPRECT lpRect)) \
    WPROC(Gdi32, "GetStockObject", HGDIOBJ, GetStockObject_, (int fnObject)) \
    WPROC(Gdi32, "SelectObject", HGDIOBJ, SelectObject_, (HDC hdc, HGDIOBJ hgdiobj)) \
    WPROC(Gdi32, "TextOutA", BOOL, TextOutA_, (HDC hdc, int nXStart, int nYStart, LPCSTR lpString, int cchString)) \
    WPROC(Gdi32, "GetTextMetricsA", BOOL, GetTextMetricsA_, (HDC hdc, LPTEXTMETRIC lptm))

#define WPROC(Module, DllName, ReturnType, Name, Params) \
    typedef ReturnType Name##t Params; \
    static Name##t *Name;

WIN32_DYNAMIC_PROCS

#undef WPROC

void
DynamicLink(void)
{
    HMODULE Kernel32 = GetKernel32Module();
    GetProcAddress = GetGetProcAddress(Kernel32);
    LoadLibraryA = (LoadLibraryA_t*)GetProcAddress(Kernel32, "LoadLibraryA");
    HMODULE User32 = LoadLibraryA("user32.dll");
    HMODULE Gdi32 = LoadLibraryA("gdi32.dll");

#define WPROC(Module, DllName, ReturnType, Name, Params) \
    Name = (Name##t*)GetProcAddress(Module, DllName);

    WIN32_DYNAMIC_PROCS

#undef WPROC
}

LRESULT CALLBACK
WndProc(HWND Window, UINT Message, WPARAM WParam, LPARAM LParam)
{
    LRESULT result = 0;

    switch (Message)
    {
    case WM_PAINT:
    {
        char Text[] = "No Libraries!";
        PAINTSTRUCT Paint;
        TEXTMETRIC Metrics;
        RECT Rect;
        HDC DeviceContext = BeginPaint(Window, &Paint);
        SelectObject(DeviceContext, GetStockObject(ANSI_FIXED_FONT));
        GetTextMetricsA(DeviceContext, &Metrics);
        GetClientRect(Window, &Rect);

        // Center text
        int TextLen = sizeof(Text) - 1;
        int TextWidth = TextLen * Metrics.tmAveCharWidth;
        int TextHeight = Metrics.tmHeight;
        int X = max((Rect.right - Rect.left - TextWidth) / 2, 0);
        int Y = max((Rect.bottom - Rect.top - TextHeight) / 2, 0);
        TextOutA(DeviceContext, X, Y, Text, TextLen);

        EndPaint(Window, &Paint);
    }
    break;

    case WM_CLOSE:
        PostQuitMessage(0);
        break;

    default:
        result = DefWindowProcA(Window, Message, WParam, LParam);
        break;
    }

    return result;
}

void 
MainStartup(void)
{
    DynamicLink();

    HINSTANCE Instance = GetModuleHandleA(0);
    WNDCLASSA windowClass = {};
    windowClass.style = CS_HREDRAW | CS_VREDRAW ;
    windowClass.lpfnWndProc = WndProc;
    windowClass.hInstance = Instance;
    windowClass.lpszClassName = "NoLibrariesWindowClass";
    windowClass.hbrBackground = (HBRUSH)GetStockObject(WHITE_BRUSH);

    if (RegisterClassA(&windowClass))
    {
        HWND WindowHandle = CreateWindowExA(
            0, "NoLibrariesWindowClass", "Greetings",
            WS_OVERLAPPEDWINDOW | WS_VISIBLE,
            CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT,
            0, 0, Instance,	0
            );

        if (WindowHandle)
        {
            MSG Message;
            while (GetMessageA(&Message, NULL, 0, 0))
            {
                TranslateMessage(&Message);
                DispatchMessageA(&Message);
            }
        }
    }

    return;
}
Haha great post! I've only used the following with 32 bit code. Never tested it with 64 bit code.

If you aren't using the CRT at all then you can do some fun tricks. When the windows loader calls the entry point eax is loaded with a pointer into kernel32.dll. (on the stack theres a pointer into ntdll.dll if you wanted to use the undocumented api) Coincidentally from what I've found, this pointer is always a page away from the base of kernel32.dll (all modules are loaded on page boundaries) on all versions of Windows XP and later.

1
2
3
4
5
6
7
8
void Entry() {
	DWORD base;
	_asm {
		xor ax, ax		// align eax pointer to page boundaries by clearing lower word
		sub eax, 0x10000		// roll back a page to the base address of k32
		mov base, eax
	}
	// do something with base...

If you're uncomfortable with the single-page away thing on all versions of Windows, to future proof yourself you can always just roll back pages until you get the base instead of just doing it once:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
void Entry() {
	DWORD base;
	_asm mov base, eax
	for(base &= 0xFFFF0000; *(short *)base != 'ZM'; base -= 0x10000)
		if (base <= 0)
			return;

	auto pe = (PIMAGE_NT_HEADERS)(base + ((PIMAGE_DOS_HEADER)base)->e_lfanew);
	auto exportTable = (PIMAGE_EXPORT_DIRECTORY)(base + pe->OptionalHeader.DataDirectory[0].VirtualAddress);
	auto exportNames = (PSTR *)(base + exportTable->AddressOfNames);
	//auto exportFuncs = (DWORD *)(base + exportTable->AddressOfFunctions);

	for(DWORD i = 0; i < exportTable->NumberOfNames; ++i)
		MessageBoxA(0, (PSTR)(base + exportNames[i]), "Export:", MB_OK);
}

The above is just a little test snippet. You'd want to scan for the LoadLibrary and GetProcAddress exports. If you're sizecoding and have the base of any general dll, a lot of times this is done by comparing the hashes of the strings or through other methods. This sort of loop is also quite useful if you want to find the baseaddress of any module that you have a pointer into.

Another trick is that the windows loader puts the address of the PEB in ebx. Kinda useful if you wanna save bytes instead of getting it through the TEB.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#include <windows.h>
#include <winternl.h>

typedef int (WINAPI *MsgBoxWPtr)(HWND, PWSTR, PWSTR, UINT);

void Entry() {
	PPEB peb;
	_asm mov peb, ebx
	((MsgBoxWPtr)GetProcAddress(LoadLibrary("user32.dll"), "MessageBoxW"))
		(0, peb->ProcessParameters->CommandLine.Buffer, L"cmdline", MB_OK);
}



EDIT: Now I'm really off topic but another cool thing that I actually haven't tested but have wanted to for a while is using the OpenGL driver addresses in the TEB. It's undocumented but at (using 32 bit code) fs:[18] theres a GlDispatchTable[0x118] containing function pointers to all the opengl functions that you could call. This could actually circumvent going through opengl32.dll but it would just cause more trouble for yourself. It'd probably be most useful if you wanted to hook one of them from a dll.

Edited by a_null on
AWESOME!!!

- Casey
This was not working for me on Windows 8.1 x64.

I needed to change CompareMemory function to be case insensitive, because win32_ldr_data_entry had kernel32.dll module name in uppercase - "KERNEL32.DLL". ntt.dll is lowercase though.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
int 
CompareMemory(void *p1, void *p2, size_t n)
{
    uint8* b1 = (uint8*)p1;
    uint8* b2 = (uint8*)p2;
    while (n)
    {
        uint8 c1 = *b1;
        uint8 c2 = *b2;
        c1 = c1 >= 'A' && c1 <= 'Z' ? c1 - 'A' + 'a' : c1;
        c2 = c2 >= 'A' && c2 <= 'Z' ? c2 - 'A' + 'a' : c2;
        if (c1 != c2) return c1 - c2;
        n--;
        b1++;
        b2++;
    }
    return n ? *b1 - *b2 : 0;
}


But pretty cool that this works. Although I could imaging some anti-virus or anti-malware software could be suspicious of such executables.
what you did is avoid LINKING to kernel32.dll Right?

but programs on windows will always have kernel32.dll loaded

But what are the benefits of avoid linking to kernel32.dll since it is there on every machine? Is there any?

The other thing is shouldn't you call ExitProcess() rather than return from the main function.

Your work is VERY VERY NICE.
Thanks
aameen951
what you did is avoid LINKING to kernel32.dll Right?

but programs on windows will always have kernel32.dll loaded

But what are the benefits of avoid linking to kernel32.dll since it is there on every machine? Is there any?

The other thing is shouldn't you call ExitProcess() rather than return from the main function.

Your work is VERY VERY NICE.
Thanks


Thank you. kernel32.dll is loaded into every Windows program as far as I am aware. What I did was avoid linking the program with the import library (kernel32.lib). Here is a fairly good explanation of the difference between using an import library versus the LoadLibrary()/GetProcAddress() combo:

http://stackoverflow.com/question...port-library-work-details#3573527
OK, now I understand what you mean.
Now you don't need kernel32.lib to compile the program or you don't need any .lib file to compile the program.
But according to the link GCC linker doesn't need these .lib files, it can get the information needed from the DLL directly.
It seems possible but I don't know why MSVC linker didn't do it?

Anyway the code you showed looks reliable and it is boilerplate you put it and off you go and that's good.

You didn't say if you need to call ExitProcess at the end of MainStartup instead of return, because the other post said "you should not return from WinMainCRTStartup function ever, so I am calling ExitProcess function at end of it"?

Edited by Ameen Sayegh on
MSVC can not use dll file directly in linker. But you can easily generate import library (.lib file) from dll file.

First generate .def file. It's a simple text file that lists exports for dll file. You can write it either by hand or by using pexport utility: http://sourceforge.net/projects/m.../pexports-0.46-mingw32-bin.tar.xz
Then use lib.exe (from MSVC command-line) to create .lib file from generated .def file.

Here's an example for OpenGL32.dll:
1
2
pexports.exe C:\Windows\System32\OpenGL32.dll >OpenGL32.def
lib.exe /def:OpenGL32.def /OUT:OpenGL32.lib

Edited by Mārtiņš Možeiko on
I am making an attempt to get this loader working in x64 assembly. I have it almost working however LoadLibraryA is crashing for some odd reason. Maybe someone here knows why this could happen. My program compiles with NASM and I have a sneaking suspicion it may have something to do with me overlapping DOS and PE headers to save space.

EDIT: mmozeiko found my issue, thanks to him.

Working code can be found here
https://gist.github.com/mojobojo/921a5af897e86bb940a2

Edited by mojobojo on
Verify your suspicion - don't overlap headers. Does it crashes also then? If so, use the [strike]force[/strike] debugger! Where does crash happen? Which place in code is 0000000140010253 address? Is it call to GetProcAddress? Is it call to MessageBoxA? Are the values in registers you are passing correct - both arguments and actual pointer you are jumping to? Compare with real import address. Is your stack 16-bytes aligned when calling function (not counting pushed return address)? Are you accounting for "home space" - 8*4=32 bytes of stack that can be used by functions you are calling (don't store there values important to you).
mmozeiko
Is your stack 16-bytes aligned when calling function (not counting pushed return address)? Are you accounting for "home space" - 8*4=32 bytes of stack that can be used by functions you are calling (don't store there values important to you).


You got it, it was my stack alignment. I had NO IDEA that the stack needed to be 16 byte aligned. Thank you, I cant believe that's what it was. We need a rep system. +1 for you my friend.

So check this out. Windows doesn't give me a properly aligned stack pointer from the start. I just had a look at an application and it subtracts like this to align it...

1
sub     rsp, 28h

Edited by mojobojo on
Yeah, it's documented here: https://msdn.microsoft.com/en-US/library/ew5tede7.aspx

The rsp % 16 is always 8 at entry of function. The functions you are calling rely on that. Most likely they are subtracting some number N, where N % 16 = 8 to get rsp to be again 16 byte aligned and then stores SSE registers with aligned mov operation. But if you pass rsp which is which %16 is 0 then they will get wrong alignment for rsp and aligned store or load instruction for SSE register will raise exception.
Well for anyone interested in an assembly example, here it is. I will take some time tomorrow to clean up the code a bit and comment it to make it a little more legible. Thank you very much to mmozeiko for helping me find the problem.

https://gist.github.com/mojobojo/921a5af897e86bb940a2
What changes do I have to make in order to run this masterpiece on a 32 bit platform?
You'll need to change win32_pe structure to support 32-bit header. And change code line that gets TEB structure, on 32-bit its done differently. Otherwise the logic stays the same.