Guide - How to avoid C/C++ runtime on Windows

Right, that is true. Although if you are using handmade-like build, where everything is single-translation unit, then /GL doesn't really matter. There are no optimizations across translation-units, because there is only one translation unit. So you won't gain anything from link-time optimizations. Everything can be optimized during compilation with /O2 or /Ox /Ot.
Greetings everyone, I know that it's quite an old thread but, I've noticed it just a few days ago and since then I'm trying as hard as I can to implement the casting functions by myself with inline assembly. I think I'm pretty much done with the _ftoui3 (float to unsigned int casting) function, even though I haven't done any exception checking and I don't know how efficient it is, it seems to be working quite nicely. You can judge for yourself:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
 __declspec(naked) void _ftoui3() {
        __asm {				
			push ebp
			mov ebp, esp

            ;cvttss2si eax, xmm0 ; This instruction also truncates but stores the value as a signed integer, just like fisttp :(

			sub esp, 4
			movd [ebp-4], xmm0
			fld [ebp-4]
			fabs

			sub esp, 4
			fisttp dword ptr [ebp-8]
			mov eax, [ebp-8]

			add esp, 8

			pop ebp
			ret
        }
    }


However, I got stuck on the _dtoui3 (double to unsigned int casting) function. You see, the problem is that although there are indeed instructions that can truncate floating point numbers successfully, they all share a common unfortunate drawback: They all store the truncated value as a signed integer, which will lead to some data loss when a value stored in a double exceeds the limit of a signed integer. I've scanned the whole Internet trying to find different instructions or different methods to tackle this problem but to no avail.
If anyone of you could share your wisdom with me and shed some light on this minor topic, I would appreciate it very much. I'm certain that there is a way since the CRT managed to implement it somehow, but the disassembly of this function looks very intimidating so I can't figure out anything out of it.
I found this instruction, but it seems to be AVX512. https://www.felixcloutier.com/x86/VCVTTSD2USI.html
SedatedSnail

I found this instruction, but it seems to be AVX512. https://www.felixcloutier.com/x86/VCVTTSD2USI.html


Oh man, that's so close!!!
Unfortunately, there aren't many people yet (myself included) who got the necessary processor to support this instruction set :(
Using AVX512 instruction when you are compiling for 32-bit code seems very wrong... :)

Typically you implement _ftoui3 like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
const float next_after_max_signed_int = 2147483648.0f; // represented exactly in float

if (float_value > next_after_max_signed_int)
{
  int result = (int)(float_value - next_after_max_signed_int);
  return (unsigned int)(result ^ 0x80000000); 
}
else
{
  int result = (int)float_value;
  return (unsigned int)result;
}


if statement does not need to be real "if", it can be conditional move.

This is how it would look like with SSE2:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// assumes float input is in xmm0 register
movss xmm1, [value_of_2147483648_constant]
movss xmm2, xmm0
subss xmm2, xmm1
cvttss2si eax, xmm2
cvttss2si edx, xmm0
xor eax, 0x80000000
ucomiss xmm1, xmm0
cmova  eax, edx
// unsigned int value now is in eax register


Not sure if this will handle all the Inifinty or NaN floats exactly same as your regular C runtime cast, but otherwise it should work exactly the same.

Similar approach works four doubles in 32-bit code.

If you want to avoid SSE instructions, you can probably us x87 fpu instructions in similar way, or alternatively you can extract mantissa & exponents bits and use them to calculate actual value yourself. Check the code in llvm compiler-rt library that does this: https://github.com/llvm-mirror/co.../builtins/fp_fixuint_impl.inc#L17

Edited by Mārtiņš Možeiko on
Awesome!!!
Thank you very much my good man, I'll use this information wisely!
mmozeiko

Similar approach works four doubles in 32-bit code.


How exactly is it similar if the truncation instruction returns a signed value, and as a result, causes you to lose precision if your number exceeds the signed integer limit, which can happen if you work with doubles.
You need to find another instruction that truncates and returns an unsigned value or implement the truncation yourself somehow, don't you?
It will work exactly the same:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
const double next_after_max_signed_int = 2147483648.0;

if (double_value > next_after_max_signed_int)
{
  int result = (int)(double_value - next_after_max_signed_int);
  return (unsigned int)(result ^ 0x80000000); 
}
else
{
  int result = (int)double_value;
  return (unsigned int)result;
}


Or in asm:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// assumes double input is in xmm0 register
movss xmm1, [value_of_2147483648_constant]
movss xmm2, xmm0
subss xmm2, xmm1
cvttsd2si eax, xmm2
cvttsd2si edx, xmm0
xor eax, 0x80000000
ucomiss xmm1, xmm0
cmova  eax, edx
// unsigned int value now is in eax register


(NOTE: I have not actually tested this code, so verify that it works yourself)

The difference is only in conversion operation - cvttss2si vs cvttsd2si. Both returns 32-bit signed int, which you fixup, if it is "negative".

If you are running in 64-bit code, then you can use just one instruction: "cvttsd2si rax, xmm0".

Edited by Mārtiņš Možeiko on
Very late to the discussion, I have waited an episode to go through all this.

Thank you so much mmozeiko, fantastic work!!!
:) :)

Edited by itzjac on
Hi!

Firstly, there are some great posts here and I've learned a lot about dropping the CRT.

But I'm having a problem implementing a DLL with pure virtual base classes - I always get this error

error LNK2001: unresolved external symbol "const type_info::`vftable'" (??_7type_info@@6B@)

I need to call into the class via the base object (from an exe) but I can't get past this problem :-(

Can anyone help resolve this?



Incidentally, I cant link to ANY .libs - I need to be able to compile a DLL on users machines that don't have the windows SDK installed (It's for a scripting system). Everything works apart from the base class thing!
Hi,

I found this post really interesting, but there is a point that needs more explications for me. Why calling ExitProcess is necessary?

Doesn't the OS (Windows here) able to close all threads and handles retained by the process when he die?
I know that some short-time living applications like compilers doesn't take care of releasing memory or closing all handles before exiting, so I suppose it is finally done in someway if those applications are written in C or C++.

Does the C/C++ runtime catch all exceptions to call ExitProcess?

What will be issues if ExitProcess isn't called before the real entry point is returning?
Closing threads/handles or releasing allocated memory is done by OS regardless of how process terminates - with ExitProcess or because of some unhandled exception or any other way (for example, debugger kills process). ExitProcess by itself has nothing to do with releasing allocated resources.

I believe I put ExitProcess there because otherwise in older versions of Windows process crashed if it returned from "startup" function - you basically got unhandled exception. At least that's it what I remember. I may be wrong about this because that was very long time ago - and since then I simply put ExitProcess there. I checked now and it seems it can return just fine and it actually sets int value you return from this function as process exit code (same as "main" does). So there's no reason to do ExitProcess - you can change return value of WinMainCRTStartup to int, and it'll work fine.

I'll update initial with this.

Edited by Mārtiņš Možeiko on
The CRT normally calls ExitProcess for you after returning from main.

If you don't call it yourself then your process may not exit if you have any other threads running that haven't yet exited.
I checked in MSVC 2019 CRT - it does not call ExitProcess.

WinMainCRTStartup is in "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\crt\src\vcruntime\exe_winmain.cpp" file - that simply calls __scrt_common_main() and returns whatever this function returns.

__scrt_common_main is in "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\crt\src\vcruntime\exe_common.inl" file - and that simply calls invoke_main in same file (which calls your main) and uses its return value to return back to caller.

So CRT does not call ExitProcess. Maybe exe loader (one that calls WinMainCRTStartup) calls ExitProcess for you - but that is not part of CRT.

Edited by Mārtiņš Možeiko on
mmozeiko
I checked in MSVC 2019 CRT - it does not call ExitProcess.

WinMainCRTStartup is in "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\crt\src\vcruntime\exe_winmain.cpp" file - that simply calls __scrt_common_main() and returns whatever this function returns.

__scrt_common_main is in "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\crt\src\vcruntime\exe_common.inl" file - and that simply calls invoke_main in same file (which calls your main) and uses its return value to return back to caller.

So CRT does not call ExitProcess. Maybe exe loader (one that calls WinMainCRTStartup) calls ExitProcess for you - but that is not part of CRT.


exe_common.inl:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
        //
        // Initialization is complete; invoke main...
        //

        int const main_result = invoke_main();

        //
        // main has returned; exit somehow...
        //

        if (!__scrt_is_managed_app())
            exit(main_result);


Isn't that it right there, the CRT exit function?

"[...]the C runtime library automatically calls ExitProcess when you exit the main thread, regardless of whether there are any worker threads still active." - https://devblogs.microsoft.com/oldnewthing/20100827-00/?p=13023

I stepped through the code to look. After returning from WinMain it checks if it's a managed app (__scrt_is_managed_app), it's not, and so it calls the exit function.
Which in my case, is either in the same executable (-MT, app.exit) or ucrtbase.dll (-MD, ucrtbase.exit).

Inside the exit function it eventually called ExitProcess. It looks like it could also call TerminateProcess depending on the "process end policy"