Handmade Hero » Forums » Code » Guide - How to avoid C/C++ runtime on Windows
mmozeiko
Mārtiņš Možeiko
1831 posts / 1 project
#12406 Guide - How to avoid C/C++ runtime on Windows
1 year, 5 months ago

Right, that is true. Although if you are using handmade-like build, where everything is single-translation unit, then /GL doesn't really matter. There are no optimizations across translation-units, because there is only one translation unit. So you won't gain anything from link-time optimizations. Everything can be optimized during compilation with /O2 or /Ox /Ot.
Joystick
11 posts
#16438 Guide - How to avoid C/C++ runtime on Windows
2 months, 2 weeks ago

Greetings everyone, I know that it's quite an old thread but, I've noticed it just a few days ago and since then I'm trying as hard as I can to implement the casting functions by myself with inline assembly. I think I'm pretty much done with the _ftoui3 (float to unsigned int casting) function, even though I haven't done any exception checking and I don't know how efficient it is, it seems to be working quite nicely. You can judge for yourself:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
 __declspec(naked) void _ftoui3() {
        __asm {				
			push ebp
			mov ebp, esp

            ;cvttss2si eax, xmm0 ; This instruction also truncates but stores the value as a signed integer, just like fisttp :(

			sub esp, 4
			movd [ebp-4], xmm0
			fld [ebp-4]
			fabs

			sub esp, 4
			fisttp dword ptr [ebp-8]
			mov eax, [ebp-8]

			add esp, 8

			pop ebp
			ret
        }
    }


However, I got stuck on the _dtoui3 (double to unsigned int casting) function. You see, the problem is that although there are indeed instructions that can truncate floating point numbers successfully, they all share a common unfortunate drawback: They all store the truncated value as a signed integer, which will lead to some data loss when a value stored in a double exceeds the limit of a signed integer. I've scanned the whole Internet trying to find different instructions or different methods to tackle this problem but to no avail.
If anyone of you could share your wisdom with me and shed some light on this minor topic, I would appreciate it very much. I'm certain that there is a way since the CRT managed to implement it somehow, but the disassembly of this function looks very intimidating so I can't figure out anything out of it.
SedatedSnail
5 posts
#16439 Guide - How to avoid C/C++ runtime on Windows
2 months, 1 week ago

I found this instruction, but it seems to be AVX512. https://www.felixcloutier.com/x86/VCVTTSD2USI.html
Joystick
11 posts
#16440 Guide - How to avoid C/C++ runtime on Windows
2 months, 1 week ago

SedatedSnail

I found this instruction, but it seems to be AVX512. https://www.felixcloutier.com/x86/VCVTTSD2USI.html


Oh man, that's so close!!!
Unfortunately, there aren't many people yet (myself included) who got the necessary processor to support this instruction set :(
mmozeiko
Mārtiņš Možeiko
1831 posts / 1 project
#16442 Guide - How to avoid C/C++ runtime on Windows
2 months, 1 week ago Edited by Mārtiņš Možeiko on Sept. 29, 2018, 3:18 a.m.

Using AVX512 instruction when you are compiling for 32-bit code seems very wrong... :)

Typically you implement _ftoui3 like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
const float next_after_max_signed_int = 2147483648.0f; // represented exactly in float

if (float_value > next_after_max_signed_int)
{
  int result = (int)(float_value - next_after_max_signed_int);
  return (unsigned int)(result ^ 0x80000000); 
}
else
{
  int result = (int)float_value;
  return (unsigned int)result;
}


if statement does not need to be real "if", it can be conditional move.

This is how it would look like with SSE2:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// assumes float input is in xmm0 register
movss xmm1, [value_of_2147483648_constant]
movss xmm2, xmm0
subss xmm2, xmm1
cvttss2si eax, xmm2
cvttss2si edx, xmm0
xor eax, 0x80000000
ucomiss xmm1, xmm0
cmova  eax, edx
// unsigned int value now is in eax register


Not sure if this will handle all the Inifinty or NaN floats exactly same as your regular C runtime cast, but otherwise it should work exactly the same.

Similar approach works four doubles in 32-bit code.

If you want to avoid SSE instructions, you can probably us x87 fpu instructions in similar way, or alternatively you can extract mantissa & exponents bits and use them to calculate actual value yourself. Check the code in llvm compiler-rt library that does this: https://github.com/llvm-mirror/co.../builtins/fp_fixuint_impl.inc#L17
Joystick
11 posts
#16446 Guide - How to avoid C/C++ runtime on Windows
2 months, 1 week ago

Awesome!!!
Thank you very much my good man, I'll use this information wisely!
Bits Please
8 posts
#16487 Guide - How to avoid C/C++ runtime on Windows
2 months ago

mmozeiko

Similar approach works four doubles in 32-bit code.


How exactly is it similar if the truncation instruction returns a signed value, and as a result, causes you to lose precision if your number exceeds the signed integer limit, which can happen if you work with doubles.
You need to find another instruction that truncates and returns an unsigned value or implement the truncation yourself somehow, don't you?
mmozeiko
Mārtiņš Možeiko
1831 posts / 1 project
#16489 Guide - How to avoid C/C++ runtime on Windows
2 months ago Edited by Mārtiņš Možeiko on Oct. 6, 2018, 8:19 a.m.

It will work exactly the same:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
const double next_after_max_signed_int = 2147483648.0;

if (double_value > next_after_max_signed_int)
{
  int result = (int)(double_value - next_after_max_signed_int);
  return (unsigned int)(result ^ 0x80000000); 
}
else
{
  int result = (int)double_value;
  return (unsigned int)result;
}


Or in asm:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// assumes double input is in xmm0 register
movss xmm1, [value_of_2147483648_constant]
movss xmm2, xmm0
subss xmm2, xmm1
cvttsd2si eax, xmm2
cvttsd2si edx, xmm0
xor eax, 0x80000000
ucomiss xmm1, xmm0
cmova  eax, edx
// unsigned int value now is in eax register


(NOTE: I have not actually tested this code, so verify that it works yourself)

The difference is only in conversion operation - cvttss2si vs cvttsd2si. Both returns 32-bit signed int, which you fixup, if it is "negative".

If you are running in 64-bit code, then you can use just one instruction: "cvttsd2si rax, xmm0".
itzjac
19 posts
#16838 Guide - How to avoid C/C++ runtime on Windows
2 weeks, 4 days ago Edited by itzjac on Nov. 23, 2018, 3:45 a.m.

Very late to the discussion, I have waited an episode to go through all this.

Thank you so much mmozeiko, fantastic work!!!
:) :)
markds
1 posts
#16882 Guide - How to avoid C/C++ runtime on Windows
1 week, 3 days ago

Hi!

Firstly, there are some great posts here and I've learned a lot about dropping the CRT.

But I'm having a problem implementing a DLL with pure virtual base classes - I always get this error

error LNK2001: unresolved external symbol "const type_info::`vftable'" ([email protected]@[email protected])

I need to call into the class via the base object (from an exe) but I can't get past this problem :-(

Can anyone help resolve this?



Incidentally, I cant link to ANY .libs - I need to be able to compile a DLL on users machines that don't have the windows SDK installed (It's for a scripting system). Everything works apart from the base class thing!