Guide - How to avoid C/C++ runtime on Windows

Nicolas Léveillé

#6164

March 16, 2016

In terms of either x86 or x64, its my understanding that a long and an int are basically the same thing. I mostly use Windows, but I’m guessing there must be other OSs where they are different.

Even if you're only focusing on Intel CPUs, the assumption that int and long are one and the same fails as soon as you port to Linux and Mac OSX.

On Linux and OSX an int will be 32-bit only and a long will be 64-bit. (LP64 model) while Windows chose to use 32-bit int and long (LLP64 model)

You can see below at this Wikipedia link the table of the various data models chosen for various platforms:
https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models

Fred Harris

#6165

March 16, 2016

Even if you're only focusing on Intel CPUs, the assumption that int and long are one and the same fails as soon as you port to Linux and Mac OSX.

On Linux and OSX an int will be 32-bit only and a long will be 64-bit. (LP64 model) while Windows chose to use 32-bit int and long (LLP64 model)

I assumed that was likely the case uucidl. Thanks for the clarification. It always seemed to me Microsoft was wasting an intrinsic C based data type by defining ints and longs as basically the same thing.

And I don't want you to think I'm ignoring your valuable suggestions mmozeiko. I truly appreciate the help you've given. Its just that I've been trying to implement your suggestions and studying things in general. I'll get back with a more detailed reply in a bit. Yep, still working on it!

Fred Harris

#6168

March 17, 2016

By #including <emmintrin.h> I was able to get this idea/suggestion of yours working Mmozeiko…

unsigned int DoubleToU32(double x)
{
 return (unsigned int)_mm_cvtsd_si32(_mm_set_sd(x));
}

…in that, my program including my FltToCh() function would now compile, link, and run, whereas before it wouldn’t. That’s the good news. The bad news is that it broke my algorithm which worked perfectly in 64 bit. I attempted to fix the worst of it and had some measure of success. But what stopped me was that my algorithm lost precision to the point where I was only getting a few digits of accuracy. At that point something snapped in my brain and I decided I’d had it with this.

I’ve often heard it said that a good general picks his battles and refrains from fighting battles he can’t win. My situation here is something like that, but not exactly. For you see, this is a battle I already won about six weeks ago. I see no reason to re-fight it. I already have a solution to the x86 issue of converting floating point values to character strings. That would be my use of Raymond Filiatreault’s fpu.lib written in masm…

http://www.website.masmforum.com/tutorials/fptute/

I had hoped to code a solution that would work in both x86 and x64, but I’ve given up attempting to achieve it for the reasons which you’ve pointed out to me. If I really absolutely need to have my code work in both x86 and x64 I can use my already working x86 fpu.lib solution for x86 and my FltToCh() function I previously posted in this thread for x64. In examining my original priorities, goals, and objectives for this project I never really considered it essential that the code be essentially the same for x86 verses x64. In other words, do what needs done for each with the bottom line being simply that it works.

But my priorities have always been 64 bit with wide character support. Ansi was always somewhat less important to me as was x86 less important. And I have succeeded beyond my wildest expectations. To illustrate, lets take this C++ program to parse a CSV string such as this…

1	"Zero, One, Two, Three, Four, Five, Six";

Not too hard. Here is a short program to do it with the output afterwards. I’m using VC 19 from Visual Studio 2015…

// cl StdLibParse.cpp /O1 /Os /MT /EHsc
// 200,192 Bytes
#include <iostream>
#include <sstream>

int main()
{
 std::string input = "Zero, One, Two, Three, Four, Five, Six";
 std::istringstream ss(input);
 std::string token;
 while(std::getline(ss, token, ',')) 
 {
    std::cout << token << '\n';
 }
  
 return 0;
} 


#if 0

Output:
=======
Zero
 One
 Two
 Three
 Four
 Five
 Six

 #endif

Right after the command line compilation string above which as you can see optimizes for size with an /MT stand alone executable release build we end up with a 200,192 byte bloated binary. Now let me show you my definition of success. We’ll start here with my TCLib.mak file which can be run with nmake.exe..

//           TCLib.mak
PROJ       = TCLib

OBJS       = crt_con_a.obj crt_con_w.obj crt_win_a.obj crt_win_w.obj memset.obj newdel.obj printf.obj \
             sprintf.obj _strnicmp.obj strncpy.obj strncmp.obj _strrev.obj strcat.obj strcmp.obj \
             strcpy.obj strlen.obj getchar.obj alloc.obj alloc2.obj allocsup.obj FltToCh.obj atol.obj \
             _atoi64.obj abs.obj
        
CC         = CL
CC_OPTIONS = /D "_CRT_SECURE_NO_WARNINGS" /O1 /Os /GS- /c /W3 /DWIN32_LEAN_AND_MEAN

$(PROJ).LIB: $(OBJS)
    LIB /NODEFAULTLIB /machine:x64 /OUT:$(PROJ).LIB $(OBJS)

.CPP.OBJ:
    $(CC) $(CC_OPTIONS) $<

All the *.cpp and *.h files are in the attached zip. You can recreate this if you care to. It needs to be run through the x64 compiler. It should build TCLib.lib (Tiny C Library). Having done that here is my version of the above C++ program that parses that string. Note I’ve put an abbreviated version of my String Class inline above main(). It has just enough members to do the job. My full String Class is closer to 900 lines long. Here’s the code with command line compilation string at top. Its named Parse.cpp…

// cl Parse.cpp /O1 /Os /GS- /Zc:sizedDealloc- /link TCLib.lib kernel32.lib
// 3,584 Bytes
#include <windows.h>
#include "stdlib.h"
#include "stdio.h"

class String
{
 public:
 String()   // Uninitialized Constructor
 {
  this->lpBuffer    = new char[16];
  this->lpBuffer[0] = 0;
  this->iLen        = 0;
  this->iCapacity   = 15;
 }
  
 String(const char* pStr)  //Constructor: Initializes with char*
 {
  this->iLen=strlen(pStr);
  int iNewSize=(this->iLen/16+1)*16;
  this->lpBuffer=new char[iNewSize];
  this->iCapacity=iNewSize-1;
  strcpy(lpBuffer,pStr);
 }

 String& operator=(const char* pStr)  // Assign char* To String
 {
  size_t iNewLen=strlen(pStr);
  if(iNewLen>this->iCapacity)
  {
     delete [] this->lpBuffer;
     int iNewSize=(iNewLen*2/16+1)*16;
     this->lpBuffer=new char[iNewSize];
     this->iCapacity=iNewSize-1;
  }
  strcpy(this->lpBuffer,pStr);
  this->iLen=iNewLen;
    
  return *this;
 }
  
 int ParseCount(const wchar_t c)   //returns one more than # of
 {                                 //delimiters so it accurately
  int iCtr=0;                      //reflects # of strings delimited
  char* p;                         //by delimiter.

  p=this->lpBuffer;
  while(*p)
  {
    if(*p==c)
       iCtr++;
    p++;
  }

  return ++iCtr;
 }
 
 void Parse(String* pStr, char delimiter, size_t iParseCount)
 {
  char* pBuffer=new char[this->iLen+1];  
  if(pBuffer)
  {
     char* p=pBuffer;
     char* c=this->lpBuffer;
     while(*c)
     {
        if(*c==delimiter)
           *p=0;
        else
           *p=*c;
        p++, c++;
     }
     *p=0, p=pBuffer;
     for(size_t i=0; i<iParseCount; i++)
     {
         pStr[i]=p;
         p=p+pStr[i].iLen+1;
     }
     delete [] pBuffer;
  }
 }
   
 char* lpStr()
 {
  return this->lpBuffer;
 } 
 
 void Print(bool blnCrLf)
 {
  printf("%s",lpBuffer);
  if(blnCrLf)
     printf("\n");
 }
   
 ~String()
 {
  delete [] this->lpBuffer;
 } 
  
 private:
 char*  lpBuffer;
 size_t iLen;
 size_t iCapacity; 
};


int main()
{
 String s1       = "Zero, One, Two, Three, Four, Five, Six";  // Assign CSV String To Be Parsed
 int iParseCount = s1.ParseCount(',');                        // Determine Number Of CSVs
 String* pStrs   = new String[iParseCount];                   // Allocate Array To Hold Above Determined Number of CSVs
 s1.Parse(pStrs, ',', iParseCount);                           // Parse The String
 for(int i=0; i<iParseCount; i++)                             // Output The CSVs
     pStrs[i].Print(true);                                    // ....
 delete [] pStrs;                                             // De-Allocate Dynamic Memory
 
 return 0;
}

So there you have my definition of success, i.e., an x64 C++ program containing a String Class which compiles as a stand alone executable to 3,584 bytes, which, if you do the division, comes in at 55.8 times smaller than the 200,192 byte Standard Library based program first shown! Here’s the results of the command line compilation with program run output…

C:\Code\VStudio\VC15\LibCTiny\x64\Test14>cl Parse.cpp /O1 /Os /GS- /Zc:sizedDealloc- /link TCLib.lib kernel32.lib
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.23506 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

Parse.cpp
Microsoft (R) Incremental Linker Version 14.00.23506.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:Parse.exe
TCLib.lib
kernel32.lib
Parse.obj

C:\Code\VStudio\VC15\LibCTiny\x64\Test14>Parse
Zero
 One
 Two
 Three
 Four
 Five
 Six

C:\Code\VStudio\VC15\LibCTiny\x64\Test14>

Exactly the same output to the last space as the C++ Standard Bloatware Library version. The logic and structure of my solution above, as well as my String Class, is based upon a PowerBASIC version of this program, which looks like this…

#Compile Exe                                          ‘Create Stand Alone Exe as opposed to Dll or Lib
#Dim All                                              ‘Require All Variables To Be Declared

Function PBMain() As Long
  Local iParseCount As Long                           ‘To Contain Number Of CSVs
  Local strLine As Wstring                            ‘A OLE String Engine BSTR To Hold CSVs
  Local pStrs() As Wstring                            ‘A Dynamic Array Of COM BSTRs
  Register i As Long                                  ‘Use Register For Iterator

  strLine="Zero, One, Two, Three, Four, Five, Six"    ‘BSTR Containing CSVs
  iParseCount=ParseCount(strLine,",")                 ‘ParseCount() Returns # of CSVs
  Redim pStrs(iParseCount) As Wstring                 ‘Dynamically Allocate iParseCount # of BSTRs
  Parse strLine, pStrs(), ","                         ‘Parse CSVs Based On ‘,’; Put CSVs Into pStrs()’s Array
  For i=0 To UBound(pStrs, 1) –1                      ‘Iterate Through Array pStrs()
    Console.Print LTrim$(pStrs(i))                    ‘Output Trimmed Strings To Console
  Next i
  Erase pStrs()                                       ‘Release Dynamically Acquired Memory
  Waitkey$

  PBMain=0
End Function

#if 0
Zero
One
Two
Three
Four
Five
Six
#EndIf

The above 32 bit PowerBASIC program compiles to 15,360 bytes. If you compare it to my C++ program above you’ll immediately see the resemblance. However, its hard to compare these programs in terms of size because of the radically different nature of the C++ language and the PowerBASIC language. The PowerBASIC program above has already initialized the COM subsystem of Windows and and has an OLE based String variable type built right into the language – unlike C/C++. A lot of other things going on too that I won’t get into here. But the reason I’m doing this work here in attempting to develop a C++ application development framework which eliminates the C and C++ Standard Libraries has a lot to do with PowerBASIC. For you see, the world class programmer who developed the PowerBASIC programming language – Robert Zale, passed away several years ago, while development of a 64 bit compiler was under way. He never completely finished it. The company still exists in skeleton form, but its doubtful whether it will be able to finish Bob’s 64 bit compiler. So I needed to move on.

For the past 15 years I’ve used C and C++ for my embedded development work with Windows CE, as I develop our handheld data recorder programs. For mission critical desktop applications where I work in the forestry sector I’ve used PowerBASIC because of the extremely high performance (it’ll match C tick for tick as its really an extension of MASM using the exact same variable types, free use of inline assembler, etc) and more highly developed dynamic multi-dimensional array handling capabilities, much better string handling than C++, etc., etc. But, as I said, I fear there will never be any 64 bit version, nor any further development of the PowerBASIC language. So that leaves me with C and C++. Which will live on after Dennis Ritchie and Bjarne Stroustrup are gone, the former of whom has already departed us. C’s good and I’m reasonably good at it, but its very slow to develop with. I personally feel I need the enhanced capabilities of C++. Its just that I can’t live with the C++ bloat on desktop Windows (that doesn’t exist in Windows CE – that’s very lean), and the present propensities of the C++ anointed and acolytes to abstract everything, make classes out of everything, write 100 or a 1000 lines of code when something could be done elegantly with ten lines of code – drives me nuts (or even better yet, add a whole library to add 2 + 2 together)….

http://lispian.net/2011/11/01/lasagna-code/

Lasagna Code
November 1, 2011
By lispian
Anyone who claims to be even remotely versed in computer science knows what “spaghetti code” is. That type of code still sadly exists. But today we also have, for lack of a better term — and sticking to the pasta metaphor — “lasagna code”.
Lasagna Code is layer upon layer of abstractions, objects and other meaningless misdirections that result in bloated, hard to maintain code all in the name of “clarity”. It drives me nuts to see how badly some code today is. And then you come across how small Turbo Pascal v3 was, and after comprehending it was a full-blown Pascal compiler, one wonders why applications and compilers today are all so massive.
Turbo Pascal v3 was less than 40k. That’s right, 40 thousand bytes. Try to get anything useful today in that small a footprint. Most people can’t even compile “Hello World” in less than a few megabytes courtesy of our object-oriented obsessed programming styles which seem to demand “lines of code” over clarity and “abstractions and objects” over simplicity and elegance.
Back when I was starting out in computer science I thought by today we’d be writing a few lines of code to accomplish much. Instead, we write hundreds of thousands of lines of code to accomplish little. It’s so sad it’s enough to make one cry, or just throw your hands in the air in disgust and walk away.
There are bright spots. There are people out there that code small and beautifully. But they’re becoming rarer, especially when someone who seemed to have thrived on writing elegant, small, beautiful code recently passed away. Dennis Ritchie understood you could write small programs that did a lot. He comprehended that the algorithm is at the core of what you’re trying to accomplish. Create something beautiful and well thought out and people will examine it forever, such as Thompson’s version of Regular Expressions!
Maybe it’s just my age and curmudgeonly nature shining through, but it pains me to write code for many systems. It’s just so ugly, so poorly thought out. There are bright spots, but they’re rarer by the year. No wonder so many kids decide not to go into computer science. Where it was once applied mathematics with all its intrinsic beauty it’s now been reduced to slapping at the keyboard, entering thousands of lines hoping the compiler will allow your code to compile. Where’s the elegance that was Lisp or Smalltalk or APL? Hell, even Fortran was more elegant than a lot of the crap programming languages being touted today. Why hasn’t someone gone back to Algol and pushed that forward.
As I mentioned to my kids the other day, it’s sad when one of the best programming languages remains C. Sure, there are some beautiful small languages out there that do niche work, but mainstream? Nothing. It’s just a catastrophe. Something like Python may have been great if they’d not embedded an object model into its guts. Sigh.

And I’ve never liked anything in the C++ Standard Library. And I’m independent minded enough to write my own library code which believe it or not largely works. So in a nut shell that’s basically where I’m coming from with this code I’ve developed and posted here. I decided to post it here because you’ve really helped me big time Mmozeiko. I was stuck on that _fltused thing for days trying to get to the bottom of what was going on with floating points when the C Runtime was eliminated. I should have done an internet search on it sooner rather than wasting days trying to figure it out myself. For when I did I found this site and your post here within about ten minutes. And I saw a lot of folks really appreciated the information you provided, and it amazed me to see that there were other folks other than I who were bothered by the ridiculous bloat produced by Microsoft’s C/C++ compilers. Its been my experience that almost nobody cares about this. I’ve been told a million times about the fact that hard drives and ram are now virtually infinite in size, with processors being almost infinitely fast, so efficiency and conciseness in coding no longer has any merit. But nothing seems to run any faster, as Gate’s Law effectively nullifies Moore’s law, being as while processor speeds double every 18 months, software speeds half themselves in that same time span….

http://catb.org/jargon/html/G/Gatess-Law.html

Gates's Law

“The speed of software halves every 18 months.” This oft-cited law is an ironic comment on the tendency of software bloat to outpace the every-18-month doubling in hardware capacity per dollar predicted by Moore's Law. The reference is to Bill Gates; Microsoft is widely considered among the worst if not the worst of the perpetrators of bloat.

…no doubt related to the very issues under discussion here, i.e., software application framework bloat.

By the way, by eliminating the MSVC Runtime, an x64 GUI program created through RegisterClassEX() and CreateWindowEx() comes in also at an amazing 3 k!!! Can you imagine that? Here would be that…

// cl Form1.cpp /O1 /Os /GS- /link TCLib.lib kernel32.lib user32.lib
// cl Form1.cpp /O1 /Os /GS- /link kernel32.lib user32.lib gdi32.lib
#define UNICODE        //  3,072 Bytes x64 UNICODE or ASCI With LibCTiny.lib
#define _UNICODE       // 84,992 Bytes With C Standard Library Loaded (LIBCMT.LIB)
#include <windows.h>
#include "tchar.h"

LRESULT CALLBACK fnWndProc(HWND hwnd, unsigned int msg, WPARAM wParam, LPARAM lParam)
{
 if(msg==WM_DESTROY)
 {
    PostQuitMessage(0);
    return 0;
 }

 return (DefWindowProc(hwnd, msg, wParam, lParam));
}

int WINAPI _tWinMain(HINSTANCE hInstance, HINSTANCE hPrevIns, LPTSTR lpszArgument, int iShow)
{
 WNDCLASSEX wc={0};
 MSG messages;
 HWND hWnd;

 wc.lpszClassName = _T("Form1");
 wc.lpfnWndProc   = fnWndProc;
 wc.cbSize        = sizeof(WNDCLASSEX);
 wc.hInstance     = hInstance;
 wc.hbrBackground = (HBRUSH)COLOR_BTNSHADOW;
 RegisterClassEx(&wc);
 hWnd=CreateWindowEx(0,_T("Form1"),_T("Form1"),WS_OVERLAPPEDWINDOW|WS_VISIBLE,200,100,325,300,HWND_DESKTOP,0,hInstance,0);
 while(GetMessage(&messages,NULL,0,0))
 {
    TranslateMessage(&messages);
    DispatchMessage(&messages);
 }

 return messages.wParam;
}

You can see the stats above at top. 3,072 bytes with my TCLib.lib, and 84,992 bytes using the standard MS VC19 build as an /O1 /Os /MT Release build. As perhaps an interesting aside, here is what the above GUI looks like in PowerBASIC. It compiles to 6,656 bytes with PowerBASIC…

'PowerBASIC Version Form1  Disk Image 6656 bytes;  size on disk 8192; Windows Explorer 7K
#Compile Exe
#Dim     All
#Include Once "Win32Api.inc"


Function fnWndProc(ByVal hWnd As Long, ByVal msg As Long, ByVal wParam As Long, ByVal lParam As Long) As Long
  If msg=%WM_DESTROY Then
     Call PostQuitMessage(0)
     fnWndProc=0 : Exit Function
  End If
  fnWndProc=DefWindowProc(hWnd, msg, wParam, lParam)
End Function


Function WinMain(ByVal hInstance As Long, ByVal hPrevIns As Long, ByVal lpszArgument As Asciiz Ptr, ByVal iShow As Long) As Long
  Local szClassName As Asciiz*8
  Local wc As WNDCLASSEX
  Local Msg As tagMsg
  Local hWnd As Dword

  szClassName             = "Form1"
  wc.lpszClassName        = VarPtr(szClassName)
  wc.lpfnWndProc          = CodePtr(fnWndProc)
  wc.cbSize               = SizeOf(wc)
  wc.hbrBackground        = %COLOR_BTNSHADOW
  wc.hInstance            = hInstance
  RegisterClassEx(wc)
  hWnd=CreateWindowEx(0,szClassName,szClassName,%WS_OVERLAPPEDWINDOW,200,175,320,300,%HWND_DESKTOP,0,hInstance,ByVal 0)
  ShowWindow(hWnd,iShow)
  While GetMessage(Msg,%NULL,0,0)
    TranslateMessage(Msg)
    DispatchMessage(Msg)
  Wend

  WinMain=msg.wParam
End Function

So I hope you don’t mind my posting some of my code here. Delete it if you think its no good or out of place. I’d welcome any comments on it.

In closing I have to say that I don’t know what to make of your criticisims of the wchar_t variable type and Microsoft’s use of it. That you are a more knowledgible coder than I is without question so it would seem to behoove me to delve deeper into the matter with an eye to implementing your suggestions. But on the other hand I have to say that I’ve been using the wchar_t data type in all its bizarre manifestations (TCHARs, OLECHARS, etc., etc.) for 15 years and have encountered nothing of what you have described. And world renouned programmer/author Charles Petzold – writer of the famous “Programming Windows” series of books, has bestowed his full blessing upon it. So I can’t really reconcile these facts. I will look deeper into the issue when I get time. Thank you for the ‘heads up’ about it. I will not ignore your recommendation.

Edited by Fred Harris on March 17, 2016, 3:26am Reason: fix formatting

Fred Harris

#6169

March 17, 2016

And here's a zip file with all the lib code if the site will take it....

Mārtiņš Možeiko

#6170

March 17, 2016

Oops, I gave you wrong instruction. _mm_cvtsd_si32 instruction rounds to nearest integer instead of truncation like C casting does. You should use _mm_cvttsd_si32 function that will truncate like C casting does. That will probably fix precision issues if that was the problem.

Anyway, if SSE instructions doesn't do what you want, just write pure C code first for cast. If you don't know how, take a look at compiler-rt source file from llvm project I gave link to. It should be pretty obvious how that works. Step through with debugger to see the values.

As for wchar_t - if you are using just Windows API and want to run just on Windows, then using wchar_t maybe is OK. But once you want to go cross platform, then using wchar_t is just very wrong in my opinion. utf-8 is by default used in most, if not all API on Linux and OSX. So only weird OS here is Windows. That's one of reasons why I'm saying to use utf-8 makes sense (and there are more, of course). 1 OS vs 2 OS'es ;)

But more serious reason against wchar_t is that people assume that each unicode character is exactly one wchar_t element. That's why they say you should prefer wchar_t over utf8. Sure, if you deal only with English and other European languages then that is correct. But universally that is not correct. Once your code will need to deal with arbitrary Unicode (Chinese hieroglyphs) then your code will break if it assumes that 1 char = 1 wchar_t element. Even your String class has wrong code in many places because of this reason (Left, Right, Mid members). For full UTF-16 support unicode char can take up to 2 wchar_t elements. But your code will cut such characters in half. Thus producing invalid UTF-16 string. If you repeat cutting, concatenating operation many times, you will get garbage in your string. This will lead to rendering garbage, crashes or other security issues. For example:
https://www.cvedetails.com/cve/CVE-2015-5380
https://www.cvedetails.com/cve/CVE-2012-2135
Then why use wchar_t and support multi-wchar_t characters, if you can use utf-8 from the start? Using utf-8 will allow your code to be exactly the same for ansi and utf-8 strings.

Edited by Mārtiņš Možeiko on March 17, 2016, 5:13am

Fred Harris

#6171

March 17, 2016

Oops, I gave you wrong instruction. _mm_cvtsd_si32 instruction rounds to nearest integer instead of truncation like C casting does. You should use _mm_cvttsd_si32 function that will truncate like C casting does. That will probably fix precision issues if that was the problem.

That fixed it! I guess ‘close’ does only count in horseshoes and hand grenades! :) Maybe now you can cut me some slack for dropping the u in _dtoui3?

I believe with that change it might be good enough. Seems to be working identical to my 64 bit code now, with the limitations imposed by the former’s smaller integer register size of course. I need to test a bit yet to be sure, but I think that’s it. I can’t thank you enough Mmozeiko. You’ve really helped. First with that _fltused thing – now this.

And you’ve made good points with the encoding thing. My code is only used in Pennsylvania where I live and work. But I do post on C++ forums such as here occasionally, and I’d like to think my code is workable anywhere it is run (China included). So I need to study up further on character encodings with an eye to making some changes.

Now that I’m nearly done with this project of eliminating the C Runtime Library, I’m wondering what limitations it might impose on the things I typically do. As I experiment with it I’ll surely find out! Top on my list though are ODBC database access and Microsoft’s COM (Component Object Model). I’m big into COM. It’s the object model I prefer over the typical way C++ looks at OOP. If I had to hazard a guess I’m thinking COM might work. Part of the reason I suspect that is that I recall in Microsoft’s ATL (Active Template Library) they typically eliminated the C Std. Lib. At least I think I have that right. I never really used ATL much or liked it very much. I preferred to do COM in the raw without all that weird science.

In terms of ODBC I’m less sure it will work loading the odbc32.lib. I’m guessing it might have dependencies on the C Std Lib. Any thoughts on this Mmozeiko?

Funny, in my brief excursions into Linux where I experimented some with Xlib, Motif (lesstif), and GTK, I seem to recall dealing with four byte characters in one or more of those above mentioned technologies. My understanding was that two byte characters were designed to accommodate all the languages on Earth. I figured the extra two bytes were to accommodate languages such as Klingon and Romulian when we eventually encounter them when we have a star ship Enterprise. :)

Edited by Fred Harris on March 17, 2016, 6:57pm

Mārtiņš Možeiko

#6172

March 17, 2016

Cool, I'm glad you got it working!

Using COM will work fine. The COM objects are implemented in different DLLs. You don't control what they use. They might use C runtime, and they might not use any runtime. That is all fine. And you don't need to use C runtime to access COM objects. Simply speaking COM objects is just an vtable you get pointer to. Calling side doesn't care about implementation. And then you're calling function pointers to whatever implementation they have. No C/C++ runtime is involved here.

Edited by Mārtiņš Možeiko on March 17, 2016, 7:16pm

Fred Harris

#6173

March 17, 2016

Thanks. Good info. Sounds like it will depend on whether the COM dll makes calls on the C runtime. That raises some interesting questions. One of the most challenging projects I ever undertook was to create an ActiveX Grid Control for use in my projects. I used PowerBASIC for that, and coded it first with Windows Custom Control architecture. After having gotten that to work I morphed it into a COM object which supported my custom IGrid Interface as well as IConnectionPointContainer and IConnectionPoint. When I finally finished it the executable size was about 49k, and with UPX it compacted down to 22k. I'm guesdsing its likely the smallest grid control anywhere.

Couple years later I redid it in C++. Mostly I wanted a 64 bit version, and like I previously mentioned, PowerBASIC just does 32 bit. I agonized a bit over wehether to code it in C or C++, or rather compile always as C++ but use C idioms. I finally decided to use C++ because I couldn't live without my String Class, particularly my Parse function, which you have. Only later did it occur to me that I could take that Parse code out of the class and save some binary size. Anyway, it ended up about 85 k or something like that, and UPX'ed down to about 43 k.

With this functionality of removing the C Runtime it would be interesting to see if I could reduce the code size even further. I'm excited about this!

Edited by Fred Harris on March 17, 2016, 7:49pm

x13pixels

#6175

March 18, 2016

Fantastic information mmozeiko. Thanks much.

Fred Harris

#6176

March 19, 2016

OK Martins, so I've been trying to beef up my knowledge of character encodings with regard to your comments about the use I've made of the wchar_t type. You are recommending I just use UTF-8, and if not that then the 32 bit character type. Well, lets start with UTF-8. None of the articles I've scanned tell exactly how to actually USE UTF-8. As far as I know, in Windows using C or C++ this is an exhaustive list of the variable types I have available for my use in string handling if I want library support...

1 2	char wchar_t

Of course, there are piles of type redefinitions of the above, and even wchar_t boils down to a typedef of unsigned short int. I can find no mention of an actual UTF-8 data type. Or does one simply just use the char data type?

Getting down to even more specfics, how would I translate Dennis Ritchie's famous Hello, World! program to UTF-8, or is it already by just using the char data type...

#include <stdio.h>

int main(void)
{
 char szBuffer[]="Hello, World!";
 printf("%s\n",szBuffer);
 return 0;
}

If my surmises above are correct, i.e., just use the char data type and make no use of wchar_t, then I'm guessing you are against the whole tchar.h macro setup Microsoft has, for example, where _tcslen gets transmorgrified into strlen through the mysterious alchemy of tchar.h if _UNICODE isn't defined and wcslen if it is, etc.???

Am I reading you correct on this?

Edited by Fred Harris on March 19, 2016, 3:09pm

Mārtiņš Možeiko

#6178

March 19, 2016

Yes, Windows API doesn't support UTF-8. You'll need to convert it to wide char for every call. It's not that a big of deal.

char* utf8string = ...;
wchar_t wide[256]; // whatever max size you want, or get size first and allocate on heap/pool
MultiByteToWideChar(CP_UTF8, 0, utf8string, strlen(utf8string), wide, ArrayCount(wide));
WindowsApiFunction(wide, ...);

Of course you can write your own utf8-to-utf16 string converter, it's pretty trivial.

As for your hello world program - any char whose value is in 32-127 interval is a valid utf8 string. So in your example string contains only 32-127 ascii characters. So it is utf8 string. That's the beauty of utf8. Your program will properly use and output utf8 string.

Because utf-8 is multi-byte encoding, you simply use any type that is byte for storing bytes in array. char is find. unsigned char is fine. char is better, because it allows to use ascii strings (32-127) for string literals.

All the tchar stuff is total nonsense. Why would you want to switch to ansi encoding? All modern Windows'es since Windows NT internally uses Unicode, so using A functions your are making it perform conversions to UTF-16 anyway. So there is no reason to use A functions if you want to use unicode. Just use W functions and drop ansi stuff. Use unicode with UTF-8 encoding everywhere.

Of course you'll need to write couple string functions like strlen, but some of them standard functions can be used as is (like strcat, strcmp, strcpy) because they don't care about encoding - they will simply copy or compare bytes which is fine for utf-8. Here's example how to write optimized strlen function for utf8: http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html

Fred Harris

#6403

April 21, 2016

Hello Martins!

Its me again. Stuck on another floating point issue in x86 32 bit with eliminating the C Runtime. And yes, I'm still working on it. Been three months now.

Do you know anything about _ftol2 and _ftol2_sse? I'm getting unresolved externals on those in compiling my ActiveX Grid Control dll. I'm assuming they are abbreviations for float to long. Odd thing is, nowhere in my code are there any four byte floats declared or used.

Here's the deal. My ActiveX Grid Control compiles/links/runs perfectly in x64. But in x86 its giving me those linker errors. The code is heavy with #ifdef Debug conditionals where I output debugging information to a log file, so I know exactly right down to the very statement what's causing the problem and errors. There really isn't much floating point math in the grid. Originally when I wrote the code there wasn't any. But couple years after that I got turned on to the necessity of writing "High DPI Aware" code, so that if the user changed Display or Display Resolution settings in Control Panel, my screens wouldn't look like s***. So to code that I needed about a dozen more lines of code and I have a situation where a double gets multiplied by an int and the result stored in a Windows DWORD. The double is a DPI scaling factor which commonly takes on values of 1.0, 1.25, or 1.5. I suppose other values are possible, but I believe those are the only values I’ve ever seen in playing with those settings in Control Panel on my specific laptop. The thought occurred to me that perhaps because the values are so simple and easily expressible in a four byte float that might be why the compiler is using _ftol2 instead of _dtoui which we’ve previously dealt with. This is actually the block of code from the grid dll where the doubles are declared and initialized…

// DPI Handling
double dpiX, dpiY;
double rxRatio, ryRatio;
hDC = GetDC(NULL);
dpiX=GetDeviceCaps(hDC, LOGPIXELSX);
dpiY=GetDeviceCaps(hDC, LOGPIXELSY);
rxRatio=(dpiX/96);
ryRatio=(dpiY/96);

Its those rxRatio/ryRatio variables that take on such values as 1.0, 1.25, 1.5, etc. The way I use them is that they need to be multiplied against everything in an app that specifies the sizes of anything, such as the x, y, cx, and cy variables in CreateWindowEx() calls to create and position objects. For example, say you wanted a top level window at 75, 75 on the desktop that was 320 pixels wide and 300 pixels high…

1	hWnd=CreateWindowEx(0, szClassName, szClassName, WS_OVERLAPPEDWINDOW, 75, 75, 320, 300, HWND_DESKTOP, 0, hIns, 0);

What you would do after obtaining the above DPI values would be to multiply all those numbers by rxRatio or ryRatio, as the case may be. However, I came up with a better solution that just uses a macro to do that as follows…

#define SizX(x)      x * rxRatio
#define SizY(y)      y * ryRatio

…so the above CreateWindowEx() call becomes even simpler…

[code]
hWnd=CreateWindowEx(0, szClassName, szClassName, WS_OVERLAPPEDWINDOW, SizX(75), SizY(75), SizX(320), SizY(300), HWND_DESKTOP, 0, hIns, 0);

That’s what’s failing in the 32 bit builds and generating the linker errors. I have been trying to solve it using the techniques you showed me about a month or so ago when I was fighting with that _dtoui3 thingie. If you recall, to solve that problem I created this function…

#ifdef _M_IX86

unsigned int __cdecl DoubleToU32(double x)
{
 return (unsigned int)_mm_cvttsd_si32(_mm_set_sd(x));
}

#endif

…and its use in 32 bit builds was as follows in peeling off digits of a double to convert to a character string…

while(k<=17)
{
  if(k == i)
     k++;
  *(p1+k)=48+(char)n;
  x=x*10;
  #ifdef _M_IX86
     n=DoubleToU32(x);
  #else   
     n = (size_t)x;
  #endif   
  x = x-n;
  k++;
}

So first thing I did was to try to use code like that in ftol2 and ftol2_sse implementations wrapped in extern “C”’s to see if that would link….

extern "C" long __cdecl _ftol2_sse(double x)
{
 return _mm_cvtsd_si32(_mm_set_sd(x));
}


extern "C" long __cdecl _ftol2(double x)
{
 return _mm_cvtsd_si32(_mm_set_sd(x));
}

Note I used the versions that round instead of truncate. That solved the unresolved externals and the code built. But it doesn’t work. Doesn’t crash or anything; just doesn’t work. The result of the multiplication ends up being zero. I looked around the long list of compiler intrinsics to see if I could find anything better, and I did, so I tried this….

extern "C" long _ftol2(float x)
{
 return _mm_cvtss_si32(_mm_set_ss(x));
} 

extern "C" long _ftol2_sse(float x)
{
 return _mm_cvtss_si32(_mm_set_ss(x));
}

That didn’t work either. I’m some confused about what’s going on because the dll code that won’t work seems similar to exe code that does work. When I saw what was happening I decided to make a small exe test program to see if, in 32 bit, one could declare a double and assign a number to it, multiply that by an int, and assign the result to a DWORD…

// cl StrTst5.cpp /O1 /Os /GS- /link TCLib.lib kernel32.lib 
#include <windows.h>
#include "stdlib.h"
#include "stdio.h"

extern "C" int   _fltused=1;
#define SizX(x)  x * rxRatio

int main()
{
 double rxRatio    = 1.25;
 int    iColWidths = 110;
 DWORD  pColWidths;

 pColWidths = SizX(iColWidths);
 printf("pColWidths = %u\n",pColWidths); 
 getchar();
 
 return 0;
}

...and that works perfectly. The result is 137...

C:\Code\VStudio\Grids\x86>cl StrTst5.cpp /O1 /Os /GS- /link TCLib.lib kernel32.lib
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.21022.08 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

StrTst5.cpp
Microsoft (R) Incremental Linker Version 9.00.21022.08
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:StrTst5.exe
TCLib.lib
kernel32.lib
StrTst5.obj

C:\Code\VStudio\Grids\x86>StrTst5
pColWidths = 137

Now admittedly, the code in the dll is a lot more complicated, but its unclear to me how that would make any difference, as the fundamental operation taking place is no different than above. What I mean by 'complicated' in the dll is that it has to handle 'instance data' and multiple instantiations of grids correctly, so everything is done through dynamic memory allocations and pointers. The actual statement from the grid code that's failing in 32 bit is this...

1	pGridData2->pColWidths[i] = SizX(strFieldData[0].iVal()); // <<< Line Causing Problems!!!

The variable pGridData is a pointer to a GridData object, which is dynamically allocated for each grid, and its where I hang pointers to all the grid's private 'instance' data. One of the members is a pointer to another memory block where I store all the column widths specified by the user when the grid is instantiated. These are modifiable at run time by the user through dragging the column dividers with the mouse. That's the GridData->pColWidths[] member, which is typed as DWORDs. The SizX() I've already described. The strFieldData[0].iVal() term is a member function call on my String Class where I'm extracting the column widths from the grid setup string passed in by the user/client and converting them to ints. So yes, that's complicated, but fundamentally no different than a multiplication of a double by an int with the rounded result going to a DWORD. And that appears to be where _ftol2 and _ftol2_sse enter the picture somehow. What do you think Martins? Any ideas how I might solve this???

By the way, by removing the High DPI Aware code, which only amounts to little more than I've shown above, the grid buulds and runs fine - just like its x64 counterpart.

Mārtiņš Možeiko

#6405

April 21, 2016

_ftol2 and _ftol2_sse functions have their own non-standard calling convention. Argument in ST(0) and return value in EDX:EAX. You cannot implement it with just C code. You need to write assembly. I wrote how to implement these two functions in first post of this topic. Read also warning below code to understand limitation of that implementation. You most likely will want more correct code with either SSE2 cvttps2dq (or similar) instruction or more x87 FPU instructions. Check what SDL library does: https://hg.libsdl.org/SDL/file/80...1b90/src/stdlib/SDL_stdlib.c#l320

Its much better to avoid C/C++ style or implicit casts in code to avoid generating implicit dependencies like this. Simply create your own casting functions (DoubleToU32, FloatToU32 and others) and use them.

Edited by Mārtiņš Možeiko on April 21, 2016, 10:12pm

Fred Harris

#6406

April 22, 2016

Oh! Sorry Martins! I feel dumb. You covered this early on and I forgot. My answers are right there. At the time I read it several months ago I was having other problems, which you covered, and I forgot about your coverage of the _ftol2 issue.

TM

#6910

May 9, 2016

Are there any public domain math libraries that one could use? Or at least have reasonable licences for gamedev?
The one problem I see with avoiding the c runtime is replacing the math library functions.
Especially these ones: sin, cos, atan2, sqrt.

Also do you write your own optimized memcpy, memcmp etc when you don't use the c runtime, or do you just make the compiler emit intrinsics for them?