Day 178: Compiling on Linux

In my quest to follow along on Linux I have observed the following issues when compiling handmade hero with GCC. I am hoping that someone could bring one or more of these to Casey's attention during the stream.

- test_asset_builder defaults to USE_FONTS_FROM_WINDOWS, the stb path is outdated and is using windows specific font paths.

- In handmade_debug.h GCC is complaining about the forward declaration of DebugRecordArray (extern keyword is missing), the same goes for DebugRecords_Optimized in handmade.cpp. __FILE__ and __FUNCTION__ is a const char *.

- The AtomicExchangeU64 and AtomicAddU64 needs to be implemented for LLVM/GCC.

- _snprintf_s is not supported (used in OutputDebugRecords)

It would be great if there was a way to ask a question on the stream without having to stay up all night :)

Thanks,
Kim

Edited by Kim Jørgensen on
Hi all

Would somebody bring one or more of these issues to Casey's attention during the stream tonight? I would really appreciate the help :cheer:

/Kim
I'm not entirely sure if this is the right way to do it, but from my research I guess it is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
inline u64 AtomicExchangeU64(u64 volatile *Value, u64 New)
{
    u64 Result = __sync_lock_test_and_set(Value, New);

    return(Result);
}

inline u64 AtomicAddU64(u64 volatile *Value, u64 Addend)
{
    // NOTE(casey): Returns the original value _prior_ to adding
    u64 Result = __sync_fetch_and_add(Value, Addend);

    return(Result);
}


Can anyone validate it and if this works, ask Casey to add it to the HMH code ?

Thanks

Edited by Nuno on
That's exactly how I choose to implement these and that works fine.

GetThreadID (day 182) is a bit more tricky but you can do something like this if you don't care too much about performance:

1
2
3
4
5
6
7
8
#include <pthread.h>

inline u32 GetThreadID(void)
{
    u32 ThreadID = pthread_self();

    return(ThreadID);
}


I guess this ought to be an u64 at least on my Linux distro.
I also think that this will work:

1
2
3
4
5
6
7
8
#include <sys/syscall.h>
#include <unistd.h>
inline u32 GetThreadID(void)
{
    u32 ThreadID = syscall(SYS_gettid);

    return(ThreadID);
}


However I'm not totally sure about it, as I didn't test it out properly, I'm still figuring out how to compile the sdl repo in OS X :)
In the TIMED_FUNCTION macro you need to cast __FUNCTION__ to char *

1
#define TIMED_FUNCTION(...) TIMED_BLOCK_((char *)__FUNCTION__, __LINE__, ## __VA_ARGS__)


Please note that on day 184 Casey left the code compiling but the game crashes
pthread_self is probably what you want to use if you are OK including pthread.h header in game code.

You don't want to use syscall function - it is very expensive compared to thread local storage (what pthread_self most likely uses). Syscall means context switch to perform its work in kernel.

Edited by Mārtiņš Možeiko on
Thanks for the clear explanation :)
In OSX pthread_self returns a pthread_t struct, which is useless in this context, I've "googled" a bit and found this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
+#include <pthread.h>
+
inline u32 GetThreadID(void)
{

    u64 ThreadID;
    pthread_threadid_np(NULL, &ThreadID);

            return((u32) ThreadID);
}


The cast at the return is really messy, I don't like it, but It compiles and runs ok. I just wanted to give it a quick go.

Also, OSX doesn't have support for _snprintf_s and I've replaced it with snprintf and once again, it seems to be working fine.
In handmade_debug, this:
1
2
3
4
5
6
_snprintf_s(TextBuffer, sizeof(TextBuffer),
                                    "%s: %10ucy [%s(%d)]",
                                     Record->BlockName,
                                     Region->CycleCount,
                                     Record->FileName,
                                     Record->LineNumber);


needs to be changed to (notice the %llu instead of %u regarding Region->CycleCount).

1
2
3
4
5
6
snprintf(TextBuffer, sizeof(TextBuffer),
                                     "%s: %10llu cy [%s(%d)]",
                                     Record->BlockName,
                                     Region->CycleCount,
                                     Record->FileName,
                                     Record->LineNumber);


With SDL Handmade that's what we need to make it run in OSX.

Hope this helps :)

Edited by Nuno on
Nice B)
Did you try this GetThreadId implementation?
[strike]pthread_threadid_np is expensive call. It is a syscall to SYS_thread_selfid. That involves context switch.[/strike]
(EDIT: I'm wrong. It's not a syscall. It is wrapper around pthread_self)

You really want to use pthread_self. I don't think it returns just a struct. It returns pointer to struct (pthread_t is a typedef for a pointer). And this pointer is different for each thread. So it is OK in our case to cast pointer value to integer and pretend that is our thread id.

You can see how pthread_self is implemented here: https://opensource.apple.com/sour...-583/pthreads/pthread_internals.h
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
inline static pthread_t __attribute__((__pure__))
_pthread_self_direct(void)
{
       pthread_t ret;
#if defined(__i386__) || defined(__x86_64__)
       asm("mov %%gs:%P1, %0" : "=r" (ret) : "i" (offsetof(struct _pthread, tsd[0])));
#elif defined(__ppc64__)
	register const pthread_t __pthread_self asm ("r13");
	ret = __pthread_self;
#elif defined(__arm__) && defined(_ARM_ARCH_6)
	__asm__ ("mrc p15, 0, %0, c13, c0, 3" : "=r"(ret));
#elif defined(__arm__) && !defined(_ARM_ARCH_6)
	register const pthread_t __pthread_self asm ("r9");
	ret = __pthread_self;
#endif
       return ret;
}

So on Intel desktop CPU's it is as simple as one mov operation - "mov %%gs:%P1, %0".

Edited by Mārtiņš Možeiko on
Kim
Nice B)
Did you try this GetThreadId implementation?


I tried and it crashes on startup.

@mmozeiko: Once again thanks for the explanation. I casted pthread_self to unsigned long int and it seems to work just fine, this is the current implementation I have for GetThreadID:
1
2
3
4
5
6
#include <pthread.h>

inline u32 GetThreadID(void)
{
    return (unsigned long int) pthread_self();
}


I also tried the ASM you suggested, but it complained about %P1, as I'm not fluent at all with asm, I didn't even try to understand it :)
%P1 is and integer that is passed as argument for asm construction.

You can get this value by creating small test program - include "pthread_internals.h", and print out offsetof(struct _pthread, tsd[0]) value. It may be different for 32-bit and 64-bit code. But most likely nobody anymore cares about 32-bit code on OSX.

In result code should look like:
1
2
3
void *tmp;
asm("mov %%gs::0xAA, %0" : "=r"(tmp))
return (u32)tmp;

Where 0xAA is that value in hex that you get from offsetof.
This way there won't be need for any platform specific includes (pthread).

Although if you take a look at different source of pthread then this value seems to be 0. Not sure what is correct value. Stepping into pthread_self assembly in debugger would be best option to figure out.

Here is pthread_self - https://opensource.apple.com/sour...ibpthread-105.40.1/src/internal.h
1
2
3
4
5
6
7
// Internal references to pthread_self() use TSD slot 0 directly.
inline static pthread_t __attribute__((__pure__))
_pthread_self_direct(void)
{
	return _pthread_getspecific_direct(_PTHREAD_TSD_SLOT_PTHREAD_SELF);
}
#define pthread_self() _pthread_self_direct()


Here is _pthread_getspecific_direct - https://opensource.apple.com/sour...ad-105.40.1/private/tsd_private.h
1
2
3
4
5
__header_always_inline void *
_pthread_getspecific_direct(unsigned long slot)
{
	return _os_tsd_get_direct(slot);
}


Here is _PTHREAD_TSD_SLOT_PTHREAD_SELF - https://opensource.apple.com/sour...ad-105.40.1/private/tsd_private.h
1
#define _PTHREAD_TSD_SLOT_PTHREAD_SELF __TSD_THREAD_SELF


Here is __TSD_THREAD_SELF and _os_tsd_get_direct - https://opensource.apple.com/sour...xnu-2422.1.72/libsyscall/os/tsd.h
1
2
3
4
5
6
7
8
9
_os_tsd_get_direct(unsigned long slot)
{
	void *ret;
#if defined(__i386__) || defined(__x86_64__)
	__asm__("mov %%gs:%1, %0" : "=r" (ret) : "m" (*(void **)(slot * sizeof(void *))));
#endif

	return ret;
}


So if I understand it correctly this should work:
1
2
3
void *tmp;
asm("mov %%gs::0x0, %0" : "=r"(tmp))
return (u32)tmp;
Hi mmozeiko

Thanks a lot for your research.

It actually works with a small change:

1
2
3
4
5
6
inline u32 GetThreadID(void)
{
   void *tmp;
   asm("mov %%gs:0x0, %0" : "=r"(tmp))
   return (unsigned long int ) tmp;
}


(one less : and casting to unsigned long int instead of u32)

and by "it works" I mean, it compiles and the game runs.

I'm curious to know if this will work in Linux as well.

Edited by Nuno on
Slash

I tried and it crashes on startup.


I guess that qualifies as "does not work" :-)

Regarding the assembly wouldn't it be simpler to do this?

1
2
3
4
5
6
inline u32 GetThreadID(void)
{
   u32 ThreadID;
   asm("mov %%gs:0x0,%0" : "=r"(ThreadID));
   return ThreadID;
}