Linux/OS X Support

Hi Handmade Hero friends

SDL Handmade does run on Linux and OS X but requires some small changes to Casey's code. We are looking for volunteers to bring these GCC/LLVM issues to Casey's attention during the stream.

Currently the following methods needs to be implemented in handmade_platform.h
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
inline u64 AtomicExchangeU64(u64 volatile *Value, u64 New)
{
    u64 Result = __sync_lock_test_and_set(Value, New);
    return(Result);
}
inline u64 AtomicAddU64(u64 volatile *Value, u64 Addend)
{
    u64 Result = __sync_fetch_and_add(Value, Addend);
    return(Result);
}
inline u32 GetThreadID(void)
{
    u32 ThreadID;
#if defined(__APPLE__)
    asm("mov %%gs:0x00,%0" : "=r"(ThreadID));
#elif defined(__i386__)
    asm("mov %%gs:0x08,%0" : "=r"(ThreadID));
#elif defined(__x86_64__)
    asm("mov %%fs:0x10,%0" : "=r" (ThreadID));
#else
#error Unsupported architecture
#endif
    return ThreadID;
}


Also in handmade_platform.h a cast is needed as __FUNCTION__ is a const char *
1
#define TIMED_FUNCTION(...) TIMED_BLOCK_((char *)__FUNCTION__, __LINE__, ## __VA_ARG


In handmade_debug.cpp _snprintf_s should be replaced by snprintf but this will be solved once the dependency to stdio is removed.

/Kim
"#if defined(__APPLE__)" needs to be replaced with "#if defined(__APPLE__) && defined(__x86_64__)". Just in case somebody will want to port HH to iOS.

MSVC2013 has _snprintf function that is almost like snprintf (it misses just a few exotic formatters). So instead of using ugly _snprintf_s function, we could use this:
1
2
3
#if COMPILER_MSVC
#define snprintf _snprintf
#endif

And then use snprintf in all places. Then nobody will have issues on OSX, Linux or even MinGW on Windows.

Also add "-D_CRT_SECURE_NO_DEPRECATE" argument for cl.exe to not generate deprecated warning. It's perfectly fine to use those deprecated functions (they are deprecated only by Microsoft). MSVC2015 has a real snprintf function.
Kim
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
inline u32 GetThreadID(void)
{
    u32 ThreadID;
#if defined(__APPLE__)
    asm("mov %%gs:0x00,%0" : "=r"(ThreadID));
#elif defined(__i386__)
    asm("mov %%gs:0x08,%0" : "=r"(ThreadID));
#elif defined(__x86_64__)
    asm("mov %%fs:0x10,%0" : "=r" (ThreadID));
#else
#error Unsupported architecture
#endif
    return ThreadID;
}


Question for very advanced programmers: Why was the selector changed from GS to FS for the x86-64 port of Linux?

(Yes, there's a reason.)
Sorry, I don't know the reason for the change from gs to fs segment, but
when I compile with gcc on linux, I get the following error:
1
2
handmade_platform.h:561:137: error: expected primary-expression before ‘)’ token
 Block_##Number(__COUNTER__, __FILE__, __LINE__, BlockName, ## __VA_ARGS__)


It seems to be due to undefined compiler behaviour according to this
website:
http://binglongx.com/2013/07/11/t...empty-argument-to-variadic-macro/
Even though the handmade code uses a "##" before the "__VA_ARGS__".

I can get it to compile and run by changing this line:
1
2
3
4
-#define TIMED_BLOCK__(BlockName, Number, ...) timed_block TimedBlock_##Numb\
er(__COUNTER__, __FILE__, __LINE__, BlockName, ## __VA_ARGS__)
+#define TIMED_BLOCK__(BlockName, Number, ...) timed_block TimedBlock_##Numb\
er(__COUNTER__, __FILE__, __LINE__, BlockName) //, ## __VA_ARGS__)


Well, in the process of writing this post I got it to work :)
I was using "-std=c++11", then I tried changing to "-std=gnu++11" and it
seems to work. I thought I'd still post this in case it helps anyone.
@longboolean and @effect0r: Thank you for bringing these issues up during the stream and of course to Casey for implementing these. It works fine :)

Now we only need to get rid of the _snprintf_s.


Pseudonym73
Question for very advanced programmers: Why was the selector changed from GS to FS for the x86-64 port of Linux?

(Yes, there's a reason.)


You are killing us. Please tell why they switch to using the FS register.

/Kim
To be compatible with Windows? :D
It also uses gs for 64-bit and fs for 32-bit.

Or maybe (I'm guessing) because there is some special CPU support in 64-bit mode for GS register but not for FS - SWAPGS instruction: http://www.felixcloutier.com/x86/SWAPGS.html
Maybe something can be done more efficiently with SWAPGS than without it.

Edited by Mārtiņš Možeiko on
mmozeiko
To be compatible with Windows?

Unsurprisingly, Windows does it for the same reason.

mmozeiko
Or maybe (I'm guessing) because there is some special CPU support in 64-bit mode for GS register but not for FS - SWAPGS instruction: http://www.felixcloutier.com/x86/SWAPGS.html
Maybe something can be done more efficiently with SWAPGS than without it.


You got it! The thing that can be done more efficiently with SWAPGS is system calls.

The SYSCALL instruction is used to perform a fast system call. It transfers control to a specific location, elevates the privilege level to ring 0 (i.e. "kernel mode"), saves the CPU flags into one specific register (r11), and disables interrupts. It doesn't change any other registers, including the stack pointer. This is much, much faster than the way things used to be done (e.g. software interrupts), which involved saving registers to a stack at a known location, switching stacks, and so on.

So when you enter the kernel, almost all of the CPU registers are set to what they were in user space. The first thing the kernel has to do is save that user state. But where should it save the state to?

Note that it can't compute a location, because any computation would use registers. Using static space would work on a single core, but not on a multicore machine.

The answer is to use SWAPGS. In kernel space, GS points to the kernel's per-cpu information, and in user space, it points to user-space per-thread information. SWAPGS swaps gs with a MSR, which can be set per-CPU at boot time.

Here's the relevant code from Homebrew OS:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
	.align 16
	.globl __syscall_entry
__syscall_entry:
        // Calling convention on HBOS is that %rax contains
        // the system call number when you enter a system call and
        // contains errno when you exit. This is a fast path to make sure
        // that the number makes sense. Note that this doesn't change
        // the values in any registers or use any stack.
	cmpq $SYSCALL_COUNT,%rax
	jb __syscall_ok
	movq $ENOTSUP,%rax
	sysretq
__syscall_ok:
        // Swap the user-space thread-local storage with the kernel's
        // per-cpu state for this CPU.
	swapgs

        // %gs:0 now points to the kernel's per-CPU state.
        // We can now freely use this without interfering with
        // anything that other CPUs are doing.

        // Swap stacks...
	movq %rsp,%gs:CPUSTATE_SAVED_STACK
	movq %gs:CPUSTATE_SYSCALL_STACK,%rsp

        // Now we are on a known good stack, so we
        // have a place to save a register so we can do
        // some computation.
	push %rax
	movq %gs:CPUSTATE_CUR_THREAD,%rax

        // %rax now points to the kernel's thread control block.

        // Remember that we entered the kernel via a syscall as
        // opposed to, say, an interrupt. Eventually we will probably
        // want to return control to the thread, so we have to make
        // sure we exit in the opposite way that we entered.
	movb $THREADRET_SYSCALL,THREAD_THREADRET(%rax)

        // Now that we have the TCB, save registers into it...
	movq %r11,REGOFF_RFLAGS(%rax)
	lea THREAD_CPUREGS(%rax),%rax
	movq %rbx,REGOFF_RBX(%rax)
	movq %rcx,REGOFF_RCX(%rax)
	movq %rdx,REGOFF_RDX(%rax)
	movq %rsi,REGOFF_RSI(%rax)
	movq %rdi,REGOFF_RDI(%rax)
	movq %rbp,REGOFF_RBP(%rax)
	movq %r8,REGOFF_R8(%rax)
	movq %r9,REGOFF_R9(%rax)
	movq %r10,REGOFF_R10(%rax)
	movq %r11,REGOFF_R11(%rax)
	movq %r12,REGOFF_R12(%rax)
	movq %r13,REGOFF_R13(%rax)
	movq %r14,REGOFF_R14(%rax)
	movq %r15,REGOFF_R15(%rax)

	// We temporarily stored the user-space stack
        // pointer in CPUSTATE_SAVED_STACK...
	movq %gs:CPUSTATE_SAVED_STACK,%rcx
	movq %rcx,REGOFF_RSP(%rax)

        // ...and we temporarily stored %rax on the stack.
	pop %rcx
	movq %rcx,REGOFF_RAX(%rax)

        // All registers are now saved, and we're on a known
        // good kernel-space stack. We can now call into C
        // code and everything will work.
	mov %rcx,%rax
	movq %r10,%rcx
	call *syscall_table(,%rax,8)


The sysret code does essentially the same thing only in reverse. The kernel changes the saved copy of the GS when it does a context switch, so when the system call exit code (sysret) does a final SWAPGS, the correct thread-local storage pointer is swapped back in.