mmozeiko
To be compatible with Windows?
Unsurprisingly, Windows does it for the same reason.
mmozeiko
Or maybe (I'm guessing) because there is some special CPU support in 64-bit mode for GS register but not for FS - SWAPGS instruction: http://www.felixcloutier.com/x86/SWAPGS.html
Maybe something can be done more efficiently with SWAPGS than without it.
You got it! The thing that can be done more efficiently with SWAPGS is system calls.
The SYSCALL instruction is used to perform a fast system call. It transfers control to a specific location, elevates the privilege level to ring 0 (i.e. "kernel mode"), saves the CPU flags into one specific register (r11), and disables interrupts. It doesn't change any other registers, including the stack pointer. This is much, much faster than the way things used to be done (e.g. software interrupts), which involved saving registers to a stack at a known location, switching stacks, and so on.
So when you enter the kernel, almost all of the CPU registers are set to what they were in user space. The first thing the kernel has to do is save that user state. But where should it save the state to?
Note that it can't compute a location, because any computation would use registers. Using static space would work on a single core, but not on a multicore machine.
The answer is to use SWAPGS. In kernel space, GS points to the kernel's per-cpu information, and in user space, it points to user-space per-thread information. SWAPGS swaps gs with a MSR, which can be set per-CPU at boot time.
Here's the relevant code from Homebrew OS:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72 | .align 16
.globl __syscall_entry
__syscall_entry:
// Calling convention on HBOS is that %rax contains
// the system call number when you enter a system call and
// contains errno when you exit. This is a fast path to make sure
// that the number makes sense. Note that this doesn't change
// the values in any registers or use any stack.
cmpq $SYSCALL_COUNT,%rax
jb __syscall_ok
movq $ENOTSUP,%rax
sysretq
__syscall_ok:
// Swap the user-space thread-local storage with the kernel's
// per-cpu state for this CPU.
swapgs
// %gs:0 now points to the kernel's per-CPU state.
// We can now freely use this without interfering with
// anything that other CPUs are doing.
// Swap stacks...
movq %rsp,%gs:CPUSTATE_SAVED_STACK
movq %gs:CPUSTATE_SYSCALL_STACK,%rsp
// Now we are on a known good stack, so we
// have a place to save a register so we can do
// some computation.
push %rax
movq %gs:CPUSTATE_CUR_THREAD,%rax
// %rax now points to the kernel's thread control block.
// Remember that we entered the kernel via a syscall as
// opposed to, say, an interrupt. Eventually we will probably
// want to return control to the thread, so we have to make
// sure we exit in the opposite way that we entered.
movb $THREADRET_SYSCALL,THREAD_THREADRET(%rax)
// Now that we have the TCB, save registers into it...
movq %r11,REGOFF_RFLAGS(%rax)
lea THREAD_CPUREGS(%rax),%rax
movq %rbx,REGOFF_RBX(%rax)
movq %rcx,REGOFF_RCX(%rax)
movq %rdx,REGOFF_RDX(%rax)
movq %rsi,REGOFF_RSI(%rax)
movq %rdi,REGOFF_RDI(%rax)
movq %rbp,REGOFF_RBP(%rax)
movq %r8,REGOFF_R8(%rax)
movq %r9,REGOFF_R9(%rax)
movq %r10,REGOFF_R10(%rax)
movq %r11,REGOFF_R11(%rax)
movq %r12,REGOFF_R12(%rax)
movq %r13,REGOFF_R13(%rax)
movq %r14,REGOFF_R14(%rax)
movq %r15,REGOFF_R15(%rax)
// We temporarily stored the user-space stack
// pointer in CPUSTATE_SAVED_STACK...
movq %gs:CPUSTATE_SAVED_STACK,%rcx
movq %rcx,REGOFF_RSP(%rax)
// ...and we temporarily stored %rax on the stack.
pop %rcx
movq %rcx,REGOFF_RAX(%rax)
// All registers are now saved, and we're on a known
// good kernel-space stack. We can now call into C
// code and everything will work.
mov %rcx,%rax
movq %r10,%rcx
call *syscall_table(,%rax,8)
|
The sysret code does essentially the same thing only in reverse. The kernel changes the saved copy of the GS when it does a context switch, so when the system call exit code (sysret) does a final SWAPGS, the correct thread-local storage pointer is swapped back in.