Compiling without libc on Linux

I started looking into compiling on Linux without libc. I have found this blog (https://blogs.oracle.com/ksplice/entry/hello_from_a_libc_free) which got me started, but I have no clue (and can't seem to find anything with google) how to do file io and write to the console without the help of libc.
Does anyone know how to do file io and output to (what would be) stdout without libc?
Thanks.
You really don't want to avoid libc. It's ok to avoid C runtime stuff (fopen, fread, fprintf, etc...) but a lot of functions exported from libc are not C runtime. Linux is mostly POSIX compatible OS. POSIX is a set of standards that defines low level API how application can interact with OS (like file I/O, signals, memory, etc). C runtime is built on top of these API's.

For opening file, reading in at closing you use these functions:
open - http://pubs.opengroup.org/onlinepubs/007908799/xsh/open.html
read - http://pubs.opengroup.org/onlinepubs/007908799/xsh/read.html
close - http://pubs.opengroup.org/onlinepubs/007908799/xsh/close.html

These function does pretty much the same job as CreateFile, ReadFile and CloseHandle on Windows.

To write to console you simply write to file with file handle 1. First three handles are predefined handles.
0 - stdin, use with read to get keyboard input
1 - stdout, use with write
2 - stderr, use with write

Be aware that functions can be interrupted. This is indicated by returning -1 and setting errno to EINT. This doesn't necessary mean there is an error, you should try to repeat call to function.

For example, opening file will look like this:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
int fd; // file handle

while ((fd = open("file.txt", O_RDONLY)) == -1 && errno == EINTR)
{
    // repeat while fd is -1 and errno is EINTR
}

if (fd < -1) // if fd is still -1 that's an error
{
    // report error
}
// else fd is valid file descriptor


If you are compiling with GNU C Library (most likely you do), then it provides convenient macro TEMP_FAILURE_RETRY:
1
2
3
4
5
6
7
int fd;
TEMP_FAILURE_RETRY(fd = open("file.txt", O_RDONLY));
if (fd < 0)
{
   // error
}
// else success

You can use this macro if you define _GNU_SOURCE to 1 before including unistd.h header.
Well if you really want to avoid linking to libc then the thing you need to use are syscalls.

Syscalls are mechanism how user code can jump to kernel code to perform specific OS task (file I/O, memory management, etc). Most of POSIX functionality are syscalls.

Windows also uses syscalls but those are very undocumented. And they change from version to version. On Linux syscalls numbers and arguments are define by POSIX (I think), and their call mechanism is define by ABI. For x86_64 that is here: http://www.x86-64.org/documentation/abi.pdf (Appending A.2.1).

For example, on x86_64 architecture write and exit calls would look like this:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <stddef.h>
#include <syscall.h>

// returns negative value for error (for example, if error is EINVAL, then -EINVAL is returned)
static int my_write(int fd, const void *buf, size_t size)
{
    long result;
    __asm__ __volatile__(
        "syscall"
        : "=a"(result)
        : "0"(__NR_write), "D"(fd), "S"(buf), "d"(size)
        : "cc", "rcx", "r11", "memory");
    return result;
}

static void my_exit(int code)
{
    __asm__ __volatile__(
        "syscall"
        :
        : "a"(__NR_exit)
        : "cc", "rcx", "r11", "memory");
    __builtin_unreachable(); // syscall above never returns
}

void _start()
{
    char text[] = "Hello, world!\n";

    // for this example let's ignore result of write
    // but you should really handle it
    // 1 is stdout file handle
    my_write(1, text, sizeof(text) - 1);

    my_exit(0);
}


If you compile code, run it and disassemble you see this:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
$ gcc -s -Os -nostdlib -ffreestanding a.c

$ ./a.out
Hello, world!

$ objdump -t a.out

a.out:     file format elf64-x86-64

SYMBOL TABLE:
no symbols


$ objdump -S a.out

a.out:     file format elf64-x86-64


Disassembly of section .text:

0000000000400144 <.text>:
  400144:       b8 01 00 00 00          mov    $0x1,%eax
  400149:       48 8d 7c 24 f1          lea    -0xf(%rsp),%rdi
  40014e:       be 6f 01 40 00          mov    $0x40016f,%esi
  400153:       b9 0f 00 00 00          mov    $0xf,%ecx
  400158:       ba 0f 00 00 00          mov    $0xf,%edx
  40015d:       f3 a4                   rep movsb %ds:(%rsi),%es:(%rdi)
  40015f:       48 8d 74 24 f1          lea    -0xf(%rsp),%rsi
  400164:       89 c7                   mov    %eax,%edi
  400166:       0f 05                   syscall
  400168:       b8 3c 00 00 00          mov    $0x3c,%eax
  40016d:       0f 05                   syscall


No dynamic symbols, and no calls to functions in disassembly.

More info:
https://en.wikibooks.org/wiki/X86_Assembly/Interfacing_with_Linux
http://blog.rchapman.org/post/368...inux-system-call-table-for-x86-64

Be aware that each architecture (i386, arm, etc...) has a different way of performing syscall operation. So you'll nee to write them manually.

But make sure you understand there are disadvantages for not linking to libc. I'm not sure how it works nowadays on modern Linux'es, but you might run into issues when dynamically loading shared libraries (so files, that's like dll file on Windows). And when debugging you might have some issues with threads. I repeat - I'm not sure, it might work, but it might not.

I would suggest to link to libc. libc is not like C runtime on Windows, where Microsoft change it all the time and provide bazillion of dll files for it. On normal Linux distribution every single component in normal situation uses same libc file. And for example, if you will want to use XCB library to create windows and process events, you will need to use libc library. More specifically - free function: http://xcb.freedesktop.org/tutori...eivingevents:writingtheeventsloop Result of xcb_wait_for_event/xcb_poll_for_event needs to be released with free function. Stupid, but well there's no easy way around it (unless you want to manually implement X protocol).

Edited by Mārtiņš Možeiko on
Thanks for the amazing details! I appreciate it!
My initial thoughts were to avoid libc as different distributions and their versions ship with different versions of libc, but I am currently (for my day job) in need of creating a pre compiled executable that will work on every distribution of Linux (and their different versions of libc);
Statically linking against libc does either not work properly or is legally not allowed.
Not using libc will have same problems as statically linking to libc, so that reason doesn't really matter.

libc is just a name of library. It has many different implementations.
glibc - LGPL license. This is what most desktop distribution ship with.
newlib - collection of different licenses, mostly BSD and public domain.
musl - MIT or BSD
...and more

You could build musl or newlib and link it to statically. Then you could use pretty much regular functions for almost everything.

Here's some comparisons:
http://www.etalabs.net/compare_libcs.html
http://wiki.osdev.org/C_Library
http://www.linux.org/threads/a-variety-of-c-standard-libraries.7876/

Edited by Mārtiņš Možeiko on
Sounds like a plan!
I'll try that; thanks!

Btw, when Casey talks about (eventually) not even linking against the C library, does he mean only on Windows because Microsoft changes it a lot?
Or also for Linux?
I guess the Raspberry Pi is yet another story...
Well on Windows he doesn't want to link to C library because that's a library. Casey doesn't like (bad) libraries very much :) He uses functions what OS provides. There is nothing that C runtime provides that Handmade Hero needs (except for some temporary stuff for debugging, like snprintf).

On Linux the POSIX functions from libc are those that OS provides. While you can use syscalls directly, I think it's just more work and using POSIX functions for Handmade needs is perfectly fine. Using syscalls directly won't change the fact you are still using POSIX functions (open, read, close) you'll will just need additional code to call them (what I wrote above). Not sure how useful that is.

As far as I have understood on Raspberry Pi there won't be any OS at all. Handmade Hero will boot itself and will have all the freedom it wants, you could say that Handmade Hero will be OS itself.
Thanks :-)
mmozeiko

...
For opening file, reading in at closing you use these functions:
open - http://pubs.opengroup.org/onlinepubs/007908799/xsh/open.html
read - http://pubs.opengroup.org/onlinepubs/007908799/xsh/read.html
close - http://pubs.opengroup.org/onlinepubs/007908799/xsh/close.html
...


Also, for the sort of atomic position-and-read we do in Handmade Hero, you can resort to pread, which is also POSIX.
mmozeiko
On Linux syscalls numbers and arguments are define by POSIX (I think), and their call mechanism is define by ABI.


That's kind of true and kind of not true. POSIX is a family of standards. As far as the API goes, it mandates a C API, but doesn't mandate how they're implemented. It definitely doesn't mandate syscall numbers!

On Linux, some POSIX calls are system calls (e.g. open, read), some are libraries built on top of OS calls (e.g. pthreads), and some bits of POSIX aren't implemented at all (e.g. STREAMS).

mmozeiko
I'm not sure how it works nowadays on modern Linux'es, but you might run into issues when dynamically loading shared libraries (so files, that's like dll file on Windows).

Unlike Windows, dynamic shared objects are not built into the operating system. Conceptually, the way that you use one on a modern Linux is to memory map the file into your address space, then trawl through the ELF headers to understand what's in it.

You could implement all that yourself, however, the way it's actually done in Linux is that every program is considered as something like bytecode which is "interpreted" by a special interpreter called ld-linux.so. Other ELF platforms have something similar; on FreeBSD it's called ld.so, and on OS X it's called dyld.

There's a good description of how all this stuff works in John Levine's book, Linkers and Loaders. Levine put a copy of the manuscript free online, but some of the diagrams are missing. Another option is to look at the dynamic loader from the Flux OSKit, called rtld. It's a bit old, but it'll give you a good idea of what a minimal dynamic loader looks like in an ELF/POSIXy environment.

mmozeiko
I would suggest to link to libc.

I would too. It may not be as Handmade as you'd like, but the simple fact is that Unix is a C virtual machine, and so doesn't really work well without libc.
My current precompiled application links against glibc, but it won't run on a Linux distribtion with a different version of glibc; because of LGPL I can't really statically link against glibc.
I have briefly looked into musl and it seems that if I compile my application statically against musl on my machine it should run on any Linux distribution.
Would that be about right? What do you guys think?
In my previous company I have successfully shipped product on Linux as binary executable with linking to system libc. What we did is we compiled on older distribution version (If I remember correctly, Ubuntu 10.04). Then executable run just fine on other distributions like newer Ubuntu, or Suse, Fedora, CentOS, ArchLinux. I don't remember exact list what we tested on.