Mac os x vm_allocate vs win VirtualAlloc

Does anyone know how of how you specify a base address in vm_allocate? And specifically why a special address should work or not. Mach works a lot different than windows and it is really hard to find this info. Or should this be approached in some other way?

I can get it to run using:
1
2
3
4
5
6
7
8
9
BaseAddress = Gigabytes(5);// random!


    kern_return_t result = vm_allocate((vm_map_t)mach_task_self(),
                                       (vm_address_t*)&BaseAddress,
                                       TotalSize,
                                       VM_FLAGS_FIXED);

    OSXStateGameMemeoryBlock = (void*)BaseAddress;

Edited by Filip on
If you don't like vm_allocate you can use mmap to allocate mempoy (mmap works also on Linux). Somebody here says that mmap is faster than vm_allocate: https://bugzilla.mozilla.org/show_bug.cgi?id=691731

Edited by Mārtiņš Možeiko on
Filip,
I switched my Mac platform layer over to mmap. I'm still cleaning up some code from the past few days so it's not in github yet, but here's an excerpt:

1
2
3
4
5
6
7
8
9
    char* RequestedAddress = (char*)Gigabytes(8); // Make this somewhere above 4GB
    _gameMemory.PermanentStorage = mmap(RequestedAddress, totalSize,
                                        PROT_READ|PROT_WRITE,
                                        MAP_PRIVATE|MAP_FIXED|MAP_ANON,
                                        -1, 0);
    if (_gameMemory.PermanentStorage == MAP_FAILED)
    {
        printf("mmap error: %d  %s", errno, strerror(errno));
    }
A couple of things here…

I timed the attached test program of that bug report on my 10.10 system (the report was filed against OS X 10.7) and the gap between vm and mmap calls is now much smaller.

And more importantly, the way they're using the allocations, at least in the test program, is per 256KB block, and the test measures 51200 allocations and deallocations, which take a grand total of ~50ms all together.

In the case of HandmadeHero, the app makes about 3 calls to reserve memory from the OS and all at app startup time, so the sub-microsecond time difference between these two calls is irrelevant.

As for vm_allocate, I tried the following little test program

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <mach/mach_init.h>
#include <mach/vm_map.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
    // vm_address_t and vm_offset_t are just uint64_t on LP64 machines
    
    vm_address_t address = 2ul * 1024 * 1024 * 1024 * 1024; // 2TB
    vm_offset_t byteSize = 1ul * 1024 * 1024 * 1024;        // 1GB 

    kern_return_t result = vm_allocate((vm_map_t)mach_task_self(),
                                        &address,
                                        byteSize,
                                        VM_FLAGS_FIXED);

    if (result == KERN_SUCCESS) {
        printf("got mem at %lu\n", address);
    }
    else {
        printf("failed to get memory, error: %d\n", result);
    }
}


And that works fine.

Jeff, you posted as I was writing this up, are there other reasons to use mmap over vm_allocate?

--

Update: when you start reading/writing to the memory allocated by vm_allocate, the pages are zero filled, which is a requirement for the HH app.

Edited by zenmumbler on
Great to have so much competence here! Better answers then the whole rest of the internets :)
Btw, I just got state replay to work. (Currently the code is filled with litter and stray code but pushed to repo anyway. I used vm_alloc with random addr above 5 GB (why?)
I have a 15" retina MacBook Pro with crazy fast pci flash but the writes (using fwrite) takes way too long. I'm thinking of saving the replay state to memory and then writing to disk with a separate command.
I switched over to mmap() because I was thinking of doing some clever debugging/testing tools that would let me substitute a file descriptor into the same mmap code to load saved memory images. Otherwise, vm_allocate() is fine.

As far as picking the requested address, I'm not sure there is any magic value, it's just that when you are asking for a large contiguous block, you need to find a big empty space in virtual memory, and lots of stuff ends up living in the 0-4GB range. If you run the vmmap command line utility on your running application (get the pid from top, ps, or Activity Monitor), you can get an idea of your application's memory layout:

Here's a little excerpt of my HandMadeHero app's vmmap output. You can see my requested large mmap'd memory starting at the 8GB address that I highlighted in red:

MALLOC_LARGE (freed) 000000010ed00000-000000010eeaf000 [ 1724K] rw-/rwx SM=ZER
MALLOC_SMALL (freed) 000000010f000000-000000010f800000 [ 8192K] rw-/rwx SM=PRV ...cHelperZone_0x100082000
MALLOC_SMALL 000000010f800000-0000000110000000 [ 8192K] rw-/rwx SM=PRV ...cHelperZone_0x100082000
[color=#ff0000]VM_ALLOCATE 0000000200000000-0000000208000000 [128.0M] rw-/rwx SM=PRV
VM_ALLOCATE (reserved) 0000000208000000-0000000304000000 [ 3.9G] rw-/rwx SM=NUL ...ess space (unallocated)[/color]
__DATA 000012348029b000-00001234802f3000 [ 352K] rw-/rwx SM=COW .../AMDRadeonX3000GLDriver
__DATA 00001234802f3000-00001234802f4000 [ 4K] rw-/rwx SM=PRV .../AMDRadeonX3000GLDriver
Ok, I switched to mmap mostly since Casey started to map files on the windows side. It works fine and the delay when starting recording is ok now.
I do like this:
1. For GameMemoryBlock I use mmap for anon alloc like Jeff does above
2. For the ReplayBuffers I use mmap like:
1
2
3
4
5
6
 
 ReplayBuffer->FileDescriptor = open(
                ReplayBuffer->FileName, O_RDWR | O_CREAT | O_NONBLOCK|  O_TRUNC, 0666);
ReplayBuffer->MemoryBlock = mmap(0, TotalSize, 
                PROT_READ|PROT_WRITE,MAP_PRIVATE, 
                ReplayBuffer->FileDescriptor, 0);

3. When I start recording, I ftruncate the filedescriptor to "TotalSize"

This leads to:
A. slight delay when the files are truncated the first time (then instantaneous)
B. delay when quitting (writing?)

I would like to:
I) circumvent the delay for A (like some kind of nonblocking truncate)
II) circumvent the delay for B

Do anyone of you have any idea?
Another approach is to seek to the end of the file descriptor and then write a single byte to realize the file. This works fine too. But differently.
1
2
3
4
5
off_t ret = lseek(ReplayBuffer->FileDescriptor, State->TotalSize-1, SEEK_SET);
        if (ret )
        {
            int ret = write(ReplayBuffer->FileDescriptor, "", 1);
}

1. Condition B is fixed
2. Delay for A is more or less the same, maybe worse

Is there some secret sauce to add?
I think on Linux you could use fallocate function to allocate space in file.
On OSX fallocate doesn't exist, but you can use fcntl with F_PREALLOCATE. See how to use it in this function from Firefox source: https://hg.mozilla.org/mozilla-ce...a907/xpcom/glue/FileUtils.cpp#l59
Thanks, I actually tried that but it doesn't make any difference as far as I can tell. Might be worth testing under more controlled circumstances.
For now,on a practical level, it works "good enough". It's easy and fast to tweak and test after the first initial write.
I came across this thread in search of a way to do the Windows equivalent of VirtualAlloc on mac. I'm just leaving a note here to anyone in the future that might come across this. MAP_FIXED is discouraged (see: https://developer.apple.com/libra...anPages_iPhoneOS/man2/mmap.2.html) and I kept getting "Cannot allocate memory" errors (from perror()) until I removed that flag. As the documentation specifies when using MAP_FIXED the "requestedAddress" (first parameter of mmap) must be a multiple of the page size. So unless you know what the page size and can ensure this to be the case, be cautious with using MAP_FIXED.
I'm pretty sure they say it is discouraged because it requires correctly passing arguments and handling errors if anything happens. If you don't do that, your application won't work. And it will be potentially less portable, especially in older 32-bit architecture where finding unused address can be tricky.

But as long as your code is correct, there is nothing wrong asking for fixed address with mmap. They probably wrote that document very long time ago and did not bother updating it. Here's how Linux removed "discouraged" from mmap documentation: https://lore.kernel.org/patchwork/patch/857794/