Using a function call to prevent compiler memory reordering. Yay/nay?

Greetings everyone. I am very late to the party (by around 220 days from Casey's first HH stream) so i apologize if this is a duplicate.

NOTE: I have not followed the series from start to finish, but rather have been browsing the Youtube playlists that revolve around specific functionality (Thank you so much for doing that Casey). As a result my own individual code base does not mirror HH's (Virtual OS-like platform abstraction instead of Casey's implementation, Vulkan based rendering (eventually) rather than winapi/OpenGL based rendering, etc) and i might not be quite caught up with changes Casey might have made after the videos listed in a specific playlist.

On day 124, Casey implemented memory barriers and explained how they prevented memory reordering done by the compiler. _Read/WriteBarrier are VC++ only similar to how asm volatile is GCC only. Plus The MSDN is trying to suggest using the std::atomic types which means using the c++ std lib (which I like Casey wanted to avoid as much as possible).

I do not like using any deprecated functions/types/apis/etc at risk of future versions a thing A no longer supporting them resulting in code having to be rewritten, plus the _Read/WriteBarrier properties are not even available to me because i am using MinGW-64 rather than VC++. GCC's asm volatile might work, but regardless of GCC or VC++ much code/#defines/refactoring will need to be done in the event that i want to use different compilers to maximize portability (GCC is deprecated in Android NDK for example in exchange for Clang)

Doing deeper research led me too a third solution; just use a function call, but that seems to dirty to not be viable without some form of major drawback.

1
2
3
4
5
6
void doSomeStuff(Foo* foo)
{
    foo->bar = 5;
    sendValue(123);       // prevents reordering of neighboring assignments
    foo->bar2 = foo->bar;
}


Rather than all these explicit compiler barriers/std c++ atomics, could i just use a function call?

Link to the forum post i found discussing using a function call as a barrier.

Edited by Feed on
You can only really count on memory ordering around a function call if the two are in separate compilation units. If the compiler can see both at the same time, all bets are off. (Remember: inline is a *suggestion*, the compiler is free to inline whatever the hell it wants inside a single compilation unit.) Since HH uses a unity build, almost everything is in a single compilation unit and ergo you can't count on it. (A function call can also be pretty high overhead, particularly in the context of inserting profiling hooks.)

A macro is still probably the best option. Define it as needed for each compiler you need to support. If you consistently use your macro rather than the intrinsic / inline asm / whatever itself, that's only a single thing you need to change in the case it breaks / is removed.
Thanks for responding.

Thing is one of the areas that my baseboard does not mirror HH is that mine actually does have multiple compilation units, so a function would be more viable than it would be in HH based on what you said, but that overhead likely is not worth the simplicity.

Ill just rely on a macro like you said that is proceduraly set based on the detected compiler.
Feed
plus the _Read/WriteBarrier properties are not even available to me because i am using MinGW-64

Not sure what MinGW do you use, but my MinGW supports _Read/WriteBarrier's just fine:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
C:\test>type test.c
#include <intrin.h>

void f()
{
  _ReadBarrier();
  _WriteBarrier();
}

C:\test>x86_64-w64-mingw32-gcc.exe -Wall -Wextra -c test.c

C:\test>x86_64-w64-mingw32-gcc.exe -v
Using built-in specs.
COLLECT_GCC=C:\msys2\mingw64\bin\x86_64-w64-mingw32-gcc.exe
COLLECT_LTO_WRAPPER=C:/msys2/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/5.2.0/lto-wrapper.exe
Target: x86_64-w64-mingw32
Configured with: ../gcc-5.2.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --with-gxx-include-dir=/mingw64/include/c++/5.2.0 --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=c,lto,c++,objc,obj-c++,fortran,ada --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-time=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --enable-version-specific-runtime-libs --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev4, Built by MSYS2 project' --with-bugurl=http://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld
Thread model: posix
gcc version 5.2.0 (Rev4, Built by MSYS2 project)


Afaik mingw tries to be as compatible to MSVC as gcc can be. This means extra patches to headers and compiler. This allows mingw to target C runtime from MS (msvcrt.dll) rather than glibc or newlib or something else. Additionally it supports few MSVC specific features - like Read/WriteBarrier.

Edited by Mārtiņš Možeiko on
Wow, guess i need to look more into my code then. Thanks for sharing.

Only issue i see is that i eventually plan to migrate away from using mscrt using the guide you actually posted (epic thanks for that by the way). Wouldn't that make them not work?

Also just as an overall question/observation.

_Read/WriteBarrier is deprecated.
certain explicit GCC calls such as _sync_synchronize (i know Casey said x86/x64 do not require HW barriers, but for other cases such as ARM) are also deprecated.
For each search that tells me asm volatile("" :: "memory") is deprecated, i find another one that says it is not.
Everything i find is saying to use std::atomic rather than any of these, which means C++ std lib.

Are programmers who just use straight C (or trying to avoid C++ std lib) just out of luck? Or will these 'deprecated' methods always be supported, because as i mentioned in the initial post my worst fear is making everything work with one of these deprecated methods and in the future discover that future compilers are no longer supporting them.

P.S Im assuming that since _Read/WriteBarrier rely on the mscrt, that asm volatile/_sync_synchronize must rely on some crt library. Is this true.

P.S.S The macro in the windows platform definitions. Pretty sure i might be doing something wrong. I have very little experience in lockless multithreading.
1
#define RELEASE_FENCE() asm volatile("" ::: "memory") __sync_synchronize();
Barriers will still work if you are not using C runtime. Barriers are compile time feature - they influence what kind of instructions are generated. They won't call any external function. Similar how you can use SSE intrinsics (_mm_sqrt_ps) without C runtime.

asm volatile is just an inline asm. Simply explaining - GCC takes your asm code (in this case empty string "") and puts in .S intermediate file which gets assembled to object file. There's nothing to call in C runtime. Pure copy & paste (with input/output operand assignments).

Microsoft also deprected fopen and strcpy. But that doesn't mean you can't use them. They are simply forcing people to use their new stuff (like Windows 10). Also there are a lot more specific atomics rather than just compiler memory read/write barrier. For GCC these atomics are accessible by compiler __atomic builtins: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html For older gcc versions they were available with __sync builtins: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html
But better use __atomic builtins, they offer more flexibility for atomic operations. They allow to specify exact memory order you need for your operation. On some architectures that can generate code with significantly better performance. Here's a good article on this: http://preshing.com/20140709/the-...of-memory_order_consume-in-cpp11/ It uses C++11 atomics, but those can easily be rewritten using GCC __atomic builtins.

There's not point of using asm volatile("" : : memory) together with __sync_synchronize. Second one implies compiler barrier.

Edited by Mārtiņš Možeiko on
Wow, i was actually surprised at how much you gave me.

Thanks a million! These resources are amazing!
Also if you have whole-process optimization enabled then the linker can inline the function call and then move the memory ops around again.

Unless you explicitly mark to the compiler that you want a memory barrier the linker and compiler are free to reorder as much as it wants.