volatile - it just ain't what it used to be

[note - I went a little overboard citing sources but I had fun researching this so I left them in for the curious or the bored]

A lot of people think that volatile has something to do with multi-threading and, until today, I've always thought they were wrong. It turns out the truth is somewhere in-between.

For a C++11 ISO compliant compiler (i.e. one which has to conform to a standard that actually defines a memory model and talks about threads), they are wrong - volatile has nothing to do with multi-threading or memory models.

I can cite lots of sources for this:


Do not use volatile except in low-level code that deals directly with hardware.

Do not assume that volatile has special meaning in the memory model. It does not. It is not - as in some later languages - a synchronization mechanism. To get synchronization, use an atomic, a mutex, or a condition_variable

-- The C++ Programming Language, Fourth Edition by Stroustrup


Item 40: Use std::atomic for concurrency, volatile for special memory

Poor volatile. So misunderstood. It shouldn't even be in this chapter, because it has nothing to do with concurrent programming.

-- Effective Modern C++ by Scott Meyers

Volatile: Almost Useless for Multi-Threaded Programming by the Intel architect of their Thread Building Blocks library

Nine ways to break your systems code using volatile by John Regehr (CS Prof and well regarded in the embedded space)

volatile vs. volatile by Herb Sutter (on the ISO C++ 11 concurrency working group)


It turns out, however, that pre-C++11, volatile did have a role to play in (non-portable) C/C++ code compiled with Visual Studio (and perhaps others?). In the past Visual Studio imparted acquire/release semantics to volatile variables (see: MSDN's Synchronization and Multiprocessor Issues). As of Visual Studio 2013, volatile is still treated this way but they now warn that this is non-portable (see MSDN's volatile (C++)).

All of that is a long winded way of saying that, although I believe the HMH code that uses volatile is correct when compiled by Visual Studio, it might be helpful to use std::atomic to act as a guide to people porting to other platforms.
Volatile appears to still has the same role as I've always thought it has, to tell the compiler "don't assume this variable can't change".

i.e,

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
int running = 1;

int threadproc(void * foo)
{
   while (running)
   {
      do_work();
   }  
   return 0;
}


If "running" isn't volatile, it's entirely possible the threadfunc doesn't re-read it on every loop and thus never stops.

Offhand I can't think of any other uses for volatile. I may be wrong (in both cases) =)
The truth is not actually somewhere in between. The people who think "volatile" has something to do with multi-threading are completely correct.

Volatile is essential for multithreading, it's just not about atomicity. It's about preventing compiler optimizations that would break memory visibility between threads. It's a lot like a compiler fence, in that respect: it is not about telling the CPU what to do, it's about telling the compiler what to do.

My explanation, and use, of volatile on the stream was correct as written and also has nothing to do with atomic operations, much as I explained on the stream. The atomic operations are handled by InterlockedCompareExchange in the source code. That is, in fact, the only place where anything atomic is happening in the codebase.

The volatile keyword is not there for anything related to atomicity, which is a CPU-side thing. Volatile is a compiler-side thing. It is there to prevent the compiler from thinking that it can use a previously read version of the memory location, which it will happily do when things are not marked volatile if it thinks that there are no aliased pointers.

Again, this has absolutely nothing to do with atomicity and I very clearly explained this on the stream where we covered it. It has to do with preventing compiler optimizations that would cause the use of old data after another thread has already modified a value in a way that was visible to the current thread.

As for std::atomic, there is somewhere between a negative infinity and a zero percent chance that we will ever use anything from the C++ standard library in Handmade Hero. In this particular case it is also superfluous and unnecessary, since the threading code is in the platform-specific portion of the code and can trivially call InterlockedCompareExchange directly.

- Casey
Hi Casey,

It is true that pre-C++11 on the Visual Studio compiler, volatile provided both atomicity and memory visibility (acquire/release) guarantees. These were not non-standard then because the standard had nothing to say about threads and memory models but they were (at least technically) non-portable. Visual studio continues to offer these semantics but I doubt, for example, clang for windows would (but I really don't know) because they are non-standard.

I think what you are saying (please correct me if I am wrong) is that because the volatile keyword forces the compiler to generate code that always reads/writes to memory (which could be just its local cache) for the current value of a volatile variable that it is guaranteed to see a value written by another thread. That is a statement about memory visibility (and hence the memory model) and is not true in a portable sense (it is true, by default, with Visual Studio but they warn that the behavior is not portable), even if it may be true on specific CPUs.

As I said, I believe the code you have there is correct because of Visual Studio's non-standard volatile semantics. Even without that it is probably correct but I am worried about whether the writes to CompletionGoal and CompletionCount are guaranteed to be visible when compiled with, say, clang for windows (when that is feasible):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
internal void
Win32CompleteAllWork(platform_work_queue *Queue)
{
    while(Queue->CompletionGoal != Queue->CompletionCount)
    {
        Win32DoNextWorkQueueEntry(Queue);
    }

    Queue->CompletionGoal = 0;    // guaranteed to be visible to other threads?
    Queue->CompletionCount = 0;   // guaranteed to be visible to other threads?
}


It is probably fine, but it is not obvious to me by inspection. I don't claim to be a lock-free programming guru - far from it. My multi-threading code tends to be very conservative. I am only going by my interpretation of the sources I have found by people I feel I should be able to trust (Stroustrup, Meyers and Sutter). If you don't want to use std::atomic that's fine but, since this is a learning stream I think it is helpful to be very clear about what volatile really means when compiled with a modern C++ compiler.

If you feel my interpretation of what volatile guarantees in portable code is incorrect, I would really appreciate being corrected and understanding where I am in error. I am, after all, following HMH to learn and improve.

Edited by Patrick Lahey on
Here are the relevant sections of the C++11 standard, page 145:

[ Note: volatile is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation. See 1.9 for detailed semantics. In general, the semantics of volatile are intended to be the same in C ++ as they are in C. — end note ]

And section 1.9 (page 8 ) states that:

The least requirements on a conforming implementation are:
— Access to volatile objects are evaluated strictly according to the rules of the abstract machine.

And some of your references don't even question HMH's usage of volatile, I think. The intel guide you link lists three portable uses of volatile and one of them is:

memory that is modified by an external agent

The second page of the dr Dobbs article says:

To safely communicate with special hardware or other memory that has unusual semantics, use unoptimizable variables: ISO C/C++ volatile.
rathersleepy
It is true that pre-C++11 on the Visual Studio compiler, volatile provided both atomicity and memory visibility (acquire/release) guarantees.

Not sure how to say this any more forcefully: we have never talked about, or used, any meaning of volatile that is not the traditional one guaranteed by the C specification.

I think what you are saying (please correct me if I am wrong) is that because the volatile keyword forces the compiler to generate code that always reads/writes to memory (which could be just its local cache) for the current value of a volatile variable that it is guaranteed to see a value written by another thread.
Correct. All x64 processors ensure that if a write is issued to a memory location, a corresponding read from that memory location on another CPU core will see the result of the write. So the only requirement for memory visibility on x64 is that the compiler must not optimize out the loads and stores to that memory, hence the need for volatile.

Note, again, that this has nothing to do with atomicity. InterlockedCompareExchange is what guarantees the atomicity. volatile is only about preventing optimization.

That is a statement about memory visibility (and hence the memory model) and is not true in a portable sense
Well, it may not be true in a porting scenario that we are not interested in - ie., in a scenario where you used a machine where the cache was completely manual, and you had to issue instructions to write it to memory or invalidate it. But if we were to port to a platform like that, we would have to do a lot of other work as well that has nothing to do with std::atomic. For example, how would the thread that does the blit see all the rendering work that all the other threads did? None of the memory accesses performed by the other threads would be seen by the blit thread, so technically we would have to go through and manually invalidate every single tile's memory in order to run correctly on this platform.

So, yes - volatile doesn't help that scenario, but neither does std::atomic :) This is why the queue code is very specifically in the platform-specific portion of our code: because you really do want to know what platform you're on when writing multithreading code, because you need to take different steps to ensure correctness on different platforms, and performance is a concern so you cannot just mark up your entire framebuffer with std::atomic or you'd run pathetically slow for obvious reasons.

I am worried about whether the writes to CompletionGoal and CompletionCount are guaranteed to be visible when compiled with, say, clang for windows (when that is feasible)
There is nothing specific to the compiler that we are relying on here. clang for Windows, or Linux for that matter, will work just fine. You have to change the atomic intrinsic, of course, because clang uses GCC syntax (you change InterlockedCompareExchange to __sync_lock_test_and_set). But other than that the code is the same.

I am only going by the sources I have found by people I feel I should be able to trust (Stroustrup, Meyers and Sutter).
I wouldn't trust that trio to program my VCR, let alone my queue code. But I also suspect that they are not actually contradicting what I'm saying, if you read them carefully (I'm not going to, that's going to have to be up to you :P ), but rather just trying to say that since people often think volatile does something better than just preventing load/store optimizations, they would prefer it if they started using std::atomic instead since that does atomicity.

If you feel my interpretation of what volatile guarantees in portable code is incorrect, I would really appreciate a source when I can go to update my knowledge. I am, after all, following HMH to learn and improve.
Well, as long as you understand what volatile actually does (prevent the compiler from optimizing out loads and stores), then I think you don't really need any more sources, you just need to think about what you are doing a little more carefully! It sounds like you understand what volatile does, you are just coming to some odd conclusions about portability, and I'm not sure exactly where that confusion is coming from (perhaps those three fine fellows listed above, who often tend to write things obtusely).

We could try to talk about this on the stream some more if it would help.

- Casey
Thanks to everyone who contributed to this thread - particularly Casey for his long detailed responses. I must admit I'm confuzzled so I think I need to "Grinch" this one:


And he puzzled and puzzled 'till his puzzler was sore. Then the Grinch thought of something he hadn't before.

-- How The Grinch Stole Christmas by Dr. Seuss

Edited by Patrick Lahey on