Yes, important point. But I mean the first read, after the write. There are ways to know if this is the case, like for instance WaitForMultipleObjects in some cases, or locking in other cases.
Or is even this wrong?
Again, compiler fences don't offer any guarantees on when the updated value will be read; they just guarantee that whenever you read the value, previous memory operations will have completed (e.g. after you read the "shared counter" you know that previous writes to the data structure it protects have finished).
In x86(_64), what guarantees that you will read the value after it has been written to is the strong memory model enforced by the cache subsystem. When you want to read a value, the CPU will check with the cache which will have an Invalid
line (since the CPU that wrote to the value will have the same line in Modified mode; or Exclusive if it wrote it back to higher level cache or main RAM) or won't have the line at all, and will have to fetch it.
The distinction is important since different CPU architectures have different memory models and, while compiler fences are enough in x86(_64) they might not be in other ones (e.g. ARM) which might need CPU fences to properly synchronize data between CPU cores.
I'm not sure what you mean by
There are ways to know if this is the case, like for instance WaitForMultipleObjects in some cases, or locking in other cases.
but if you are referring to Semaphores, Mutexes and the like, they are higher level synchronization structures implemented in terms of fences and/or atomics at the low level (+ OS/kernel space data structures to keep a record to what to notify, etc.).