I'm using C (more exactly: C11 with gcc) for developing some low-latency software for x64 (more exactly: Intel CPUs only). I don't care about portability or any other architecture in general.
I know that volatile is in general not the first choice for data synchronization. However, those three facts seem to be true:
volatileenforces writing data to memory and as well reading from memory (=so it's not allowed to "cache" the value in a register and it also implies that some optimizations cannot be done by the compiler)volatileaccesses must not be reordered by the compiler- 4 byte (or even 8 byte) values are always atomically written on x64 (same is true for reading)
Now I have this code:
typedef struct {
double some_data;
double more_data;
char even_more_data[123];
} Data;
static volatile Data data;
static volatile int data_ready = 0;
void thread1()
{
while (true) {
while (data_ready) ;
const Data x = f(...); // prepare some data
data = x; // write it
data_ready = 1; // signal that the data is ready
}
}
void thread2()
{
while (true) {
while (!data_ready) ;
const Data x = data; // copy data
data_ready = 0; // signal that data is copied
g(x); // process data
}
}
thread1 is a producer of Data and thread2 is a consumer of Data. Note that is used those facts:
datais written beforedata_ready. So whenthread2readsdata_readyand it's 1, then we know thatdatais also available (guarantee for the ordering ofvolatile)thread2first reads and storesdataand then setsdata_readyto 0, sothread1can again produce some data and store it.data_readycannot have a weird state, because reading and writing anint(with 4 bytes) is automatically atomic on x64
This way was the fastest option I've finally had. Note that both threads are pinned to cores (which are isolated). They are busy polling on data_ready, because it's important for me to process the data as fast as possible.
Atomics and mutexes were slower, so I used this implementation.
My question is finally if it's possible that this does not behave as I expect it? I cannot find anything wrong in the shown logic, but I know that volatile is a tricky beast.
Thanks a lot