Quantcast
Channel: Lockless deque with non atomic sized items - Stack Overflow
Viewing all articles
Browse latest Browse all 2

Lockless deque with non atomic sized items

$
0
0

I'm using a worker deque as described in "Correct and Efficient Work-Stealing for Weak Memory Models". I want queue items to be 16 bytes in size, and I only care about Intel/AMD Windows x64 and VS 2019.

I understand that 16 byte (Say __m128) aligned load/stores are typically atomic in modern processors, but they are not guaranteed by specification.

The types for the deque are:

typedef struct {    atomic_size_t size;    atomic_int buffer[];} Array;typedef struct {    atomic_size_t top, bottom;    Atomic(Array *) array;} Deque;

Importantly the array buffer items specifically have an atomic type. If I compile this with VS2019 I can see that it bloats the buffer item size with a spinlock - I don't want this. Is it possible to prevent it? Specifically as I only care about x64 which comes with certain guarantees.

The actions on the deque are given by the functions:

int take(Deque* q) {    size_t b = load_explicit(&q->bottom, relaxed) - 1;    Array* a = load_explicit(&q->array, relaxed);    store_explicit(&q->bottom, b, relaxed);    thread_fence(seq_cst);    size_t t = load_explicit(&q->top, relaxed);    int x;    if( t <= b ) {        /* Non-empty queue. */        x = load_explicit(&a->buffer[b % a->size], relaxed);        if( t == b ) {            /* Single last element in queue. */            if( !compare_exchange_strong_explicit(&q->top, &t, t + 1, seq_cst, relaxed) )                /* Failed race. */                x = EMPTY;            store_explicit(&q->bottom, b + 1, relaxed);        }    } else { /* Empty queue. */        x = EMPTY;        store_explicit(&q->bottom, b + 1, relaxed);    }    return x;}void push(Deque* q, int x) {    size_t b = load_explicit(&q->bottom, relaxed);    size_t t = load_explicit(&q->top, acquire);    Array* a = load_explicit(&q->array, relaxed);    if( b - t > a->size - 1 ) { /* Full queue. */        resize(q);        a = load_explicit(&q->array, relaxed);    }    store_explicit(&a->buffer[b % a->size], x, relaxed);    thread_fence(release);    store_explicit(&q->bottom, b + 1, relaxed);}int steal(Deque* q) {    size_t t = load_explicit(&q->top, acquire);    thread_fence(seq_cst);    size_t b = load_explicit(&q->bottom, acquire);    int x = EMPTY;    if( t < b ) {        /* Non-empty queue. */        Array* a = load_explicit(&q->array, consume);        x = load_explicit(&a->buffer[t % a->size], relaxed);        if( !compare_exchange_strong_explicit(&q->top, &t, t + 1, seq_cst, relaxed) )            /* Failed race. */            return ABORT;    }    return x;}

With a lot of that being redundant and should optimise out on x64. In fact the paper specifies only a memory fence is needed in the take function at the thread_fence(seq_cst) line. Although I am not sure if this is true if the queue item type is 16 bytes in size?

It seems as take()/push() must happen in the same thread, so there is no issue between them. Thus the danger is any thread that calls steal() reading a partially written 16 byte item. But as push() does a memory fence only after all 16 bytes are written, and updates bottom only after that, it seems on x64 this isn't a concern?

I have done an experiment where I removed the atomic qualifier on the buffer items and used plain assignments to and from the buffer via a volatile pointer. And it appeared to work fine, but obviously that is no certainty!

If this is not possible then perhaps using a cmpxchg16b is a better option to load/store the 16 bytes my specific case? Or complicating it all by having queue items as indices, and locklessly allocating 16 byte slots that are indexed.

So the simplified version of my question is: On x64 can I simply change the definition of the Array buffer type to a volatile pointer to an array of non atomic qualified 16byte struct items, and change the load and store of those items in the above functions to simple non atomic assignment expressions?


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images