Skip to content

Latest commit

 

History

History
74 lines (41 loc) · 4.25 KB

README.md

File metadata and controls

74 lines (41 loc) · 4.25 KB

zib

High performance Unbounded Multi-Producer Single-Consumer Queues for C++20.

The implementations are lock-less, wait-free and (yet to be proven formally) linearizable.

See Performance for benchmark results.

Contact: donald.rupin@pm.me

Repository

Queues are in self contained headers. Include one or both into your project.

Using mpsc_zib

Requires gcc 11+

Tests

Tests can be built:

mkdir build && cd build
cmake -GNinja -DCMAKE_BUILD_TYPE=Release ..
ninja
ctest

Design

These queues are built as a variation of multilist mpsc queue by Andreia Coorreia and Pedro Remalhete. It includes some of the possible improvements listed in the paper, and some additional improvements for cache coherency and reduced system calls to allocate.

The first addition is that the queue operates on a linked list of Node arrays, rather then individual Nodes. This means that for each producer, the enqueue operation is similar to that of a ring buffer. Unlike a ring buffer, when the array reaches the end, a new one is allocated and linked.

The second addition is that each writer also implements a Single-Producer Single-Consumer queue to "recycle" the Node arrays back to each respective producer. Once a consumer depletes a Node array, the memory is reset and pushed onto the SPSC queue. When a producer needs to allocate a new ring buffer, it checks to see if it can pull an array from the SPSC first.

The result of the design is that if the reader can somewhat keep up with the producers, a type of linked ring buffer mode can be achieved, where no new memory is allocated. In the worst case where the producers pull ahead, they can simply allocate more Node arrays. The SPSC is a bounded ring buffer, and if it is full, the Node arrays are de-allocated to prevent memory build up.

Variations:

spin_mpsc_queue

The spin queue will never block but can return a nullopt if queue is empty.

wait_mpsc_queue

The wait queue will block when the queue is empty, and resume when the next element is enqueued. This requires an additional atomic variable and a couple of extra atomic operations. The block is achieved with the new std::atomic::wait. Although benchmarks indicate this queue is more performant for some reason.

overflow_mpsc_queue

A wait_mpsc_queue queue but with the property that the number of threads is not bounded. Thread id's over the allocated amount are allowed, but the elements added by the extra threads are by themselves are not linearizable. The overflow is implemented similar to Dmitry's mpsc queue.

Performance

The repository contains a benchmark and some reference implementations to compare zib queues with others. The benchmark times the amount of time required to concurrently enqueue 1,000,000 elements per thread onto the queue, whilst the consumer attempts to dequeue all elements. The total time is the time it takes the consumer to successfully dequeue number_of_threads * 1,000,000 elements.

All zib queue variations where tested against:

The overflow_mpsc_queue queue was ran twice:

  • [normal] The queue was built with the number of producer threads.
  • [overflow] The queue was built with 2/3 of the number of producer threads. The remaining threads contest for the overflow write spot.

The following results were produced with Intel i7-10875H (16 core) Dell XPS.

Results.

The wait_mpsc_queueis the fastest for all threads benchmark. I would of guessed that the spin_mpsc_queue would beat it - I am not sure why wait_mpsc_queue with the extra atomic operations and blocking is quicker. The overflow[normal] times indicate that the overhead for maintaining the extra overflow write spot is negligible. Although, the overflow[overflow] times show that if this write spot is heavily contested, throughput is reduced significantly. An application can achieve nearly peak performance with overflow_mpsc_queue so long as it can guarantee that overflow writes are rare.