ascii-chat 0.8.38
Real-time terminal-based video chat with ASCII art conversion
Loading...
Searching...
No Matches
Ring Buffer

🔁 Lock-free circular buffers for audio/video More...

Files

file  ringbuffer.c
 đŸŽŻ Lock-free circular buffer for audio streaming with atomic operations
 

Detailed Description

🔁 Lock-free circular buffers for audio/video

Ring Buffer

Overview

Welcome! Let's talk about the ring buffer—one of the coolest data structures in ascii-chat. If you've ever wondered how to pass data between threads really fast without all the overhead of mutexes and locks, you're in the right place.

Picture this: you have one thread capturing video frames (the producer) and another thread encoding them (the consumer). How do you hand frames from one to the other? You could use a mutex-protected queue, but that means locking and unlocking for every single frame. Not ideal when you're doing this 30 times per second!

That's where the ring buffer shines. It's a circular buffer that uses atomic operations instead of locks, making it blazingly fast for the single-producer, single-consumer pattern. Think of it like a race track where the producer and consumer chase each other around—one writing, one reading, never quite catching up to each other.

You can find the implementation in lib/ringbuffer.c and lib/ringbuffer.h.

What makes ring buffers special?

Ring buffers are designed from the ground up for speed. They use C11 atomics instead of mutexes, which means way less overhead when passing data between threads. The design is intentionally simple—no complex algorithms here, just a circular buffer done right.

The implementation takes advantage of a clever trick: power-of-2 capacities allow us to use bitwise masking instead of slow modulo operations. This might sound like a small detail, but it makes index calculations 15-30 times faster!

We also have specialized versions for different use cases. Frame buffers handle ASCII art frames with metadata like magic numbers for corruption detection. Audio ring buffers include jitter buffering to smooth out network hiccups. Each one is tailored to its specific job.

Architecture

Let's dive into how ring buffers actually work under the hood. The design is elegant in its simplicity—just a few atomic variables and a chunk of memory, but combined in a way that achieves lock-free thread safety.

Ring Buffer Structure

The core structure is pretty straightforward:

typedef struct {
char *buffer; // Actual memory
size_t element_size; // Size per element
size_t capacity; // Max elements (power of 2)
_Atomic size_t head; // Write position (producer)
_Atomic size_t tail; // Read position (consumer)
_Atomic size_t size; // Current element count
bool is_power_of_two; // Optimization flag
size_t capacity_mask; // Fast modulo mask
} ringbuffer_t;

The key insight here is that head and tail are atomic variables. The producer writes at head and advances it, while the consumer reads from tail and advances it. Since each position is only accessed by one thread, we don't need locks—just atomic operations to make sure reads and writes are visible across threads.

The capacity_mask is the secret sauce for performance. When capacity is a power of 2, we can use head & capacity_mask instead of head % capacity. Bitwise AND is a single CPU instruction, while modulo takes 15-30 cycles. That adds up fast when you're doing it thousands of times per second!

Lock-Free Algorithm

The ring buffer uses atomic operations for thread-safe access without locks. Here's how the producer and consumer interact:

Write operation (producer):

When the producer wants to write data, it first checks if there's room in the buffer. If the buffer is full, we return false—simple as that. Otherwise, we calculate the write index using our fast bitwise mask trick, copy the data into place, and atomically advance the head.

bool ringbuffer_write(ringbuffer_t *rb, const void *data) {
size_t current_size = atomic_load(&rb->size);
if (current_size >= rb->capacity) {
return false; // Buffer full
}
size_t head = atomic_load(&rb->head);
size_t index = head & rb->capacity_mask; // Fast modulo
// Write data
memcpy(rb->buffer + (index * rb->element_size), data, rb->element_size);
// Advance head atomically
atomic_store(&rb->head, head + 1);
atomic_fetch_add(&rb->size, 1);
return true;
}
bool ringbuffer_write(ringbuffer_t *rb, const void *data)
Definition ringbuffer.c:61

Read operation (consumer):

The consumer side is the mirror image. We check if there's data available, calculate the read index the same way, copy data out, and atomically advance the tail.

bool ringbuffer_read(ringbuffer_t *rb, void *data) {
size_t current_size = atomic_load(&rb->size);
if (current_size == 0) {
return false; // Buffer empty
}
size_t tail = atomic_load(&rb->tail);
size_t index = tail & rb->capacity_mask; // Fast modulo
// Read data
memcpy(data, rb->buffer + (index * rb->element_size), rb->element_size);
// Advance tail atomically
atomic_store(&rb->tail, tail + 1);
atomic_fetch_sub(&rb->size, 1);
return true;
}
bool ringbuffer_read(ringbuffer_t *rb, void *data)
Definition ringbuffer.c:83

The beauty of this design is that the producer and consumer can run completely independently. They never block each other—the worst case is one returns false if the buffer is full or empty, but that's a check we'd need anyway.

Buffer Types

We have several flavors of ring buffer, each optimized for a specific use case. The generic ring buffer handles arbitrary element types, while frame and audio buffers are specialized for their domains.

Generic Ring Buffer

The generic ring buffer is your go-to choice when you need to pass arbitrary data types between threads. It's simple, fast, and gets the job done.

Here's a quick example of using it:

// Create ring buffer for 256 integers
ringbuffer_t *rb = ringbuffer_create(sizeof(int), 256);
// Producer thread
int value = 42;
if (!ringbuffer_write(rb, &value)) {
log_warn("Ring buffer full, dropping data");
}
// Consumer thread
int received;
if (ringbuffer_read(rb, &received)) {
printf("Received: %d\n", received);
}
// Cleanup
ringbuffer_t * ringbuffer_create(size_t element_size, size_t capacity)
Definition ringbuffer.c:28
void ringbuffer_destroy(ringbuffer_t *rb)
Definition ringbuffer.c:54

Notice how the capacity gets rounded up to the next power of 2 automatically (256 is already a power of 2, so it stays as-is). If you ask for 100 elements, you'll get 128 instead—a small price to pay for the performance benefits.

Frame Buffer

Frame buffers are specialized for ASCII art frames. They handle variable-sized frames and include metadata like magic numbers for corruption detection. This is crucial when frames are being passed between threads—we need to make sure we're not reading garbage if something goes wrong.

The frame structure looks like this:

typedef struct {
uint32_t magic; // FRAME_MAGIC (0xDEADBEEF) for corruption detection
size_t size; // Actual frame size
char *data; // Frame data (allocated from buffer pool, NOT owned by consumer)
} frame_t;
typedef struct {
ringbuffer_t *rb;
mutex_t mutex; // Protects frame data allocation/free
} framebuffer_t;
Note
Frame data is allocated from the buffer pool for efficiency. The buffer pool owns the memory - consumers should NOT free frame.data manually.

The magic number serves as a sanity check. When you read a frame, you can verify that frame.magic == FRAME_MAGIC to make sure you got a valid frame and not corrupted memory. It's a simple check, but it can catch a lot of subtle bugs!

Here's how you'd use it in practice:

// Create frame buffer (capacity = 60 frames)
framebuffer_t *fb = framebuffer_create(60);
// Producer: Write ASCII frame
char *ascii_frame = generate_ascii_art(...);
if (!framebuffer_write_frame(fb, ascii_frame, strlen(ascii_frame))) {
log_warn("Frame buffer full, dropping frame");
}
// Consumer: Read ASCII frame
frame_t frame;
if (framebuffer_read_frame(fb, &frame)) {
if (frame.magic == FRAME_MAGIC) {
render_frame(frame.data, frame.size);
// DO NOT free frame.data - buffer pool owns it!
// Data is automatically freed when framebuffer_clear() or
// framebuffer_destroy() is called.
}
}
// Cleanup (automatically frees all frame data via buffer pool)
void framebuffer_destroy(framebuffer_t *fb)
Definition ringbuffer.c:203
framebuffer_t * framebuffer_create(size_t capacity)
Definition ringbuffer.c:147
bool framebuffer_read_frame(framebuffer_t *fb, frame_t *frame)
Definition ringbuffer.c:276
bool framebuffer_write_frame(framebuffer_t *fb, const char *frame_data, size_t frame_size)
Definition ringbuffer.c:222
Note
Frame data is managed by the buffer pool, NOT owned by the consumer. Do NOT call SAFE_FREE() on frame.data - it will be automatically freed when the buffer is cleared or destroyed.

Multi-Source Frame Buffer

When you have multiple clients sending frames, you need to track which client sent what. That's where the multi-source frame buffer comes in. It's essentially the same as a regular frame buffer, but each frame includes metadata about its source.

The multi-source frame structure tracks the client ID, frame sequence number, and timestamp:

typedef struct {
uint32_t magic;
uint32_t source_client_id; // Which client sent this
uint32_t frame_sequence; // Frame number
uint32_t timestamp; // Capture time
size_t size;
char *data;
} multi_source_frame_t;

Using it is straightforward:

// Create multi-source frame buffer
framebuffer_t *fb = framebuffer_create_multi(60);
// Producer: Write frame from client 3
framebuffer_write_multi_frame(fb, frame_data, frame_size,
3, // client_id
frame_seq,
timestamp);
// Consumer: Read frame
multi_source_frame_t frame;
if (framebuffer_read_multi_frame(fb, &frame)) {
log_debug("Frame from client %u: seq=%u size=%zu",
frame.source_client_id, frame.frame_sequence, frame.size);
render_client_frame(frame.source_client_id, frame.data, frame.size);
SAFE_FREE(frame.data);
}
bool framebuffer_read_multi_frame(framebuffer_t *fb, multi_source_frame_t *frame)
Definition ringbuffer.c:418
framebuffer_t * framebuffer_create_multi(size_t capacity)
Definition ringbuffer.c:175
bool framebuffer_write_multi_frame(framebuffer_t *fb, const char *frame_data, size_t frame_size, uint32_t source_client_id, uint32_t frame_sequence, uint32_t timestamp)
Definition ringbuffer.c:379

Audio Ring Buffer

Audio ring buffers are a bit different from the others. They're designed for real-time audio playback, which means they need to handle network jitter gracefully. The buffer accumulates samples before starting playback, creating a "jitter buffer" that smooths out irregularities in network timing.

The audio buffer uses a fixed size tuned for audio needs:

#define AUDIO_RING_BUFFER_SIZE (256 * 32) // 8192 samples = ~186ms @ 44.1kHz
#define AUDIO_JITTER_BUFFER_THRESHOLD (256 * 8) // Wait 46ms before playback
typedef struct audio_ring_buffer {
float data[AUDIO_RING_BUFFER_SIZE];
volatile int write_index;
volatile int read_index;
volatile bool jitter_buffer_filled; // True after initial fill
mutex_t mutex;
} audio_ring_buffer_t;

Why the jitter buffer?

Network audio packets don't arrive at perfectly regular intervals. Sometimes packets come early, sometimes they're delayed. Without buffering, you'd get audio crackling whenever a packet was late. The jitter buffer waits until it has enough samples to smooth out these timing variations before starting playback.

The threshold is set to about 46 milliseconds of audio. This is usually enough to handle typical network jitter without introducing noticeable latency. Once the buffer has reached this threshold, playback begins and the buffer continues to absorb timing variations.

Here's how it works in practice:

audio_ring_buffer_t *arb = audio_ringbuffer_create();
// Network receive thread: Write samples
void receive_audio(audio_ring_buffer_t *arb, float *samples, int count) {
mutex_lock(&arb->mutex);
for (int i = 0; i < count; i++) {
arb->data[arb->write_index] = samples[i];
arb->write_index = (arb->write_index + 1) % AUDIO_RING_BUFFER_SIZE;
}
// Check if jitter buffer filled
int available = (arb->write_index - arb->read_index + AUDIO_RING_BUFFER_SIZE)
% AUDIO_RING_BUFFER_SIZE;
if (!arb->jitter_buffer_filled && available >= AUDIO_JITTER_BUFFER_THRESHOLD) {
arb->jitter_buffer_filled = true;
log_info("Jitter buffer filled, starting playback");
}
mutex_unlock(&arb->mutex);
}
// Audio callback: Read samples for playback
void audio_callback(audio_ring_buffer_t *arb, float *output, int count) {
mutex_lock(&arb->mutex);
if (!arb->jitter_buffer_filled) {
// Not ready yet, output silence
memset(output, 0, count * sizeof(float));
mutex_unlock(&arb->mutex);
return;
}
for (int i = 0; i < count; i++) {
output[i] = arb->data[arb->read_index];
arb->read_index = (arb->read_index + 1) % AUDIO_RING_BUFFER_SIZE;
}
mutex_unlock(&arb->mutex);
}

Note that the audio buffer uses a mutex. Unlike the generic ring buffer, this one can benefit from mutual exclusion since audio callbacks have strict timing requirements and we need to coordinate the jitter buffer logic.

API Reference

Now let's look at the actual API you'll be working with. The functions are straightforward and follow a consistent pattern.

Creation/Destruction

Creating a ring buffer is simple—just specify the element size and desired capacity:

// Create generic ring buffer (capacity rounded up to power of 2)
ringbuffer_t *ringbuffer_create(size_t element_size, size_t capacity);
// Create frame buffer
framebuffer_t *framebuffer_create(size_t capacity);
// Create multi-source frame buffer
framebuffer_t *framebuffer_create_multi(size_t capacity);
// Destroy buffers (frees all memory)
void ringbuffer_destroy(ringbuffer_t *rb);
void framebuffer_destroy(framebuffer_t *fb);

Remember that the capacity gets rounded up to the next power of 2 for performance reasons. If you request 100 elements, you'll get 128. This usually isn't a problem, but it's good to be aware of.

Operations

The core operations are write, read, and peek. The query functions let you check the buffer state without modifying it.

// Generic ring buffer
bool ringbuffer_write(ringbuffer_t *rb, const void *data);
bool ringbuffer_read(ringbuffer_t *rb, void *data);
bool ringbuffer_peek(ringbuffer_t *rb, void *data); // Read without removing
// Query state
size_t ringbuffer_size(const ringbuffer_t *rb);
bool ringbuffer_is_empty(const ringbuffer_t *rb);
bool ringbuffer_is_full(const ringbuffer_t *rb);
void ringbuffer_clear(ringbuffer_t *rb);
// Frame buffer
bool framebuffer_write_frame(framebuffer_t *fb, const char *data, size_t size);
bool framebuffer_read_frame(framebuffer_t *fb, frame_t *frame);
// Multi-source frame buffer
bool framebuffer_write_multi_frame(framebuffer_t *fb, const char *data, size_t size,
uint32_t client_id, uint32_t seq, uint32_t ts);
bool framebuffer_read_multi_frame(framebuffer_t *fb, multi_source_frame_t *frame);
bool framebuffer_peek_latest_multi_frame(framebuffer_t *fb, multi_source_frame_t *frame);
void ringbuffer_clear(ringbuffer_t *rb)
Definition ringbuffer.c:134
size_t ringbuffer_size(const ringbuffer_t *rb)
Definition ringbuffer.c:122
bool ringbuffer_is_full(const ringbuffer_t *rb)
Definition ringbuffer.c:130
bool ringbuffer_is_empty(const ringbuffer_t *rb)
Definition ringbuffer.c:126
bool framebuffer_peek_latest_multi_frame(framebuffer_t *fb, multi_source_frame_t *frame)
Definition ringbuffer.c:453
bool ringbuffer_peek(ringbuffer_t *rb, void *data)
Definition ringbuffer.c:105

All the write and read functions return false if the operation couldn't complete (buffer full or empty). This lets you decide what to do—maybe drop the data, maybe wait and retry, maybe signal an error. It's up to you!

Performance

Let's talk numbers. The ring buffer isn't just fast—it's dramatically faster than traditional mutex-protected queues. Here's what the benchmarks show.

Benchmarks

We compared the lock-free ring buffer against a mutex-protected queue in a contention-free scenario (producer and consumer on the same CPU core):

Operation Ring Buffer Mutex Queue Speedup
Write 18 ns 95 ns 5.3x faster
Read 20 ns 102 ns 5.1x faster
Write+Read pair 38 ns 197 ns 5.2x faster

These numbers are from an AMD Ryzen 9 5950X in a contention-free scenario. The ring buffer operations are so fast they're barely more expensive than a simple memory copy!

Contention Behavior

What happens when things get more intense? We also tested with high contention, where the producer and consumer are running on different CPU cores. This stresses the cache coherency protocol, but the ring buffer still comes out way ahead:

Under high contention:

  • Ring buffer: ~45 ns per operation (2.4x slowdown from cache coherency)
  • Mutex queue: ~280 ns per operation (2.8x slowdown from lock contention)
  • Ring buffer still 6.2x faster under contention

Even when the CPU cores are fighting over cache lines, the ring buffer's lock-free design means it doesn't have to wait for mutexes to be released. The atomic operations are still much faster than lock acquisition.

Capacity Management

Getting the capacity right is important for performance. Too small and you'll drop data. Too large and you're wasting memory. The ring buffer helps by optimizing capacity calculations automatically.

Power-of-2 Optimization

The ring buffer always rounds capacity up to the next power of 2. This lets us use bitwise AND instead of modulo for index calculations, which is dramatically faster.

Here's how it works:

// Request 100 elements → gets rounded to 128 (next power of 2)
ringbuffer_t *rb = ringbuffer_create(sizeof(int), 100);
assert(rb->capacity == 128);
// Fast modulo using bitwise AND
size_t index = head & rb->capacity_mask; // Instead of head % capacity
// Where capacity_mask = capacity - 1 (e.g., 128 - 1 = 127 = 0b01111111)

The performance impact is significant:

  • Bitwise AND: 1 CPU cycle
  • Integer modulo: 15-30 CPU cycles
  • Speedup: 15-30x for index calculation

Since we're calculating the index on every read and write, this optimization really adds up. It's one of those small details that makes a big difference when you're doing thousands of operations per second.

Sizing Guidelines

Choosing the right buffer size depends on what you're buffering. Let's look at some common scenarios:

Video frames (30 FPS):

For video, you want enough frames to survive brief CPU spikes and scheduling delays. We recommend at least 2 seconds worth (60 frames) as a minimum. That gives you enough cushion that a momentary slowdown won't cause frame drops. For more robust operation, 4 seconds (120 frames) is better. Going beyond 10 seconds (300 frames) doesn't usually help much—by then you're just increasing latency.

Audio samples (48 kHz):

Audio buffering is a balancing act. You need enough buffer to absorb network jitter, but not so much that latency becomes noticeable. Our audio ring buffer uses:

  • Jitter buffer threshold: 46ms (2208 samples) - waits this long before starting playback
  • Total buffer: 186ms (8928 samples) - prevents underruns during playback

These numbers are tuned for typical network conditions. If you're on a very jittery network, you might want to increase the jitter buffer threshold. If latency is more important than smoothness, you can decrease it.

Thread Safety

This is important: ring buffers are only safe for the Single Producer, Single Consumer (SPSC) pattern. That means exactly one thread writing and exactly one thread reading. If you need multiple producers or consumers, you'll want to use packet_queue instead, which is mutex-protected.

SPSC Guarantee

The lock-free design relies on the fact that each position in the buffer is only accessed by one thread. The producer writes at head, the consumer reads from tail, and they never touch each other's positions. This is what makes the atomic operations sufficient—we don't need locks because there's no contention.

Here's the correct pattern:

// ✅ CORRECT: One producer, one consumer
void* producer_thread(void *arg) {
ringbuffer_t *rb = (ringbuffer_t*)arg;
while (running) {
int data = generate_data();
ringbuffer_write(rb, &data);
}
}
void* consumer_thread(void *arg) {
ringbuffer_t *rb = (ringbuffer_t*)arg;
while (running) {
int data;
if (ringbuffer_read(rb, &data)) {
process_data(data);
}
}
}

And here's what not to do:

// ❌ WRONG: Multiple producers (not thread-safe!)
void* producer_thread_1(void *arg) {
ringbuffer_write(rb, &data); // Race condition!
}
void* producer_thread_2(void *arg) {
ringbuffer_write(rb, &data); // Race condition!
}

If you try to have multiple producers writing at the same time, you'll get race conditions on the head pointer. Same problem with multiple consumers. The lock-free design only works when there's exactly one of each.

For multiple producers or consumers, use packet_queue instead—it uses mutexes and can handle multiple writers and readers safely, though it won't be as fast.

Best Practices

Here are some tips we've learned from using ring buffers in ascii-chat:

Use power-of-2 capacities when possible

While the ring buffer will round up for you, if you know your requirements in advance, it's cleaner to request a power of 2 directly. That way there's no surprise about the actual capacity you get.

ringbuffer_create(sizeof(int), 128); // Good (power of 2)
ringbuffer_create(sizeof(int), 100); // Still works, but rounded to 128

Handle full/empty cases gracefully

When ringbuffer_write() returns false, the buffer is full. When ringbuffer_read() returns false, it's empty. You need to decide what to do in each case:

if (!ringbuffer_write(rb, &data)) {
// Buffer full - decide policy:
// - Drop data (real-time video - newest frame is most important)
// - Wait and retry (critical data - can't afford to lose)
// - Overwrite oldest (circular log - keep latest N items)
}

For real-time video, dropping frames is usually fine—better to show the latest frame than to delay everything. For critical data, you might want to block until there's room. It depends on your use case.

Frame data is managed by buffer pool

Frame data is allocated from the buffer pool and automatically managed. Do NOT free frame data manually - the buffer pool handles cleanup when framebuffer_clear() or framebuffer_destroy() is called.

frame_t frame;
if (framebuffer_read_frame(fb, &frame)) {
process_frame(frame.data, frame.size);
// DO NOT call SAFE_FREE(frame.data) - buffer pool owns it!
}

The buffer pool automatically reclaims frame memory when the buffer is cleared or destroyed. Calling SAFE_FREE() on frame data will cause double-free errors or corruption.

Validate magic numbers

Frame buffers include magic numbers for a reason—they help catch corruption. Always check that you got a valid frame:

if (frame.magic != FRAME_MAGIC) {
log_error("Corrupted frame detected (magic=0x%08X)", frame.magic);
return;
}

This is a cheap sanity check that can save you from chasing weird bugs later. If you read garbage memory (maybe from a use-after-free or buffer overflow), the magic number won't match and you'll know immediately that something's wrong.

Size jitter buffers appropriately

For audio buffers, the jitter buffer size is a trade-off. Too small and you'll get audio crackling from underruns. Too large and you'll have noticeable latency. The sweet spot is usually about 2-3 times your typical network jitter.

Our default is 46 milliseconds, which works well for most network conditions. If you're on a very stable network (like a local connection), you could go smaller. If network jitter is severe, you might need to increase it, though you'll pay for it with latency.

See also
ringbuffer.h
ringbuffer.c