🔊 Audio capture, playback, and multi-client mixing More...

Modules
	Audio
	🔊 Audio system for real-time audio capture and playback

Detailed Description

🔊 Audio capture, playback, and multi-client mixing

Audio system with PortAudio integration and multi-client audio mixing with ducking.

Mixer README

Overview

Welcome to the audio mixer—where all the magic happens when multiple people are talking at once!

Picture yourself in a group video call. Person A is talking, Person B laughs, Person C asks a question—all happening simultaneously. Your speakers don't have three separate outputs (well, most don't). So how does your computer play all three audio streams at once? That's where the mixer comes in!

The mixer takes multiple audio streams (one from each client) and combines them into a single output stream that gets sent to everyone. It's like a real mixing board at a concert—each microphone is a separate input, and the mixer blends them into one cohesive sound that goes to the speakers.

But here's the cool part: when lots of people are talking at once, the mixer automatically applies "ducking" (volume reduction) so the combined audio doesn't clip or distort. It's like how a good sound engineer knows to turn down each microphone a bit when everyone's singing together—the mix stays clear and balanced.

Implementation: lib/mixer.h

What makes the mixer special?

Real-time mixing: Combines multiple audio streams on the fly
Dynamic source management: Sources can join or leave without disrupting the mix
Active speaker detection: Automatically identifies who's talking loudest
Automatic ducking: Attenuates background sources when someone is speaking
Dynamic range compression: Prevents clipping with professional compressor
Noise gate: Suppresses background noise below threshold with hysteresis
High-pass filtering: Removes low-frequency rumble and noise
Soft clipping: Prevents harsh digital clipping artifacts
Crowd scaling: Automatically adjusts volume based on participant count
Thread-safe: Reader-writer locks for concurrent access
Low latency: Fixed 256-sample frame processing
O(1) source exclusion: Bitset-based tracking for echo cancellation

Architecture

Mixer Design:

Single mixer instance per server
Per-client audio input buffers
Shared output buffer for mixed audio
Thread-safe operation with mutex protection

Audio Flow:

Client 1 Audio → Input Buffer 1 ──┐
Client 2 Audio → Input Buffer 2 ──┤
Client 3 Audio → Input Buffer 3 ──┼→ Mixer → Mixed Output → All Clients
...                               ─┘

Operations

Initialization

Create Mixer:

// mixer_create returns a pointer to a new mixer (NULL on failure)
mixer_t *mixer = mixer_create(MIXER_MAX_SOURCES, 48000);
if (!mixer) {
    log_error("Failed to create mixer");
    return ASCIICHAT_ERROR_MEMORY;
}

Source Management

Add Audio Source (client with audio ring buffer):

uint32_t client_id = 12345;
audio_ring_buffer_t *client_audio_buffer = ...; // Client's audio ring buffer
 
int result = mixer_add_source(mixer, client_id, client_audio_buffer);
if (result < 0) {
    log_error("Failed to add client %u to mixer", client_id);
    return ASCIICHAT_ERROR_FULL;
}
log_info("Client %u added to mixer at index %d", client_id, result);

Remove Audio Source:

mixer_remove_source(mixer, client_id);

log_info("Client %u removed from mixer", client_id);

mixer_remove_source

void mixer_remove_source(mixer_t *mixer, uint32_t client_id)

Definition mixer.c:401

Audio Processing

The mixer reads audio directly from each client's audio ring buffer. Clients write their audio samples to their ring buffer, and the mixer reads and mixes them during processing.

Mix Audio (reads from all source ring buffers):

float mixed_output[MIXER_FRAME_SIZE];
int samples_mixed = mixer_process(mixer, mixed_output, MIXER_FRAME_SIZE);
 
if (samples_mixed > 0) {
    // Send mixed audio to all clients
    send_audio_to_all_clients(mixed_output, samples_mixed);
}

Mix Audio Excluding a Source (for echo cancellation):

// Mix all sources except the client we're sending to (prevents echo)
float output_for_client[MIXER_FRAME_SIZE];
int samples = mixer_process_excluding_source(mixer, output_for_client,
                                              MIXER_FRAME_SIZE, client_id);
send_audio_to_client(client_id, output_for_client, samples);

Cleanup

Destroy Mixer:

mixer_destroy(mixer); // Pass pointer, not address-of

mixer_destroy

void mixer_destroy(mixer_t *mixer)

Definition mixer.c:347

Active Speaker Detection & Ducking

The ducking system automatically identifies who's speaking and attenuates background sources to improve clarity. This is more sophisticated than simple volume scaling.

How It Works:

Leader Detection: The loudest source(s) above threshold_dB are identified
Margin Tracking: Sources within leader_margin_dB of the loudest are also "leaders"
Attenuation: Non-leader sources are attenuated by atten_dB
Smooth Transitions: Attack/release curves prevent jarring volume changes

The ducking uses dB-based audio analysis:

// Ducking parameters (in dB and milliseconds)
typedef struct {
    float threshold_dB;    // Speaking threshold (-40dB typical)
    float leader_margin_dB; // Margin to be considered a leader (6dB typical)
    float atten_dB;        // Attenuation for non-leaders (-12dB typical)
    float attack_ms;       // How fast ducking engages (10ms typical)
    float release_ms;      // How fast ducking releases (100ms typical)
} ducking_t;

Practical Example:

Person A speaks at -20dB (loud)
Person B speaks at -25dB (within 6dB margin of A)
Person C has background noise at -50dB (below threshold)
Result: A and B are heard at full volume, C is attenuated by 12dB

This allows multiple people to have a natural conversation while suppressing background noise from inactive participants.

Thread Safety

Reader-Writer Lock Protection:

Source array protected by reader-writer locks (rwlock)
Multiple readers can process audio concurrently
Writers (add/remove source) get exclusive access
Bitset operations are atomic for source exclusion

Thread Model:

// Network receive thread - writes to client's ring buffer
void* client_receive_thread(void *arg) {
    client_t *client = (client_t *)arg;
    while (running) {
        float samples[256];
        receive_audio_packet(client->id, samples, 256);
        // Write directly to client's audio ring buffer
        audio_ring_buffer_write(client->audio_buffer, samples, 256);
    }
    return NULL;
}
 
// Audio processing thread - reads from all ring buffers via mixer
void* audio_mix_thread(void *arg) {
    mixer_t *mixer = (mixer_t *)arg;
    while (running) {
        float mixed[MIXER_FRAME_SIZE];
        int samples = mixer_process(mixer, mixed, MIXER_FRAME_SIZE);
        if (samples > 0) {
            send_to_all_clients(mixed, samples);
        }
    }
    return NULL;
}

Performance

Mixing Algorithm:

Simple additive mixing with ducking
SIMD optimization where available
Minimal memory allocations
Cache-friendly data layout

CPU Usage:

2 clients: ~1% CPU
4 clients: ~2% CPU
8 clients: ~3% CPU
16 clients: ~5% CPU

Latency:

Mixing latency: <1ms
Total audio latency: ~50-60ms (includes network)

Buffer Management

Per-Client Buffers:

Fixed-size circular buffers
Automatic overflow handling
Underrun detection and handling

Buffer Configuration:

mixer.buffer_size = 8192; // Samples per client buffer

mixer.min_frames = 256; // Minimum frames for mixing

Overflow Handling:

Drop oldest frames when buffer full
Log warning message
Continue operation without crash

Underrun Handling:

Output silence when insufficient data
Log debug message
Wait for more data

Integration Example

Complete Server Integration:

// Initialize mixer
mixer_t mixer;
mixer_init(&mixer, MAX_CLIENTS);
 
// When client connects
void on_client_connect(uint32_t client_id) {
    mixer_add_client(&mixer, client_id);
    log_info("Client %u added to audio mixer", client_id);
}
 
// When client disconnects
void on_client_disconnect(uint32_t client_id) {
    mixer_remove_client(&mixer, client_id);
    log_info("Client %u removed from audio mixer", client_id);
}
 
// When audio packet arrives
void on_audio_packet(uint32_t client_id, float *samples, size_t num_frames) {
    mixer_submit_audio(&mixer, client_id, samples, num_frames);
}
 
// Audio mixing thread
void* audio_thread(void *arg) {
    float mixed[256];
 
    while (running) {
        size_t frames = mixer_process(&mixer, mixed, 256);
        if (frames > 0) {
            broadcast_audio_to_all_clients(mixed, frames);
        }
        usleep(5000);  // ~5ms sleep for 44.1kHz
    }
 
    return NULL;
}
 
// Cleanup
mixer_destroy(&mixer);

Best Practices

DO:

Enable ducking for 4+ clients
Monitor buffer overflows/underruns
Use consistent sample rates across clients
Remove clients from mixer on disconnect
Use dedicated audio mixing thread

DON'T:

Don't mix audio on network thread
Don't forget to remove disconnected clients
Don't use different sample rates per client
Don't disable ducking with many clients
Don't mix audio without mutex protection

See also: mixer.h; audio.h; ringbuffer.h

Modules