Audio system with PortAudio integration and multi-client audio mixing with ducking.
Overview
Welcome to the audio mixer—where all the magic happens when multiple people are talking at once!
Picture yourself in a group video call. Person A is talking, Person B laughs, Person C asks a question—all happening simultaneously. Your speakers don't have three separate outputs (well, most don't). So how does your computer play all three audio streams at once? That's where the mixer comes in!
The mixer takes multiple audio streams (one from each client) and combines them into a single output stream that gets sent to everyone. It's like a real mixing board at a concert—each microphone is a separate input, and the mixer blends them into one cohesive sound that goes to the speakers.
But here's the cool part: when lots of people are talking at once, the mixer automatically applies "ducking" (volume reduction) so the combined audio doesn't clip or distort. It's like how a good sound engineer knows to turn down each microphone a bit when everyone's singing together—the mix stays clear and balanced.
Implementation: lib/mixer.h
What makes the mixer special?
- Real-time mixing: Combines multiple audio streams on the fly
- Dynamic source management: Sources can join or leave without disrupting the mix
- Active speaker detection: Automatically identifies who's talking loudest
- Automatic ducking: Attenuates background sources when someone is speaking
- Dynamic range compression: Prevents clipping with professional compressor
- Noise gate: Suppresses background noise below threshold with hysteresis
- High-pass filtering: Removes low-frequency rumble and noise
- Soft clipping: Prevents harsh digital clipping artifacts
- Crowd scaling: Automatically adjusts volume based on participant count
- Thread-safe: Reader-writer locks for concurrent access
- Low latency: Fixed 256-sample frame processing
- O(1) source exclusion: Bitset-based tracking for echo cancellation
Architecture
Mixer Design:
- Single mixer instance per server
- Per-client audio input buffers
- Shared output buffer for mixed audio
- Thread-safe operation with mutex protection
Audio Flow:
Client 1 Audio → Input Buffer 1 ──┐
Client 2 Audio → Input Buffer 2 ──┤
Client 3 Audio → Input Buffer 3 ──┼→ Mixer → Mixed Output → All Clients
... ─┘
Operations
Initialization
Create Mixer:
if (!mixer) {
log_error("Failed to create mixer");
return ASCIICHAT_ERROR_MEMORY;
}
mixer_t * mixer_create(int max_sources, int sample_rate)
Source Management
Add Audio Source (client with audio ring buffer):
uint32_t client_id = 12345;
audio_ring_buffer_t *client_audio_buffer = ...;
if (result < 0) {
log_error("Failed to add client %u to mixer", client_id);
return ASCIICHAT_ERROR_FULL;
}
log_info("Client %u added to mixer at index %d", client_id, result);
int mixer_add_source(mixer_t *mixer, uint32_t client_id, audio_ring_buffer_t *buffer)
Remove Audio Source:
log_info("Client %u removed from mixer", client_id);
void mixer_remove_source(mixer_t *mixer, uint32_t client_id)
Audio Processing
The mixer reads audio directly from each client's audio ring buffer. Clients write their audio samples to their ring buffer, and the mixer reads and mixes them during processing.
Mix Audio (reads from all source ring buffers):
float mixed_output[MIXER_FRAME_SIZE];
int samples_mixed =
mixer_process(mixer, mixed_output, MIXER_FRAME_SIZE);
if (samples_mixed > 0) {
send_audio_to_all_clients(mixed_output, samples_mixed);
}
int mixer_process(mixer_t *mixer, float *output, int num_samples)
Mix Audio Excluding a Source (for echo cancellation):
float output_for_client[MIXER_FRAME_SIZE];
MIXER_FRAME_SIZE, client_id);
send_audio_to_client(client_id, output_for_client, samples);
int mixer_process_excluding_source(mixer_t *mixer, float *output, int num_samples, uint32_t exclude_client_id)
Cleanup
Destroy Mixer:
void mixer_destroy(mixer_t *mixer)
Active Speaker Detection & Ducking
The ducking system automatically identifies who's speaking and attenuates background sources to improve clarity. This is more sophisticated than simple volume scaling.
How It Works:
- Leader Detection: The loudest source(s) above threshold_dB are identified
- Margin Tracking: Sources within leader_margin_dB of the loudest are also "leaders"
- Attenuation: Non-leader sources are attenuated by atten_dB
- Smooth Transitions: Attack/release curves prevent jarring volume changes
The ducking uses dB-based audio analysis:
typedef struct {
float threshold_dB;
float leader_margin_dB;
float atten_dB;
float attack_ms;
float release_ms;
} ducking_t;
Practical Example:
- Person A speaks at -20dB (loud)
- Person B speaks at -25dB (within 6dB margin of A)
- Person C has background noise at -50dB (below threshold)
- Result: A and B are heard at full volume, C is attenuated by 12dB
This allows multiple people to have a natural conversation while suppressing background noise from inactive participants.
Thread Safety
Reader-Writer Lock Protection:
- Source array protected by reader-writer locks (rwlock)
- Multiple readers can process audio concurrently
- Writers (add/remove source) get exclusive access
- Bitset operations are atomic for source exclusion
Thread Model:
client_t *client = (client_t *)arg;
while (running) {
float samples[256];
receive_audio_packet(client->id, samples, 256);
}
return NULL;
}
void* audio_mix_thread(void *arg) {
mixer_t *mixer = (mixer_t *)arg;
while (running) {
float mixed[MIXER_FRAME_SIZE];
if (samples > 0) {
send_to_all_clients(mixed, samples);
}
}
return NULL;
}
asciichat_error_t audio_ring_buffer_write(audio_ring_buffer_t *rb, const float *data, int samples)
void * client_receive_thread(void *arg)
Performance
Mixing Algorithm:
- Simple additive mixing with ducking
- SIMD optimization where available
- Minimal memory allocations
- Cache-friendly data layout
CPU Usage:
- 2 clients: ~1% CPU
- 4 clients: ~2% CPU
- 8 clients: ~3% CPU
- 16 clients: ~5% CPU
Latency:
- Mixing latency: <1ms
- Total audio latency: ~50-60ms (includes network)
Buffer Management
Per-Client Buffers:
- Fixed-size circular buffers
- Automatic overflow handling
- Underrun detection and handling
Buffer Configuration:
mixer.buffer_size = 8192;
mixer.min_frames = 256;
Overflow Handling:
- Drop oldest frames when buffer full
- Log warning message
- Continue operation without crash
Underrun Handling:
- Output silence when insufficient data
- Log debug message
- Wait for more data
Integration Example
Complete Server Integration:
mixer_t mixer;
mixer_init(&mixer, MAX_CLIENTS);
void on_client_connect(uint32_t client_id) {
mixer_add_client(&mixer, client_id);
log_info("Client %u added to audio mixer", client_id);
}
void on_client_disconnect(uint32_t client_id) {
mixer_remove_client(&mixer, client_id);
log_info("Client %u removed from audio mixer", client_id);
}
void on_audio_packet(uint32_t client_id, float *samples, size_t num_frames) {
mixer_submit_audio(&mixer, client_id, samples, num_frames);
}
void* audio_thread(void *arg) {
float mixed[256];
while (running) {
if (frames > 0) {
broadcast_audio_to_all_clients(mixed, frames);
}
usleep(5000);
}
return NULL;
}
Best Practices
DO:
- Enable ducking for 4+ clients
- Monitor buffer overflows/underruns
- Use consistent sample rates across clients
- Remove clients from mixer on disconnect
- Use dedicated audio mixing thread
DON'T:
- Don't mix audio on network thread
- Don't forget to remove disconnected clients
- Don't use different sample rates per client
- Don't disable ducking with many clients
- Don't mix audio without mutex protection
- See also
- mixer.h
-
audio.h
-
ringbuffer.h