ascii-chat 0.8.38
Real-time terminal-based video chat with ASCII art conversion
Loading...
Searching...
No Matches
Audio Processing

🔊 Audio capture, playback, and multi-client mixing More...

Modules

 Audio
 🔊 Audio system for real-time audio capture and playback
 

Detailed Description

🔊 Audio capture, playback, and multi-client mixing

Audio system with PortAudio integration and multi-client audio mixing with ducking.

Mixer README

Overview

Welcome to the audio mixer—where all the magic happens when multiple people are talking at once!

Picture yourself in a group video call. Person A is talking, Person B laughs, Person C asks a question—all happening simultaneously. Your speakers don't have three separate outputs (well, most don't). So how does your computer play all three audio streams at once? That's where the mixer comes in!

The mixer takes multiple audio streams (one from each client) and combines them into a single output stream that gets sent to everyone. It's like a real mixing board at a concert—each microphone is a separate input, and the mixer blends them into one cohesive sound that goes to the speakers.

But here's the cool part: when lots of people are talking at once, the mixer automatically applies "ducking" (volume reduction) so the combined audio doesn't clip or distort. It's like how a good sound engineer knows to turn down each microphone a bit when everyone's singing together—the mix stays clear and balanced.

Implementation: lib/mixer.h

What makes the mixer special?

  • Real-time mixing: Combines multiple audio streams on the fly
  • Dynamic source management: Sources can join or leave without disrupting the mix
  • Active speaker detection: Automatically identifies who's talking loudest
  • Automatic ducking: Attenuates background sources when someone is speaking
  • Dynamic range compression: Prevents clipping with professional compressor
  • Noise gate: Suppresses background noise below threshold with hysteresis
  • High-pass filtering: Removes low-frequency rumble and noise
  • Soft clipping: Prevents harsh digital clipping artifacts
  • Crowd scaling: Automatically adjusts volume based on participant count
  • Thread-safe: Reader-writer locks for concurrent access
  • Low latency: Fixed 256-sample frame processing
  • O(1) source exclusion: Bitset-based tracking for echo cancellation

Architecture

Mixer Design:

  • Single mixer instance per server
  • Per-client audio input buffers
  • Shared output buffer for mixed audio
  • Thread-safe operation with mutex protection

Audio Flow:

Client 1 Audio → Input Buffer 1 ──┐
Client 2 Audio → Input Buffer 2 ──┤
Client 3 Audio → Input Buffer 3 ──┼→ Mixer → Mixed Output → All Clients
... ─┘

Operations

Initialization

Create Mixer:

// mixer_create returns a pointer to a new mixer (NULL on failure)
mixer_t *mixer = mixer_create(MIXER_MAX_SOURCES, 48000);
if (!mixer) {
log_error("Failed to create mixer");
return ASCIICHAT_ERROR_MEMORY;
}
mixer_t * mixer_create(int max_sources, int sample_rate)
Definition mixer.c:218

Source Management

Add Audio Source (client with audio ring buffer):

uint32_t client_id = 12345;
audio_ring_buffer_t *client_audio_buffer = ...; // Client's audio ring buffer
int result = mixer_add_source(mixer, client_id, client_audio_buffer);
if (result < 0) {
log_error("Failed to add client %u to mixer", client_id);
return ASCIICHAT_ERROR_FULL;
}
log_info("Client %u added to mixer at index %d", client_id, result);
int mixer_add_source(mixer_t *mixer, uint32_t client_id, audio_ring_buffer_t *buffer)
Definition mixer.c:364

Remove Audio Source:

mixer_remove_source(mixer, client_id);
log_info("Client %u removed from mixer", client_id);
void mixer_remove_source(mixer_t *mixer, uint32_t client_id)
Definition mixer.c:401

Audio Processing

The mixer reads audio directly from each client's audio ring buffer. Clients write their audio samples to their ring buffer, and the mixer reads and mixes them during processing.

Mix Audio (reads from all source ring buffers):

float mixed_output[MIXER_FRAME_SIZE];
int samples_mixed = mixer_process(mixer, mixed_output, MIXER_FRAME_SIZE);
if (samples_mixed > 0) {
// Send mixed audio to all clients
send_audio_to_all_clients(mixed_output, samples_mixed);
}
int mixer_process(mixer_t *mixer, float *output, int num_samples)
Definition mixer.c:460

Mix Audio Excluding a Source (for echo cancellation):

// Mix all sources except the client we're sending to (prevents echo)
float output_for_client[MIXER_FRAME_SIZE];
int samples = mixer_process_excluding_source(mixer, output_for_client,
MIXER_FRAME_SIZE, client_id);
send_audio_to_client(client_id, output_for_client, samples);
int mixer_process_excluding_source(mixer_t *mixer, float *output, int num_samples, uint32_t exclude_client_id)
Definition mixer.c:605

Cleanup

Destroy Mixer:

mixer_destroy(mixer); // Pass pointer, not address-of
void mixer_destroy(mixer_t *mixer)
Definition mixer.c:347

Active Speaker Detection & Ducking

The ducking system automatically identifies who's speaking and attenuates background sources to improve clarity. This is more sophisticated than simple volume scaling.

How It Works:

  • Leader Detection: The loudest source(s) above threshold_dB are identified
  • Margin Tracking: Sources within leader_margin_dB of the loudest are also "leaders"
  • Attenuation: Non-leader sources are attenuated by atten_dB
  • Smooth Transitions: Attack/release curves prevent jarring volume changes

The ducking uses dB-based audio analysis:

// Ducking parameters (in dB and milliseconds)
typedef struct {
float threshold_dB; // Speaking threshold (-40dB typical)
float leader_margin_dB; // Margin to be considered a leader (6dB typical)
float atten_dB; // Attenuation for non-leaders (-12dB typical)
float attack_ms; // How fast ducking engages (10ms typical)
float release_ms; // How fast ducking releases (100ms typical)
} ducking_t;

Practical Example:

  • Person A speaks at -20dB (loud)
  • Person B speaks at -25dB (within 6dB margin of A)
  • Person C has background noise at -50dB (below threshold)
  • Result: A and B are heard at full volume, C is attenuated by 12dB

This allows multiple people to have a natural conversation while suppressing background noise from inactive participants.

Thread Safety

Reader-Writer Lock Protection:

  • Source array protected by reader-writer locks (rwlock)
  • Multiple readers can process audio concurrently
  • Writers (add/remove source) get exclusive access
  • Bitset operations are atomic for source exclusion

Thread Model:

// Network receive thread - writes to client's ring buffer
void* client_receive_thread(void *arg) {
client_t *client = (client_t *)arg;
while (running) {
float samples[256];
receive_audio_packet(client->id, samples, 256);
// Write directly to client's audio ring buffer
audio_ring_buffer_write(client->audio_buffer, samples, 256);
}
return NULL;
}
// Audio processing thread - reads from all ring buffers via mixer
void* audio_mix_thread(void *arg) {
mixer_t *mixer = (mixer_t *)arg;
while (running) {
float mixed[MIXER_FRAME_SIZE];
int samples = mixer_process(mixer, mixed, MIXER_FRAME_SIZE);
if (samples > 0) {
send_to_all_clients(mixed, samples);
}
}
return NULL;
}
asciichat_error_t audio_ring_buffer_write(audio_ring_buffer_t *rb, const float *data, int samples)
void * client_receive_thread(void *arg)

Performance

Mixing Algorithm:

  • Simple additive mixing with ducking
  • SIMD optimization where available
  • Minimal memory allocations
  • Cache-friendly data layout

CPU Usage:

  • 2 clients: ~1% CPU
  • 4 clients: ~2% CPU
  • 8 clients: ~3% CPU
  • 16 clients: ~5% CPU

Latency:

  • Mixing latency: <1ms
  • Total audio latency: ~50-60ms (includes network)

Buffer Management

Per-Client Buffers:

  • Fixed-size circular buffers
  • Automatic overflow handling
  • Underrun detection and handling

Buffer Configuration:

mixer.buffer_size = 8192; // Samples per client buffer
mixer.min_frames = 256; // Minimum frames for mixing

Overflow Handling:

  • Drop oldest frames when buffer full
  • Log warning message
  • Continue operation without crash

Underrun Handling:

  • Output silence when insufficient data
  • Log debug message
  • Wait for more data

Integration Example

Complete Server Integration:

// Initialize mixer
mixer_t mixer;
mixer_init(&mixer, MAX_CLIENTS);
// When client connects
void on_client_connect(uint32_t client_id) {
mixer_add_client(&mixer, client_id);
log_info("Client %u added to audio mixer", client_id);
}
// When client disconnects
void on_client_disconnect(uint32_t client_id) {
mixer_remove_client(&mixer, client_id);
log_info("Client %u removed from audio mixer", client_id);
}
// When audio packet arrives
void on_audio_packet(uint32_t client_id, float *samples, size_t num_frames) {
mixer_submit_audio(&mixer, client_id, samples, num_frames);
}
// Audio mixing thread
void* audio_thread(void *arg) {
float mixed[256];
while (running) {
size_t frames = mixer_process(&mixer, mixed, 256);
if (frames > 0) {
broadcast_audio_to_all_clients(mixed, frames);
}
usleep(5000); // ~5ms sleep for 44.1kHz
}
return NULL;
}
// Cleanup
mixer_destroy(&mixer);

Best Practices

DO:

  • Enable ducking for 4+ clients
  • Monitor buffer overflows/underruns
  • Use consistent sample rates across clients
  • Remove clients from mixer on disconnect
  • Use dedicated audio mixing thread

DON'T:

  • Don't mix audio on network thread
  • Don't forget to remove disconnected clients
  • Don't use different sample rates per client
  • Don't disable ducking with many clients
  • Don't mix audio without mutex protection
See also
mixer.h
audio.h
ringbuffer.h