ascii-chat 0.8.38
Real-time terminal-based video chat with ASCII art conversion
Loading...
Searching...
No Matches

🔊 Audio system for real-time audio capture and playback More...

Files

file  analysis.c
 Audio Analysis Implementation.
 
file  audio.c
 đź”Š Audio capture and playback using PortAudio with buffer management
 
file  mixer.c
 đźŽšď¸Ź Real-time audio mixer with ducking, gain control, and multi-stream blending
 
file  opus_codec.c
 Opus audio codec implementation.
 

Detailed Description

🔊 Audio system for real-time audio capture and playback

Audio README

Overview

Welcome to the Audio System! This is where all the audio magic happens—capturing your voice from the microphone, playing back audio from other participants, and making sure everything runs smoothly in real-time. We use PortAudio for cross-platform audio I/O, which means everything works the same way on Linux, macOS, and Windows.

What does the audio system do?

The Audio System provides real-time audio capture and playback functionality for ascii-chat video chat sessions. Here's what it gives you:

  • Real-time audio capture from microphone/input devices (so your voice is captured as you speak)
  • Real-time audio playback to speakers/output devices (so you hear others in real-time)
  • Thread-safe ring buffers for audio data (so capture and playback can run in parallel without conflicts)
  • Low-latency audio processing (so there's minimal delay between speaking and hearing)
  • Platform-specific real-time priority scheduling (so audio threads get priority and don't get interrupted by other tasks)
  • Configurable audio parameters (sample rate, buffer size) so you can tune for your needs
  • Automatic device enumeration and selection (so it just works with your audio devices)

Implementation: lib/audio.h

Architecture

The audio system is built around a few key concepts: PortAudio for cross-platform audio I/O, ring buffers for efficient data transfer, and thread-safe operations so everything can run in parallel. Let's walk through how everything fits together.

How does PortAudio work?

We use PortAudio for cross-platform audio I/O because it handles all the platform-specific details for us—we write the same code and it works on Linux, macOS, and Windows. PortAudio provides:

Audio Streams:

  • Separate input and output streams for full-duplex audio (you can capture and play back at the same time)
  • Independent capture and playback threads (so capture doesn't block playback and vice versa)
  • Automatic stream management and lifecycle (PortAudio handles starting, stopping, and cleaning up streams)

What about ring buffers?

Ring Buffers:

  • Efficient producer-consumer audio data transfer: Ring buffers let one thread (the producer) write data while another thread (the consumer) reads data, without blocking each other
  • Lock-free or mutex-protected buffers depending on platform (we use the most efficient approach for each platform)
  • Jitter buffering to smooth out network timing variations (network latency can vary, so we buffer a bit to smooth it out)
  • Configurable buffer sizes for latency/quality trade-offs (bigger buffers = smoother playback but higher latency, smaller buffers = lower latency but might stutter)

How do we handle threading?

Thread Safety:

  • Audio context state protected by mutex: When multiple threads access the audio context, they're protected by a mutex so there are no race conditions
  • Ring buffers provide thread-safe audio data transfer: The ring buffers themselves are thread-safe, so capture and playback can run in parallel
  • Real-time priority scheduling on supported platforms: Audio threads get real-time priority, so they don't get interrupted by other tasks (critical for smooth audio playback)

Audio Parameters

Audio parameters control the quality and latency of audio. The defaults are tuned for good quality with low latency, but you can adjust them for your needs.

What are the defaults?

Default Configuration:

  • Sample Rate: 44.1kHz (CD quality)—this gives you excellent audio quality while keeping bandwidth reasonable. If you need even higher quality, you can go up to 48kHz or even 192kHz
  • Channels: Mono (1 channel)—this keeps bandwidth low. If you want stereo (2 channels), you can enable it, but it doubles the bandwidth
  • Buffer Size: 256 frames per buffer (low latency)—this gives you low latency (~5.8ms at 44.1kHz). If you have audio stuttering, you might want to increase this
  • Format: 32-bit floating point samples—this gives you the best quality and is what PortAudio recommends

What can I configure?

Configurable Options:

  • Custom sample rates: You can use any sample rate from 8kHz (for low bandwidth) up to 192kHz (for high quality), but 44.1kHz and 48kHz are the most common
  • Stereo support: You can enable stereo (2 channels) if you want spatial audio, but it doubles the bandwidth
  • Variable buffer sizes: You can adjust buffer sizes from 128 frames (ultra-low latency but might stutter) to 1024 frames (smooth but higher latency)
  • Device selection: You can choose which input/output device to use if you have multiple audio devices

Operations

Initialization

Create Audio Context:

audio_context_t audio_ctx;
asciichat_error_t err = audio_init(&audio_ctx);
if (err != ASCIICHAT_OK) {
log_error("Failed to initialize audio: %d", err);
return err;
}
asciichat_error_t audio_init(audio_context_t *ctx)

Configure Audio Parameters:

// Set custom sample rate and buffer size
audio_ctx.sample_rate = 48000;
audio_ctx.buffer_size = 512;
audio_ctx.channels = 1; // Mono

Audio Capture

Start Capture:

err = audio_start_capture(&audio_ctx);
if (err != ASCIICHAT_OK) {
log_error("Failed to start audio capture");
return err;
}

Read Captured Samples:

float samples[256];
asciichat_error_t err = audio_read_samples(&audio_ctx, samples, 256);
if (err == ASCIICHAT_OK) {
// Send samples to network
send_audio_packet(sockfd, samples, 256);
}
asciichat_error_t audio_read_samples(audio_context_t *ctx, float *buffer, int num_samples)

Audio Playback

Start Playback:

err = audio_start_playback(&audio_ctx);
if (err != ASCIICHAT_OK) {
log_error("Failed to start audio playback");
return err;
}

Write Playback Samples:

float samples[256];
// Receive samples from network
receive_audio_packet(sockfd, samples, 256);
// Write to playback buffer
asciichat_error_t err = audio_write_samples(&audio_ctx, samples, 256);
if (err != ASCIICHAT_OK) {
log_warn("Failed to write audio samples");
}
asciichat_error_t audio_write_samples(audio_context_t *ctx, const float *buffer, int num_samples)

Cleanup

Stop Audio:

audio_stop_capture(&audio_ctx);
audio_stop_playback(&audio_ctx);

Destroy Audio Context:

audio_destroy(&audio_ctx);
void audio_destroy(audio_context_t *ctx)

Platform Support

Windows:

  • DirectSound backend (legacy)
  • WASAPI backend (modern, recommended)
  • ASIO backend (low-latency professional audio)

Linux:

  • ALSA backend (standard Linux audio)
  • JACK backend (professional audio, low latency)
  • PulseAudio support via ALSA

macOS:

  • CoreAudio backend (native macOS audio)
  • Automatic device selection
  • Low-latency support

Performance

Audio performance is all about balancing latency, CPU usage, and bandwidth. We've tuned the defaults for good performance across all three dimensions, but let's look at what you can expect:

How much latency do we have?

Latency:

  • Buffer size: 256 frames @ 44.1kHz = ~5.8ms latency (this is the latency from the audio buffer itself)
  • Network jitter buffering: +46ms (8 packets) (this is extra buffering to smooth out network timing variations)
  • Total end-to-end latency: ~50-60ms (this is the total time from when someone speaks to when you hear it)

50-60ms is quite good for networked audio—it's comparable to phone calls and way better than most video conferencing software. The jitter buffering helps smooth out network hiccups, so you get smooth audio even when the network is a bit flaky.

How much CPU does audio use?

CPU Usage:

  • Audio capture: ~1-2% CPU (single thread) (capturing audio is pretty lightweight)
  • Audio playback: ~1-2% CPU (single thread) (playback is also lightweight)
  • Total audio overhead: ~2-4% CPU (so audio adds about 2-4% CPU overhead total)

Audio is pretty lightweight—you probably won't even notice the CPU usage unless you're monitoring it closely.

How much bandwidth does audio use?

Bandwidth:

  • 44.1kHz mono: ~176 KB/s (about 176 kilobytes per second)
  • 48kHz mono: ~192 KB/s (about 192 kilobytes per second)
  • 48kHz stereo: ~384 KB/s (about 384 kilobytes per second, double because stereo is two channels)

Audio bandwidth is pretty reasonable—even stereo at 48kHz is only about 384 KB/s, which is much less than video. You could stream audio over a decent cellular connection without any problems.

Ring Buffers

The audio system uses ring buffers for efficient producer-consumer audio transfer:

Capture Ring Buffer:

  • Producer: PortAudio capture callback
  • Consumer: Network send thread
  • Size: 8192 samples (~186ms @ 44.1kHz)
  • Thread-safe: Mutex-protected on all platforms

Playback Ring Buffer:

  • Producer: Network receive thread
  • Consumer: PortAudio playback callback
  • Size: 8192 samples (~186ms @ 44.1kHz)
  • Jitter buffer threshold: 2048 samples (~46ms)
  • Thread-safe: Mutex-protected on all platforms

Threading

Audio Threads:

  • Capture thread: PortAudio callback (real-time priority)
  • Playback thread: PortAudio callback (real-time priority)
  • Network threads: User threads (normal priority)

Priority Scheduling:

  • Windows: THREAD_PRIORITY_TIME_CRITICAL for audio threads
  • Linux: SCHED_FIFO with priority 50 (requires capabilities)
  • macOS: Real-time thread priority

Integration

Network Integration:

  • Audio samples sent via PACKET_TYPE_AUDIO_BATCH
  • Compression enabled for large batches
  • Encryption via crypto context (if enabled)

Ring Buffer Integration:

  • Uses specialized audio ring buffer (audio_ring_buffer_t)
  • Jitter buffering for network latency compensation
  • Thread-safe operations with mutex protection
See also
audio.h
ringbuffer.h
network/av.h