ascii-chat 0.6.0
Real-time terminal-based video chat with ASCII art conversion
Loading...
Searching...
No Matches
utf8.h File Reference

🔤 UTF-8 Encoding and Decoding Utilities More...

Go to the source code of this file.

Functions

int utf8_decode (const uint8_t *s, uint32_t *codepoint)
 Decode a UTF-8 sequence to a Unicode codepoint.
 

Detailed Description

🔤 UTF-8 Encoding and Decoding Utilities

This header provides simple, efficient UTF-8 validation and decoding without external dependencies. The implementation handles multi-byte UTF-8 sequences and validates encoding correctness.

CORE FEATURES:

  • Multi-byte UTF-8 sequence decoding (1-4 bytes)
  • Unicode codepoint extraction
  • UTF-8 validation during decoding
  • No external dependencies (pure C implementation)
  • Safe handling of invalid sequences

UTF-8 ENCODING:

UTF-8 encodes Unicode codepoints using 1-4 bytes:

  • 1 byte: ASCII characters (0x00-0x7F)
  • 2 bytes: Latin-1 supplement, etc. (0x80-0x7FF)
  • 3 bytes: Most CJK characters (0x800-0xFFFF)
  • 4 bytes: Rare characters, emoji (0x10000-0x10FFFF)

VALIDATION:

The decoder validates UTF-8 sequences during decoding:

  • Checks for valid byte sequences according to UTF-8 rules
  • Detects overlong encodings (security feature)
  • Detects invalid byte patterns
  • Returns error on invalid sequences
Note
This is a minimal implementation for basic UTF-8 handling.
For full Unicode support, consider using a library like ICU.
Invalid sequences return -1 to indicate error.
Author
Zachary Fogg me@zf.nosp@m.o.gg
Date
October 2025

Definition in file utf8.h.