High-performance, configurable pixel format and color space conversion with Highway SIMD acceleration.
The CSC framework converts images between any pair of pixel descriptions (PixelFormat), handling differences in memory layout, component ordering, bit depth, chroma subsampling, color model, and transfer function. It is the implementation behind Image::convert().
The framework has three execution tiers:
"Path" configuration key. Useful for debugging and as a reference implementation (100–300 Mpix/s).The generic pipeline processes one scanline at a time through an ordered chain of stages. Stages that are not needed for a given conversion are eliminated at compile time.
The YCbCr matrix operates on gamma-encoded RGB values, not linear. This matches the ITU-R specifications (BT.709, BT.601, BT.2020) where the luma/chroma separation is defined on the encoded signal.
Fast paths are registered at static initialization via CSCRegistry::registerFastPath() and discovered automatically by CSCPipeline during compilation. Each fast path is a single function that converts one scanline from source to target format, bypassing all pipeline stages.
Registered fast paths (56 total):
| Standard | Bit Depth | Formats |
|---|---|---|
| BT.709 | 8-bit | YUYV, UYVY, NV12, NV21, NV16, Planar 422/420 ↔ RGBA8 |
| BT.709 | 10-bit LE | UYVY, Planar 422/420, NV12, v210 ↔ RGBA10 LE |
| BT.709 | 8-bit | v210 ↔ RGBA8 (inline 8↔10-bit step around v210 math) |
| BT.709 | 12-bit LE | UYVY, Planar 422/420, NV12 ↔ RGBA12 LE |
| BT.601 | 8-bit | YUYV, UYVY, NV12, Planar 420 ↔ RGBA8 |
| BT.2020 | 10-bit LE | UYVY, Planar 420 ↔ RGBA10 LE |
| BT.2020 | 12-bit LE | UYVY, Planar 420 ↔ RGBA12 LE |
| Format | 8-bit | BGRA↔RGBA, RGBA↔RGB, RGB↔BGRA |
Fast paths use fixed-point integer arithmetic with standard broadcast coefficients (e.g. BT.709: Y = 66R + 129G + 25B). They do not apply separate sRGB and Rec.709 transfer functions — the input sRGB signal is treated as Rec.709 directly. This is standard broadcast practice and produces results within ±1 LSB of the ITU integer reference for 8-bit.
Scalar pipeline vs Color::convert() — The scalar generic pipeline uses the same float-domain EOTF, 3×3 matrix, and OETF chain as Color::convert(), with transfer functions evaluated via pre-computed LUTs. Maximum deviation: ±2 LSB for 8-bit conversions.
Fast path vs scalar pipeline — Fast paths use integer BT.709/601/2020 arithmetic which intentionally approximates the sRGB↔Rec.709 transfer function difference as negligible. Maximum deviation from the scalar pipeline: ≤35 LSB near black where the sRGB and Rec.709 linear segments diverge. For mid-range and highlight values, the deviation is typically ≤2 LSB.
Fast path round-trip — RGB → YCbCr → RGB via fast paths incurs ≤3 LSB error for achromatic and saturated colors (integer quantization). Chroma subsampling (4:2:2, 4:2:0) introduces additional spatial error at sharp color transitions.
10-bit accuracy — 10-bit fast paths use the same fixed-point structure scaled to the 10-bit range (Y: 64–940, Cb/Cr: 64–960). When the source is upconverted from 8-bit, the combined tolerance is ±4 LSB.
12-bit accuracy — 12-bit fast paths use fixed-point arithmetic scaled to the 12-bit limited range (Y: 256–3760, Cb/Cr: 256–3840, out of 0–4095). Round-trip (RGBA12 → YCbCr12 → RGBA12) error is ≤3 LSB for achromatic and saturated colors. Chroma subsampling (4:2:2, 4:2:0) introduces additional spatial error at sharp color transitions.
CSCPipeline accepts a MediaConfig for runtime hints:
| Key | Type | Default | Description |
|---|---|---|---|
MediaConfig::CscPath | String | "optimized" | "optimized" enables SIMD and fast paths. "scalar" forces the generic float pipeline with no SIMD. |
CSCPipeline is immutable after construction and safe to execute concurrently from multiple threads. Each thread must provide its own CSCContext for scratch storage. Image::convert() creates a temporary context internally, so no manual context management is needed for the high-level API.
Pipeline caching — Callers that repeatedly convert between the same format pairs should use CSCPipeline::cached() instead of constructing a new CSCPipeline each call. The cache is a process-wide mutex-protected map keyed on (srcDesc, dstDesc, path). compile() runs exactly once per unique triple; subsequent calls return the same shared pipeline. The Image::convert() high-level API uses the cache automatically.
Register a fast path at static initialization:
The pipeline will automatically use the registered kernel when constructing a CSCPipeline for that source/target pair, unless the "Path" config is set to "scalar".
The CSC test suite validates conversions at multiple levels:
Color::convert() single-pixel validation (ground truth)Color::convert() (±2 LSB)VideoTestPattern through scalar pipeline vs Color referenceVideoTestPattern through fast path vs BT.709 integer reference (±1 LSB)See also: CSCPipeline, CSCContext, CSCRegistry, Image::convert(); Color Science for the underlying color model architecture.