--- library_name: onnxruntime tags: - snac - onnx - 24khz - decoder - browser license: other language: - en --- # SNAC 24 kHz — Decoder as ONNX (browser-ready) This repo provides **ONNX decoders** for the SNAC 24 kHz codec so you can decode SNAC tokens **on-device**, including **in the browser** with `onnxruntime-web`. **Why?** If your TTS front-end is a decoder-only Transformer (e.g. Orpheus-style) that can stream out SNAC tokens fast and cheaply, you can keep synthesis private and responsive by decoding the audio **in the user’s browser/CPU** (or WebGPU when available). > In a Colab CPU test, we saw ~**2.1× real-time** decoding for a longer file using the ONNX model (inference time only, excluding model load). Your mileage will vary with hardware and browser. --- ## Files - **`snac24_int2wav_static.onnx`** — *int → wav* decoder Inputs (int64): - `codes0`: `[1, 12]` - `codes1`: `[1, 24]` - `codes2`: `[1, 48]` Output: - `audio`: `float32 [1, 1, 24576]` (24 kHz) Shapes correspond to a **48-frame window**. Each frame is **512 samples**, so one window = **24576 samples** ≈ **1.024 s** at 24 kHz. Token alignment: `L0*4 = L1*2 = L2*1 = shared_frames`. - **`snac24_latent2wav_static.onnx`** — *latent → wav* decoder Input: `z` `float32 [1, 768, 48]` → Output: `audio [1, 1, 24576]` Use this if you reconstruct the latent yourself (RVQ embeddings + 1×1 conv projections). - **`snac24_codes.json`** — sample codes (for testing) - **`snac24_quantizers.json`** — RVQ metadata/weights (stride + embeddings + 1×1 projections) to reconstruct `z` if needed. --- ## Browser (WASM/WebGPU) quickstart Serve these files from a local server with cross-origin isolation for multithreaded WASM (e.g., COOP/COEP headers). If not isolated, WASM will typically run **single-threaded**. ```html Streaming note SNAC is streamable in principle. For practical low-latency TTS, emit ~200 ms of tokens, decode in ~100 ms, start playback, and continue decoding subsequent chunks; cross-fade a few ms to hide seams. Threads / GPU Multithreaded WASM requires cross-origin isolation (COOP/COEP). Without it, browsers typically run single-threaded. WebGPU can accelerate on desktop and mobile when kernels are supported; this model usually falls back to WASM if not.