Speech, rendered on the metal.
An open-source, MLX-native speech library for Apple Silicon. Text-to-speech, voice cloning, dialogue, sound effects, and recognition, all running on the Apple GPU. Weights download on first use. Load any model by a short alias.
The laptop is the whole runtime.
Everything runs on the Apple GPU through MLX. No Python-side Torch, no server, nothing leaves the machine. Every model returns a real waveform, from text to audio.
Pure MLX runtime
No torch-backed inference under an MLX label. Weights ship as .safetensors with explicit remapping. The Apple GPU does the work.
Local & private
Converted weights download once, then run fully offline. Aliases and local checkpoint paths are interchangeable.
One clean interface
tts.load() / asr.load() in Python, or the mlx-speech CLI. Per-family scripts expose the full surface.
Eleven models. One loader.
Synthesis, cloning, dialogue, editing, sound effects, and recognition. Each module links to its behavior guide and its converted weights on Hugging Face.
All weights live under appautomaton on Hugging Face. Load by alias or full repo id. tts.load("fish-s2-pro") and tts.load("appautomaton/fishaudio-s2-pro-8bit-mlx") are equivalent.
Install. Load. Generate.
Requires an Apple Silicon Mac (M1 or later) and Python 3.13+. Weights download on first use.