mlx-speech
Pure MLX · Apple Silicon · No PyTorch

Speech, rendered on the metal.

An open-source, MLX-native speech library for Apple Silicon. Text-to-speech, voice cloning, dialogue, sound effects, and recognition, all running on the Apple GPU. Weights download on first use. Load any model by a short alias.

$ pip install mlx-speech
View on GitHub
11
Speech models
2
Tasks · TTS & ASR
48kHz
Stereo output
0
PyTorch at runtime
01 Why pure MLX

The laptop is the whole runtime.

Everything runs on the Apple GPU through MLX. No Python-side Torch, no server, nothing leaves the machine. Every model returns a real waveform, from text to audio.

Pure MLX runtime

No torch-backed inference under an MLX label. Weights ship as .safetensors with explicit remapping. The Apple GPU does the work.

Local & private

Converted weights download once, then run fully offline. Aliases and local checkpoint paths are interchangeable.

One clean interface

tts.load() / asr.load() in Python, or the mlx-speech CLI. Per-family scripts expose the full surface.

02 The catalog

Eleven models. One loader.

Synthesis, cloning, dialogue, editing, sound effects, and recognition. Each module links to its behavior guide and its converted weights on Hugging Face.

Text-to-speech08 modules
Speech-to-text03 modules

All weights live under appautomaton on Hugging Face. Load by alias or full repo id. tts.load("fish-s2-pro") and tts.load("appautomaton/fishaudio-s2-pro-8bit-mlx") are equivalent.

03 Quickstart

Install. Load. Generate.

Requires an Apple Silicon Mac (M1 or later) and Python 3.13+. Weights download on first use.

mlx-speech
# Text-to-speech
model = mlx_speech.tts.load("fish-s2-pro")
result = model.generate("Hello from mlx-speech!")
# result.waveform: mx.array · result.sample_rate: int
 
# Voice cloning with emotion tags
result = model.generate(
  "[excited] This is amazing!",
  reference_audio="reference.wav",
  reference_text="Transcript of the reference.",
)
 
# Speech-to-text
asr = mlx_speech.asr.load("qwen3-asr-1.7b")
print(asr.generate("audio.wav").text)
$ pip install mlx-speech
 
$ mlx-speech tts --model fish-s2-pro --text "Hello!" -o out.wav
  ✓ out.wav
 
$ mlx-speech asr --model qwen3-asr-1.7b --audio speech.wav
 
$ mlx-speech tts --list-models