◆ Pure MLX · Apple Silicon · No PyTorch

Speech, rendered on the metal.

An open-source, MLX-native speech library for Apple Silicon. Text-to-speech, voice cloning, dialogue, sound effects, and recognition, all running on the Apple GPU. Weights download on first use. Load any model by a short alias.

$ pip install mlx-speech

View on GitHub

Speech models

Tasks · TTS & ASR

48kHz

Stereo output

PyTorch at runtime

01 Why pure MLX

The laptop is the whole runtime.

Everything runs on the Apple GPU through MLX. No Python-side Torch, no server, nothing leaves the machine. Every model returns a real waveform, from text to audio.

Pure MLX runtime

No torch-backed inference under an MLX label. Weights ship as .safetensors with explicit remapping. The Apple GPU does the work.

Local & private

Converted weights download once, then run fully offline. Aliases and local checkpoint paths are interchangeable.

One clean interface

tts.load() / asr.load() in Python, or the mlx-speech CLI. Per-family scripts expose the full surface.

02 The catalog

Eleven models. One loader.

Synthesis, cloning, dialogue, editing, sound effects, and recognition. Each module links to its behavior guide and its converted weights on Hugging Face.

Text-to-speech08 modules

Speech-to-text03 modules

All weights live under appautomaton on Hugging Face. Load by alias or full repo id. tts.load("fish-s2-pro") and tts.load("appautomaton/fishaudio-s2-pro-8bit-mlx") are equivalent.

03 Quickstart

Install. Load. Generate.

Requires an Apple Silicon Mac (M1 or later) and Python 3.13+. Weights download on first use.

mlx-speech

# Text-to-speech

model = mlx_speech.tts.load("fish-s2-pro")

result = model.generate("Hello from mlx-speech!")

# result.waveform: mx.array · result.sample_rate: int

# Voice cloning with emotion tags

result = model.generate(

"[excited] This is amazing!",

reference_audio="reference.wav",

reference_text="Transcript of the reference.",

)

# Speech-to-text

asr = mlx_speech.asr.load("qwen3-asr-1.7b")

print(asr.generate("audio.wav").text)

$ pip install mlx-speech

$ mlx-speech tts --model fish-s2-pro --text "Hello!" -o out.wav

✓ out.wav

$ mlx-speech asr --model qwen3-asr-1.7b --audio speech.wav

$ mlx-speech tts --list-models

View on GitHub mlx-speech on PyPI