TNT — terminal voice-to-text for Apple Silicon

The whole loop

Three taps from thought to text.

No menus, no modes to learn. The model is already loaded and waiting — recording starts the instant you press the key.

Tap to record

Press Space to start. Hold it instead to record only while held — a live braille oscilloscope shows real mic levels as you talk.

Tap to transcribe

Press Space again to stop. Audio goes straight to the resident GPU model — a short take returns in a fraction of a second.

It's on your clipboard

The transcript appears in the log and is auto-copied. Paste anywhere. Click any past entry to copy it again. Press Space mid-transcribe to cancel.

Built for the terminal

Fast where it counts. Honest about the rest.

Pure MLX inference on the Apple GPU, a microphone that can always be reclaimed, and a TUI that reshapes itself to your terminal.

⚡

In-process GPU inference

Pure MLX — no PyTorch, no CUDA, no subprocess for the model. Weights are int8 (~2.5 GB, about half of BF16) with a faster decode, loaded once in a background warmup so every single take is warm.

MLXint8~2.5 GBresidentApple GPU

⠿

Live braille oscilloscope

Real audio levels render as a braille waveform while you record, so you always know the mic is hearing you.

🌐

Zero network at runtime

Everything runs on your machine. No cloud round-trips, no telemetry, nothing leaves the laptop.

🎙️

Mic that can't get stuck

Native AVFoundation capture in an isolated Swift helper process. A wedged audio stack? TNT kills the helper and macOS releases the mic.

语

English, Chinese & mixed

Language auto-detected, or force it via env var — keep mixed zh/en speech from being translated away.

▦

Clipboard-first, responsive TUI

New transcriptions auto-copy; click any past entry to copy it again. The layout uses a side-rail on wide terminals and stacks on narrow ones — it fits whatever window you've got.

Get going

Up and running in one command.

Apple Silicon, Python 3.13+, and uv. The bootstrap script pulls the int8 checkpoint and links it; first launch compiles the tiny Swift mic helper and caches it.

Requirements

▸Apple Silicon Mac — M1 or later
▸Python 3.13+
▸uv — the package manager
▸Xcode CLT — xcode-select --install
▸~2.5 GB for the int8 model weights

git + uv

$ git clone https://github.com/appautomaton/tnt-asr.git
$ cd tnt-asr
$ uv sync
$ ./bootstrap-mlx-asr.sh      # downloads + links the int8 checkpoint (~2.5 GB)
$ uv run tnt

$ uv tool install automaton-tnt
$ TNT_MLX_MODEL=/path/to/qwen3-asr-1.7b-int8-mlx tnt

# or symlink the checkpoint instead of setting the env var:
#   ~/.local/share/tnt/qwen3-asr-mlx

A ready-to-use int8 build is published at appautomaton/qwen3-asr-1.7b-int8-mlx. BF16 and mxfp8 builds work too — mlx-speech reads the quant from the checkpoint config, so switching is just a relink.

Yours alone

Your voice never leaves the laptop.

Inference runs in-process on your Apple GPU. There are no network calls at runtime — not for the model, not for analytics. What you say stays on your machine.

No cloud No telemetry No PyTorch / CUDA MLX only MIT licensed