Local · On-device · Apple Silicon

Speak.
It just lands.

Terminal voice-to-text. Tap Space, speak, tap Space — your words appear in the transcript and on the clipboard before you reach for the mouse.

Qwen3-ASR-1.7B runs in-process on the Apple GPU via MLX — int8, ~2.5 GB resident, warm on every take. Fully local: no cloud, no runtime network calls.

$ uv tool install automaton-tnt
The whole loop

Three taps from thought to text.

No menus, no modes to learn. The model is already loaded and waiting — recording starts the instant you press the key.

01

Tap to record

Press Space to start. Hold it instead to record only while held — a live braille oscilloscope shows real mic levels as you talk.

02

Tap to transcribe

Press Space again to stop. Audio goes straight to the resident GPU model — a short take returns in a fraction of a second.

03

It's on your clipboard

The transcript appears in the log and is auto-copied. Paste anywhere. Click any past entry to copy it again. Press Space mid-transcribe to cancel.

Built for the terminal

Fast where it counts. Honest about the rest.

Pure MLX inference on the Apple GPU, a microphone that can always be reclaimed, and a TUI that reshapes itself to your terminal.

In-process GPU inference

Pure MLX — no PyTorch, no CUDA, no subprocess for the model. Weights are int8 (~2.5 GB, about half of BF16) with a faster decode, loaded once in a background warmup so every single take is warm.

MLXint8~2.5 GBresidentApple GPU

Live braille oscilloscope

Real audio levels render as a braille waveform while you record, so you always know the mic is hearing you.

🌐

Zero network at runtime

Everything runs on your machine. No cloud round-trips, no telemetry, nothing leaves the laptop.

🎙️

Mic that can't get stuck

Native AVFoundation capture in an isolated Swift helper process. A wedged audio stack? TNT kills the helper and macOS releases the mic.

English, Chinese & mixed

Language auto-detected, or force it via env var — keep mixed zh/en speech from being translated away.

Clipboard-first, responsive TUI

New transcriptions auto-copy; click any past entry to copy it again. The layout uses a side-rail on wide terminals and stacks on narrow ones — it fits whatever window you've got.

Get going

Up and running in one command.

Apple Silicon, Python 3.13+, and uv. The bootstrap script pulls the int8 checkpoint and links it; first launch compiles the tiny Swift mic helper and caches it.

Requirements

  • Apple Silicon Mac — M1 or later
  • Python 3.13+
  • uv — the package manager
  • Xcode CLTxcode-select --install
  • ~2.5 GB for the int8 model weights
git + uv
$ git clone https://github.com/appautomaton/tnt-asr.git
$ cd tnt-asr
$ uv sync
$ ./bootstrap-mlx-asr.sh      # downloads + links the int8 checkpoint (~2.5 GB)
$ uv run tnt
$ uv tool install automaton-tnt
$ TNT_MLX_MODEL=/path/to/qwen3-asr-1.7b-int8-mlx tnt

# or symlink the checkpoint instead of setting the env var:
#   ~/.local/share/tnt/qwen3-asr-mlx

A ready-to-use int8 build is published at appautomaton/qwen3-asr-1.7b-int8-mlx. BF16 and mxfp8 builds work too — mlx-speech reads the quant from the checkpoint config, so switching is just a relink.

Yours alone

Your voice never leaves the laptop.

Inference runs in-process on your Apple GPU. There are no network calls at runtime — not for the model, not for analytics. What you say stays on your machine.

No cloud No telemetry No PyTorch / CUDA MLX only MIT licensed