Push-to-talk dictation · Linux · 100% on-device

Hold a key.
Speak. It's already typed.

AgentWhisper turns your voice into text on your own computer — typed straight into whatever you're writing, with a copy in your clipboard. No cloud, no account, no internet after setup.

Download for Debian/Ubuntu View source v0.3 · free & open source (MIT) · X11 · also installable from source

Try it: click & hold

works fully offline 0 accounts, API keys or telemetry ~2s from release to typed text 59 tests MIT licensed

How it works

One key. No windows, no buttons, no browser tab.

The whole interaction fits between pressing and releasing a single key — by default F12, but any key you like.

STEP 1

Hold F12

While AgentWhisper runs, the key belongs to it exclusively — no other program sees it, so nothing fires by accident.

STEP 2

Speak

A translucent panel floats at the bottom of your screen, green bars dancing to your voice — you always know when the mic is live.

STEP 3

Release

Your words are typed into the window you were working in and land in your clipboard as backup. A notification shows what was heard.

Private by architecture

Your voice never leaves this machine.

Speech recognition runs locally through faster-whisper — OpenAI's Whisper model on your own CPU. Nothing to sign up for, nothing phoning home, nothing to subscribe to.

One guided ~140 MB model download on first run, then AgentWhisper works entirely offline. Unplug the router; it won't notice.

Your microphonecaptured only while the key is held
Whisper, on your CPUtranscribed in seconds, models from tiny to medium
Your active windowtyped for you + copied to the clipboard
Cloud speech APIs NO CONNECTION

Features

A small tool, finished properly.

Everything you need for daily dictation and nothing you don't — controlled from a tray icon, a config file, or your terminal.

Lives in your system tray

A microphone icon that turns red while recording. Every control is two clicks away — this is the entire menu:

Scriptable from the terminal

The same controls as the tray, for your keybindings and scripts.

Two recording modes

Hold-to-talk, or tap once to start and once to stop. Switch anytime from the tray.

hold to talk press to toggle

Any key you like

F1–F12, Scroll Lock, Pause… reserved exclusively while it runs.

F12 ⇄

Configured in one commented file

Pick the key, the model, auto-type behavior and recording limits in ~/.config/agentwhisper/config.toml.

A first run that holds your hand

The speech model downloads in the background with live progress in the tray. Dictations made meanwhile are queued, not lost.

Install

Two minutes, in or out.

Built for Debian/Ubuntu with X11 and Python 3.11+. Everything lands in predictable places, and uninstall.sh removes it all again.

Full walkthrough and troubleshooting in the installation guide.

# system packages for tray, typing, clipboard (most preinstalled on XFCE)
sudo apt install python3-gi python3-gi-cairo \
     gir1.2-ayatanaappindicator3-0.1 xclip xdotool libnotify-bin

# grab the .deb from the latest release, then:
sudo apt install ./agentwhisper_0.3.7_all.deb

Then start AgentWhisper from your applications menu — the mic appears in your tray.

Roadmap

The name is the roadmap.

v0.3 does everything on the box today. Where it goes next:

shipped · v0.1 – v0.3

Core dictation

Exclusive hotkey, local transcription, auto-type, tray menu, voice visualizer, autostart, guided model download, .deb package.

later

More languages & Wayland

v0.3 ships English models; the engine underneath already speaks 90+ languages.

later · the namesake

Agent mode

Push-to-talk straight into your AI coding agent: speak to your computer, watch it work.