AgentWhisper — push-to-talk dictation for Linux

How it works

One key. No windows, no buttons, no browser tab.

The whole interaction fits between pressing and releasing a single key — by default F12, but any key you like.

STEP 1

Hold `F12`

While AgentWhisper runs, the key belongs to it exclusively — no other program sees it, so nothing fires by accident.

STEP 2

Speak

A translucent panel floats at the bottom of your screen, green bars dancing to your voice — you always know when the mic is live.

STEP 3

Release

Your words are typed into the window you were working in and land in your clipboard as backup. A notification shows what was heard.

Private by architecture

Your voice never leaves this machine.

Speech recognition runs locally through faster-whisper — OpenAI's Whisper model on your own CPU. Nothing to sign up for, nothing phoning home, nothing to subscribe to.

One guided ~140 MB model download on first run, then AgentWhisper works entirely offline. Unplug the router; it won't notice.

Your microphonecaptured only while the key is held

Whisper, on your CPUtranscribed in seconds, models from tiny to medium

Your active windowtyped for you + copied to the clipboard

Cloud speech APIs NO CONNECTION

Features

A small tool, finished properly.

Everything you need for daily dictation and nothing you don't — controlled from a tray icon, a config file, or your terminal.

Lives in your system tray

A microphone icon that turns red while recording. Every control is two clicks away — this is the entire menu:

Ready — hold F12 and speak

✓Enabled

✓Auto-Type into active window

✓Notifications

Start at login

Recording ModeHold to talk ▸

Quit AgentWhisper

Scriptable from the terminal

The same controls as the tray, for your keybindings and scripts.

$ agentwhisper status

daemon: running · engine: ready · mode: hold · key: f12

$ agentwhisper mode toggle # tap to start, tap to stop

$ agentwhisper autostart on # start with your session

Two recording modes

Hold-to-talk, or tap once to start and once to stop. Switch anytime from the tray.

hold to talk press to toggle

Any key you like

F1–F12, Scroll Lock, Pause… reserved exclusively while it runs.

F12 ⇄

Configured in one commented file

Pick the key, the model, auto-type behavior and recording limits in ~/.config/agentwhisper/config.toml.

# The push-to-talk key

key = "f12"

# tiny.en (fastest) … medium.en (most accurate)

model = "base.en"

auto_type = true

A first run that holds your hand

The speech model downloads in the background with live progress in the tray. Dictations made meanwhile are queued, not lost.

whisper base.en62% · 87 MB / 140 MB

Install

Two minutes, in or out.

Built for Debian/Ubuntu with X11 and Python 3.11+. Everything lands in predictable places, and uninstall.sh removes it all again.

Full walkthrough and troubleshooting in the installation guide.

# system packages for tray, typing, clipboard (most preinstalled on XFCE)
sudo apt install python3-gi python3-gi-cairo \
     gir1.2-ayatanaappindicator3-0.1 xclip xdotool libnotify-bin

# grab the .deb from the latest release, then:
sudo apt install ./agentwhisper_0.3.7_all.deb

# no root needed — installs into your home directory
git clone https://github.com/ChrisSchroedinger/agentwhisper.git
cd agentwhisper
./install.sh

Then start AgentWhisper from your applications menu — the mic appears in your tray.

Roadmap

The name is the roadmap.

v0.3 does everything on the box today. Where it goes next:

shipped · v0.1 – v0.3

Core dictation

Exclusive hotkey, local transcription, auto-type, tray menu, voice visualizer, autostart, guided model download, .deb package.

AppImage

One file that runs on any distro — no install script, no packaging dance.

later

More languages & Wayland

v0.3 ships English models; the engine underneath already speaks 90+ languages.

later · the namesake

Agent mode

Push-to-talk straight into your AI coding agent: speak to your computer, watch it work.