Dictation Software for Linux
I’ve heard people say that their typing can’t keep up with their thoughts. I’ve never really felt that way, so I’m not sure if that means I’m a fast typer or just a slow thinker (probably a little of column A, a little of column B). Either way, I’ve never given speech to text (STT), or dictation software much of a shot because I didn’t see much value in it for me personally.
I have been hearing a lot about speech to text lately though, and how much better it’s gotten since the days of yore, when Dragon was really the only option in town. Intrigued, I set out to see what is available for Linux these days. I thought my options would be less limited, but unfortunately I ran into some snags.
What Didn’t Work
VOXD
VOXD is an open source transcriber that runs via command line. You can assign a hotkey to run it in Linux. I wasn’t able to get it to work in Fedora 43, unfortunately, as I get an error message on setup. I might try to tinker with it some more at a later date to get it to work.
Handy
Handy is another open source contender that runs entirely locally. Handy has 5 different transcription models to choose from, including Parakeet TDT, which runs on NVIDIA GPUs for acceleration. I like the idea behind Handy quite a bit, as it’s designed to be an extensible framework for vocal computing, and the developer encourages community involvement in its development and its plugin ecosystem. According to their website “Handy isn’t trying to be the best speech-to-text app. It’s trying to be the most forkable one.” Unfortunately, when I tried to run it on Fedora it failed to initialize the Wayland data control protocol clipboard and I could never get it to work.
Hyprwhspr
hyprwhspr touts it runs natively on Fedora/Wayland/GNOME, so I was excited to try it out. Like VOXD and Handy, it’s open source and free. It supports Parakeet as well as Whisper transcription models, and it also supports REST API for OpenAI, Groq, or custom endpoints. hyprwhspr claims it can run on any Linux distribution with systemd. Unforunately, hyprwhspr didn’t install correctly either, as I could never get the dependencies to to install for some reason. When I run the script to install dependencies it just tells me it’s going to install them and then does nothing. I was starting to feel pretty hopeless about the state of open source transcription software by now, and I ran across this thread from the Linux Mint forums from last year that seemed to indicate that this there isn’t much out there that’s available right now.
What Finally Worked
Whispering (Epicenter)
Whispering is open source dictation software which can use local AI models for STT, or you can bring your own key (BYOK) to use AI models from OpenAI, Anthropic, Groq, Google, ElevenLabs, or Mistral. It also supports OpenRouter. If you’re using local models, none of your data ever leaves your PC, which I think is pretty cool. I downloaded and tested it with Parakeet V3.
The way the software works is it records your voice, runs it through the transcription model, then copies the text to your system’s clipboard. When you’re done recording it pastes from your clipboard to wherever your cursor is. The downside of this approach is that you’re not able to see the text being typed as you speak.
The transcription with Parakeet is pretty dang good. It did mess up a handful of words, which may . It stores recordings locally on your machine. You can also run text through AI post-processing using what Whispering calls Transformations in order to do all kinds of things with the transcribed text, like format meeting minutes a specific way, for example. This seems like a really powerful feature, but I need to explore it a bit more to see how well it works.
Whispering seems to have recently changed their name to Epicenter, and they’re introducing what they call an “ecosystem” of apps that work together, starting with some kind of local AI assistant. To be honest, this is the something that gives me pause with Whispering. I really like the simplicity and power it provides, and I really hate software bloat with a passion, so I’m hoping whatever direction they go they keep the software modular so I can just keep the bits I like. Either way, this was the only thing I could get to work at this time on my Fedora 43 machine, so I stopped here for now.
Conclusion
For now at least, Whispering seems like the only thing I can get to run on my machine. I’m confident that more options will become available in the future, however, as I know speech to text is exploding in popularity as these transcription models have gotten significantly more sophisticated in the past couple of years. I’ll sit patiently and wait for better Fedora support from some of these other tools. In the meantime, Whispering works for my limited needs.
For more information about text to speech on Linux, this blog post on LinuxVox.com is a wealth of information.