May 16, 2026
Why I Built a Dictation App That Doesn’t Touch the Internet
In early 2025, I sat down to write a PR description and felt a familiar ache in my forearms. I had been typing for fifteen years, and my body was telling me to stop. I tried every cloud dictation app on the market. They were good — genuinely good — but every time I pressed the hotkey and started speaking, a thought looped in the back of my head: someone else can hear this. This is the story of what I did about it.
The moment that started it
It was a Tuesday. I was dictating a PR description through Wispr Flow — explaining a database migration, the reasoning behind a schema change, what reviewers should watch out for. The text came out beautifully. The formatting was perfect. And then I stopped mid-sentence because I realized I was describing my company's internal architecture to a server I knew nothing about.
I am not paranoid about technology. I use cloud services every day. I am not the person who puts tape over their webcam. But there was something about voice — about the rawness and intimacy of audio — that made cloud processing feel different from sending text to a server. Text is discrete and intentional. Audio is ambient and revealing. It carries your mood, your accent, your environment, the conversation happening in the next room. I did not want that signal leaving my machine.
So I did what any developer with more curiosity than sense would do: I opened a terminal and typed pip install openai-whisper.
The prototype that should not have worked
The first prototype was embarrassing. A Python script that recorded audio while you held a key, saved it to a temp file, ran it through Whisper, and pasted the result. It took about 4 seconds to transcribe a 5-second utterance. The latency was terrible. The accuracy was worse — I was using the tiny model, and it mangled anything longer than a sentence. But when it worked, even briefly, something clicked.
The text appeared. The audio file was deleted. Nothing had left my machine. The entire transaction — voice to text — had happened entirely on my MacBook, with WiFi turned off. I had used dictation without sending my voice to anyone. It felt like a superpower.
That feeling is what kept me going through the next year. Not the technology — the feeling of ownership. Of knowing that my words were mine, that the processing happened on my terms, that no server log recorded what I said. The prototype was slow and inaccurate, but it proved something important: on-device dictation was possible. The rest was engineering.
Why “just use a cloud app” was not the answer
When I told people I was building an on-device dictation app, the most common response was: “Why? Wispr Flow exists.” It was a fair question. Wispr Flow and Aqua Voice are excellent products. They solved the dictation problem for most people. Why build another one?
The answer, which took me months to articulate clearly, is this: cloud dictation and on-device dictation are not two versions of the same product. They are different products for different people. The difference is not features or pricing — it is architecture. A cloud dictation tool is a service. It processes your voice on someone else's computer. An on-device dictation tool is a utility. It processes your voice on your computer. The distinction matters for anyone who:
- Works with confidential or proprietary information
- Is subject to regulatory requirements (HIPAA, legal privilege)
- Works in environments without reliable internet
- Does not want a subscription for a tool that should be a utility
- Simply believes their voice should stay on their machine
That last group is larger than most people think. It includes journalists protecting sources, lawyers guarding privilege, therapists bound by confidentiality, developers at companies with strict data policies, and a growing number of people who have watched the tech industry's relationship with user data and decided they want less of it.
The decision to make it a real product
For the first few months, Rewisper was a tool I built for myself. I used it every day. I dictated PR descriptions, Slack messages, emails, and documentation. It replaced about 70% of my typing. My RSI symptoms receded. I was productive again, and I was doing it without sending my voice to a server.
But I kept running into the same conversation. I would mention that I used dictation, and another developer would say “I wish I could do that, but I can't send our internal stuff through a cloud service.” Or a lawyer would say “I would love to dictate my briefs, but I cannot risk the privilege issue.” Or a journalist would say “I cannot have my source material on someone else's server.”
Every one of those conversations was a signal. There was a significant group of people who wanted dictation but could not use the existing options because of structural, not preferential, reasons. They did not prefer privacy — they required it. And nobody was building for them.
I decided to be the one who did.
What I believe that most dictation companies do not
Most dictation companies believe accuracy is the only thing that matters. If they can get word error rate lower than the competition, they win. This is why cloud dictation dominates — throwing a massive language model at the problem produces better accuracy than a local model ever can. It is a competition local models cannot win on raw accuracy alone.
But I believe accuracy is not the only axis that matters. Privacy is not a feature you add to a cloud product — it is an architectural property. You cannot bolt privacy onto a service that receives raw audio. The only way to guarantee privacy is to never receive the audio in the first place. That is the bet Rewisper makes: that enough people care about where their voice goes that they will accept slightly lower accuracy in exchange for knowing it never leaves their machine.
I also believe dictation should be a one-time purchase, not a subscription. Dictation is a utility — like a text editor or a terminal. It does something specific and does not need ongoing server costs to function. Charging monthly for a tool that runs entirely on your device is a business model decision, not a technical necessity. I wanted Rewisper to be something you buy once and own.
What “doesn't touch the internet” actually means
I want to be precise about this, because “offline” is often used loosely. Rewisper does not make network requests during dictation. The speech-to-text model is loaded from your local disk. Inference runs on your Mac's CPU and Neural Engine. Audio is processed in memory and discarded. The transcribed text is inserted directly into your active application. At no point does any audio or text leave your machine through Rewisper.
This is not a privacy policy. It is a system architecture. You can verify it with a packet sniffer. You can turn off your WiFi and watch it work. The guarantee is not contractual — it is physical. There is no server, so there is nothing to trust.
I built Rewisper this way because I wanted a dictation tool I could use for everything— not just casual messages, but internal design docs, confidential bug reports, private conversations. A tool where I never had to stop mid-sentence and think “should I be saying this?” That kind of tool cannot have a server.
The people who get it immediately
One of the most satisfying parts of building Rewisper has been watching who becomes an evangelist. It is not the general consumer who tries ten apps and picks the one with the best UI. It is people with a specific, non-negotiable requirement for privacy:
- The security engineer who runs Wireshark on new software before installing it, and who told me Rewisper was the first dictation app to pass.
- The defense attorney who had been handwriting case notes because no cloud dictation service would sign a BAA.
- The journalist working on an investigation who could not risk their source material existing on a third-party server.
- The indie developer who just did not want another subscription and appreciated that the app works forever without phoning home.
These are not niche audiences. They are people with real needs that the existing market simply ignored, because the subscription + cloud processing business model is more attractive than selling software once. I understand why companies make that choice. I just made a different one.
What I would tell myself two years ago
If you are a developer thinking about building a local-first alternative to a cloud service, here is what I know now that I did not know when I started:
- The market is larger than you think. For every person who tweets about privacy, there are a hundred who quietly choose products based on it. They do not argue about it online. They just buy the thing that respects them.
- Accuracy does not need to beat the cloud. It needs to be good enough that the privacy trade-off feels worth it. That threshold is lower than most engineers assume.
- Local-first is a moat.Cloud dictation companies compete on features and pricing. On-device dictation competes on architecture. You cannot add “runs locally” to a cloud product any more than you can add “privacy” to a surveillance business model. It is either in the foundation or it is not.
- Users can tell the difference. They may not run Wireshark, but they understand the difference between “we promise not to look at your data” and “we never receive your data.” That distinction is the core of Rewisper's appeal.
The bottom line
I built Rewisper because I wanted a dictation tool I could use without thinking about where my voice was going. I wanted to dictate a PR description without wondering who might read it. I wanted to write an email without my audio passing through a data center in a different legal jurisdiction. I wanted a tool that worked on a plane, in a coffee shop with bad WiFi, and five years from now regardless of what happens to the company behind it.
That tool did not exist, so I built it. It runs on my Mac. It runs on thousands of other people's Macs now. It does not touch the internet, because it does not need to.
If you need dictation and you cannot — or will not — send your voice to the cloud, Rewisper was built for you. It was built for me first. I think you will like it.
Read: Where Does Your Voice Go When You Use Wispr Flow or Aqua Voice? →