May 16, 2026
Where Does Your Voice Go When You Use Wispr Flow or Aqua Voice?
Every time you press that dictation hotkey, your voice leaves your Mac and travels somewhere. This post is a factual, sourced look at what Wispr Flow and Aqua Voice say they do with your audio — their Terms of Service, privacy policies, server locations, and retention practices. No speculation, no fear-mongering. Just the publicly documented facts, laid out so you can decide for yourself.
Why this matters
Cloud dictation is fundamentally different from typing. When you type, the server sees text — discrete, intentional, editable. When you dictate, the server receives raw audio. Audio contains more than words: your voiceprint, your accent, your emotional state, background conversations, ambient sounds that reveal where you are. It is a richer signal than text, and that richness is exactly why people want to know where it goes.
Both Wispr Flow and Aqua Voice have published privacy policies and terms of service. What follows is drawn directly from those documents, accessed May 2026.
Wispr Flow
What they collect
Wispr Flow's privacy policy states that they collect the audio you dictate, which is transmitted to their servers for processing. They also collect standard metadata — device information, app version, and usage analytics.
How it's processed
Audio is sent to Wispr's servers, where it is processed by speech-to-text models and large language models for formatting and contextual improvement. The processing pipeline involves both ASR (automatic speech recognition) and LLM-based text refinement. This is what gives Wispr Flow its formatting quality — the models understand context, add punctuation, capitalize appropriately, and structure paragraphs.
Server infrastructure
Wispr Flow uses cloud infrastructure for processing. Their privacy policy references standard cloud service providers. This means audio typically routes through US-based data centers, though the company does not publicly document a complete server map or offer region-locked processing options.
Data retention
Wispr states that they do not permanently store your audio recordings. Audio is processed in-memory during transcription and discarded afterward. However, their policy notes that transcribed text and associated metadata may be retained for service improvement purposes, unless you opt out. The opt-out mechanism exists but is not enabled by default.
Third-party sharing
Wispr's policy states they do not sell your data. They may share data with service providers (cloud infrastructure, analytics) who are contractually bound to data processing agreements. These are standard subprocessor relationships, not data sales.
Key takeaway
Wispr Flow processes audio on remote servers and discards it after transcription. Transcribed text may be retained for service improvement unless you opt out. Your voice data leaves your machine and crosses the network.
Aqua Voice
What they collect
Aqua Voice collects audio input, transcribed text, and usage data. Their privacy policy is more explicit than most about the fact that audio is transmitted to their servers — it is the core mechanism of the product, not a side effect.
How it's processed
Aqua Voice uses proprietary models fine-tuned for long-form dictation. Audio is streamed to their servers, transcribed, and returned as text. The product is designed for extended dictation sessions, which means the audio stream is continuous rather than chunked — your voice is streaming to their infrastructure for the duration of your dictation session.
Server infrastructure
Aqua Voice runs on cloud infrastructure, predominantly in the United States. Like Wispr, they do not offer region-specific processing or data residency options. If you are subject to data sovereignty requirements (GDPR in the EU, for example), this is a relevant consideration — your audio crosses into US jurisdiction during processing.
Data retention
Aqua Voice states that audio is processed in real time and not stored permanently. Transcribed text may be retained for model improvement. Their policy includes language about using “anonymized and aggregated” data for training, which is standard industry practice but worth noting — anonymization of voice data is a complex technical problem, and different companies define it differently.
Third-party sharing
Aqua Voice's policy states they do not sell personal data. They list standard subprocessor relationships for infrastructure and analytics. Their policy is comparable to Wispr Flow's in this regard.
Key takeaway
Aqua Voice streams your audio continuously during dictation sessions. Audio is not permanently stored, but transcribed text may be kept for training. US-based infrastructure, no data residency options.
The questions their ToS don't answer
Reading these policies side by side, several questions remain unanswered by either service:
- Who has access?Neither policy specifies which employees or contractors can access audio data, under what circumstances, or with what auditing. The policies authorize access for “service improvement” but do not describe access controls.
- What happens during a security incident? Neither company has published a breach disclosure policy specific to voice data. If their infrastructure is compromised, it is unclear what voice data an attacker could access.
- Can law enforcement request your audio? Both policies contain standard legal compliance clauses — they will disclose data if required by law. But neither addresses whether audio specifically (as opposed to account metadata) is subject to such requests, or whether they have ever received one.
- What is “anonymized” data? Voice data is inherently identifying. Your voiceprint is a biometric identifier. When a company says they anonymize voice data for training, it is not clear what that process entails or whether re-identification is possible.
How on-device dictation is different
On-device dictation tools like Rewisper and MacWhisper process your voice entirely on your Mac. The audio never leaves your machine. The transcription runs locally; the cleanup pass runs locally too. The result lands in your clipboard — or, optionally, at your cursor — with no server in between.
This architectural difference means that questions about data retention, server location, employee access, and law enforcement requests simply do not apply. The privacy guarantee is structural, not contractual — there is no server to subpoena because the data was never collected.
The trade-off
On-device dictation runs into a hardware ceiling. Local cleanup covers most contextual rewrites well, but a frontier cloud model still has the edge on the most ambitious transformations. The choice is not about which is “better” — it is about which trade-off matches your requirements: maximum accuracy with cloud processing, or maximum privacy with local processing.
The bottom line
Wispr Flow and Aqua Voice are transparent about the fact that they process your voice in the cloud. Their privacy policies are publicly available, reasonably clear, and consistent with standard SaaS practices. Both state they do not permanently store your audio. Both state they do not sell your data.
The open questions are not about dishonesty — they are about inherent structural risks of any cloud service: access controls, breach exposure, legal compulsion, and the limits of anonymization for biometric data.
If those risks are acceptable given the accuracy you need, Wispr Flow and Aqua Voice are the best cloud dictation tools available. If they are not acceptable — because of your industry, your jurisdiction, or your personal standards — on-device dictation exists and works well.
What matters is that you know the difference and can make an informed choice. That is what this post is for.
Read: Is Cloud Dictation HIPAA-Compliant? →
Read: What Your Dictation App Knows About You →