May 16, 2026

What Your Dictation App Knows About You

Most people think about dictation privacy in terms of words: what you said, and whether the app stores it. But raw audio encodes far more than the words you speak. Your voice carries signals about your identity, your health, your emotional state, your environment, and even your location. This post is about what those signals are — and why the architecture of your dictation tool matters more than its privacy policy.

What audio reveals beyond words

When you speak into a dictation app, you are not just transmitting text that happens to be encoded as sound waves. You are transmitting a rich biometric signal. Here is what a raw audio recording contains, whether you intend it or not:

Your voiceprint

Your voice is as individually identifying as your fingerprint. Voiceprints are used by banks for authentication, by law enforcement for identification, and by advertising networks for cross-device tracking. A dictation app that receives your raw audio has everything it needs to build or match a voiceprint, whether it tells you it does or not.

Your emotional state

Prosody — the rhythm, pitch, and tone of your speech — is a reliable signal of emotional state. Stressed people speak faster and at a higher pitch. Depressed people speak with reduced pitch variation and slower tempo. Modern ML models can detect these patterns with accuracy that rivals clinical assessments. Your dictation app may not be doing this today, but the raw material is in every audio stream.

Your health indicators

Vocal biomarkers are an active research area in medicine. Changes in voice can indicate Parkinson's disease, respiratory conditions, cardiovascular issues, and cognitive decline. Researchers have demonstrated that audio recordings can predict coronary artery disease from vocal features alone. Again, your dictation app is not running diagnostic models — but it is collecting the raw signal that those models consume.

Your environment

Background audio tells a story. Traffic noise places you near a road. Office chatter suggests a workplace. A specific airport PA announcement identifies your location. Children's voices indicate a family. The hum of specific appliances can even identify the room you are in. Your dictation app is not analyzing these ambient signals — but a raw audio stream contains them, and they are transmitted alongside your words.

Overheard conversations

You might be the one dictating, but your microphone also captures what people around you are saying. If you dictate in an open office, a coffee shop, or at home with family, the audio stream contains their words too. Those people never consented to having their voices transmitted to a server.

Why privacy policies are the wrong tool for this problem

The standard approach to privacy in consumer software is a privacy policy — a document that describes what the company collects, how it uses data, and what it shares. For most software, this model works reasonably well. When you use a notes app, the data is text you intentionally typed, and the privacy policy governs how that text is handled.

Dictation is different. The data is a raw biometric signal that encodes far more than you intended to share. You cannot selectively redact your emotional state from your voice. You cannot filter out background conversations before they reach the microphone. The signal is inherently richer than the text it represents, and that richness is the problem.

A privacy policy is a promise about what a company will do with that rich signal. But promises can change. Companies get acquired. Terms of Service get updated. Data that was “not used for training” today can be used for training tomorrow with an email notification you will not read. The GDPR and CCPA give you rights on paper, but exercising those rights against a company that holds your voice data is not straightforward.

The architectural argument

There is an alternative to trusting promises: choosing an architecture where the data never exists on a server in the first place.

On-device dictation tools — Rewisper, MacWhisper — process audio entirely on your machine. The raw audio is consumed by an on-device speech-to-text model, text is produced, the audio buffer is discarded, and a local cleanup pass runs on the transcript. The result lands on your clipboard or, optionally, at your cursor. No audio is transmitted. No server receives anything.

This is not a promise. It is a mathematical property of the architecture. The signal never leaves your device, so there is no server-side copy to be breached, subpoenaed, sold, or repurposed. The privacy guarantee is structural.

The trade-off

Structural privacy is stronger than contractual privacy, but it comes with a hardware ceiling. On-device tools handle cleanup locally, which is good enough for most prose but cannot match a frontier cloud model on the longest, most contextual rewrites. Cloud dictation can throw more compute at the problem. The question is whether that edge is worth the signal you give up.

What happens to your voice data over time

Even if a cloud dictation service has strong privacy practices today, your voice data has a property that text data does not: it cannot be rotated. If your password leaks, you change it. If your credit card is stolen, you get a new one. If your voiceprint is compromised, you cannot get a new voice.

Voice data is a permanently identifying biometric. A recording of you speaking from five years ago is still identifiably you. A voiceprint built from your dictation sessions today will match you a decade from now. This is not like a data breach that exposes your email address — it is a breach that exposes a biometric identifier you carry for life.

This is the argument for minimizing the number of entities that ever possess your raw voice data. Every server that touches your audio is a server that could be compromised, subpoenaed, or repurposed. The only server that cannot be any of those things is the one that does not exist.

The secondary data problem

Even when a company's intentions are good, voice data creates secondary privacy risks. Consider:

Model training.If a company uses “anonymized” voice data to train future models, can those models later be prompted to reveal information about individuals in the training set? Research on training data extraction suggests this is a real concern for large models.
Mergers and acquisitions.When a company is acquired, its data assets transfer to the acquirer, including voice data collected under a different privacy policy. The acquiring company's use of that data may differ from what the original privacy policy described.
Legal discovery. If a company becomes party to litigation, its data — including user audio, if retained — may be subject to discovery. Your voice data could become evidence in a case you have nothing to do with.
Government access. Under laws like the CLOUD Act in the US, law enforcement can compel US-based technology companies to produce data stored on their servers, regardless of where the user lives. Voice data held by a US company is reachable by US law enforcement.

What you can actually do

Privacy is not binary. It is a spectrum of risk and convenience. Here are the practical options, from most private to most accurate:

Maximum privacy: On-device dictation

Use Rewisper or MacWhisper. Audio never leaves your Mac. No server, no privacy policy to trust, no data to breach. The privacy guarantee is structural. Accuracy is competitive with cloud ASR for everyday speech; cleanup runs locally, so the most aggressive contextual rewrites still favor cloud tools.

Moderate privacy: Cloud dictation, data processing opt-out

Use Wispr Flow or Aqua Voice, but opt out of data retention and model training where those options exist. Your audio still transits their servers, but you minimize what persists. This is a contractual protection, not a structural one.

Hybrid approach

Use on-device dictation for sensitive material (personal notes, work documents, health information) and cloud dictation when accuracy matters more than privacy (public-facing writing, casual messages). You do not have to pick one tool for everything.

The bottom line

Your dictation app knows more about you than you think. Not because it is malicious — because raw audio is an extraordinarily rich signal, and you cannot filter out the parts you did not mean to share.

Cloud dictation services, for all their accuracy and polish, create a permanent biometric data trail on servers you do not control. Privacy policies offer contractual protection, but contracts change and servers get breached. Your voice cannot be rotated.

On-device dictation eliminates the problem by ensuring the audio never leaves your machine. The guarantee is not in the terms of service — it is in the absence of a server. For anyone who cares about voice privacy, that is the difference that matters.

Read: Where Does Your Voice Go When You Use Wispr Flow or Aqua Voice? →
Read: Is Cloud Dictation HIPAA-Compliant? →