in the real world against Deepgram's ~28%
~12M min/mon
minutes/month of audio transcribed
10 languages
across all 16 markets
The Problem
Pocket records in the real world. Far-field rooms, phone calls, several people talking at once. Generic speech-to-text falls apart exactly there: wrong names, missing words, no sense of who said what.
And the stakes are high. Therapy sessions and financial calls carry sensitive data that has to be redacted, not just transcribed. Users span 10 languages across 16 markets, so one model has to hold accuracy everywhere.
A transcript users can't trust breaks everything downstream: the summary, the action items, the follow-up.
The Solution
Pocket runs Smallest AI's Pulse STT as the transcription layer in its batch pipeline. Because Pocket processes complete files rather than live streams, it uses Pulse in pre-recorded mode, where accuracy is highest.
One pass returns everything: transcript, speaker labels, PII and PCI redaction, punctuation, timestamps, and noise handling. Automatic language detection covers all 10 market languages. Nothing to stitch together as volume grows.
The Results
The following results are;
9.63% WER in the real world vs Deepgram's ~28%
Every transcript ships with speaker labels, redaction, punctuation, and timestamps in a single pass
Pulse holds accuracy across all of Pocket's market languages:insert table on multilingual benchmarks here
And it stays ahead as conditions degrade. WER by noise band:insert table on noise-band benchmarks here
Batch full-file processing scales with volume with no per-stream limits, so quality stays consistent across every market and use case.
Building something that depends on accurate transcription? See the Pulse model card or talk to our team.
Company name
Pocket (Open Vision Engineering Inc.)
Industry
AI Hardware / Productivity Tech
Company size
SMB
Products used
Speech to text (Pulse)
About the company
Pocket, built by Open Vision Engineering, is a wearable capture device. It records conversations, calls, and meetings, then processes the full audio file in batch into a clean transcript and summary. Therapists, realtors, sales reps, and founders use it for hands-free capture across 16 Western markets. With 86,700+ customers on the platform, Pocket records first and transcribes after. That choice puts all the weight on one thing: the quality of the final transcript.
