Sign up here to receive our free Work Tech newsletter in your inbox.
Among the most broadly useful everyday applications of artificial intelligence is audio transcription powered by ever-improving neural networks.
AI transcription gained new utility during the pandemic, as videoconference meetings could be easily recorded and transcripts shared with colleagues who couldn’t join. As hybrid work has emerged as a long-term trend, automated transcription has grown with it: the tools are dramatically better than just three years ago, the cost lower, and the utility proven. As companies have attempted to reduce the number of people in meetings and number of meetings overall, AI transcripts eliminate FOMOOM: fear of missing out on meetings.
We tested several commercially available AI-based transcription tools and found Fireflies, Rev Max, and Sonix all provide highly accurate transcription. Sonix and Rev Max were slightly better at spelling people’s names, while Fireflies is by far the least expensive for many dozens of hours of a combination of meeting and uploaded transcription each month.
The degree of accuracy offered by these three exceeds the level needed for routine business purposes. They’re cheap and easy enough to use that they’re a legitimate option for generating searchable transcripts of everyday audio like your team’s meetings and brainstorming sessions. They also remove the cognitive load and expense of having a dedicated notes-taker or post-event summarizer.
For those who need verbatim transcripts for business, legal, journalistic, and other purposes, our three picks offer a quality high enough to require just a small amount of effort to move from nearly accurate to an exact record. Our picks also include tools that allow for interactive transcript clean-up, making it easier to check the underlying audio for any word or phrase while reviewing or improving the results. If you need precision verbatim results, budgeting time for clean-up can still be far more affordable than commissioning transcripts using people. (Services including Rev let you access human transcription for a premium charge.)
Live captions available in major videoconferencing tools—as well as post-event transcription—increase accessibility across an enterprise for those who are deaf, with hearing impairments, or different modes of information uptake, such as ADHD. Only one of our picks, Rev Max, provided such captions, and only in Zoom, as an alternative to built-in Zoom captioning.
Tied for best in class with Fireflies on pure transcription, Sonix focuses on producing speaker-accurate records of meetings. However, being the best costs more, with Sonix charging for every minute used at one of two rates depending on plan.
- Extremely high accuracy.
- Correct spelling of many proper names without dictionary.
- Almost perfect recognition of distinct speakers.
- Fixed hourly cost makes it quite expensive for heavy monthly use.
- Suggests substantial videoconference integrations but lacks documentation.
- Audio player didn’t work in Safari, only Chrome.
With excellent transcription and speaker recognition, Rev.com’s automated service is only a degree or two below its peers. For those spending endless hours in Zoom, however, the unlimited transcription for Zoom meetings can be a substantial price advantage over Sonix. (If you need Rev’s quality for under two hours a month, use Temi: it’s identical in every way except a flat $0.25/minute for use.)
- Very high accuracy.
- Excellent speaker recognition.
- Integration with Zoom for live captions or post-meeting processing.
- Somewhat expensive per minute above 20 hours per month for non-Zoom uses.
- Some audio features failed in Safari, worked in Chrome.
The best save Sonix for transcription, super speaker identification, and even an almost-worthwhile AI meeting summary put Fireflies at almost the tip-top of the list for quality. The service also includes unlimited meeting transcription and 8,000 minutes of uploaded transcription in its $18/month tier.
- Extremely high accuracy.
- Superb differentiation of speakers.
- Substantially less expensive than all other services for significant monthly usage of a mix of uploaded and meeting transcription.
- Barely a con, Fireflies’ proper name recognition was slightly worse than peers.
- Requires Google Calendar or Outlook Calendar integration to even set up an account, with no seeming way to bypass.
Of several companies focused entirely on AI-based transcription or that offer transcription alongside manual transcription or other services, we opted to examine six closely: Fireflies, MeetGeek, Otter.ai, Rev Max, Sonix, and Trint. (For our winnowing rationale, see “How we chose what to review,” below.) We also considered Temi, owned by and identical to Rev Max except in pricing.
Each service’s cost and limitations were picked apart, along with the range of services they offer and integration with videoconferencing services such as Meet, Teams, WebX, and Zoom.
For our test audio, we used the Turing Test of spoken words: a podcast of eight rapidly speaking participants with overlapping comments recorded across three continents. While everyone spoke in English, the accents included American (from three parts of the county), New Zealander, and Canadian (Alberta), as well as non-native speakers from Germany and Sweden. Two speakers, from Wisconsin and Alberta, have voices that are difficult to differentiate purely in audio.
The three services we picked were able to not only distinguish among all speakers, including the two with the most similar-sounding voices, but also correctly transcribe speech no matter the accent. This included correctly spelling certain non-English words.
What to watch for: With AI technology among major firms in a headlong race, expect to see further innovations and options. We’d like to see better summaries (see below). Also keep an eye on Whisper from OpenAI: the company released the high-quality transcription engine under free, open-source model that’s already been incorporated into simpler products.
AI-based voice recognition aims to provide a close-enough rendering of speech to be useful in real-time usage or in later review for reading, searching, and summarizing. No machine-learning offering promises 100% accuracy. Most that provide an estimate tend to assert they fall in the 90%-95% range, which appears accurate in our testing. The services we liked best are at the high end of that range.
Each of our picks offers the following, at a minimum:
- Speaker identification: Transcripts track multiple speakers and label them distinctly, allowing later modification. Our three top picks were excellent at this.
- Meeting integration: Having a meeting record roll automatically into transcription with no additional effort is a plus. Our three top picks provide this, though Fireflies and Sonix are best integrated with Zoom, and Rev Max only works directly with Zoom.
- Transcript editing and annotation: All recommended services offer an editing interface for improving a transcript. Rev Max and Sonix provide better annotation options than Fireflies.
- Export into multiple formats: Getting a transcript out of a system is typically easy across all systems we examined. Our top three picks offer Word, PDF, and one or more standard timecode-based video captioning formats.
Many also offer AI-based summaries or keyword analysis, making a high-level look at a transcript available at a glance and searchable for key information. However, these features are still in their infancy and vary too much in quality and usefulness across transcripts and services to make them a criterion yet. (We liked Fireflies summary, but even it included some laughable conclusions.) That could change quickly.
We also picked apart specific features that could be important to your decision among our three picks, or other services (noted later) that have minor to significant shortcomings but meet other needs:
Zoom integration: All allow audio uploads and integrate with Zoom. Fireflies and Sonix have post-meeting automated processing for other popular videoconferencing systems, while Rev Max offers live captions for Zoom as an alternative to Zoom’s built-in captioning service.
Languages: Some services offer English only, with US or UK accents, while others offer a broader range, up to recognition among 30 or more languages. All three of our picks recognized an impressive number of native and non-native accents. However, if you need non-English transcription, Sonix or Fireflies would be the right starting points.
Editing: Sonix and Rev Max stand out somewhat above Fireflies for better transcript editing and annotation tools. All three provide a range of export options, including at least one popular option for video captioning.
Price: We provide a comparison later in the article for 20 hours and 60 hours of usage per month across the six services we examined, but of our top three, our evaluation is that:
- Fireflies is the cheapest in all scenarios per minute for both low-volume and high-volume monthly transcription. It offers unlimited meeting transcription and 8,000 minutes (133.3 hours) of uploaded audio at its lowest-paid tier.
- Sonix costs from seven to 10 times more per minute than Fireflies as the company charges a fixed hourly rate for all usage: $10 an hour for pay-as-you-go and $5 per hour with a $22 per month ($100 per year) subscription.
- Rev Max has the highest per-minute cost for uploaded files after 20 hours at $0.25 per minute. But that is heavily mitigated with the inclusion of unlimited Zoom meeting transcription as part of its $29.95 per month single-tier service.
You’ll need to check your usage needs and sign up for free trials of each service to determine which fits best.
Among all our top picks and nearly all other services we tested, the cost can be exceptionally low, both in ratio of value to utility and compared to transcription by people. Manual transcripts cost $0.75 to $2.50 per minute, depending on the level of accuracy required, turnaround time, and the amount of industry-specific jargon in the source material. Services that offer live captioning by people during a meeting charge around $150 to $180 per hour ($2.50 to $3 per hour), with minimums.
Automated transcription is instantaneous for live events, though the quality may be poorer compared to on-demand or post-meeting processing. Offline transcription may require from seconds to over a minute for each minute of source audio: a one-hour meeting could be ready from 10 to 60 minutes later, depending on the service’s promises and capabilities. We only tested services that allow direct uploads in addition to meeting integration options.
If you need an app that you can use for mobile recording and transcript viewing that’s tied directly into transcription, Fireflies and Sonix lack smartphone options. Rev offers one, both under its name and the separately branded flat-rate Temi service, but these apps for iPhone and Android provide only recording and service-request features. They don’t offer integrated transcript viewing and editing support.
If a mobile app that displays transcripts is critical for your needs, consider Trint. The company offers full-featured iOS and Android apps for recording, with transcription nearly at the quality of Sonix and Fireflies. The app syncs transcriptions with the central web app, allowing on-the-go viewing. Trint scored poorly on speaker identification in our test.
Trint’s pricing is perplexing, though. Its Starter plan is $60 per month ($576 per year) for just seven files per month. Jump slightly to the Advanced plan at $75 per month ($720 per year), and transcription moves to unlimited.
Pricing deep dive
Because of variations in service tiers, price lists don’t provide easy apples-to-apples comparisons among transcription providers. Some services required an email or additional research in their support documents to assemble the full cost.
If you expect to use 20 hours (1,200 minutes) of transcription per month, the lowest subscription price or pay-as-you-go price is as follows on a per-minute basis using month-at-a-time pricing (tax is excluded):
- Otter.ai, Fireflies, MeetGeek: $0.015/minute with Pro plans at $16.99 (20 hours), $18 (unlimited meetings/up to 133 hours of uploads), and $19 (20 hours)
- Rev Max: $0.025/minute with standard plan at $29.99 (20 hours)
- Trint: $0.06/minute with advanced plan at $75 per month (unlimited)
- Sonix: $0.10/minute or $122 total, with Premium plan at $22 per month base plus $5 per hour for all usage
- Temi: $0.25/minute or $300 total, on flat-rate basis
At a higher level of usage, costs become more starkly different. For an account that uses 60 hours (3,600 minutes) of transcription per month—say, one person responsible for managing a meeting transcription workflow via their account—the per-minute pricing looks like this:
- MeetGeek: $0.015/minute or $51, with Business tier (40 hours and $0.60 per hour afterward)
- Fireflies: $0.008/minute with Business plan at $29 (unlimited meetings and uploads)
- Otter.ai: $0.008/minute with Business plan at $30 (100 hours)
- Trint: $0.02/minute with Advanced plan at $75 per month (unlimited)
- Sonix: $0.09/minute or $222 total, with Premium plan at $22 per month base plus $5/hour
- Temi: $0.25/minute or $900 total
Rev Max has to be broken out separately for fairness due to its inclusion of unlimited Zoom transcription and limited uploaded file transcription. For 60 hours in three scenarios:
- Rev Max, Zoom at 40 hours or more: $0.008/minute with standard plan at $29.99 (20 hours) if 40 or more hours of transcription are via Zoom
- Rev Max, Zoom at 30 hours, uploads at 30 hours: $0.05/minute or $179.99 ($29.99 plus $150)
- Rev Max, 60 hours of uploads only: $0.17/minute or $629.99 ($29.99 plus $600)
Yearly discounts can reduce the plan portion of the cost by 10% to 20%. Per-seat and enterprise licenses can reduce the cost. Some services pool minutes across licenses in business plans as well.
If your purposes require far above 60 hours of use and few user licenses for your company, MeetGeek, Fireflies, Otter.ai, and Trint remain at or below the prices noted above per hour. Rev Max and Sonix would be out of scope. Temi makes sense only for relatively modest on-demand purposes.
Fireflies and Trint claim to offer “unlimited” transcription. Fireflies offers no elaboration. Trint qualifies this offer by noting, “Our fair use cap is set so that you’ll almost never hit it, and we’ll tell you if you’re getting close.”
How we chose what to review
After creating a comprehensive list of automated transcription services, we chose to look closely at Fireflies, MeetGeek, Otter.ai, Rev Max, Sonix, and Trint, as they had the right combination of integration with videoconferencing services.
MeetGeek’s meeting focus renders it currently unable to change speaker assignments after transcription, though its transcription quality is high. Otter.ai was a pioneer in AI-based transcription, yet its current transcription quality and speaker recognition were the poorest we tested.
Speak AI and Speechtext.ai offer no integration with meeting software and were left out. Chorus and Gong include forms of AI-based transcription as part of suites of customer-interaction management tools and can’t be evaluated separately.
Several services had a very narrow focus and didn’t fit our business rubric: Alice (investigative journalists), Beey (professional video subtitles), and scribe.com (medical documentation and telehealth). TranscribeMe calls out that it is not ideal for lower-quality audio, common with videoconferences and other ad hoc recording. Verbit.ai offers no standard monthly pricing plans; it only provides custom usage quotations.
Glenn Fleishman has reported on technology since the 1990s as a freelancer variously for the New York Times, the Economist, Wired, Fast Company, and many others. He’s a senior editor at Macworld. From 2019 to 2022, he created 100 Tiny Type Museums.
Sign up here to receive our free Work Tech newsletter in your inbox.