Among the most broadly useful everyday applications of artificial intelligence is audio transcription powered by ever- improving neural networks.
We tested several commercially available AI-based transcription tools and found Fireflies, Rev Max, and Sonix all provide highly accurate transcription. Sonix and Rev Max were slightly better at spelling people’s names, while Fireflies is by far the least expensive for many dozens of hours of a combination of meeting and uploaded transcription each month.
The degree of accuracy offered by these three exceeds the level needed for routine business purposes. They’re cheap and easy enough to use that they’re a legitimate option for generating searchable transcripts of everyday audio like your team’s meetings and brainstorming sessions. They also remove the cognitive load and expense of having a dedicated notes-taker or post-event summarizer.
For those who need verbatim transcripts for business, legal, journalistic, and other purposes, our three picks offer a quality high enough to require just a small amount of effort to move from nearly accurate to an exact record. Our picks also include tools that allow for interactive transcript clean-up, making it easier to check the underlying audio for any word or phrase while reviewing or improving the results. If you need precision verbatim results, budgeting time for clean-up can still be far more affordable than commissioning transcripts using people. (Services including Rev let you access human transcription for a premium charge.)
AI transcription gained new utility during the pandemic, as videoconference meetings could be easily recorded and transcripts shared with colleagues who couldn’t join. As hybrid work has emerged as a long-term trend, automated transcription has grown with it: the tools are dramatically better than just three years ago, the cost lower, and the utility proven. As companies have attempted to reduce the number of people in meetings and number of meetings overall, AI transcripts eliminate FOMOOM: fear of missing out on meetings.
Live captions available in major videoconferencing tools—as well as post-event transcription—increase accessibility across an enterprise for those who are deaf, with hearing impairments, or different modes of information uptake, such as ADHD. Only one of our picks, Rev Max, provided such captions, and only in Zoom, as an alternative to built-in Zoom captioning.
Tied for best in class with Fireflies on pure transcription, Sonix focuses on producing speaker-accurate records of meetings. However, being the best costs more, with Sonix charging for every minute used at one of two rates depending on plan.
- Extremely high accuracy
- Correct spelling of many proper names without dictionary
- Almost perfect recognition of distinct speakers
- Fixed hourly cost makes it quite expensive for heavy monthly use
- Suggests substantial videoconference integrations but lacks documentation
- Audio player didn't work in Safari, only Chrome
With excellent transcription and speaker recognition, Rev.com's automated service is only a degree of two below its peers. For those spending endless hours in Zoom, however, the unlimited transcription for Zoom meetings can be a substantial price advantage over Sonix. (If you need Rev's quality for under two hours amonth, use Temi: it's identical in every way except a flat $0.25/minute for use.)
- Very high accuracy
- Excellent speaker recognition
- Integration with Zoom for live captions or post-meeting processing
- Somewhat expensive per minute above 20 hours per month for non-Zoom uses
- Some audio features failed in Safari, worked in Chrome
The best save Sonix for transcription, super speaker identification, and even an almost-worthwhile AI meeting summary put Fireflies at almost the tip-top of the list for quality. The service also includes unlimited meeting transcription and 8,000 minutes of uploaded transcription in its $18/month tier.
- Extremely high accuracy
- Superb differentiation of speaker
- Substantially less expensive than all other services for significant monthly usage of a mix of uploaded and meeting transcription
- Barely a con, Fireflies’ proper name recognition was slightly worse than peers.
- Requires Google Calendar or Outlook Calendar integration to even set up an account, with no seeming way to bypass
AI-based voice recognition aims to provide a close-enough rendering of speech to be useful in real-time usage or in later review for reading, searching, and summarizing. No machine- learning offering promises 100% accuracy. Most that provide an estimate tend to assert they fall in the 90%-95% range, which appears accurate in our testing. The services we liked best are at the high end of that range.
Each of our picks offers the following, at a minimum:
- Speaker identification: Transcripts track multiple speakers and label them distinctly, allowing later modification. Our three top picks were excellent at this.
- Meeting integration: Having a meeting record roll automatically into transcription with no additional effort is a plus. Our three top picks provide this, though Fireflies and Sonix are best integrated with Zoom, and Rev Max only works directly with Zoom.
- Transcript editing and annotation: All recommended services offer an editing interface for improving a transcript. Rev Max and Sonix provide better annotation options than Fireflies.
- Export into multiple formats: Getting a transcript out of a system is typically easy across all systems we examined. Our top three picks offer Word, PDF, and one or more standard timecode-based video captioning formats.
Many also offer AI-based summaries or keyword analysis, making a high-level look at a transcript available at a glance and searchable for key information. However, these features are still in their infancy and vary too much in quality and usefulness across transcripts and services to make them a criterion yet. (We liked Fireflies summary, but even it included some laughable conclusions.) That could change quickly.
We also picked apart specific features that could be important to your decision among our three picks, or other services (noted later) that have minor to significant shortcomings but meet other needs:
Zoom integration: All allow audio uploads and integrate with Zoom. Fireflies and Sonix have post-meeting automated processing for other popular videoconferencing systems, while Rev Max offers live captions for Zoom as an alternative to Zoom’s built-in captioning service.
Languages: Some services offer English only, with US or UK accents, while others offer a broader range, up to recognition among 30 or more languages. All three of our picks recognized an impressive number of native and non-native accents. However, if you need non-English transcription, Sonix or Fireflies would be the right starting points.
Editing: Sonix and Rev Max stand out somewhat above Fireflies for better transcript editing and annotation tools. All three provide a range of export options, including at least one popular option for video captioning.
Price: We provide a comparison later in the article for 20 hours and 60 hours of usage per month across the six services we examined, but of our top three, our evaluation is that:
- Fireflies is the cheapest in all scenarios per minute for both low-volume and high-volume monthly transcription. It offers unlimited meeting transcription and 8,000 minutes (133.3 hours) of uploaded audio at its lowest-paid tier.
- Sonix costs from seven to 10 times more per minute than Fireflies as the company changes a fixed hourly rate for all usage: $10 an hour for pay-as-you-go and $5 per hour with a $22 per month ($100 per year) subscription.
- Rev Max has the highest per-minute cost for uploaded files after 20 hours at $0.25 per minute. But that is heavily mitigated with the inclusion of unlimited Zoom meeting transcription as part of its $29.95 per month single-tier service.
You’ll need to check your usage needs and sign up for free trials of each service to determine which fits best.
Among all our top picks and nearly all other services we tested, the cost can be exceptionally low, both in ratio of value to utility and compared to transcription by people. Manual transcripts cost $0.75 to $2.50 per minute, depending on the level of accuracy required, turnaround time, and the amount of industry-specific jargon in the source material. Services that offer live captioning by people during a meeting charge around $150 to $180 per hour ($2.50 to $3 per hour), with minimums.
Automated transcription is instantaneous for live events, though the quality may be poorer compared to on-demand or post- meeting processing. Offline transcription may require from seconds to over a minute for each minute of source audio: a one-hour meeting could be ready from 10 to 60 minutes later, depending on the service’s promises and capabilities. We only tested services that allow direct uploads in addition to meeting integration options.
Of several companies focused entirely on AI-based transcription or that offer transcription alongside manual transcription or other services, we opted to examine six closely: Fireflies, MeetGeek, Otter.ai, Rev Max, Sonix, and Trint. (For our winnowing rationale, see “How we chose what to review,” below.) We also considered Temi, owned by and identical to Rev Max except in pricing.
Each service’s cost and limitations were picked apart, along with the range of services they offer and integration with videoconferencing services such as Meet, Teams, WebX, and Zoom