With the development of machine learning technologies, many services have appeared that facilitate the translation of audio and video recordings into text format. Some work for free, while others are paid and provide the user with more tools for their work. Neural networks for decoding audio and video content into text can already do a lot: they distinguish languages, can edit and format transcripts, and some of them are trained to understand the specific speech of doctors or lawyers. Many people recognize speech with an accuracy of 99%, and the speed of transcription is ten times higher than human capabilities.
Despite the fact that the technology is still far from perfect, its implementation has greatly facilitated the work of people whose professions require constant processing of large volumes of records.
This article contains descriptions of neural network assistants that work both on a paid and free basis.
Free of charge
Name | What he can do | Disadvantages |
Speechlogger | Converts voice to text with an accuracy of 84 to 100%, depending on the audio quality.; generates subtitles; transcribes audio files; Supports many formats, including .mp3, .mp4, .aac, .m4a, wav, .mpeg; places punctuation marks and timestamps | |
Speechpad | Automatically converts dictated speech; edits using the built-in tool; transcribes video content from YouTube; It can work with audio from other browser tabs.; allows you to make adjustments quickly; integrates with Windows, macOS, Linux | It does not recognize speech well in noisy conditions |
Speechnotes.co | Decrypts dictated text with 90% accuracy%; inserts capital letters, punctuation marks, and highlights paragraphs using voice commands.; Supports all file types; sets timestamps; Makes up Sammari; saves the text in the browser, from where it can be printed or sent to a PC or Google Drive. | |
Speech to Text | You can type text by dictation into the microphone.; multilingual; It has a built-in editor that allows you to do simple editing and formatting.; It can export files in DOC and TXT formats. | |
Sammarize.tech | Makes sammari from videos of any content in whole or in blocks | Speech is processed in Russian, but the extracts are printed in English. |
Dictation | Designed to create letters, documents, and emails without the need for printing; it works as a speech converter on the website; Supports 100 languages; places signs using voice commands; The finished text can be edited, saved on a PC, or sent by e-mail. | It does not support working with ready-made files; the conversion quality is low |
Paid with the free version
Name | What he can do | Disadvantages | Free use | Cost |
Speech2Text | Integrates with the API; Registration is not required; recognizes the voices of multiple speakers; Supports 20 languages; high quality and speed of recognition; It works with files of various formats, including rare ones.; It finds the necessary content through YouTube links, and you can also specify another online hosting service.; creates subtitles; It has a player with timecodes; The paid version allows you to work in a team, as well as simultaneously launch conversion through 6 channels. | There is no mobile version | 15 min./day | 450 rubles/month for 6 hours;17600 rubles – unlimited |
Salut Speech | Supports microphone dictation option; transcribes uploaded files; records and transcribes lectures, meetings; It can filter out noise.; places punctuation marks; generates subtitles; Available on Telegram | 100 min./month for individuals | for 1200 rubles/year, an additional 1000 minutes are available to individuals.; for legal entities, the base rate is 1 kopeck/min. | |
FollowUp | Transcribes the conversation; records tasks, deadlines, responsible persons, and agreements; compiles and distributes sammari; The transcription accuracy is 98%; The quality of sammarisation is 100% of the stored information | 100 minutes | 3 rubles/min. when buying up to 10 hours; 2.5 rubles/min. – from 10 to 70 hours; 2 rubles/min. – 70-140 hours; 1.5 rubles/min. – from 140 hours | |
Yandex SpeechKit | A technology based on the Alice voice assistant, adapted to work in call centers; recognizes speech in real time; converts files up to 240 minutes in length to text; recognizes 10 languages | can’t edit or format | From 267 rubles/month. for renting a virtual machine; from 824 rubles. — for a cluster with a managed database | |
Teamlogs | Supports 7 audio and 6 video formats.; Recognition accuracy – 95%; distinguishes between the speech of several speakers; edits and formats the transcript; answers questions about decryption; makes a squeeze of facts; highlights keywords; can formulate legal reports | High demands on the purity of recording and clarity of voice; understands only Russian and English; Recognized text can only be downloaded in three formats – XLSX, SRT, and DOCX. | 15 min. | 7 rubles/min, but 6 rubles/min when buying more than 5000 minutes |
RealSpeaker | Transcribes audio and video materials up to 180 minutes long; allows you to work with files by uploading them to the cloud in the user’s folder.; allows you to edit text without leaving the program interface.; Supports 38 languages; creates subtitles | Can’t decipher speech dictated into a microphone; low quality transcription in Russian; low level of privacy (24 hours all uploaded files are publicly available) | 1.5 minutes | 7 rubles/min. |
Wonder Scribe | Converts audio files; the length of the files and their number are unlimited; Transcription accuracy – 85%; works with MP3, MP4, WAV, FLAC, AVI files | Knows only Russian | 10 minutes | 300 rubles/hour. |
Otter Al | Decrypts online meetings (created for this purpose); Connects directly to Google Meets, as well as Zoom; recognizes speech from multiple speakers; exports text to TXT, DOCX, PDF, and SRT (subtitles); It works through apps for iOS, Android, Slack, and the Chrome extension. | Does not know Russian | basic package for 300 minutes/month; 30 minutes of recording at a time | tariff PRO – 10$/month; Buses – 20$/month; Enterprise – calculated individually |
REV.AI | Supports 58 languages; transcribes in real time in 9 languages; defines the dominant language; identifies key topics in the text (English); The decoding accuracy is 95%; It recognizes names, addresses, and phone numbers well.; adheres to the rules of spelling and punctuation; Sammari makes (English); communicates with the user through context-sensitive translation in 11 languages; export in multiple formats; sets timestamps | $8 to the account when registering for recognition | 0.02$/min. | |
Happy Scribe | Converts audio and video clips online; transcribes recordings; creates subtitles; exports transcription results to any formats.; There are no restrictions on the volume and number of files.; | There is a free tariff for transcription and generation of subtitles. | Pricing plans: Basic – 10$/month for 120 minutes+export; Pro 17$/month – for 300 min + export and support; Business – $ 29/month for 10 hours, collaboration of three users | |
Al Transcription | Decrypts audio and video with 99 accuracy%; Supports 100 languages; you can record inside the platform.; There is a mobile app; It is possible to broadcast video calls with 720p picture quality and 44.1 kHz sound for free and without restrictions.; Paid services include improved broadcast quality, real-time calls, a video recorder option, and unlimited transcription. | There is a free tariff | Pricing plans: Standard – $ 19/month; Professional – $29/month; Business – individual payment | |
Transcribe | Transcribes lectures, podcasts, interviews, phone conversations; generates subtitles for YouTube, Facebook, and Vimeo channels; exports text in DOC and TXT formats.; You can upload files or dictate text.; 80 languages | Trial version | Manual – $ 20/year; Automatic – $20/year + $6/hour |
Paid
Name | What he can do | Cost |
Whisper | Automatic language detection (out of 100); high recognition speed; splitting the text into paragraphs; places punctuation marks; synced with GitHub; you can export the finished text. | 36 cents/hour |
Al Speech | Transcribes lectures, conferences, and interviews with high accuracy and speed; works with mp3, mp4, wav, flv, avi formats | 3 rubles/min. |
TranscribeMe | The decryption accuracy is 99%; adheres to grammatical rules; suppresses non-verbal noises, which improves the quality of the source code; exports the decryption result to TXT, Word, HTML, PDF and SRT | 0,07$/min. |
Deep Scribe | Designed to work in the field of medicine; transcribes speech on medical topics; More than 50 options allow doctors to personalize notes, as well as conduct patient appointments with parallel recording of what is happening. | The cost is determined after registration |
Conclusion
Some neural networks can work with voice input and files, but most are still trained in only one of the techniques. The services can be used for real–time transcription or for converting prerecorded content such as negotiations, meetings, and lectures. To decrypt long-running video files, it is recommended to choose platforms that have no record length restrictions.
Many services are trained in punctuation, know the rules of spelling, are able to highlight paragraphs and write capital letters. But no matter how advanced the neural network is for decoding audio and video recordings, it is still impossible to use them without editing. Errors still occur, which is most often due to unclear recording, too noisy background, poor diction of the speaker.