Many types of professional activities require writing down texts. Journalists are constantly working with them. Doctors should fill out medical records on a daily basis and keep patient appointment cards. During trials, it is necessary to keep a detailed record of everything that happens (in general, every word has a meaning). Marketing specialists are constantly working on concepts to increase sales and promotional offers. The introduction of automatic text extraction technology from audio has greatly simplified human work in many industries.
Online Voice-to-text Conversion Tools

These are the most affordable and inexpensive services that allow you to transcribe an audio recording into text on the go. The user dictates information into a microphone, and the neural network immediately translates it into a written format. However, online services are still far from perfect. They have learned how to translate notes into text quickly, but they make many mistakes. The quality of recognition strongly depends on many side factors, such as external noise, accent, or poor diction of the speaker.
Not all programs can edit, so in some cases the text looks like a solid record without paragraphs or punctuation marks. It will need to be edited manually, but it will still take much less time than if the user had to record from the original source himself, repeatedly scrolling through fragments of the audio recording.
Name | + | – |
Google Docs | It only transcribes | Speech should be loud and clear. External noise greatly reduces the quality |
Speech to Text BOT | Recognizes speech dictated into the microphone. Knows how to place punctuation marks and capital letters.Understands several dozen languages. There are options for editing, copying, downloading | Does not work with audio and video recordings |
Speechpad | Notepad for voice input. Integrates with Windows, MAC, Linux. Knows 15 languages. The recognition accuracy is high. There are tools to protect against external noise. Sets timestamps | Does not punctuate |
Summarize.Tech | Designed to work with YouTube videos. Makes a short description from the video (in several paragraphs). Sets timestamps | Knows only English. Does not transcribe verbatim |
Yandex SpeechKit | Recognizes short dictations up to 60 seconds long. | It does not edit, format, or punctuate |
Applications for converting speech to text on mobile devices
These applications are not designed to decode long recordings, only short ones – thoughts, ideas. They can be compared to an ambulance: when there is no ordinary working equipment nearby, but you urgently need to send something to someone or write it down so as not to forget, they will help. The programs allow the user to record a short (dictated) message and send it to any email address.
Name | What does |
Google Keep | Converts to dictation. Writes down the finished text, which can be edited and sent by email or on social media. It can be combined with all gadgets of the same account. |
Dictation for iOS | It can convert long records. He knows 40 languages and can immediately translate recognized speech into the desired one. Edits, allows you to copy, and send by email or on social media. Syncs with all devices of the same account via the cloud |
Speechnotes for Android | Converts dictations to text with high recognition accuracy. It has a built-in keyboard that can be used in parallel with dictation. There are editing tools. The finished text is saved, copied, and distributed. The main functions of the application are available for free, but for a symbolic $ 1.5 per month, you can add the option to create keyboard shortcuts, as well as insert frequently used words or phrases of speech. |
For automatic audio and video transcription
Automatic speech recognition (Speech-to-Text) technology helps to capture and translate large audio data into text: conferences, meetings, meetings. She also helps doctors, lawyers, journalists, and teachers in their work, and facilitates students’ studies.
Unfortunately, most programs require special conditions to work well, which degrades the quality of transcription. For example, many services do not recognize dialogues well against the background of external noise, they do not understand the speaker well if he has an accent or poor diction. These disadvantages have yet to be overcome by the developers. But even in this form, the programs make the work much easier.
Name | What he can do | Disadvantages |
FollowUp | Records and recognizes conversations of any length with 98% accuracy. Records agreements, tasks, responsible persons, and deadlines. Forms a sammari with 100% preservation of meaning. Sends sammari to the participants | |
Speech2Text | It transcribes well. He knows 20 languages. Generates subtitles. Sets timestamps. It works with files and links. Distinguishes between the voices of several speakers | There is no version for mobile devices. |
Speechlogger | The recognition quality is above 85%. Creates subtitles. Transcribes audio files. Supports various formats, including .mp3, .mp4, .aac, .m4a, .wav, .mpeg. Sets time stamps and punctuation marks | |
Teamlogs | Recognition quality – 95%; High-speed source code processing. Sammari forms. Creates legal reports. Knows how to edit and format. Sets timestamps | Understands only Russian and English. Saves text only in XLSX, SRT, DOCX |
RealSpeaker | Transcribes source materials lasting no more than 180 minutes. Allows you to work with files by uploading them to the cloud in the user’s folder. Allows you to edit text without leaving the program interface. He speaks 38 languages. Create subtitles | Does not decode dictated speech. He does not recognize Russian well. Low level of privacy of the service |
Manual transcription of audio and video recordings
There are many situations where you can’t trust neural networks to transcribe, and recognition has to be done manually. For example, if negotiations are supposed to be strictly confidential. You also have to work with recordings in manual mode if their quality is too low and the machine “doesn’t hear” anything. In such a situation, only a human can cope, because he understands the context perfectly and can make a sentence, even if some words are not audible at all.
But professional transcribers also use the help of artificial intelligence, for example, the Zapisano service. It is effective for translating text from audio if the user has high typing speed and absolute literacy. A person listens and writes, and the neural network simultaneously cleans the text, removing slang, parasitic words, repetitions, reservations. The service also allows you to translate into other languages and create subtitles.