Beyond academia, computer labs, and localization teams, language AI is making its way into the everyday lives of ordinary people — those without a clear connection to the language industry.
Language AI, defined in the Slator Pro Guide on Language AI for Consumers, is an umbrella term covering any AI-enabled conversion or generation of speech or text. This can include machine translation (MT), automated captions, AI dubbing, speech translation, and AI text generation, just to name a few.
What is particularly striking about language AI’s evolution is that its now widespread use is often seen on both ends of the production cycle. Content creators, for instance, can use language AI tools to make their content more accessible, but end-users, viewers, or listeners often do the same on their end. Talk about leveling the playing field!
Access a Website in Any Language
One of the most straightforward examples of widely accepted language AI is the little pop-up window that appears on webpages in an “unfamiliar” language (according to the computer’s or browser’s settings).
Depending on the website, this low-stakes content is considered a prime target for language AI, since even a rough translation means an improvement over a previously inaccessible roadblock.
Major brands operating in multiple markets have long offered a selection of translated and localized versions of their websites, but the AI-powered development is a web browser extension or plugin that can detect content the user may want translated.
These tools run web copy through an MT model and present the translated output in its original location on the webpage. Performance, in terms of quality and the breadth of languages covered, depends on the product, with Google Translate and Microsoft Translator typically supporting the highest numbers of languages.
Transcription for Podcasts, Interviews, Vlogs, and More
AI-enabled transcription is now one of the most ubiquitous examples of language AI for the people (if not by the people).
The multilingual capabilities of automated transcription depend on the tools or models used and the audio quality, but generally rely on a workflow that includes automatic speech recognition, speech-to-text APIs, and translation.
Two factors have simultaneously encouraged the widespread adoption of AI transcription, particularly for the plethora of podcasts, interviews, and vlogs flooding people’s feeds: an increased focus on accessibility and a shift in media consumption.
The bottom line: Many viewers/listeners today opt to skim a transcript (instead of watching a video) or prefer to watch a video with captions or subtitles. Some platforms have caught on to these trends and often present videos to viewers with captions or subtitles displayed as a default setting.
Turn Written Scripts Into Audio with Voice AI
Conversely — and perhaps counterintuitively — there is a simultaneous increase in demand for audio content, easily reflected in the exponential growth in podcasts over the past five years. This can also be seen in the many online articles that offer readers a chance to listen to, instead of read, the text.
But some content creators are using AI tools to allow themselves to regularly skip the recording studio, thanks to text-to-speech tools. A major recent advancement is voice cloning, which starts with a user recording a relatively brief sample of their speaking voice. Voice cloning tools can then produce a synthetic voice that mimics the gender, age, cadence, accent, emotion, and other traits of the original speaker.
Throwing MT into the mix means that the speaker can now appear to communicate in languages they do not actually speak. The appeal, of course, is that a multilingual voice clone will bring a creator’s content to more markets and, consequently, more fans.
Slator Pro Guide: Language AI for Consumers
This 16-page guide explores how consumers are using AI to generate, translate, edit, and dub speech and text in multiple languages.
Dub a Short-Form Video
This is truly the TikTok generation’s moment, and dubbing for short-form videos epitomizes the unique combination of technological know-how and creativity creators can employ from their own home studios.
Standard AI dubbing tools consist of a speech-to-text component, a translation tool, and a voice-generation function; some include a lip-sync component for matching lip movements to new sounds. Here, too, improvements in voice cloning have propelled AI-enabled dubbing from fantasy to non-fiction.
YouTube, one of the most recognizable video platforms on the planet, has designed its own endless scroll of short videos, possibly in response to TikTok’s runaway success. Now, YouTube is reportedly also testing AI-powered dubbing for several languages with hundreds of creators.
Interpret a Conversation with Hand-Held Devices
No discussion of AI-enabled anything would be complete without the obligatory reference to the holy grail of the Babelfish, regularly invoked as a point of praise for new translation gadgets.
While the high price point may prevent many would-be buyers from owning one today, each standalone device has had its moment in the spotlight, including Pocketalk and iFLYTEK. A workaround for the time being is to use an app from Microsoft, Windows, or Apple on a standalone device.
And users have already been blown away by these inventions, such as the OpenAI translation tool that enables speakers (of Italian and English, in the demo) to communicate without a common language. Quality depends partly on the language pair as well as other factors, and while none are perfect — it sure seems as though they are getting pretty close.
Obtain a copy of Slator’s Pro Guide: Language AI for Consumers for a concise and easy-to-read guide to Language AI technology and its adoption, featuring 10 key use cases.