A viral promo video featuring AI-enabled dubbing, lip-synching, and machine translation exemplifies how fast language AI products leveraging GPT-4 and a host of other generative AI multimodal technologies are proliferating.
Throughout the interaction the dubbed dialogue and intonation match the mouth movements and facial expressions of the two men fairly well, demonstrating that consumer-grade machine dubbing has come a long way from the times when target language expansion, contraction, and tonality made it obvious that a video was dubbed.
Prady Modukuru, the developer, disclosed part of the technology stack he used to make the video in a Twitter post, including GPT-4 for translation, ElevenLabs for voice training, text-to-speech, and Wav2lip-2. The latter is Synchronicity Labs’ closed-beta offer, which the company claims enables synching of any video to any audio in any language, without training.
GPT-4 itself was launched only in March 2023, and now hundreds of startups (including those listed in Slator’s list of Language AI 50 Under 50 companies) are attempting to capitalize on the various use cases provided by multimodal, generative language AI.
Modukuru and other developers transitioned Wav2lip-2 from open experimentation in GitHub to a viable product. It appears the “generative lip-synching” has been in the works since about 2020, as a demo video of the previous iteration shows. In the 2020 video, a fictional Marvel character Tony Stark speaks the same phrase in multiple languages with almost flawless lip-synching.
I built the simplest way to convert your video to other languages.
all you need is a youtube link. no install required.
— Prady (@therealprady) July 16, 2023
Over the next one or two years it will become clearer which language AI products are barely “a thin layer on top of ChatGPT”, as Afore Capital’s Gaurav Jain put it, and which ones offer a technological edge that is here to stay and sustainable in the marketplace.