The algorithm is called Speech2Face and was part of a research paper first published in 2019. A demo is available online if youβre curious to check it out for yourself.
Faces seem to be more accurately recreated with longer audio clips, which shouldnβt come as much of a surprise. The code was created using millions of videos from YouTube, with the software modelled by learning βaudio-visual and voice-face correlationsβ from a wide-range of samples.
Itβs still a work in progress, of course, so it isnβt completely on point every time. The potential for a system that registers voices and identifies individuals quickly could be huge, particularly within legal systems and surveillance companies.
Researches behind the tech are adamant that it is only for scientific purposes, but we already know that larger companies β like Facebook, Google, Amazon, and a bunch more β are already very interested in advanced Metaverse programmes, Web 3.0, and harvesting user data. An ability to identify anyone quickly like this could be devastating in the wrong hands.
DIY Photography also points out that software like this could put the identities of influencers at risk, especially those who keep their faces hidden. TikTokers or YouTubers that make a deliberate effort to mask their identity could be discovered through audio snippets of their voices, from any clip theyβve ever posted.
Still, thatβs likely far off in the future, as the algorithm is privative at present. It seems weβll have to accept a future where AI and deepfake technology blurs the line between real and artificial, with misinformation likely to remain rampant and harder to stamp out.
Detecting identities through brief voice clips is simply another step along an inevitable path. Letβs just hope things donβt spiral out of control.