Menu Menu

Google’s Lumiere generates realistic AI videos from text prompts

AI video is fast turning from uncanny valley to genuinely realistic, and Google’s Lumiere is the most sophisticated text-to-video generator we’ve seen to date.

Evoking a sense of awe – and a hefty dose of unease – Google recently exhibited how sophisticated AI video has become in just a few years of development.

In the same way that text-to-image generators like Bing Image Creator, DALL-E, and Midjourney can create original images from a single-line prompt, Google’s ‘Lumiere’ application can turn our wildest ideas into fully rendered five second videos.

Other examples of text-to-video generators are already available, granted, but Google’s attempt is the first to really nail an accurate portrayal of movement to a near CGI standard.

It achieves this by establishing a base frame and using its highly touted STUNet (Space-Time-U-Net) technology to autonomously establish where are how items in the image should move. Once selected, objects within that initial frame then comprise several layers of their own that flow into each other seamlessly.

Lumiere is able to generate 80 frames per image compared to the previous maximum of 25 achieved by its closest competitor Stable Video Diffusion. Though several early results released by Google have a touch of artificiality about them, the leap in overall quality since its 2022 demo is staggering.

Beyond text-to-video, there is also image-to-video generation which will bring a still picture to life, stylised generation, which can create videos in a specific visual style, and a cinemograph setting able to animate a specific portion of an existing image – like flowing water, a flickering fire, or smoke from a train engine, for instance.

In terms of market strategy, the late arrival of Lumiere falls in line with Google’s fashionably late policy. Since the early iteration of its generative language tool Bard flopped last year, the tech giant has quietly developed its multimodal vision for generative AI in the background.

Its latest announcement closely follows a showcase for Google’s Gemini language model, which is tipped to make a late challenge for ChatGPT’s crown as the benchmark for the sector.

Looking beyond the commercial buzz for video AI, it would be remiss to ignore the technology’s potential for misuse as it becomes harder to distinguish fictional works from real life content.

The ongoing debacle involving sexually explicit depictions of Taylor Swift and her likeness using text-to-image apps could be just the tip of the iceberg if text-to-video takes off on a similar scale.

Google assures that it is creating safeguards to ensure fair use of Lumiere, but the paper’s authors haven’t ratified exactly how incidents will be prevented. We’re keen to get our hands on the technology, but not if it will open a larger can of worms.