Just ‘Imagen’ it: Google AI tech will create HD video from your text prompt
x

Just ‘Imagen’ it: Google AI tech will create HD video from your text prompt


Within a week of Meta launching Make-A-Video, Google has unveiled an artificial intelligence (AI)-based video generator of its own. The Google text-to-video AI technology is called Imagen Video and it can produce high-definition (HD) videos from a text prompt.

How Imagen Video works

All you need to do is give Google Imagen Video a text description of what you want, such as “A teddy bear running in New York City.” The system will first generate a 16-frame video at three frames per second (FPS) with a resolution of 24×48 pixels. The system then “predicts” additional frames. The final product is a 128-frame, 24-FPS HD video (at 720p or 1280×768).

Imagen Video is an extension of Google’s text-to-image system, Imagen, launched in May. In the recently-published paper “Imagen Video: High definition video generation with diffusion models,” Google has claimed that Imagen Video has a “high degree of control” and world knowledge. It can create videos and text animations in different artistic styles, coupled with 3D understanding, text rendering, and animation.

Not yet perfect

However, the Google AI technology has a lot of room for improvement. The video clips are shaky and fuzzy, and the edges of images are blurry, just like Make-A-Video. But, with the way AI-based tech is progressing, it may only be a matter of months before you can create crisp HD videos just with a text prompt.

To take things forward, the Imagen Video team plans to work with researchers at Phenaki, another text-to-video system from Google. This AI system can turn long, detailed prompts into two-minute videos. However, the result is of low quality.

Watch: AI-powered technology for offside calls at 2022 FIFA World Cup

There are security concerns, too. Imagen Video may be used to produce violent or sexually explicit clips. Google has decided not to release Imagen Video “until these issues are resolved.”

A new trend

AI-based text-to-image technology, such as DALL-E, MidJourney, and Stable Diffusion, took the digital world by storm over the last year. Even the text-to-video conversion system is not new. Earlier this year, researchers from Tsinghua University and the Beijing Academy of Artificial Intelligence released CogVideo. It is a program that can convert text into high-quality short clips. But Imagen Video takes it a notch higher, with the ability to animate captions.

(With agency inputs)

Read More
Next Story