Artificial intelligence is quickly advancing in the field of video generation. That could have a profound effect on our social media feeds one day.
AI’s creative abilities are outstripping its driving skills. While self-driving car technology is going nowhere, there’s been a remarkable explosion in research around generative models, or artificial intelligence systems that can create images from simple text. In just the past week, AI researchers from Meta Platforms Inc. and Alphabet Inc.’s Google have taken an extraordinary leap forward, developing systems that can generate videos with just about any text prompt one can imagine.
The videos from Facebook-parent Meta look like trippy dream sequences, showing a teddy bear painting flowers or a horse with distended legs galloping over a field. They last about one or two seconds and have a glitchy quality that betrays their source, but they’re still remarkable. The videos generated by Google, of coffee being poured into a cup or a flight over a snowy mountain, look especially realistic.
Google has also built an even more impressive second system called Phenaki that can create longer videos, lasting two minutes or more. Here’s an example of the prompt Google used for one:
“Lots of traffic in futuristic city. An alien spaceship arrives to the futuristic city. The camera gets inside the alien spaceship. The camera moves forward until showing an astronaut in the blue room. The astronaut is typing in the keyboard. The camera moves away from the astronaut. The astronaut leaves the keyboard and walks to the left…”
That’s less than a third of the entire prompt, which reads almost like a movie script with commands such as “camera zooms in.” And here’s the resulting clip, posted on Twitter by Dumitru Erhan, one of Phenaki’s creators at Google Brain: