The most charming video that’s come across my newsfeed in recent days is of a fluffy cat wearing a pirate hat and riding around someone’s lounge on a robotic vacuum cleaner. The video has the familiar, slightly shaky look of a candid clip recorded on someone’s smartphone.
Only, the cat and the vacuum cleaner aren’t real, and neither is the house. The 12-second clip was created by Sora, the new text-to-video generator driven by artificial intelligence from OpenAI, the company behind the ChatGPT AI chatbot.
Sora uses a so-called diffusion AI model which works by examining a vast number of videos and learning to identify the objects and actions in them. It can then assemble completely new videos by responding to text prompts. Sora understands what the user has asked for in the prompt, as well as how those things exist in the physical world.
For instance, the prompt for that cat video was: “An adorable kitten pirate riding a robot vacuum around the house.” That’s exactly what Sora delivered.
Text-to-video AI tools are not new, but Sora is winning rave reviews for the incredibly realistic and smooth short video clips it can create. It isn’t limited to Finding Nemo-type animated videos but real-life characters, settings and objects.
It really dawned on me as I scrolled through these early Sora clips that our perception of what is real and what is manufactured will be upended in the next couple of years as AI video is fine-tuned.
If, like me, you’ve been shooting your own movie script in your head for years, Sora could eventually let you bring it to life with the finesse of Sir Peter Jackson, minus the massive visual-effects budget. Its video clips are currently limited to 60 seconds, given the intensive computing power required to generate them. But video is just a collection of still images and Sora will be able to piece feature films together given access to enough processing power.
Sora is still a bit glitchy. In one video of a New York street, a yellow taxi disappears behind a pedestrian and re-emerges painted grey. But soon you’ll simply be able to prompt Sora to “fix the taxi cabs in the background”.
It is only in limited release while OpenAI “red team” the application, trying to figure out its bugs and how it could be misused.
Although Sora is set to revolutionise DIY film-making, its use as a tool for spreading misinformation is also very real. “This is the reason we are not deploying the system yet,” Mira Murati, OpenAI’s chief technology officer, told the Wall Street Journal.
Then there’s the contentious issue of who owns the images Sora is trained on and whether content owners have the right to opt out or be paid for licensing their work. Murati said Sora’s videos are based on “publicly available data and licensed data”, but was vague on the details. “I’m not sure; I’m not confident about it,” she told the WSJ when asked if YouTube or Facebook videos are used.
The European Union’s new AI Act stipulates that deep-fake images and videos created by programs like Sora must be clearly labelled as manipulated content. Watermarking videos that are AI generated will help identify Sora’s work, but in the world of social media where videos go viral in seconds, will anyone pause to check whether the clip they are sharing is real or a figment of Sora’s imagination?