OpenAI’s latest text-to-video tool raises troubling questions

By Peter Griffin

New Zealand Listener·

8 Apr, 2024 05:30 PM3 mins to read

Subscribe to listen

Access to Herald Premium articles require a Premium subscription. Subscribe now to listen.

Already a subscriber? Sign in here

Listening to articles is free for open-access content—explore other articles or learn more about text-to-speech.

‌

Save

Share this article

Reminder, this is a Premium article and requires a subscription to read.

Peter Griffin: "Although Sora is set to revolutionise DIY film-making, its use as a tool for spreading misinformation is also very real." Photo / Getty Images

The most charming video that’s come across my newsfeed in recent days is of a fluffy cat wearing a pirate hat and riding around someone’s lounge on a robotic vacuum cleaner. The video has the familiar, slightly shaky look of a candid clip recorded on someone’s smartphone.

Only, the cat

Sora uses a so-called diffusion AI model which works by examining a vast number of videos and learning to identify the objects and actions in them. It can then assemble completely new videos by responding to text prompts. Sora understands what the user has asked for in the prompt, as well as how those things exist in the physical world.

For instance, the prompt for that cat video was: “An adorable kitten pirate riding a robot vacuum around the house.” That’s exactly what Sora delivered.

Text-to-video AI tools are not new, but Sora is winning rave reviews for the incredibly realistic and smooth short video clips it can create. It isn’t limited to Finding Nemo-type animated videos but real-life characters, settings and objects.

Vague on the details: OpenAI’s Chief Technology Officer Mira Murati.

It really dawned on me as I scrolled through these early Sora clips that our perception of what is real and what is manufactured will be upended in the next couple of years as AI video is fine-tuned.

If, like me, you’ve been shooting your own movie script in your head for years, Sora could eventually let you bring it to life with the finesse of Sir Peter Jackson, minus the massive visual-effects budget. Its video clips are currently limited to 60 seconds, given the intensive computing power required to generate them. But video is just a collection of still images and Sora will be able to piece feature films together given access to enough processing power.

Sora is still a bit glitchy. In one video of a New York street, a yellow taxi disappears behind a pedestrian and re-emerges painted grey. But soon you’ll simply be able to prompt Sora to “fix the taxi cabs in the background”.

Advertise with NZME.

It is only in limited release while OpenAI “red team” the application, trying to figure out its bugs and how it could be misused.

Although Sora is set to revolutionise DIY film-making, its use as a tool for spreading misinformation is also very real. “This is the reason we are not deploying the system yet,” Mira Murati, OpenAI’s chief technology officer, told the Wall Street Journal.

Discover more

Then there’s the contentious issue of who owns the images Sora is trained on and whether content owners have the right to opt out or be paid for licensing their work. Murati said Sora’s videos are based on “publicly available data and licensed data”, but was vague on the details. “I’m not sure; I’m not confident about it,” she told the WSJ when asked if YouTube or Facebook videos are used.

The European Union’s new AI Act stipulates that deep-fake images and videos created by programs like Sora must be clearly labelled as manipulated content. Watermarking videos that are AI generated will help identify Sora’s work, but in the world of social media where videos go viral in seconds, will anyone pause to check whether the clip they are sharing is real or a figment of Sora’s imagination?

Save

Share this article

Reminder, this is a Premium article and requires a subscription to read.

Advertise with NZME.

OpenAI’s latest text-to-video tool raises troubling questions

Latest from The Listener

Brainspotting: Danny Boyle’s great return to the zombie apocalypse

Midlife nightlife crisis: Finding late-evening fun for over-40s in our cities

No, I Don’t Get Danger Money by Lisette Reymer

Jane Clifton: Éire odyssey

How to speak AI