A fake video of Obama talking went viral several years ago. Image / File
The magic of CGI was once the preserve of Hollywood. Now anyone can digitally manipulate videos to make people believe something that isn't true. Danny Fortson reports on a growing problem.
Hany Farid knows Barack Obama better than anyone — except, perhaps, the former president's wife. He has dissected everytic, every subtle sharpening of his crow's feet, his gaze towards the floor that inevitably comes before he delivers bad news. In all, Farid has catalogued 576 hours of footage of Obama speaking — equal to 3½ weeks of video — sliced it into 10-second chunks and studied it as if it were the Zapruder footage of JFK's assassination.
Farid is not a stalker. The professor at the University of California, Berkeley, is one of the world's foremost digital forensics specialists, and his deep dive into Obama's face is part of a frantic effort to combat a menace as devastating as anything since the dawn of the internet: deepfakes, or videos altered using artificial intelligence (AI), which show people doing and saying things they haven't.
Since the technology emerged in a dark corner of the web two years ago, it has advanced in leaps and bounds. What was once the preserve of Hollywood studios is now available to anyone with a decent graphics processor and internet connection.
"Anything can be faked," laments Farid, sitting in his spartan office inside the School of Information, a grand brick building in the shadow of Berkeley's famed bell tower. His work on Obama, a common target, led him to invent a new detection method. But it is an uphill battle. "This technology is something you can now just download from GitHub [a popular online community that hosts free open-source software]. It will run for a few hours or a day, and it will just generate it for you automatically. Eventually it's going to be something you can just download in an app," Farid says. "That's where we're headed."
In China, one already exists. Zao, which was released earlier this year, transplants a user's face to surprising effect onto actors in famous films: Leonardo DiCaprio in Titanic, for example.
Deepfakes are most commonly employed, however, to place the faces of A-list female stars such as Gal Gadot, Emma Watson and Ariana Grande onto hardcore pornography. According to research from Deeptrace Labs, a Dutch cyber-security company, about 96% of all deepfake videos are pornographic. But the potential uses reach beyond the seedy side of the web and are already leaking into politics, society and even business.
In the past few months, videos have appeared online to showcase the technology's power. Boris Johnson was seen to endorse Jeremy Corbyn in the election; the US comedian Bill Hader morphed into Tom Cruise on a late-night talk show and Mark Zuckerberg confessed his grand plan to manipulate the masses. "The more you express yourself," said the Facebook billionaire in a doctored video from June, "the more we own you."
Faked videos themselves are only part of the problem. The growing ease with which they are made dents our faith in a much deeper way. Andrew Gully, a technical research manager at Jigsaw, the division of Google tasked with fighting the darkest elements of the internet, explains: "The most dangerous, insidious thing about deepfakes is not necessarily whether people will believe them or not, it's the fact that they hack everybody's understanding of what is true and what is not."
The result is what many in the field refer to as the "liar's dividend": anyone can plausibly claim that any recording — a video that would otherwise end a political campaign or career, for example — is fake. "Our historical belief that video and audio are reliable records of reality is no longer tenable," writes Giorgio Patrini, the founder of Deeptrace Labs.
And so a quiet arms race is unfolding. On one side is a rapidly growing community of hobbyists, developers, fraudsters and political opponents who recognise the power of deepfakes to swing an election, embarrass a lover or manipulate financial markets. On the other side are artists, academics, Big Tech platforms and civil society organisations that are racing to come up with tools — and laws — to counter the problem.
Dan Coats, then the US director of national intelligence, warned this year that the technology will surely be deployed in the misinformation wars, "to augment influence campaigns directed against the United States and our allies and partners". It is hard to imagine that the American war machine will not be availing itself of such a powerful cyber-weapon. Indeed, it has already tried.
Misinformation is not new, nor is using the tools of the day to disseminate it. The newspaper baron William Randolph Hearst reportedly told an artist working for his New York Journal back in 1897: "You furnish the pictures and I'll furnish the war." In the 1990s, Photoshop made it easy to doctor pictures. Russian bot armies helped sow division via fake social media accounts in the 2016 US presidential election.
Farid argues, however, that the deepfake phenomenon is different. "Video was the last bastion of trust," he says. "Ten years ago, if in a court of law you presented me with a three-minute video of somebody doing something, I'd say it was nearly impossible to fake. I can't say that any more. How do you have democracy if you don't have shared facts?" What makes it even more dangerous, however, is social media, which provides a distribution platform that is unique in human history. "We've got 1bn uploads to Facebook a day, 500 hours of video uploaded to YouTube every minute, a couple of billion tweets a day," Farid says. "The landscape is very, very complicated."
Adam Schiff, the Democratic congressman who led the Trump impeachment hearings, recently warned developers of deepfake technology: "This is a Pandora's box you're opening."
It all began on Reddit. On November 2, 2017, a user with the handle "deepfakes" created a forum on the social media and discussion group website and uploaded pornographic videos with the digitally transplanted faces of female celebrities such as Scarlett Johansson. Reddit took down the forum three months later, but Pandora's box had been cracked.
Cottage industry
Today, deepfake development is a cottage industry. According to Deeptrace Labs, more than 100,000 people spread across 20 known deepfake forums are toiling away, day in, day out, uploading code libraries for free, allowing others to tweak, improve and share algorithms that are, at an astounding speed, getting better at automatically crafting faked videos. Portals have cropped up to offer custom deepfakes for as little as $3. All they require is 250 pictures of the chosen person and two days of processing. A customised voice-cloning service sells synthetic audio for $10 for every 50 words. We're all vulnerable, explains Britt Paris, an assistant professor of informatics at Rutgers University, New Jersey. "With thousands of images of many of us online, in the cloud and on our devices, anyone with a public social media profile is fair game to be faked."
In practical terms, it is almost always women who are targeted. "That the new technology's first notable amateur output depicts one woman engaged in a sex act without her consent highlights how the use of technology can be wielded as harassment or harm individuals," Paris says. There are a growing number of cases where "regular people" are being targeted. The Indian investigative journalist Rana Ayyub, for example, was the victim of a brutal online smear campaign that saw her appear in a faked pornographic video.
Even to one of its founding fathers, the speed with which deepfake technology is being democratised is surprising. Hao Li, a professor at the University of Southern California who has been dubbed the world's "top deepfake artist" by the MIT Technology Review, was drawn into this world by tragedy.
On November 30, 2013, Paul Walker, the square-jawed star of the Fast & Furious film franchise, was driving at nearly 100mph when he ploughed into an electricity pylon in Santa Clarita, California. His Porsche Carrera GT nearly sheared in two and burst into flames. He and his passenger died, and the Hollywood producers had a problem.
Shooting for the latest instalment of the film series, Furious 7, was only half complete. So they called Li, who had begun to make a name for himself for his ability to digitise faces. "It was a cool project," Li recalls. "There were already a lot of people making movies with realistic digital humans, but the thing that was really new was Walker was young, people knew what he looked like and the settings weren't dark. It was always pretty bright." Li, who wears his bleached hair in a mohawk that cascades down the side of his head, had the added challenge of creating a digital Paul Walker that would be convincing to audiences watching the film on 90ft cinema screens.
In short, he had little room for error. So Li assembled a team and went about developing an on-set facial-tracking system to composite Walker's face onto the body of one of his two brothers or a stuntman acting out his part. He created new ways to get algorithms to do what previously would be painstakingly carried out by animators.
It worked. Li's digital confection was in more than 200 scenes, and no one was the wiser. Furious 7 grossed US$1.5bn. "You need a project like that to show it is possible. You needed a budget like that," he says.
Creating a walking, talking digital likeness of Walker took months and millions of dollars. The Star Wars franchise has since been able to regenerate the characters played by Peter Cushing and Carrie Fisher. Today, many of the core technologies that Li developed have been refined and are now out in the wild, available for free download.
Big Tech is belatedly scrambling to rein it all in. In September, Google Research and the Google division Jigsaw released to the public hundreds of deepfake videos featuring paid actors that the companies themselves had produced. The goal, says Jigsaw's Gully, was to provide researchers with material on which to train detection tools. "We have a pretty strong geopolitical nexus to a lot of the work we do," Gully says. "With deepfakes in particular, the potential malicious use cases became clear."
Dessa, a Toronto-based AI start-up, has created one of the world's best synthetic voice systems. Its computer-generated Sir David Attenborough can wax lyrical about the Serengeti, or tell you to wash the dishes. But what really got Dessa noticed was fakejoerogan.com, a website that plays algorithmically manufactured audio clips that are indistinguishable from the bombastic podcast host Joe Rogan.
Dessa pulled it off by training its RealTalk speech generator with hours of Rogan's shows. In a statement, Dessa co-founder Ragavan Thurairatnam says it was an attempt to "get the public to take notice of just how realistic deepfakes were becoming, and to understand the tremendous implications at hand for society as a result". He adds: "The unfortunate reality is that, at some time in the near future, deepfake audio and video will be weaponised."
Thurairatnam speaks from experience. One of Dessa's biggest fans is the CIA, which asked to buy the technology. What it wanted to do with it was not clear. Dessa politely turned the US spy agency down.
Washington is not just viewing it as a weapon; it is putting up defence as well. In 2016, Defense Advanced Research Project Agency (Darpa), the Pentagon's research arm, launched Media Forensics, or MediFor, to fund researchers working on ways to catch manipulated imagery. The initiative predated deepfakes, but the technology is now its primary focus.
Facebook is also tackling the problem, announcing in September the US$10m Deepfake Detection Challenge in partnership with Oxford University, MIT, UC Berkeley and other esteemed academic
institutions. Under the scheme, it too will create a dataset of faked videos, offering cash prizes for participants who come up with the best ways of exposing them. Farid applauds the efforts, but says such schemes point to the gravity of the situation. "What do you do when you can't solve a problem? You create a dataset, and have someone else solve it," he says.
Indeed, despite efforts around the world, the ability to produce deepfakes is evolving far faster than our ability to uncover them. Nick Dufour of Google Research likens it to the Red Queen hypothesis in biology, which explains how species stave offextinction. "When you have two species interacting in the same environment, they're always co-evolving. One is driving the other, and we're seeing a similar kind of thing here. Detection methods that worked well last year don't work that well any more."
There is a reason for this. Take "face swaps", a common form of deepfake. Most are produced using generative adversarial networks, or gans. These systems contain two competing neural networks, known as the generator and the discriminator.
The generator network's job is to produce a convincing image for every frame of footage — replacing Jennifer Lawrence's face with Steve Buscemi's, for example. Meanwhile the discriminator network compares the faked footage with real video and hunts for flaws, or "artefacts" as they are known in the field. The systems work in a tight loop, whirring away and learning from each other until the artefacts are smoothed out and the system produces a passable fake.
The problem is that the primary method of exposing doctored content relies on discriminator tools that are at the heart of how many deepfakes are generated — and they are self-improving. "If you have this detector, you can basically train another one to fool it," Li says. "At some point it has to converge to something where you get every pixel right, and it's practically impossible to detect."
We are not at perfection yet, and that is reason for hope, says Gully. Watch a typical deepfake and you can tell that something is off. Perhaps the image is blurry around the edges. Or, as is often the case, the person's mouth moves in ways that don't align with the words they are saying. (The lips not touching when someone says a word starting with "B", "P" or "M" is a telltale sign.) In those rough edges lies opportunity, which is why so much time and money is being poured in now, before the process becomes too advanced — and truly automatic. "There tends to be a bit of hyperbole out there that someone would be able to create a large amount of deepfakes at scale and with ease," Gully says. "There's still a tremendous amount of tuning and tweaking that one really has to do in order to create something that's ultra-realistic."
The Pentagon is not complacent. In recognition that it is falling behind, Darpa this year launched a follow-on effort to MediFor called Semantic Forensics, or SemaFor. In its solicitation for bids, it admits that "purely statistical detection methods are quickly becoming insufficient for detecting falsified media assets".
The goal of this new initiative is to create a class of tools that look at deepfakes more holistically. Rather than focusing on minute "artefacts", they might step back and check for subtle inconsistencies that are common in deepfakes, such as mismatched earrings. The underlying conceit is that a producer of a deepfake would need to get every detail right, while only a single mistake would need to be detected to expose it. In military speak, those failures "provide an opportunity for defenders to gain an asymmetric advantage".
This digital deception arms race brings to mind the inspiration for the Red Queen theory: Lewis Carroll's Through the Looking Glass. Seeking to explain this strange new place, Looking-Glass Land, the Red Queen tells Alice: "Now, here, you see, it takes all the running you can do, to keep in the same place."
The political worrry
Jeremy Corbyn had had enough.
After a bruising campaign, the Labour leader finally decided to throw in the towel. In a video address last month, he urged all party members to back Boris Johnson, "a prime minister that works for the many and not for the few".
The video, published weeks before the December 12 election, was created by Barnaby Francis, who works under the pseudonym Bill Posters. The British video artist is behind some of the best-known deepfakes, including a viral Zuckerberg clip this summer in which the Facebook founder refers to his "control of billions of people's stolen data".
Francis says the Corbyn stunt had a clear goal: to expose how ill-suited Britain's "Victorian" regulations are for a society in which AI-generated videos threaten to catapult us into a post-truth world. "We hacked the personas of UK politicians to make a visceral point — that these new forms of computational propaganda need to be understood and restricted," he says.
In the tech giants' long war against regulation, deepfakes may prove to be their Waterloo. The internet has been built on two pillars. The first is a purist techno-utopianism, which holds that if you can build something, even if it is a frighteningly powerful tool, you should, because technology is inherently good. The second is a clause in the 1996 Communications Decency Act, called section 230, which has allowed the techno-utopians to pursue their wildest ambitions without any worry that they will be held legally responsible if it all goes pear-shaped. Specifically, section 230 absolves tech companies of any liability for content posted on their sites by others. It is the foundation upon which Facebook, Twitter, Reddit, Google and YouTube have built their empires.
When section 230 was passed, fewer than 10% of American or British adults even had access to the internet. As the web morphed into a utility at the centre of billions of people's lives, Big Tech clung to section 230 for dear life, because the alternative, being liable for what 2bn individuals might post on your site, was too scary to countenance.
That hands-off approach to policing their own domains has become increasingly problematic, leading to bizarre episodes such as the one that unfolded on Facebook this summer. A video of Nancy Pelosi, the Democratic speaker of the House of Representatives, purported to show her slurring her words, apparently drunk. The footage was a so-called "shallow fake" because it was not digitally altered, simply slowed down. YouTube removed it.
Facebook refused to take it down, however, because it somehow did not violate terms of service that expressly prohibit posting "misleading, discriminatory or fraudulent" content. Zuckerberg tied himself into knots in October, admitting that he was worried about an "erosion of truth" while defending the company's decision to let politicians lie in political ads on the platform.
What happens when we can all fabricate video with a few swipes on an easy-to-use app? When deepfakes are generated in their millions? Farid, who testified before Congress this summer, is hopeful that peeling back section 230's blanket immunity has become one of those rare issues where both parties agree.
"If we don't change the way we think about social media's responsibility with respect to news, political campaigns, civil unrest, online revenge porn and so on, this is all for nothing," he says. "The margins are really thin. Most nations are very divided. If you release something 48 hours before an election, it doesn't matter if we figure out it's fake — it's going to work."