'We are outgunned': Top AI researchers race to detect 'deepfake' videos

By Drew Harwell

Washington Post·

12 Jun, 2019 09:38 PM16 mins to read

US President Donald Trump. Deep fake videos could have an impact on the 2020 US presidential election. Photo / AP

Top artificial-intelligence researchers across the US are racing to defuse an extraordinary political weapon: computer-generated fake videos that could undermine candidates and mislead voters during the 2020 presidential campaign.

And they have a message: We're not ready.

The researchers have designed automatic systems that can analyse videos for the telltale indicators of a fake, assessing light, shadows, blinking patterns - and, in one potentially groundbreaking method, even how a candidate's real-world facial movements relate to each other, like the angle they tilt their head when they smile.

But for all that progress, the researchers say they remain vastly overwhelmed by a technology they fear could herald a damaging new wave of disinformation campaigns, much in the same way fake news stories and deceptive Facebook groups were deployed to influence public opinion during the 2016 race.

Powerful new AI software has effectively democratised the creation of convincing "deepfake" videos, making it easier than ever to fabricate someone appearing to say or do something they didn't really do, from harmless satires and film tweaks to targeted harassment and deep fake porn.

And researchers fear it's only a matter of time before the videos are deployed for maximum damage - to sow confusion, fuel doubt or undermine an opponent, potentially on the eve of a White House vote.

"We are outgunned," said Hany Farid, a computer-science professor and digital-forensics expert at the University of California, Berkeley. "The number of people working on the video-synthesis side, as opposed to the detector side, is 100 to 1."

These AI-generated videos have yet to drive their own political scandal in the US. But even simple tweaks to existing videos can create turmoil, as happened with the recent viral spread of a video of House Speaker Nancy Pelosi, distorted to make her speech stunted and slurred. That video was viewed more than three million times.

Deepfakes have already made their appearance elsewhere: In central Africa last year, a video of Gabon's long-unseen President Ali Bongo, who was believed in poor health or already dead, was decried as a deepfake by his political opponents and cited as the trigger, a week later, for an unsuccessful coup by the Gabonese military.

And in Malaysia, a viral clip of a man's seeming confession to having sex with a local cabinet minister is being questioned as a potential deepfake. He "does not look like this. . . . His body isn't as built as in the video," a local politician said, according to the Malay Mail newspaper in Kuala Lumpur.

The threat of deepfakes, named for the "deep learning" AI techniques used to create them, has become a personal one on Capitol Hill, where lawmakers believe the videos could threaten national security, the voting process - and, potentially, their reputations. The House Permanent Select Committee on Intelligence will hold a hearing tomorrow in which AI experts are expected to discuss how deepfakes could evade detection and leave an "enduring psychological impact."

"People can duplicate me speaking and saying anything . . . and it's a complete fabrication," former President Barack Obama told an audience in Canada last month. "The marketplace of ideas that is the basis of our democratic practice has difficulty working if we don't have some common baseline of what's true and what's not."

Rachel Thomas, the co-founder of Fast.ai, a machine-learning lab in San Francisco, says a disinformation campaign using deepfake videos would likely catch fire due to the reward structure of the modern Web, in which shocking material drives bigger audiences - and can spread further and faster than the truth.

"Fakes often, particularly now, don't have to be that compelling to still have an impact," Thomas said. "We are these social creatures that end up going with the crowd into seeing what the other people are seeing. It would not be that hard for a bad actor to have that kind of influence on public conversation."

No law regulates deepfakes, though some legal and technical experts have recommended adapting current laws covering libel, defamation, identity fraud, or impersonating a government official. But concerns of overregulation abound: The dividing line between a First-Amendment-protected parody and deepfake political propaganda may not always be clear-cut.

New from me:

Congress will talk about cracking down on deepfakes tomorrow, maybe even with legislation.

But what happens when a deepfake is art?https://t.co/WkbdrQBaht
— Ben Collins (@oneunderscore__) June 12, 2019

And some worry that the potential hype or hysteria of fake videos could even erode how people accept video evidence - especially when, as Gregory said, "there's still plenty of truth out there." Misinformation researcher Aviv Ovadya calls this problem "reality apathy": "It's too much effort to figure out what's real and what's not, so you're more willing to just go with whatever your previous affiliations are."

It might already be leaving an impact. In a Pew Research study released this month, about two-thirds of Americans surveyed said altered videos and images had become a major problem for understanding the basic facts of current events. More than a third said "made-up news" had led them to reduce the amount of news they get overall.

There also are fears that deepfakes could lead to people denying legitimate videos - a phenomenon the law professors Robert Chesney and Danielle Citron call "the liar's dividend." President Donald Trump, for instance, has told people the Access Hollywood video, in which he boasted of assaulting women, was doctored. (After the real audio was first revealed by the Washington Post in October 2016, Trump apologised for the remarks.)

Officials with the Democratic and Republican parties and the nation's top presidential campaigns say they can do little in advance to prepare for the damage, and are counting on social networks and video sites to find and remove the worst fakes. But the tech companies have differing policies on takedowns, and most don't require that uploaded videos must be true.

The technology is progressing rapidly. AI researchers at the Skolkovo Institute of Science and Technology in Moscow last month unveiled a "few-shot" AI system that could create a convincing fake of someone with only a few still photos of their face. The lead researcher, Egor Zakharov, said he could not discuss it due to ongoing peer review, but in a statement the team said that the "net effect" of making video special-effects technologies more widely available "has been positive . . . (and) we believe that the case of neural avatar technology will be no different."

A deepfake video of Mark Zuckerberg tested Facebook's video policies.

This is how convincing deepfakes can really be. pic.twitter.com/M0jnRCrghq
— VICE News (@VICENews) June 11, 2019

Another group of AI researchers, including from Stanford and Princeton universities, just debuted a separate system that can edit what someone appears to be saying on video, just by changing some text, with the AI swapping around the person's voiced syllables and mouth movement to leave only a seamlessly altered "talking head."

The lead researcher, Ohad Fried, said the technology could be used to enhance low-budget filmmaking and help localise videos to international languages and audiences. But he also said it could be abused to falsify video or "slander prominent individuals." Video made using the tool, he said, should be presented as synthetic. But he said regulators, tech companies and journalists should play a more leading role in researching how to unmask.

"In general people do need to understand that video may not be an accurate representation of what happened," he said.

Deepfake video is just one part of how AI is revolutionising disinformation. New natural-language AI systems like GPT-2, by the research lab OpenAI, can feed on written text and spit out many more paragraphs in a similar tone, theme and style - a boon, perhaps, to spam chatbots and "fake news" creators, even if the underlying ideas sometimes trend toward gibberish.

The technique has already been used to automatically parrot political leaders' speaking style after "learning" from hours of United Nations speeches. To counteract it, researchers at the University of Washington and the Allen Institute for Artificial Intelligence last month unveiled a fake-text-detector system, called Grover, that could potentially expose what it calls machine-generated "neural fake news."

Convincing fake audio is also on the horizon, including from Facebook AI researchers, who have replicated a person's voice using computer-generated speech that sounds deceivingly lifelike. The system, MelNet, learned its impersonations by listening to hundreds of hours of TED Talks and audiobooks; in samples, the system can make Bill Gates, Jane Goodall and others say sentences like "A cramp is no small danger on a swim."

AI deepfakes are now as simple as typing whatever you want your subject to say https://t.co/maR2l57nar pic.twitter.com/c6EJgUcaUQ
— The Verge (@verge) June 10, 2019

In AI circles, identifying fake-media has long received less attention, funding, and institutional backing than creating it: Why sniff out other people's fantasy creations when you can design your own? "There's no money to be made out of detecting these things," Memon said.

Much of the funding for researching ways of detecting deepfakes comes from the Defense Advanced Research Projects Agency, the Pentagon's high-tech research arm, which in 2016 launched a "Media Forensics" program that sponsored more than a dozen academic and corporate groups pursuing high-level research. Matt Turek, a computer-vision expert who leads the DARPA programme, called synthetic-media detection a "defensive technology" against not just foreign adversaries but domestic political antagonists and Internet trolls.

"Nation-states have had the ability to manipulate media since, essentially, the beginning of media," Turek said. But a strong enough fake-spotting system would make it so groups with more limited resources would face "enough computational burden to make it not worth the risk."

The trick for unravelling a deepfake, researchers said, is building a tool that works in what cryptography circles call a "trustless environment," where authoritative details of the video's creator, origin and distribution can be impossible to trace. And speed is critical: With every minute that an investigator spends debunking video, a clip can expand that much further across the Web.

Forensic researchers have homed in on a range of subtle indicators that could serve as giveaways, such as the shape of light and shadows, the angles and blurring of facial features, or the softness and weight of clothing and hair. But in some cases, a trained video editor can go through the fake to smooth out possible errors, making it that much harder to assess.

With one new method, researchers at the universities of California, Berkeley, and Southern California, built a detective AI system that they fed hours of video of high-level leaders and trained it to look for hyper-precise "facial action units" - data points of their facial movements, tics and expressions, including when they raise their upper lip and how their heads rotate when they frown.

"Deepfakes are not yet pervasive, but the US government is concerned that foreign adversaries could use them in attempts to interfere with the 2020 election." https://t.co/Ud3TaVc00o
— Donie O'Sullivan (@donie) June 12, 2019

To test these "soft biometric" models, Farid and his team worked with a team of digital-avatar designers to create some deepfakes of their own, swapping the faces of Elizabeth Warren, Hillary Clinton and Donald Trump onto their own impersonators on Saturday Night Live. The system has scored high in accuracy on gauging a number of different kinds of fakes: videos of a satirical human impersonator; "face-swap" fakes, popular in social-media apps; "lip-sync" fakes, in which the real face remains but the mouth is substituted; and "puppet-master" fakes, in which a target's face is placed onto an actor's body.

The research, titled "Protecting World Leaders Against Deep Fakes," was partially developed with funding from Google, Microsoft and DARPA. It will be revealed alongside other techniques next week in California at the Conference on Computer Vision and Pattern Recognition, a landmark annual summit sponsored by the biggest names in American and Chinese AI.

Sam Gregory, a programme director at Witness, a human-rights group that helps train amateur journalists around the world to record abuse, said the world's social media platforms need to unify around a "shared immune system" designed to find and stop viral fakes. Scanning top politicians' faces using Farid's method, Gregory said, would offer protection to high-level leaders, but not to local politicians, journalists or other people who could be vulnerable to attack.

Farid wants media outlets to have access to the deepfake-detector tool so they can assess news-making video when it arises. But making the system more widely available carries its own threat, by potentially allowing deepfake creators to examine the code and find workarounds. This cat-and-mouse game is a long-running frustration for forensic researchers, ensuring that even a promising detection method is only of temporary use.

Siwei Lyu, director of a computer-vision lab at the State University of New York at Albany, helped pioneer research last year that found many deepfakes had a telltale clue: a lack of blinking. It was an investigative victory - until two weeks later, when Lyu received an email from a deepfake creator who said they had solved the problem in their latest fakes.

More than just a technical hurdle, Lyu believes media manipulation can have a broader psychological effect, by subtly shifting people's understandings of politicians, events and ideas.

#Deepfakes are true threats. So is a deepfake president. https://t.co/GHhNTdkdrD
— Laurence Tribe (@tribelaw) June 11, 2019

"Everybody knows it's a fake video. But they watch it," Lyu said. "It's generating an illusion. It can wreak a lot of damage. It's very hard to remove. And it can come from anywhere. With the Internet, all the boundaries are becoming blurred."

High-definition fake videos often are the easiest to detect, researchers said. The more detail in a video, the more opportunities for the fake to reveal its flaws. But the modern Web works against that advantage because most social-media and messaging sites compress the videos into formats that make them quicker and easier to share, removing critical clues.

That challenge to some appears insurmountable, and has led some researchers to instead pursue an authentication system that would fingerprint footage right as it's captured. It could help make fakes easier to spot, but would require agreement from makers of smartphones, cameras and websites - a far-off proposal that could take years.

"I worked on detection for 15 years. It doesn't work," said Nasir Memon, a professor of computer science and engineering at New York University. "Facebook videos? Things thrown around in WhatsApp? . . . It may never work. Meanwhile, the adversary has really gone up a few notches."

Political campaigns that have long prepared defences against bruising video gaffes said they were stumped on how to prepare for the new weapons of mass deceptions. Several campaign officials said they pinned their hopes on the tech companies acting more aggressively to police for fakes.

A Democratic National Committee official said it has helped train campaigns on how to combat disinformation and push for takedowns from the social-media sites. A Republican National Committee official said it is encouraging employees to stay on alert for suspicious content, and that its digital team works with the tech giants to flag harmful posts and accounts.

But the tech giants' policies don't align on whether fakes should be deleted or flagged, demoted and preserved. YouTube, for instance, quickly pulled the distorted Pelosi video, saying it violated its "deceptive practices" policies. But Facebook kept it online, saying in a statement to the Post that "we don't have a policy that stipulates that the information you post on Facebook must be true."

YouTube said it is "exploring and investing in ways to address synthetic media" and compared it to previous challenges, such as fighting spam and finding copyright-infringing videos, that it has tackled with a mix of software and human review.

Facebook is funding some universities' manipulated-media research and, in a statement to the Post, said "combating misinformation is one of the most important things we can do." The company was targeted by its own fake this week, when an altered video of chief Mark Zuckerberg appeared to show him boasting of his "total control" over the world's data. (The fake remains online.)

These Realistic Deepfakes of Sly Stallone as The Terminator are Terrifying: https://t.co/kD3HouXuiB via @SputnikIntThis is fantastic, very funny, very creative. Why would they make a terminator with the crooked mouth?!
— Sylvester Stallone (@TheSlyStallone) June 7, 2019

Twitter said it challenges more than eight million accounts a week that attempt to spread content through "manipulative tactics." But fact-checking every tweet is not feasible, the company said, adding that it doesn't "think we should set the precedent of intervening to decide what is and is not truthful online."

The company added that "the clarification of falsehoods happens in seconds" on the site due to real-time checks from other users, and that "typically factually inaccurate material gains very little distribution on Twitter until it is" disproved. The company could not offer any statistics to support that claim.

Perhaps the most pervasive problem for modern visual storytelling, researchers said, is not sophisticated fake videos but mis-attributed real ones: Footage of a real protest march or violent skirmish, for instance, captioned as if it had happened somewhere else.

The detection systems have taken on a newfound urgency due to the upcoming election, but there is also a growing interest from corporate America to protect against viral frauds. Shamir Allibhai, the founder of Amber, a small fake-detection start-up, said his firm is working now with a test group of corporate clients seeking a shield against deepfakes that could show, for instance, a chief executive saying racist or misogynistic slurs.

In a world where video has played a pivotal role in shaping modern history, researchers said it's nevertheless critical to find a way to spot the fakes - and some fear what could happen if the authority of video slips away.

"As a consequence of this, even truth will not be believed," Memon said. "The man in front of the tank at Tiananmen Square moved the world. Nixon on the phone cost him his presidency. Images of horror from concentration camps finally moved us into action. If the notion of not believing what you see is under attack, that is a huge problem. One has to restore truth in seeing again."