Deepfakes: Weta helping Hollywood create the perfect digital human

By Tim Bradshaw

Financial Times·

10 Oct, 2019 08:39 PM12 mins to read

Share this article

Reminder, this is a Premium article and requires a subscription to read.

Tech advances help film-makers but could lead to a glut of videos involving politicians or porn.

Every Hollywood actor is desperate to cling to their youth. Now, Will Smith, the star of Independence Day and Men in Black, can be 23 forever. But unlike his Botoxed peers, the secret of Smith's fresh face is a new breed of digital doppelgänger, offering unprecedented realism.

In Gemini Man, his latest blockbuster which is in NZ cinemas now, the 51-year-old actor plays a

The 23-year-old Smith clone, known in the movie as Junior, is not the real actor hidden under layers of make-up or prosthetics. Instead, he is a completely digital recreation, constructed from his skeleton to the tips of his eyelashes by New Zealand-based visual effects studio Weta Digital.

Hollywood insiders estimate that the Junior character alone cost tens of millions of dollars to make — perhaps twice as much as hiring the real Will Smith.

Actor Will Smith stars alongside his 23-year-old 'skin double' in the new film Gemini Man. Photo / Supplied

Yet just a few weeks before Gemini Man's premiere, another, far cheaper, digital clone of Will Smith appeared in a reboot of 1999's hit science fiction movie, The Matrix. In a two-minute YouTube video, Smith took the place of Keanu Reeves to play Neo, taking the red pill and pausing bullets in mid-air.

The clip was made without Gemini Man's US$138m budget. Instead its creator, a YouTuber known only as Sham00k, employed free software called DeepFaceLab to superimpose Smith's face on to Reeves' within The Matrix footage. So-called "deepfakes" like these have been used to turn comedian Jordan Peele into Barack Obama or actor Bill Hader into Tom Cruise, with each clip more believable than the last.

Advertise with NZME.

Deepfakes and the high-end effects seen in Gemini Man offer two alternative paths to manipulating people in videos. But as the two techniques converge, the cost of a fully digital human is plummeting. The "uncanny valley" is finally being bridged, prompting some in Silicon Valley to wonder when virtual assistants, such as Alexa, will no longer be just a disembodied voice.

"The price of realism has dropped dramatically in the last 20 years," says Paul Franklin, co-founder and creative director at award-winning visual effects studio DNEG. "Things that were the domain of companies like DNEG can now be done with off-the-shelf software. It's inevitable [that] the kinds of techniques in Gemini Man will be stock-in-trade in the next 10 years."

The fact that it has never been easier for Weta wannabes to insert people into short videos has led to warnings from politicians, privacy activists and Hollywood itself. Convincing fake videos could be used to manipulate electorates, defraud companies or bully individuals — even if for now, deepfakers' principle hobby is to insert unwitting celebrities into pornography.

A report in September from Deeptrace Labs, a cyber security start-up whose technology detects manipulated videos, found that the number of deepfakes posted online had almost doubled in the past six months to 14,678. Of those, 96 per cent are classified as porn.

"It's definitely evolving very fast," says Katja Bego, a data scientist who is researching deepfakes at Nesta, a tech-focused non-profit organisation. Facebook, Google and Microsoft have driven efforts to improve deepfake detection, hoping to prevent misleading videos from spreading across their networks.

Creating realistic digital people the traditional Hollywood way is still a daunting task. Bringing Junior to life took "hundreds of hours of painstaking animators' and modellers' time", says Stuart Adcock, head of facial motion at Weta, which was founded by Lord of the Rings director Peter Jackson. "At times it felt more like we were making a real human from the ground up than a visual effect."

But with advances in machine learning and processing power available on smartphones and cloud computing systems, some predict that Gemini Man-style effects could one day become as accessible as selfie-retouching smartphone apps like Facetune are today.

De-ageing plays a prominent role in Netflix's forthcoming crime drama The Irishman with actors Al Pacino and Robert De Niro. Photo / Supplied

"Deepfakes are the next step in a long chain of the democratisation of media production," says Peter Rojas, a venture capital investor at Betaworks Ventures. "Deepfakes are the democratisation of CGI. It's not that different to what blogging did for publishing."

Advertise with NZME.

Deepfakes are barely two years old but the biggest change in recent months is the amount of input data required to create a convincing video. In September, Chinese app Zao caused a viral sensation by allowing users to trade places with Leonardo DiCaprio in a selection of scenes from movies such as Titanic. Because Zao's range of clips is limited and pre-selected, the process takes just a few seconds and requires only a single photograph of the face-swapper.

"Before, it was easy to do this for celebrities and politicians because you have a tonne of moving footage for them [on the internet]," Ms Bego says. "Now you just need one picture of a normal person."

Despite the pace of deepfakes' progress, traditional Hollywood effects studios such as Weta see little application for the technology in today's blockbusters.

Deepfakes may be popping up on smartphones in YouTube clips and Facebook feeds but in Gemini Man, Junior's digital face is shown in lingering close-ups across a vast Imax screen. While the effect is more convincing in scenes set in dark catacombs than in bright sunlight, it has nonetheless been hailed as a breakthrough for human realism. The difference is obvious even from the effects of two or three years ago, such as Princess Leia's brief appearance in the 2016 Star Wars spin-off, Rogue One.

"The really tricky thing is the way the human face moves . . . that has been a holy grail for visual effects forever," says Franklin. DNEG has worked on films including The Avengers series and Ex Machina.

"We are all experts in what faces look like," he says. "If something is even slightly off — if the muscles around the mouth don't move correctly or the eyes don't look in the right direction — we all know about it instantly." That is why many deepfakes are still easy to spot.

Achieving the level of quality seen in Gemini Man or Avengers: Endgame's "Smart Hulk" is costly and time consuming. "In high-end visual effects, we price it out in millions of dollars per minute," says Franklin. "It's incredibly labour intensive." Even on television shows and video games, where budgets are typically more constrained, a "virtual human" effect might come with a six-figure bill.

In Hollywood, that investment pays off if audiences flock to see it on the big screen. Jesse Sisgold, president and chief operating officer at Skydance Media, one of the production companies behind Gemini Man, says the film's "revolutionary technology . . . establishes a new benchmark for the theatrical experience".

The first year of production on Gemini Man at Weta was spent building a digital version of Will Smith as he is now. It included a model of his skull, a photogrammetric map of his skin pores and face lines, and just the right mix of digital oil and water to make his eyes look real.

Ang Lee, director of Gemini Man, with Will Smith on set. Photo / Supplied

Then, Weta's Adcock explains, his team compared that model to a 23-year-old "skin double", as well as drawing on footage from Smith's 1990s movies and photos of him as young as 8, to determine how features such as his nose, chin and jaw had aged.

Adding to the challenge, Adcock says, was living up to the audience's memories of Smith, who has been a familiar face to millions since The Fresh Prince of Bel-Air first aired in 1990.

For one shot, director Ang Lee asked the Weta team to make Junior look as though he was a "ruthless assassin" but sympathetic enough that the audience would still want to "sit down and enjoy a nice warm bowl of chicken soup" with the character.

"We wrestled with that concept for a while before finally landing on the recipe," says Adcock. Weta made a "small tweak" to the epicanthic fold, where the upper eyelid meets the inner corner of the eye, and put "more softness" in the eyes.

"Technically it's a huge challenge but there are also many creative choices at play to make shots work," he says. "It's a balance of art and science. We can't just have one-click solutions."

The process used by Weta and other effects studios is at odds with the idea of fully-automated deepfakes — and points to a broader challenge with artificial intelligence systems. Deep learning and neural networks are "black boxes" that take data as input and spit out a result, without explaining what happens in between. "Deepfakes allow you to get a result that is convincing in some cases but imagine art directing eye behaviour from one frame to the next," says Adcock. "That is the level of control we need."

Suranga Chandratillake, a tech investor at Balderton Capital, says today's deepfake creation systems are fragmented and incomplete. Despite the promise of instant fakery, the best-quality examples still require a lot of manual fine-tuning to ensure a convincing clip.

"When you read the hyperbolic stuff that the world is going to change [due to deepfakes], that depends on it being really good and instant. That just can't be done," he says. "I'm not sure the current approach will ever get you there."

This "man behind the curtain" problem affects other AI-led systems, such as self-driving cars, he adds. Automation can get you 90 per cent of the way there, but manual intervention is still required to reach the desired destination safely.

That adds to another challenge for deepfake producers today: their almost complete lack of a business model or corporate sponsorship. "The interesting hurdle [to overcome] would be if there is progress in commercialising this," says Bego. "There is not that much money being pumped into making these much better."

That may be starting to change. Despite many in the visual effects industry dismissing deepfakes as a gimmick, the first Hollywood movie to incorporate the technique was released earlier this year — without audiences even noticing.

Deepfakery shaved several years off British actor Bill Nighy in Pokémon Detective Pikachu, according to Tim Webber, chief creative officer of Framestore, the visual effects group that worked on the movie adaptation of the video-game franchise.

"The reason we ended up using deepfake was partly a wish to experiment with it," says Webber. "We had played around with it before, not terribly seriously, and it hadn't worked."

Deepfakery shaved several years off British actor Bill Nighy in Pokémon Detective Pikachu. Photo / Supplied

Just as models in fashion magazines might be given the Photoshop treatment, "de-ageing" techniques are widely used (though not often advertised) to digitally airbrush Hollywood stars. De-ageing plays a prominent role in Netflix's forthcoming crime drama The Irishman, to make Robert De Niro and Al Pacino look younger in flashback scenes. In most cases, unlike the fully digital Junior in Gemini Man, de-ageing involves CGI models being melded with, or pasted on top of, standard camera footage from the actors.

In Detective Pikachu, though, Framestore's deepfakes tinkering made it to the big screen. Nighy's character, Howard Clifford, is de-aged for just a few seconds, when his younger self is shown in a low-resolution archival news clip in the opening sequences.

"We were only doing de-ageing on a few shots so it wasn't worth us building a full computer-generated model of an actor's face," Webber says. "We could use a younger picture of that actor to train the deepfake model."

While development has been "incredibly rapid", Webber says, it is "a little hard to predict how things will progress". The free, open-source nature of deepfake software — and the underground community who use it — is holding back commercialisation, he says.

That could be where Silicon Valley comes in. Apple, Google and Facebook, as well as games developers such as Fortnite maker Epic, have been hiring talent from California-based visual effects companies such as Industrial Light & Magic, which was founded by Star Wars creator George Lucas, and Pixar, the Disney-owned computer animation pioneer. Last week, it emerged that Apple had acquired UK-based iKinema, which specialises in "full body" motion capture for games and films.

That has the visual effects industry speculating about what this might lead to — from upgrades to avatars, such as Apple's personalised "Memoji" or Snap's Bitmoji, to full visualisations of digital assistants like Alexa and Siri.

The tech companies "sit in the middle" between big budget Hollywood-style effects and the DIY feel of deepfakes, says Steve Caulkin, chief technical officer at Cubic Motion, which works on digital animation for video games, TV and films. "They potentially have the means to create pretty high-end digital humans."

Combining Silicon Valley's vast data troves and AI expertise with Hollywood visual effects could mean that one day, every smartphone owner has their own private version of Gemini Man's Junior — a realistic avatar to represent them in the digital world.

"What I'm excited about," Smith joked at a preview screening of the film earlier this year, "is there's a completely digital 23-year-old version of myself I can make movies with now".

Written by: Tim Bradshaw

Save