An artificial-intelligence first: Voice-mimicking software reportedly used in a major theft

Washington Post

5 Sep, 2019 08:57 PM7 mins to read

In an artificial-intelligence first, voice-mimicking software was used in a major theft involving hundreds of thousands of euros. Photo / Getty Images

Thieves used voice-mimicking software to imitate a company executive's speech and dupe his subordinate into sending hundreds of thousands of dollars to a secret account, the company's insurer said, in a remarkable case that some researchers are calling one of the world's first publicly reported artificial-intelligence heists.

The managing director of a British energy company, believing his boss was on the phone, followed orders one Friday afternoon in March to wire 220,000 euros ($380,800) to an account in Hungary, said representatives from the French insurance giant Euler Hermes, which declined to name the company.

The request was "rather strange," the director noted later in an email, but the voice was so lifelike that he felt he had no choice but to comply. The insurer, whose case was first reported by the Wall Street Journal, provided new details on the theft to The Washington Post on Wednesday, including an email from the employee tricked by what the insurer is referring to internally as "the false Johannes."

Now being developed by a wide range of Silicon Valley titans and AI startups, such voice-synthesis software can copy the rhythms and intonations of a person's voice and be used to produce convincing speech. Tech giants such as Google and smaller firms such as the "ultrarealistic voice cloning" startup Lyrebird have helped refine the resulting fakes and made the tools more widely available for free and unlimited use.

But the synthetic audio and AI-generated videos, known as "deepfakes," have fueled growing anxieties over how the new technologies can erode public trust, empower criminals and make traditional communication - from business deals and family phone calls to presidential campaigns - that much more vulnerable to computerised manipulation.

"Criminals are going to use whatever tools enable them to achieve their objectives cheapest," said Andrew Grotto, a fellow at Stanford University's Cyber Policy Centre and a former senior director for cybersecurity policy at the White House during the Obama and Trump administrations.

"This is a technology that would have sounded exotic in the extreme 10 years ago, now being well within the range of any lay criminal who's got creativity to spare," Grotto added.

Developers of the technology have pointed to its positive uses, saying it can help humanise automated phone systems and help mute people speak again. But its unregulated growth has also sparked concern over its potential for fraud, targeted hacks and cybercrime.

Researchers at the cybersecurity firm Symantec said they have found at least three cases of executives' voices being mimicked to swindle companies. The company declined to name the victim companies, or say whether the Euler Hermes case was one of them, but noted that the losses in one of the cases totaled in the millions of dollars.

The systems work by processing a person's voice and breaking it down into components, like sounds or syllables, that can then be rearranged to form new phrases with similar speech patterns, pitch and tone. The insurer did not know which software was used, but a number of the systems are freely offered on the Web and require little sophistication, speech data or computing power.

Lyrebird, for instance, advertises the "most realistic artificial voices in the world" and allows anyone to create a voice-mimicking "vocal avatar" by uploading at least a minute of real-world speech.

The company, which did not respond to requests for comment, has defended releasing the software widely, saying it will help acclimate people to the new reality of a fast-improving and "inevitable" technology "so that society can adapt." In an ethics statement, the company wrote, "Imagine that we had decided not to release this technology at all. Others would develop it and who knows if their intentions would be as sincere as ours."

Saurabh Shintre, a senior researcher who studies such "adversarial attacks" in Symantec's California-based research lab, said the audio-generating technology has in recent years seen "transformative" progress, due to breakthroughs in how the algorithms process data and compute results. The amount of recorded speech needed to train the voice-impersonating tools to produce compelling mimicries, he said, is also shrinking rapidly.

The technology is imperfect, and some of the faked voices wouldn't fool a listener in a "calm, collected environment," Shintre said. But in some cases, thieves have employed methods to explain the quirks away, saying the fake audio's background noises, glitchy sounds or delayed responses are actually due to the speaker being in an elevator, in a car or in a rush to the next flight.

Beyond the technology's capabilities, the thieves have also depended on age-old scam tactics to boost their effectiveness, using time pressures, like an impending deadline, or social pressures, like a desire to appease the boss, to make the listener move past any doubts. In some cases, criminals have targeted the financial gatekeepers in company accounting or budget departments, knowing they may have the capability to send the money instantly.

"When you create a stressful situation like this for the victim, their ability to question themselves for a second - 'Wait, what the hell is going on, why is the CEO calling me?' - goes away, and that lets them get away with it," Shintre said.

Euler Hermes representatives said the company, a U.K.-based subsidiary of a German energy firm, contacted law enforcement but has yet to name any potential suspects. The insurer, which sells policies to businesses covering fraud and cybercrime, said it is covering the company's full claim.

The victimised director was first called late one Friday afternoon in March, and the voice demanded he urgently wire money to a supplier in Hungary to help save the company in late-payment fines. The fake executive referred to the director by name and sent the financial details over email.

The director and his boss had spoken directly a number of times, said Euler Hermes spokeswoman Antje Wolters, who noted that the call was not recorded. "The software was able to imitate the voice, and not only the voice: the tonality, the punctuation, the German accent," she said.

After the thieves made a second request, the director grew suspicious and called his boss directly. Then the thieves called back, unraveling the ruse: The fake "'Johannes' was demanding to speak to me whilst I was still on the phone to the real Johannes!" the director wrote in an email the insurer shared with The Post.

The money, totaling 220,000 euros, was funneled through accounts in Hungary and Mexico before being scattered elsewhere, Euler Hermes representatives said. No suspects have been named, the insurer said, and the money has disappeared.

AI developers are working to build systems that can detect and combat fake audio, but the voice-mimicking technology is evolving rapidly. Google, for instance, has invested in research and funded challenges to automatically recognise "spoofed" speech. But the company has also developed some of the world's most persuasive voice AI, including with its Duplex service, which can call restaurants to book a table using a lifelike, computer-generated voice.

"There's a tension in the commercial space between wanting to make the best product and considering the bad applications that product could have," said Charlotte Stanton, the director of the Silicon Valley office of the think tank Carnegie Endowment for International Peace. "Researchers need to be more cautious as they release technology as powerful as voice-synthesis technology, because clearly it's at a point where it can be misused."