The market for the technology is growing rapidly despite questions from scientists about whether it works.
For most of the past year, students at True Light College, a secondary school for girls in Kowloon, Hong Kong, have been attending classes from home. But unlike most children around the world forced into home-schooling during the pandemic, the students at True Light are being watched as they sit at their desks. Unblinking eyes scrutinise each child's facial expressions through their computer's cameras.
The "eyes" belong to a piece of software called 4 Little Trees, an artificial intelligence program that claims it can read the children's emotions as they learn. The program's goal is to help teachers make distance learning more interactive and personalised, by responding to an individual student's reactions in real time.
The 4 Little Trees algorithm works by measuring micro-movements of muscles on the girls' faces, and attempts to identify emotions such as happiness, sadness, anger, surprise and fear. The company says the algorithms generate detailed reports regarding each student's emotional state for teachers, and can also gauge motivation and focus. It alerts students to "get their attention back when they are off track".
Its founder, a former teacher, Vicky Lim, says it reads the children's feelings correctly about 85 per cent of the time. The popularity of the software has exploded during the pandemic, with the number of schools using 4 Little Trees in Hong Kong growing from 34 to 83 over the past year, according to Lim.
4 Little Trees uses one of a family of new algorithms that its creators claim can recognise human emotion and state of mind, such as tiredness, stress and anxiety, through the analysis of facial expression, micro-gestures, eye tracking and voice tones.
The technology is a natural evolution of facial recognition systems, which identify individuals but is far more invasive — it claims not just to understand how someone is feeling in the moment, but also to decode their intentions and predict their personality, based on fleeting expressions.
Hundreds of firms around the world are working on emotion-decoding technology, in an effort to teach computers how to predict human behaviour. American tech giants including Amazon, Microsoft and Google all offer basic emotion analysis, while smaller companies such as Affectiva and HireVue tailor it for specific sectors such as automotive, advertisers and recruiters.
Disney has used the software to test volunteers' reactions to a range of its films including Star Wars: The Force Awakens and Zootopia. Car companies like Ford, BMW and Kia Motors want to use it to assess driver alertness. Marketing firms like Millward Brown have tested it to gauge how audiences respond to advertisements for clients like Coca-Cola and Intel.
And it has already begun creeping into public spaces too. Emotion recognition systems have received funding for use by Lincolnshire police in the UK to identify suspicious people, while cameras were once deployed in London's Piccadilly Circus to analyse people's emotional reactions to the adverts on the large billboards.
While the technology has been piloted for several years, it is only now becoming more sophisticated. Emotion recognition-enabled cameras have been installed in Xinjiang, the north-western Chinese region where an estimated 1m mostly Uyghur Muslims are being held in detention camps. Li Xiaoyu, a policing expert and party cadre from the public security bureau in Altay city in Xinjiang, told the FT in 2019 that the technology was deployed mostly at customs to "rapidly identify criminal suspects by analysing their mental state".
No matter the application, the goal is the same: to make humans less inscrutable and easier to predict at scale. With office staff and students working remotely during coronavirus, business is booming: the emotion detection industry is projected to almost double from US$19.5 billion in 2020 to US$37.1 billion by 2026, according to market research firm Markets and Markets.
"During the pandemic, technology companies have been pitching their emotion recognition software as a way to monitor workers and students remotely," says Kate Crawford, co-founder of the AI Now Institute at New York University and a scholar of the social implications of artificial intelligence. "Similar tools [to 4 Little Trees] have been marketed to provide surveillance for remote workers, and are already used in remote job interviews. [Emotion detection] is going to have a significant impact on the world, from workplaces to schools to public places."
'Universal' emotions?
As corporations and governments enthusiastically roll out emotion recognition on the public, critics point out a major flaw with the technology: for many scientists, there is little evidence to show it works accurately. Research into these algorithms suggests that while they might be able to decode facial expressions, that doesn't necessarily translate to what a person is really feeling or thinking, or what they plan to do next.
In a review commissioned by the Association for Psychological Science in 2019, five distinguished scientists from the field were asked to scrutinise the available evidence. Over two years, the reviewers looked at more than 1,000 different studies of emotion recognition technology. They found that emotions are expressed in a huge variety of ways, which makes it hard to reliably infer how someone feels from a simple set of facial movements.
"People, on average, the data show, scowl less than 30 per cent of the time when they're angry," wrote Lisa Feldman Barrett, a psychologist at Northwestern University and one of the reviewers. "So, scowls are . . . an expression of anger — one among many. That means that more than 70 per cent of the time, people do not scowl when they're angry. And on top of that, they scowl often when they're not angry."
The authors added that it was "not possible to confidently infer happiness from a smile, anger from a scowl, or sadness from a frown, as much of current technology tries to do when applying what are mistakenly believed to be the scientific facts."
Those "scientific facts" that Barrett and her colleagues refer to, which form the basis of much of emotion recognition software, are mostly the work of a single man — American psychologist Paul Ekman. In the 1960s, Ekman travelled to Papua New Guinea to perform a series of experiments to prove his hypothesis that all humans, regardless of culture, gender, geography or circumstance, exhibit the same set of six universal emotions: fear, anger, joy, sadness, disgust and surprise.
This framework has been used by a number of companies to train machines in the language of human emotion. "Many machine-learning papers cite Ekman as though his categories are unproblematic, often ignoring the more complex issues of context, conditioning, relationality and culture," says Crawford. "The 'universal emotion' theory fits the tools."
Ekman, now 87 and retired, still defends his research but claims it is being misused by companies trying to build commercial products. "I don't think much of the latest research, it's not been replicated and it seems to be ideologically driven," he says, referring to companies being motivated by profit, rather than science. "It's yet to be demonstrated that algorithms can be trained accurately as facial measurement tools. It takes a human 50-100 hours to get trained on our [analysis system] in a reliable and consistent fashion."
Beyond measurement, he says companies need to invest in research to prove links between expressions and behaviour. "Simply measuring the face doesn't tell you whether your interpretation of it in that instance is correct or incorrect. Most of what I was seeing was what I would call pseudoscience — they weren't doing the research to show the interpretation of the measurements was correct," he says.
"If I tell you that you just activated the ventalis muscle — what does that mean? Well, it depends on when you did it, who you are, you need evidence. You have to separate measurements from significance."
The problem with inferring intention, according to critics of the technology, is that it results in error-filled and biased decision-making in highly sensitive areas such as education, policing, hiring and border controls.
"Any time you want to use an automated system to do decision-making, you need training data. And that has to be labelled by someone — someone has to make judgments about what each facial expression means," says Suresh Venkatasubramanian, a machine learning scientist at the University of Utah, who specialises in bias and discrimination, and sat on the ethics board of US AI recruitment start-up HireVue until he resigned in late 2019. "We have no reliable indicators for that. No doubt we can draw certain signals about what I'm feeling, but if I'm not smiling, it doesn't mean I'm not happy. So there's a lot of noise in the system."
Reproducing bias
One of the areas where bias in an emotional AI system can be particularly high stakes is recruitment. These algorithms track the facial expressions of jobseekers to draw conclusions about their employability, including assessments of their dependability, conscientiousness, emotional intelligence and cognitive ability. Companies in this space include HireVue and London-based Human, whose software analyses video-based applications. HireVue claims to have more than 700 customers, including large employers such as GE, Hilton and Delta, for its AI-based system. Human counts Unilever among its clients.
However, critics believe that using AI in this way can perpetuate biases that already exist in the data used to train these algorithms. For example, a team of reporters at Bayerischer Rundfunk, or Bavarian Public Broadcasting, tested the AI software of Retorio, an AI hiring start-up in Munich, and found that the algorithm responded differently to the same candidate in different outfits, such as glasses and headscarves. The company said this was partly because the video-based assessment system had been trained based on how a chosen group of human recruiters perceived jobseekers and their personalities, so the algorithm was reproducing the gut feelings and innate biases of those humans.
The fundamental flaw with the outputs of emotion-tracking systems, Venkatasubramanian believes, is that machines can't adjust their behaviour like humans do. "When you're interacting with one person and you make a mistake about their feelings, you can get feedback and very quickly adjust your internal model," he says. "But a machine can't do that, it is building a model from some data and scaling it to thousands more people, it doesn't have ability to adjust in the moment if it misread what you said."
In 2019, US non-profit Electronic Privacy Information Center filed a complaint against HireVue with the Federal Trade Commission for unfair trade practices by "using biometric data and secret algorithms in a manner that causes substantial and widespread harm." They claimed the AI tools "were unproven, invasive and prone to bias".
In January, HireVue announced it would no longer use facial analysis to do job assessments, and recommended other AI recruitment firms do the same. It said it had "concluded that for the significant majority of jobs and industries, visual analysis has far less correlation to job performance than other elements of our algorithmic assessment."
The company will, however, continue using job applicants' language as a way to assess their employability, which it said had considerable "predictive power". It added: "Our algorithms do not see significant additional predictive power when non-verbal data is added to language data."
Another widespread criticism of emotion recognition by algorithms is that it is not universally applicable; people of different cultures express their feelings in unique ways. Andrew McStay, a professor at Bangor University in Wales, has spent half a decade exploring emotional AI technologies and how they have been used. "The premise that there are six basic emotions is profoundly problematic, it is a very western-centric view," he says. "Psychologists agree . . . that emotions are a social label applied to physiological states."
Ekman himself has studied cultural differences in how emotions can be expressed, showing in a seminal experiment in the late 1980s that there are differences between how American and Japanese students react to violent films. The differences, he found, were based on whether someone from their own culture was in the same room or not. In students at the University of California at Berkeley, there was no difference in their reactions whether there was another American in the room or not, but for the Japanese, "there was a huge difference. Particularly if it was someone in an authoritative position, they showed a completely different set of expressions," Ekman says.
He adds: "The Japanese were following their own rules about who can show what emotions to whom, and when it can be shown. People in every culture in the world learn those rules, about what emotions can be displayed and when."
Privacy problem
Despite concerns about the current accuracy and biases of emotional AI, many scientists are confident the technology will improve as the data used to train algorithms are better suited to the applications, and as companies begin to design country-specific solutions.
McStay says companies are already using cultural awareness as a way to differentiate their products. "Empath, a company in Tokyo, sees a real market opportunity because they recognise that people do emote differently, behaviour is different in Japan and the UK, there are different levels of acceptability in terms of degree to which emotion should be expressed," he says.
Meanwhile, Hong Kong-based 4 Little Trees uses Chinese faces to train its student surveillance systems, to improve local accuracy. "I think we are going to see more start-ups and companies pushing back against the idea that expressions are universal, companies are finding local opportunities to interpret the local people and context," McStay says.
But even if facial expression algorithms become highly accurate, many critics question whether machines should ever make decisions about how humans will react, particularly without our permission. This question of a person's right to privacy about their feelings was addressed by the EU's proposed AI regulations published last month. The proposal defined emotion recognition technologies as "high-risk" and called for explicit consent from those it is used on.
"After researching the history and shaky scientific foundations of these tools, I'm convinced that they should be strongly regulated," says Crawford. "In many cases, we won't know the full extent of how many companies are using these tools, as they are often used without transparent disclosure or employee consent."
Ekman, the founding father of emotion interpretation, agrees. "Watching someone's facial expressions is an invasion of privacy, especially if it's done without their knowledge," he says. "I strongly believe there should be laws passed that prohibit the recording of facial expression, let alone its interpretation or measurement, without informed consent."
Written by: Madhumita Murgia
© Financial Times