Interestingly, the full video recordings of the therapy session were then given to experts to classify. Unlike the AI, they made their predictions using psychological assessment based on the vocal (and other) attributes - including the words spoken and body language. Surprisingly, their prediction of the eventual outcome (they were correct in 75.6% of the cases) was inferior to predictions made by the AI based only on vocal characteristics (79.3%). Clearly there are elements encoded in the way we speak that not even experts are aware of. But the best results came from combining the automated assessment with the experts' assessment (79.6% correct).
The significance of this is not so much about involving AI in marriage counselling or getting couples to speak more nicely to each other (however meritorious that would be). The significance is revealing how much information about our underlying feelings is encoded in the way we speak - some of it completely unknown to us.
Words written on a page or a screen have lexical meanings derived from their dictionary definitions. These are modified by the context of surrounding words. There can be great complexity in writing. But when words are read aloud, it is true that they take on additional meanings that are conveyed by word stress, volume, speaking rate and tone of voice. In a typical conversation there is also meaning in how long each speaker talks for, and how quickly one or other might interject.
Consider the simple question "Who are you?". Try speaking this with stress on different words; "Who are you?", "Who are you?" and "Who are you?". Listen to these - the semantic meaning can change with how we read even when the words stay the same.
Computers reading 'leaking senses'?
It is unsurprising that words convey different meanings depending on how they are spoken. It is also unsurprising that computers can interpret some of the meaning behind how we choose to speak (maybe one day they will even be able to understand irony).
But this research takes matters further than just looking at the meaning conveyed by a sentence. It seems to reveal underlying attitudes and thoughts that lie behind the sentences. This is a much deeper level of understanding.
The therapy participants were not reading words like actors. They were just talking naturally - or as naturally as they could in a therapist's office. And yet the analysis revealed information about their mutual feelings that they were "leaking" inadvertently into their speech. This may be one of the first steps in using computers to determine what we are really thinking or feeling. Imagine for a moment conversing with future smartphones - will we "leak" information that they can pick up? How will they respond?
Could they advise us about potential partners by listening to us talking together? Could they detect a propensity towards antisocial behaviour, violence, depression or other conditions? It would not be a leap of imagination to imagine the devices themselves as future therapists - interacting with us in various ways to track the effectiveness of interventions that they are delivering.
Don't worry just yet because we are years away from such a future, but it does raise privacy issues, especially as we interact more deeply with computers at the same time as they are becoming more powerful at analysing the world around them.
When we pause also to consider the other human senses apart from sound (speech); perhaps we also leak information through sight (such as body language, blushing), touch (temperature and movement) or even smell (pheromones). If smart devices can learn so much by listening to how we speak, one wonders
how much more could they glean from the other senses
.
This article was originally published on The Conversation. Read the original article.