A compressor-decompressor is software (usually) that takes massive image files and video data streams and makes them small and manageable. Nvidia's Maxine replaces the CODEC and that really could change a whole lot of things in the coming years.
I was talking to Dr Andrew Chen of New Zealand Covid contact tracing fame about Maxine. Andrew is also interested in AI and its use in, for example, facial recognition in CCTV systems, which is likely to open up a number of ethical cans of worms.
The early claims by Nvidia on how efficient Maxine is compared to CODECs are impressive. By sending only keyframes and points of people's faces and how they move during video calls and then using AI to recreate the image, you use about a tenth of the amount of data with Maxine that CODEC-based systems do. What that means is high quality on low-bandwidth connections, and the ability to build massive cloud-based systems using micro services.
But wait, there's more, as the TV shouts at you if you turn on free-to-air during the day: Maxine can take key facial points of people in video calls and re-animate them with a machine learning technique called generative adversarial networks, with very realistic results.
More realistic virtual backgrounds, removing or adding background sounds, and very creative avatars should be coming to the video calls you're guaranteed to be in next year thanks to Covid-19. Real-time translation and captioning too, which might make the painfully unintelligible video calls that happen regularly tolerable.
Maxine has a side-kick AI framework called Jarvis that developers can use to build human-like virtual assistants, which you might be able to program to take part in video calls while you do something healthier like going for a run.
If you actually need to pop into the call, this could be done via your smartphone where the video call application hides that you're outside, sweaty and out of breath as lockdown made you even less fit than before. Just an idea, I haven't tried Maxine to see if it can do that yet.
Using AI like this is not a new idea, and it opens up a great deal of possibilities, but raises some questions at the same time.
Like, what are we and will we be watching? It's clearly not the real world. Can this be abused?
Chen mentioned the risk of your keyframes and maybe identifying points on faces being nicked by attackers and used to reanimate and impersonate people.
That's one of those rich pickings situations, because if you get a video call from someone you know, and the image looks and sounds like that person, well ... you're going to think that's the person you know and be duped. Your favourite politician calling you unexpectedly, or the bank manager with a credit application.
Real-time deepfakes could be next in other words, and we need protective features in our apps to identify them and alert us. Somewhat ironically, these are also AI-based but worryingly thin on the ground so far.
Either way, the AI fidelity future is coming at us rapidly. If I were a developer doing games, messaging, graphics, audio, you name it, it's the area I'd put money and effort into.