Online exclusive
Consumer tech: The AI revolution gathers pace as Google announces slew of new products
After a year of soul-searching as it watched rival OpenAI claim the limelight with ChatGPT, Google is ready to spend the rest of 2024 “making AI helpful for everyone”.
That was chief executive Sundar Pichai’s pitch last week at Google’s I/O developers conference, where he and senior executives outlined how Google is setting about building artificial intelligence into all of its products, from the Android mobile operating system to the ubiquitous Google search engine.
The company had an enviable lead in AI, including acquiring the AI startup Deep Mind, but it has been on the back foot when it comes to building AI into its products. (For the record, Deep Mind was co-founded by New Zealander Shane Legg, who is now Deep Mind’s chief AGI – artificial general intelligence – scientist.)
OpenAI’s deal with Microsoft means the technology underpinning Chat-GPT is built into Microsoft 365 productivity apps like Word, PowerPoint and Teams, allowing users willing to pay the extra $30 a month for a Copilot licence to generate presentations from scratch by entering text prompts, and summarise email threads in Outlook.
Google’s response was Bard, its intelligence AI chatbot, which was rebranded as Gemini. Now numerous versions of the Gemini large language model (LLM) are being deployed to embed AI in Google’s products. Here are some of the key AI services Google showed off at the I/O conference which are expected to be available this year.
AI Overviews
You may be seeing these AI Overviews occasionally at the top of Google’s search results when you enter a query into the search box. Rather than just displaying a collection of relevant links as Google has traditionally done, now it will try to summarise the answer in a short blurb.
It features links to sources and allows you to quickly ask follow-up questions to drill deeper into the subject. It’s a relatively small change, but it could have massive ramifications for how people search. Website owners fret that people will no longer click through to their sites, instead gleaning the information from Google’s AI-generated summaries instead.
I’ve been seeing AI Overviews pop up for more complicated queries during the past couple of months and found them to be useful for research. But the information summarised isn’t always accurate, so you shouldn’t rely on them and check the source material as well.
As well as AI Overviews, Google plans to use generative AI for certain types of search queries, providing AI-generated summaries of reviews, discussions from social media, and lists of suggestions. Initially, this will be applied to queries related to inspiration, dining options and recipes. It has plans to expand to other areas like movies, books, hotels and e-commerce. A big question will be how relevant these search summaries will be for the New Zealand market.
Gmail’s AI sidebar
Gmail, the world’s most popular free email service, now boasts 1.8 billion users. So an AI overhaul for Gmail could change how people all over the world use email. An AI-powered feature that helps users auto-generate entire emails will soon be available in Gmail, expanding on the existing Smart Reply and Smart Compose features.
It can read attachments to summarise the text contained in them. Pressing the “summarise” button in the sidebar will give you brief summaries of lengthy emails, so you don’t have to wade through them in full to understand what the key points are.
An executive at Google Workspace, the premium service for businesses that bundles in Gmail and Google Drive, showed how Gemini could be used to draft a fulsome email response to a contractor. Workflows will let users determine how to respond to an email automatically, such as adding a receipt or invoice email attachment to your expenses’ spreadsheet.
Project Astra
The most impression demo of Google’s new AI functions gave us a glimpse of Project Astra, Google’s effort within the DeepMind AI Lab to use your phone’s camera to identify information about the world around you, identify objects and even find misplaced objects.
This is like a supercharged version of Google Lens, which for years has been able to tell you about objects in front of your camera and translate signs. The demo revolved around voice commands being given to Gemini and at one stage, the woman in the video switched from using her phone to donning a pair of AI glasses, reminiscent of the defunct Google Glass device. By pointing the camera in the glasses at a diagram of a computer server set-up, she was able to get Gemini to recommend how the architecture of the set-up could be improved.
This was all done in real-time, with very little delay between question and answer. If Project Astra can live up to the demo in real life, that will be a hugely valuable tool. Another camera-based AI feature allows the user to ask questions using video content, such as filming a problem with a product and uploading it for identification. This feature aims to save time and enhance the search experience.
Gemini Live
A new experience called Gemini Live will allow users to have in-depth voice chats with the AI, which can understand and respond to surroundings via photos or videos captured by smartphones. This is the sort of truly intelligent conversational experience Alexa, Siri and Google Assistant have never been able to deliver.
Two stand-alone devices, the Rabbit R1, and Humane AI pin have emerged in recent months touting their abilities to offer conversational AI, though the early reviews for both devices have revealed them to have major limitations. Gemini Live seems like Google’s effort to supersede these efforts by building conversational AI into any smartphone.
Veo and Imagen
Google Veo is Google’s answer to Sora, the breathtakingly realistic text to video generator in preview mode from OpenAI. Veo is an AI model that can create 1080p (high-definition) video clips from text prompts and will apparently be integrated into platforms like YouTube Shorts.
Imagen 3 is the latest text-to-image model from Google, capable of generating lifelike images with fewer visual artefacts. This model will be available to select creators and integrated into Google’s machine learning platform, Vertex AI.
OpenAI isn’t standing still
The day before Google unveiled its slew of new AI-related products, arch rival OpenAI released a new version of its wildly popular Chat-GPT intelligent chatbot. OpenAI claims the new model, GPT-4o, promises to turn ChatGPT into a true personal assistant, much as Google is promising with Gemini Live and project Astra.
GPT-4o is free to use, though paying customers will get additional functionality and ability to ask it questions. I’ve been using GPT-4o for a few days and it’s certainly an improvement on the previous free version of Chat-GPT. It has faster response times and improvements such as the ability to offer a detailed summary of photos you upload to it.
The era of the AI agents
This flurry of announcements from Open AI and Google shows the pace of development in artificial intelligence remains relentless. We even saw two senior OpenAI executives depart the firm last week, raising concerns that the technology is advancing too quickly. OpenAI is seemingly not doing enough to focus on safety and ensuring the technology advances in line with the needs of humanity.
While large language models, which underpin useful AI functions, continue to advance, we appear to be entering a new phase of the AI race, where the focus is on creating AI agents that are useful for tackling real-world problems that extend beyond finding information. These agents are becoming multimodal in nature, able to hold lengthy, real-time conversations and to speak to users in a human-like manner.
Demos are one thing; the real value of these AI tools will be tested when they are in the hands of millions of users. But we are certainly in the midst of a step-change in the world of AI, with a handful of companies competing vigorously to produce the AI agents to rule them all.