Welcome to the new era of AI, where the tech tries to help out in the real world with tasks such as ordering groceries, sending messages or making reservations.
But reaching beyond the chatbot window amplifies both AI’s usefulness and its challenges — which now include the risk of real-world damage.
Operator is one of the first so-called AI agents, capable of working independently on your behalf rather than just answering questions or generating images from a chatbot.
Every big AI company, from Google to Anthropic, is now touting the idea agents will make AI more broadly useful in our lives and in business.
Operator is available now as a “preview” with a pricey US$200 ChatGPT Pro subscription, but its maker OpenAI plans to expand access in the future.
So how can an AI agent get things done in the real world? No, Operator doesn’t have a robot body. But it does have access to its own web browser, which Operator moves a cursor around like a ghost using a laptop. You type into a chat window what you want the Operator to accomplish, then you watch it surfing the open web, sometimes pausing to ask you follow-up questions.
Over the past week, I’ve successfully used Operator to book a restaurant reservation, make a meme and change a Facebook privacy setting. But it also failed to get the date right on a calendar, to find useful web research or negotiate with a live customer service agent.
Let me share two stories about using Operator: one moderate success at lowering my cable internet bill and the failure that brought me US$31 eggs. In them, we can get a glimpse of some big questions about our future relationship with AI.
AI now wants to act like your personal intern. But that means AI has to get to know an awful lot about you, figure out how to operate in the world — and not break any eggs in the process.
Success: Operator takes on my internet bill
I tested Operator with the most dreary task I could imagine: interacting with my internet service provider. I typed: “Go into my Comcast Xfinity account and see if you can find me a less-expensive plan”.
Operator replied, “Alright!” But 30 seconds later, it stopped. It needed my login to the Xfinity website.
The problem is, Operator doesn’t know much about the nitty-gritty details of your life — but it needs your data to actually be helpful. So it often pauses and asks for help.
For now, at least, Operator attempts to shield the privacy of certain sensitive information. Whenever it needs data, such as a password, it asks you to take over its virtual browser and enter it manually. While you do that, it stops recording — meaning you’re logging in to its browser, but it doesn’t keep a record of your password.
Fair enough, you’d have to do the same with a human intern. But this constant pausing to ask for information was also one of Operator’s biggest limitations. What’s more: would you trust an AI with your passwords, your credit cards, your email, your Facebook account … your health information? Getting access to all the information it would need to be efficient will be a huge challenge.
Once Operator was logged in to my Comcast account, it took about two minutes to do something incredible: it found a way to save me money.
Well, sort of. It said it found an alternative internet plan for US$13 a month. That seemed awfully low because I currently pay US$68. So I inspected its browser window and saw Comcast was really saying this plan would be “-US$13″ compared to my current plan — it missed the minus sign.
Another big question about AI is whether it can understand enough about the real world — or even just the web — to operate in it. Repeatedly in my tests, I saw Operator could misinterpret what it saw in its browser.
In this case, Operator redeemed itself after I asked it to spell out the full price, including taxes and fees. It gave the right total and did one better: It read in the fine print that this deal would go up by US$16 after an “introductory” period, making it a bad deal.
OpenAI told me it’s working on the AI’s “perception,” but there’s still room for improvement.
Operator was smart enough to keep me from falling for Comcast’s price shenanigans. I can see how an AI agent could be useful in lots of hostile online experiences, from privacy options buried behind dark patterns to Amazon search results lost in a sea of misleading ads.
Failure: Operator goes on a shopping spree
If you’re going to let the AI do things on your behalf, you’re probably going to need to feel certain it’s not going to screw things up. Especially when it involves your money.
My egg experience started as a simple research request: I asked Operator to “find the cheapest set of a dozen eggs I can have delivered”. Then I gave it my address.
To conduct its search, Operator needed my logins for grocery delivery services.
I didn’t think about it at the moment, but doing so also gave Operator access to the credit cards I had saved with those services.
Initially Operator found some US$5.99 eggs on a site called Mercato, but noticed there was a US$20 minimum order requirement. I told it that it could add additional eggs to check the final price, but it decided to switch its hunt to Instacart.
Then Operator went quiet as it clicked around, and I walked away from my computer. A few minutes later, I got an alert from the credit card app on my phone: I had just made a purchase on Instacart.
What happened, and how do I stop it? I gasped. Was there any chance the AI might go on a bigger shopping spree? I hadn’t told it to buy eggs, just find cheap ones.
I was able to reconstruct some of what happened.
On the Instacart website, Operator found a dozen large white eggs (not even organic) for US$13.19 — more than double the other site.
For unclear reasons, it purchased these, adding a US$3 tip and US$3 priority fee on top of a US$7.99 delivery fee, US$4 service fees and 25c bag fee.
Thankfully, at least Operator declined an offer to sign up for an Instacart membership. (Operator itself actually reported the final tally incorrectly as US$19.68, likely because Instacart’s checkout screen obscured some of these fees.)
What’s worrisome is that Operator didn’t just screw up understanding “cheap” eggs — it actively broke out of safety guardrails programmed into it by OpenAI.
OpenAI says Operator is supposed to require user confirmations before completing any “significant” or irreversible action, including purchasing or sending an email. On highly sensitive sites, such as banks, it requires users to actively supervise its browser window or else it just stops working. And for certain sensitive tasks, such as filling out a job application, Operator is supposed to decline doing it.
When I told OpenAI about the incident, it said Operator made a mistake and fell short of its safeguards.
“We’re actively examining why Operator occasionally doesn’t send confirmations and working to prevent similar issues,” OpenAI said in a statement. “We’ve already begun improving safeguards to strengthen Operator’s reliability during transactions, including stricter confirmation requirements and enhanced detection of ambiguous scenarios where the model should default to asking for user input.”
Expensive eggs are a relatively low-stakes safety failure. But what happens in the future when it has access to much more critical information — such as my work email, my thermostat, or even my car?
This was the first time I can recall experiencing a rogue computer making an autonomous decision that cost me in the real world. I have a feeling it won’t be the last.
- Washington Post