OpenAI Operator Agent Available to ChatGPT Pro Subscribers

OpenAI has launched Operator, a semi-autonomous AI agent that uses a proprietary web browser to execute tasks like planning a vacation using Tripadvisor or booking restaurant reservations through OpenTable. “It can look at a webpage and interact with it by typing, clicking and scrolling,” explains OpenAI. Operator is powered by a new model called Computer-Using Agent (CUA), and is available in research preview to ChatGPT Pro subscribers in the U.S. Combining GPT-4o’s computer vision capabilities with advanced reasoning, CUA is trained to interact with graphical user interfaces (GUIs) — parsing menus, clicking buttons and reading screen text.

“Operator⁠ transforms AI from a passive tool to an active participant in the digital ecosystem,” OpenAI says in an introductory blog post that lists third party participants including DoorDash, Etsy, eBay, Instacart, Priceline, StubHub, Target and Uber.

By initially making Operator available only to Pro users (who pay $200 per month) OpenAI seeks to to demonstrate agentic AI’s potential while working out kinks through user feedback. “Our plan is to expand to Plus, Team and Enterprise users,” eventually integrating it into ChatGPT and the API, the company notes.

“This product is the beginning of our step into agents,” OpenAI CEO and co-founder Sam Altman said in a demo on YouTube.

“Operator doesn’t take over your web browser,” writes VentureBeat, explaining that “you visit a separate, new website — operator.chatgpt.com — and are confronted with a prompt input box similar to ChatGPT.” Typed prompts “will trigger Operator to open a separate, virtual browser running in the cloud on OpenAI servers.”

Then, the agent can execute tasks and common workflows, VentureBeat adds, noting that “the user watches the cursor move on its own on the cloud-based browser in real time.”

If the agent encounters difficulty, it will use its reasoning powers to self-correct. If that fails, “it simply hands control back to the user, ensuring a smooth and collaborative experience,” according to OpenAI. “Users can choose to take over control of the remote browser at any point, and Operator is trained to proactively ask the user to take over for tasks that require login, payment details, or when solving CAPTCHAs.”

“Operator takes screenshots of a computer screen and scans the pixels to figure out what actions it can take,” reports MIT Technology Review, noting it works “like Anthropic’s Computer Use and Google DeepMind’s Mariner.” In a separate post explaining CUA, OpenAI indicates Operator outperforms those competing tools.

Typically software, including AI models, interact with other apps using APIs that provide permissions and instruction sets linking the entities. That limits functionality to users that have downloaded and activated a particular app’s API — something common to developers, but not regular users.

“If you create a model that can use the same interface that humans use on a daily basis, it opens up a whole new range of software that was previously inaccessible,” OpenAI scientist Reiichiro Nakano tells MIT Technology Review.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.