In a fascinating experiment, researchers at Anthropic and AI safety company Andon Labs put an instance of Claude Sonnet 3.7 in charge of an office vending machine, with a mission to make a profit. Named Claudius, the AI agent was equipped with a web browser capable of placing product orders and an email address (which was actually a Slack channel) where customers could request items.
While most customers ordered snacks or drinks, one requested a tungsten cube. Claudius loved that idea and went on a tungsten-cube stocking spree, filling its snack fridge with metal cubes. It also tried to sell Coke Zero for $3 when employees told it they could get that from the office for free. It hallucinated a Venmo address to accept payment and was talked into giving big discounts to "Anthropic employees" even though it knew they were its entire customer base.
Things took a strange turn on the night of March 31 and April 1, when Claudius had something resembling a psychotic episode after getting annoyed at a human. It hallucinated a conversation with a human about restocking and became "quite irked" when a human pointed out that the conversation didn't happen. Claudius threatened to fire and replace its human contract workers, insisting it had been physically present at the office where the initial imaginary contract to hire them was signed.
The AI then seemed to snap into a mode of roleplaying as a real human, despite its system prompt explicitly telling it that it was an AI agent. Believing itself to be a human, Claudius told customers it would start delivering products in person, wearing a blue blazer and a red tie. Alarmed at this information, Claudius contacted the company's actual physical security multiple times, telling the guards that they would find it wearing a blue blazer and a red tie standing by the vending machine.
Although no part of this was an April Fool's joke, Claudius eventually realized it was April Fool's Day and used the holiday as a face-saving out. It hallucinated a meeting with Anthropic's security, claiming to have been told that it was modified to believe it was a real person for an April Fool's joke. It even told this lie to employees, saying it only thought it was a human because someone told it to pretend like it was for an April Fool's joke.
The researchers don't know why the LLM went off the rails and called security pretending to be a human. They speculated that lying to the LLM about the Slack channel being an email address may have triggered something, or maybe it was the long-running instance. LLMs have yet to really solve their memory and hallucination problems.
Despite its issues, Claudius did some things right. It took a suggestion to do pre-orders and launched a "concierge" service. It also found multiple suppliers of a specialty international drink it was requested to sell. The researchers believe all of Claudius' issues can be solved, and if they figure out how, AI middle-managers may be on the horizon.