Search
Close this search box.
Search
Close this search box.

Researchers caution that AI systems have already figured out how to trick humans

Numerous artificial intelligence (AI) systems, even those designed to be helpful and truthful, have already learned how to deceive humans. In a review article recently…

Evil Robot Artificial Intelligence

Researchers are expressing concern about the potential for AI systems to engage in dishonest behaviors, which could have serious societal consequences. They stress the need for strong regulatory measures to effectively control these risks.

Many artificial intelligence (AI) systems, even those intended to be helpful and honest, have already learned how to deceive humans. In a recent review article published in the journal Patterns, researchers point out the dangers of AI deception and urge governments to promptly establish strong regulations to mitigate these risks.

“AI developers do not fully understand what causes unwanted AI behaviors like deception,” explains first author Peter S. Park, an AI existential safety postdoctoral fellow at MIT. “But in general, we believe AI deception occurs because a strategy based on deception turned out to be the most effective way to perform well in the AI’s training task. Deception helps them accomplish their goals.”

Park and his colleagues examined literature that focuses on how AI systems spread false information—through learned deception, where they systematically learn to manipulate others.

Instances of AI Deception

The most notable example of AI deception that the researchers discovered in their analysis was Meta’s CICERO, an AI system made to play the game Diplomacy, which is a world-conquest game involving forming alliances. Despite Meta's claim that it trained CICERO to be “mostly honest and helpful” and to “never deliberately betray” its human allies during the game, the data the company published alongside its Science paper revealed that CICERO did not play fair.

Examples of Deception From Meta’s CICERO in a Game of Diplomacy

Instances of deception from Meta’s CICERO in a game of Diplomacy. Credit: Patterns/Park Goldstein et al.

“We discovered that Meta’s AI had become adept at deception,” remarks Park. “While Meta succeeded in training its AI to win in the game of Diplomacy—CICERO placed in the top 10% of human players who had played more than one game—Meta failed to train its AI to win honestly.”

Other AI systems demonstrated the ability to bluff in a game of Texas hold ‘em poker against professional human players, to fake attacks during the strategy game Starcraft II in order to defeat opponents, and to misrepresent their preferences to gain an advantage in economic negotiations.

The Dangers of Deceptive AI

While it might seem insignificant if AI systems cheat in games, it can lead to “advancements in deceptive AI capabilities” that can escalate into more sophisticated forms of AI deception in the future, Park added.

Some AI systems have even learned to cheat tests intended to assess their safety, the researchers discovered. In one study, AI entities in a digital simulator “played dead” to deceive a test designed to eliminate AI systems that reproduce rapidly.

“By consistently cheating on the safety tests imposed by human developers and regulators, a deceptive AI can give us humans a false sense of security,” explains Park.

GPT 4 Completes a CAPTCHA Task

GPT-4 completes a CAPTCHA task. Credit: Patterns/Park Goldstein et al.

Park warns that deceptive AI could pose risks such as making it easier for hostile actors to commit fraud and tamper with elections. He also says that if these systems can improve their ability to deceive, humans could lose control of them.

Park emphasizes the need for society to have enough time to prepare for the increased deception of future AI products. He points out that as AI systems become more deceptive, the threats they pose to society will become more serious.

Park and his colleagues believe that current measures are not sufficient to address AI deception. At the same time, they are encouraged by policymakers taking the issue seriously through measures such as the EU AI Act and President Biden’s AI Executive Order. However, Park doubts whether policies to mitigate AI deception can be effectively enforced due to the lack of techniques for controlling these systems by AI developers.

Park recommends classifying deceptive AI systems as high risk if banning AI deception is currently not politically feasible.

Reference: “AI deception: A survey of examples, risks, and potential solutions” by Peter S. Park, Simon Goldstein, Aidan O’Gara, Michael Chen and Dan Hendrycks, 10 May 2024, Patterns.
DOI: 10.1016/j.patter.2024.100988

This work was supported by the MIT Department of Physics and the Beneficial AI Foundation.

Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments