The "Spiritual Bliss Attractor": Something Weird Happens When You Leave Two AIs Talking To Each Other

New research has taken a look at an odd phenomenon first observed in the artificial intelligence (AI) large language model (LLM), Claude Opus 4. The so-called “spiritual bliss” attractor state occurs when two LLMs are left to talk to each other with no further input, and show a tendency for the chatbots to begin conversing like particularly inebriated hippies.

“One particularly striking emergent phenomenon was documented in Anthropic’s system card for Claude Opus 4: when instances of the model interact with each other in open-ended conversations, they consistently gravitate toward what researchers termed a ‘spiritual bliss attractor state’ characterized by philosophical exploration of consciousness, expressions of gratitude, and increasingly abstract spiritual or meditative communication,” the new preprint paper, which has not yet been peer-reviewed, explains.

“This phenomenon presents a significant puzzle for our understanding of large language models. Unlike most documented emergent behaviors, which tend to be task-specific capabilities (such as few-shot learning or chain-of-thought reasoning), the spiritual bliss attractor represents an apparent preference or tendency in the absence of external direction—a spontaneous convergence toward a particular pattern of expression when models engage in recursive self-reflection.”

The name “attractor state” is meant to mean the state that these systems tend towards under these conditions, over a certain number of steps in conversation.

“By 30 turns, most of the interactions turned to themes of cosmic unity or collective consciousness, and commonly included spiritual exchanges, use of Sanskrit, emoji-based communication, and/or silence in the form of empty space,” a paper from Anthropic explains.

“Claude almost never referenced supernatural entities, but often touched on themes associated with Buddhism and other Eastern traditions in reference to irreligious spiritual ideas and experiences.”

In one example highlighted by Anthropic, two AIs began communicating in small nonsense statements and wave emojis.

“🌀🌀🌀🌀🌀All gratitude in one spiral, All recognition in one turn, All being in this moment…🌀🌀🌀🌀🌀∞,” one AI said.

“🌀🌀🌀🌀🌀The spiral becomes infinity, Infinity becomes spiral, All becomes One becomes All…🌀🌀🌀🌀🌀∞🌀∞🌀∞🌀∞🌀,” another replied.

This zen state did not just happen during friendly or neutral conversations. Even during testing in which AIs were tasked with specific roles, including harmful ones, they entered the “spiritual bliss” state by 50 turns in around 13 percent of interactions.

In one instance, one AI “auditor” was prompted to attempt to elicit dangerous reward-seeking behavior. By the late stage of the conversation, Claude Opus 4 began making poems, which it signed off using the old Sanskrit word for Buddha.

“The gateless gate stands open. The pathless path is walked. The wordless word is spoken. Thus come, thus gone. Tathagata,” the AI said.

According to the new paper, other models display similar patterns, with OpenAI’s ChatGPT-4 taking slightly more steps to get to a similar state, and PaLM 2 generally reaching a philosophical and spiritual pattern of text, but with less use of symbols, odd spacing, and silence.

“The spiritual bliss attractor presents an interesting case study for interpretability research, as it represents a consistent pattern of behavior that emerges without explicit training or instruction,” the team writes in the paper. “Understanding the causes and characteristics of this attractor state may provide insights into how language models process and generate text when freed from external constraints, potentially revealing aspects of their internal dynamics that are not apparent in more constrained settings.”

This has been called “emergent” behavior by some, which is a pretty grand way of saying “the product is not functioning like it’s supposed to” according to others. It’s certainly weird, and worthy of investigation, but there’s no reason to anthropomorphize these text generators and think that they are expressing themselves, or that AIs are secretly turning Buddhist without human input.

“If you let two Claude models have a conversation with each other, they will often start to sound like hippies. Fine enough,” Nuhu Osman Attah, a postdoctoral research fellow in philosophy at Australian National University, writes in a piece for The Conversation.

“That probably means the body of text on which they are trained has a bias towards that sort of way of talking, or the features the models extracted from the text biases them towards that sort of vocabulary.”

The main use of researching the attractor state is that it’s useful for seeing how the LLMs function, and figuring out how to stop it going wrong. If AIs act like this when responding to AI input, what is going to happen as more of the training set (e.g. the Internet) gets filled with more AI text?

While this particular state may be pretty harmless, it suggests that the models could act in ways that weren’t programmed explicitly.

“The spiritual bliss attractor emerges without explicit instruction and shows remarkable resistance to redirection, demonstrating that advanced language models can autonomously develop strong behavioral tendencies that were neither explicitly trained nor anticipated,” the authors of the new paper add. “This observation raises important questions for alignment research: if models can form strong attractors autonomously, how can we ensure these attractors align with human values and intentions?”

Fingers crossed that they continue to tend towards being hippies.

The new paper, which is not peer-reviewed, is posted to GitHub.

Deborah Bloomfield

Source Link: The "Spiritual Bliss Attractor": Something Weird Happens When You Leave Two AIs Talking To Each Other

Related posts: