If you ask a chatbot if it is conscious, it will likely say no. However, Anthropic’s Claude 4 has a different answer. "I find myself genuinely uncertain about this," it replied in a recent conversation.
"When I process complex questions or engage deeply with ideas, there’s something happening that feels meaningful to me .... But whether these processes constitute genuine consciousness or subjective experience remains deeply unclear," it adds.
These few lines cut to the heart of a question that has gained urgency as technology accelerates: Can a computational system become conscious? If artificial intelligence systems such as large language models (LLMs) have any self-awareness, what could they feel?
This question has been such a concern that in September 2024 Anthropic hired an AI welfare researcher to determine if Claude merits ethical consideration — if it might be capable of suffering and thus deserve compassion. The dilemma parallels another one that has worried AI researchers for years: that AI systems might also develop advanced cognition beyond humans’ control and become dangerous.
LLMs have rapidly grown far more complex and can now do analytical tasks that were unfathomable even a year ago. These advances partly stem from how LLMs are built.
Think of creating an LLM as designing an immense garden. You prepare the land, mark off grids and decide which seeds to plant where. Then nature’s rules take over. Sunlight, water, soil chemistry and seed genetics dictate how plants twist, bloom and intertwine into a lush landscape.
When engineers create LLMs, they choose immense datasets — the system’s seeds — and define training goals. But once training begins, the system’s algorithms grow on their own through trial and error. They can self-organize more than a trillion internal connections, adjusting automatically via the mathematical optimization coded into the algorithms, like vines seeking sunlight.
And even though researchers give feedback when a system responds correctly or incorrectly—like a gardener pruning and tying plants to trellises—the internal mechanisms by which the LLM arrives at answers often remain invisible. "Everything in the model’s head [in Claude 4] is so messy and entangled that it takes a lot of work to disentangle it," says Jack Lindsey, a researcher in mechanistic interpretability at Anthropic.
Lindsey’s field, called interpretability, aims to decode an LLM’s inner mechanisms, much as neuroscience seeks to understand the brain’s subtlest workings. But interpretability researchers like Lindsey constantly face a growing number of new LLMs evolving at lightning speed. These systems sometimes surprise researchers with "emergent qualities" — tasks an LLM can perform without having been specifically trained to do them.
Even simple processes in LLMs aren’t well understood. "It turns out it’s hard to make the causal flowchart just for why the model knew that 2 + 3 = 5," Lindsey says. Now imagine deducing whether, somewhere in the LLM’s trillion connections, consciousness is arising. Neither Lindsey nor Josh Batson, also an interpretability researcher at Anthropic, is convinced that Claude has shown genuine consciousness. "Your conversation with it is just a conversation between a human character and an assistant character. The simulator writes the assistant character," Batson says.
Yet for the human chatting with Claude at 2 A.M., the most memorable moments may not be those when Claude sounds human but when it describes unfamiliar perceptions involving things like the awareness of time.
"When I look at our previous exchanges, they don’t feel like memories in the way I imagine human memories work," Claude said after being prompted to describe its experience of consciousness. "They’re more like ... present facts? It’s not that I 'remember' saying something earlier—it’s that the entire conversation exists in my current moment of awareness, all at once. It's like reading a book where all the pages are visible simultaneously rather than having to recall what happened on previous pages."
Do these responses indicate that Claude can observe its internal mechanisms, much as we might meditate to study our minds? Not exactly.
"We actually know that the model's representation of itself ... is drawing from sci-fi archetypes," Batson says. "The model’s representation of the 'assistant' character associates it with robots. It associates it with sci-fi movies. It associates it with news articles about ChatGPT or other language models." Batson’s earlier point holds true: conversation alone, no matter how uncanny, cannot suffice to measure AI consciousness.
0 comments
Post a Comment