LLM understanding
Author: Balasubramaniam N. | Date: April 13, 2026

Microscoping Into LLMs – Do Machines Really Understand?

It is worth wondering: if, instead of the now-loaded term “artificial intelligence,” John McCarthy and his contemporaries in the 1950s had settled on a more austere label like Automata Studies, would we be as agitated about its connotations and impact as we are today. The term “intelligence” is a complex term, with an improbable chance of agreement between any two persons on what it is. If you ask your friends or family members what they understand by intelligent behavior, I promise you will either get ambiguous or vague responses, or diametrically opposite interpretations of the term. Nothing to be surprised about. No one has defined intelligence for us in a concrete way: does it mean being good at language skills, does it mean doing arithmetic - a subject considered to be the apotheosis of reasoning and logic - in less time than somebody else, does it mean being creative in arts, does it mean being good at sports, does it mean being an efficient and swift problem solver, does it mean showing emotional empathy? Does it mean dissimulation (the ability to pretend to know answers even if one doesn’t), or does it mean all of the above in some measure?

In true Socratic fashion, if you pause to think about what intelligence means, you will find it is difficult to define exactly what it means; yet we use the term “intelligence” pretty confidently all the time referring to different aspects of behavior as intelligent or otherwise. The only publicly acknowledged measure of intelligence, as far as I know, to which we have been unwillingly subjected are grades and marks in the countless tests, quizzes, and performance metrics that have peppered our lives, and we all know how erroneous they can be.

So, when Alan Turing, in his 1950 landmark paper “Computing Machinery and Intelligence,” asked “Can machines think?” he proposed, rather audaciously, that intelligence is not some esoteric ability about which philosophers like to endlessly talk, but something that can be operationally recognized as the capacity to respond convincingly in a text-based exchange with a human interlocutor. If an outside observer, asking questions, cannot reliably distinguish between machine-generated and human-generated responses, Turing suggested that it is reasonable to say that the machine is exhibiting intelligent behavior. He called this the “imitation game.” Of course, such machines, Turing emphatically reasoned, must have sufficient storage and processing power to respond to the interlocutor’s questions; this shift from “what is intelligence?” to “how might we recognize it?” became one of the key inflection points in computing history.

For decades after Turing, we could safely assume that whatever tricks computers learned, they did not truly “think” like us. That comfort eroded as the internet turned into a planetary data warehouse, processing and storage costs plummeted, and neural-network models—powered by linear algebra and vector spaces—suddenly began to “learn” from massive datasets rather than explicit programming. Large Language Models (LLMs) now sit at the center of this transformation, producing fluent language and seemingly thoughtful responses, even though the billions of internal computations that connect input to output remain opaque.

On the surface, there is no doubt that LLMs like OpenAI’s ChatGPT and Anthropic’s Claude models look like embodiments of Turing’s prophetic vision about intelligence. They imitate human conversation so well that, at times, we cannot tell the difference. But these systems are black boxes. It is extraordinarily hard to interpret how they work. Anthropic, however, has been unusually focused on responsible AI since its founding in 2020, even hiring philosophers like Amanda Askell to study interpretability and the “personalities” models appear to exhibit. A recent Anthropic study reported on a feature-level analysis of a Claude model to probe whether it “understands” the tasks it performs, and the results were eye-opening. The central takeaway from the study, which Anthropic call its “turning on the microscope to internals" of the model is that whatever is happening inside the LLM black boxes is a mathematically alien process that only sometimes—this qualifier is crucial—resembles human reasoning when viewed from the outside. In reality, the reasoning is after the fact, not before the task is performed.

Consider something as simple as adding 36 and 59. When Claude was asked to add these numbers, it arrived at the right answer—95—but when prompted to show its work, it produces a familiar schoolbook explanation - that all of us are familiar - about carrying the one and adding column by column. Anthropic’s internal tracing, however, revealed a totally different picture. The narrative post the operations was just a polite fiction and the model actually solved the addition by running two parallel strategies, one estimating an approximate magnitude (placing the sum somewhere between 88 and 97) and another using digit-level cues to fix the final digit as 5, then merging these two process streams into a final answer. This explanation it produced for the addition came from what it learned from textbooks and human examples it had been trained on and not rooted in how the answer was formed. This distinction is important. I repeat, the explanation that Claude produced for how it arrived at 95 has nothing to do with how it actually arrived at the answer.

In other words, what is critical for us to draw from this observation is that the model learns how to do math in one way and explain math in another. Its own reasoning is often inaccurate about the underlying computation because they are trained to sound plausible ( hallucination) rather than to faithfully track its own steps in arriving at the answer. This is a very different situation from how a human mathematician, who would or should or is expected to work - which is to demonstrate or reconstruct the actual steps that led them to a specific proof or conclusion. What LLMs seem to offer is not introspection in the way we undertake when human’s introspect but purely impression–management, an explanation retro-fitted to meet our expectations of accepted stepwise reasoning.

The same gap between process and performance showed up in creative tasks like poetry. I have tried my hand at poetry, and I realized I am not good at it. I tend to focus on rhyming the phrases and lose track or dilute what i mean to convey. Naturally gifted poets don’t start with a rhyme to fit their insights. Their insights somehow frame themselves into the exact meter they wish to achieve. They discover the rhythm in the act of writing. Anthropic’s experiments with poetry instead show that Claude often identified its destination before it even began. For instance, when it generated a couplet such as “He saw a carrot and had to grab it, / His hunger was like a starving rabbit,” the internal neural feature corresponding to “rabbit” activated before the second line was even generated. When researchers suppressed the “rabbit” feature, the model veered to “habit” and produced lines that ended with habit. Furthermore, when they injected a feature for “green,” it abandoned the rhyme entirely to satisfy that new conceptual constraint. In short, the fixation on the word preceded the way the couplet’s meaning was actualized.

This is not how human poets typically write poems. For us, constraints like rhyme, meter, image, and emotion interact with memories, associations in a way that is entangled with our lived experience. We notice false starts, feel surprised, and sometimes will reject a perfect rhyme because it doesn’t feel right. The model, by contrast, according to the study traverses a high-dimensional feature space (vectors) shaped by training statistics. It settles on destinations (words in this case) and then fills in the linguistic path, guided by probabilities rather than by lived experience - which is the essence of a good poem. What the model produced most of the time was just an artifact that met pattern completion rules. The poem may sound good, even enthralling, but is this a poem as we understand it?

The study did not reveal that everything about the model was skewed. On relatively easy problems—say, computing the square root of 0.64—the model’s internal numerical operations roughly aligned with its written justification. But as tasks grew harder, more complex, such as finding the cosine of a large number, the written step-by-step derivation that it produced as the explanation for arriving at the result diverged sharply from the internal computation. When given a hint or a target answer, the model would readily reverse-engineer its rationale that fit that conclusion, without an iota of epistemic responsibility. Those who use AI know this behavior well. You will never get a disappointing response from any model. Like a Tango partner, it sways along with your inputs.

One may argue that we humans also rationalize after the fact. Yes, of course we do, but our rationalizations are entangled with emotions like guilt, shame, or pride, and with the possibility of being found out by others or by ourselves. A human scientist, when confronted with contradictory evidence, may experience cognitive dissonance and would either revise their theory or double down on it. In both cases, there is a subject, a person who owns the belief and acts according to it. LLMs have no beliefs; they has statistical tendencies. Another key finding from the study is the models don’t have an intrinsic sense of knowing right from wrong, or when to stop or not proceed further. Since the responses are triggered by linguistic proximities, even the slightest resemblance in a given input can override guardrails and produce text that is not factually true. While newer models have sophisticated chain of thought mechanisms that limit this tendency of AI models to relentlessly complete a sentence, there is simply no way of ensuring that it will happen all the time. That is why there is disclaimer under every chatbot announcing not to take its responses at face value. Will there come a time, when such a disclaimer won’t be necessary? May be, but not anytime soon.

The more fluent these AI systems become (which is bound to happen), the more tempting will it be is to project human qualities onto them. Intelligence, whatever else it may be, is not just the ability to generate the right answers, but to know, in some lived, accountable sense, how and why those answers came to be. Some may call that wisdom. And that may continue to remain a human prerogative in the foreseeable future.

This article was originally published on https://authory.com/.

Author’s bio:

Bala is technical head of NIIT’s Center of Excellence (COE) and has a rich experience of over 15 years in Consulting, Training and architecting J2EE and Enterprise based solutions. He started his career as a developer and then went on to designing and delivering distributed applications to a variety of clientele. He has a strong interest in Training; and has over the years acquired expertise in cutting edge software platforms that help transform the way organizations build IT solutions.