The Lean Chinese Room

Compression, Speed, and the Blurred Line of Comprehension

Aug 25, 2025

The Chinese Room thought experiment is a go-to motivation for AI skeptics. It goes like this. Imagine a man locked in a room with a giant rulebook written in English. Slips of paper with Chinese characters are passed into the room. The man, who knows no Chinese, looks up the symbols in his rulebook, follows the instructions to the letter, and produces output characters that he passes back outside. To outsiders, the responses look fluent, even intelligent. But inside, the man is just shuffling symbols according to syntactic rules. There is no genuine understanding, no semantics, only blind symbol manipulation. (The original, by John Searle, is quite short; read it here.)

Criticisms of this thought experiment are plentiful (I actually highly recommend the Stanford Encyclopedia of Philosophy’s page on this). Here’s a quick summary: some argue that Searle mislocates understanding: while the man may not know Chinese, the system as a whole (man, books, and memory) might (the “Systems Reply”). Others suggest that running the program could generate a virtual mind distinct from the operator, which could be the true bearer of understanding (the “Virtual Mind Reply”). Embodiment theorists counter that meaning requires interaction with the world: put the program in a robot with sensors and effectors, and the symbols might gain genuine semantic grounding (the “Robot Reply”). Connectionists argue that if a program simulated the actual firing of neurons in a Chinese speaker’s brain, it would thereby instantiate understanding (the “Brain Simulator Reply”). Still others note that we attribute knowledge to other humans solely based on behavior; if a machine behaves equivalently, why not extend the same presumption (the “Other Minds Reply”)?

But what caught my eye are the arguments that question the Chinese Room argument due to its utter inefficiency. Namely, Steven Pinker in his 1997 book “How the Mind Works” writes:

‘The thought experiment slows down the waves to a range to which we humans no longer see them as light…. Similarly, Searle has slowed down the mental computations to a range in which we humans no longer think of it as understanding (since understanding is ordinarily much faster).’

Similarly, in his 1980s paper "Fast Thinking" (I actually couldn’t find the original paper referenced in SEP), philosopher Daniel Dennett argues that the speed of processing is essential to any practical definition of intelligence:

If you can’t figure out the relevant portions of the changing environment fast enough to fend for yourself, you are not practically intelligent, however complex you are”

Another insightful paper by Daniel A. Wilkenfeld posits that “Understanding as compression,” he writes:

Understanding is a matter of compressing information about the understood so that it can be mentally useful. On this account, understanding amounts to having a representational kernel and the ability to use it to generate the information one needs regarding the target phenomenon.

Reading this literature got me thinking of a “counter” thought experiment. Which I call “The Lean Chinese Room.” Imagine we depart from the exact room envisioned by Searle, but now, we start making things more complicated for the poor man who is trapped inside. Each day, we impose stricter conditions on him to generate his response.

First, each day, the translation manuals are shortened. They still contain enough information for the man to generate the correct outputs, but in increasingly compressed form—abbreviated tables instead of exhaustive ones, general rules instead of endless lists of examples. The man is now forced to rely on compact kernels of information from which the rest can be reconstructed, rather than simply following brute-force, step-by-step recipes.
Second, there is now a timer in the room, which starts ticking the moment a translation request is slipped under the door. The man must respond before the bell rings. No longer can he spend hours leafing through massive binders to find the right match. He must become efficient, quick, and resourceful, producing his answers under the constraint of speed.

My thought experiment is: is there any moment at which the man must “understand” to be able to fulfill his duty? At first, perhaps not. He can lean on the manuals and plod along, however slowly. But as the manuals shrink and the timer grows stricter, the burden shifts. To keep producing fluent answers, he must internalize shortcuts, general rules, and patterns that go beyond rote symbol-shuffling. At some point, the line between following instructions and genuine grasp blurs. If he can reconstruct meaning from compact cues and respond with speed and flexibility, we may be hard-pressed to deny that some form of understanding has emerged.

What, then, is he actually understanding? Perhaps not “Chinese” in the complete sense of an embodied speaker—he still has no taste of a hamburger or memory of Beijing. But he is no longer just pushing symbols blindly either. He has begun to understand the system itself: the patterns, abstractions, and compressed rules that allow Chinese to function as a communicative code. In other words, his performance reflects a structural or formal understanding, even if it is not the lived, grounded understanding Searle insists on. And suppose that structural understanding is enough to generate fluid, timely, context-sensitive responses. In that case, the boundary between mere manipulation and genuine comprehension becomes less clear than the original thought experiment suggests. The Lean Chinese Room thus complicates the stark form/meaning divide: under the pressures of compression and speed, “understanding” may emerge in degrees, not as an all-or-nothing property.

Okay, but maybe we strayed too far from the opening sentence, where I said “The Chinese Room thought experiment is a go-to motivation for AI skeptics.” Let me rewind. Emily Bender and Alexander Koller (2020) argue that large language models, trained purely on predicting sequences of words, cannot, in principle, acquire meaning. Their central claim is that meaning is the relation between linguistic form and communicative intent, grounded in the world. Models like BERT or GPT, in their view, only ever learn statistical regularities over form, no matter how impressive their outputs appear. They illustrate this with their now-famous “octopus test”: a clever octopus that eavesdrops on human conversations might learn to predict how speakers reply, but lacking any access to the world those utterances describe, it could not genuinely understand what the words mean. In short, language models may mimic the surface of understanding, but because they are trained on form alone, they lack the crucial grounding that ties symbols to the world. In many ways, this is a restatement of the Chinese Room argument in contemporary form. Searle’s man shuffles symbols without knowing what they mean; Bender and Koller’s octopus predicts responses without ever grasping the world those responses refer to. Both thought experiments drive at the same intuition: fluency with form is not the same as grasp of meaning.

The Lean Chinese Room complicates this stark form/meaning divide. Like Bender and Koller’s octopus, the man in the room begins in a position of pure form manipulation. But as the manuals shrink and the timer ticks, his strategies must change: he is forced into compression, pattern extraction, and fast deployment. In other words, his behavior begins to look less like rote symbol-pushing and more like the internalization of structural knowledge. He still may not “know what a hamburger tastes like,” but he has acquired a functional, compressed kernel that lets him reconstruct meaning-like responses on demand. This suggests that understanding might not be a binary property requiring full world-grounding, but something that can emerge in degrees from the interplay of form, compression, and efficiency.

If that is right, then Bender and Koller’s sharp boundary between “form-only” systems and systems that achieve meaning may be too rigid. The Lean Chinese Room thought experiment shows how, under constraints of compression and speed, purely formal manipulation can drift into something we are tempted to call understanding, at least in the structural or functional sense. This does not refute the need for grounding altogether, but it weakens the claim that systems trained only on form are categorically barred from any understanding. Instead, it raises the possibility that there are layers of understanding: structural understanding rooted in form, and embodied understanding rooted in the world. Large language models may not have the latter, but they might well be on the path toward the former.

Neha Balamurugan

Oct 21

I really enjoyed this piece! It made me think about an extension of the thought experiment:

What if the man in the room isn’t fed the original Chinese, but a translated version of it -- translated by another man in another room? That second man could represent a vision encoder, converting the world into embeddings before the language model ever sees it. The first man (the LLM) then reasons about those embeddings without ever accessing the original world.

To me, this feels close to how visual language models operate -- layers of translation mediating between perception and reasoning. It raises a question about “understanding in degrees”: if each layer compresses and abstracts meaning, how much of what we call social inference can survive that mediation?

Do you think these layered translations limit the possibility of genuine understanding in multimodal systems, or could emergent reasoning eventually bridge that gap?

Expand full comment

Shridhar Jayanthi

Aug 28

I'm a little stuck at the challenge to 'understanding' being "he doesn't know what a hamburger tastes like." I've never eaten headcheese, but I understand the concept of "headcheese," so much so that I'm disgusted by the idea of it and do not plan to ever eat it. The poor man cannot know what a hamburger tastes like because he never had access to any stimulus that would have allowed him to do that (i.e., he never saw or smelled or ate a hamburger). But I agree with you that if he was operating from a compressed translation manual, with associations, say "hamburger" is "food," he would eventually have an understanding that hamburger is a food in the same way "hot dog" is food. Is the proposition just that a machine that leans purely on text cannot know the "non-textual" qualities of an entity that he only knows through texts?

3 replies by Manoel Horta Ribeiro and others

3 more comments...

Doomscrolling Babel

Discussion about this post