A Thought Experiment: Examining the Boundaries of Large Language Models Through Nietzsche

In recent months, the hottest topics in my social circles have been “embodied intelligence is coming,” “language models have hit their ceiling,” and “the next wave of technological revolution has moved beyond text.”

Being grounded in business, I don’t really care about what papers academia publishes today, nor am I trying to compete with anyone over GPU density.

But the performance of language models in enterprise applications has indeed gradually revealed “glimpses of a ceiling”: more and more fine-tuning, ever-expanding prompts, increasingly complex RAG pipelines… improvements are becoming painfully slow, while costs keep climbing.

This year has been jokingly dubbed the “Year of Agents” in the industry, and there’s even a saying circulating: “Engineering optimization is useless now; just wait for model upgrades.” But the feeling on the ground is that while model upgrades are still happening, the wow factor is diminishing, and leapfrog improvements are increasingly rare.

It was at this moment that I thought of Nietzsche. This man, who liked to package philosophy like dynamite, reminded us countless times: language is not the world itself; language is merely the shadow of the world. The more we believe in language, the easier it is to forget this.

So I want to try using Nietzsche’s perspective to conduct a thought experiment: Is the boundary of language models the boundary of language itself? And are the current new directions in AI exploration trying to break through the walls of language?

In On Truth and Lies in a Nonmoral Sense, Nietzsche said something that perfectly explains LLMs: Humans have lived in language for so long that they forget language is merely a metaphor, not reality itself. If we transplant this to today, it means: Large language models don’t “understand”; they are just dancing in the shadows.

Of course, the shadow dances beautifully, as if it understands. But the moment you ask it to perform a cross-shadow operation—like mapping language into real actions, real environments, real physics—it immediately starts to fall apart.

This isn’t because the model isn’t smart enough; it’s because the structure of language itself is constraining it. Language can only describe “experiences that have already been classified.” But the way the world operates is often far less tidy—friction, randomness, ambiguity, indescribable feelings, phenomena without names… language simply cannot fully capture them. No matter how powerful the model, it’s still interpolating within the dimension of language. No matter how magnificent this floor is built, the larger foundation beneath it—the real world—remains untouched.

Anyone working on business deployment knows a brutal truth: LLMs are great at writing but not so great at doing. They can generate reports but can’t guarantee data accuracy. They can provide plans but can’t ensure processes are executable. They can chat but can’t react to on-site complexity. On structured, controllable, verifiable tasks—bigger models don’t necessarily mean more stability. It’s like we’re viewing business processes through a “language filter.” Often, it makes processes smoother; but sometimes, that filter itself obscures the problems.

Nietzsche would remind us: “Your belief that language is truth is itself the greatest misunderstanding.” Conversely: if language isn’t everything, then language-based intelligence can’t be everything either.

Someone will immediately raise their hand: “Haven’t we already made large models ’take action’?” ReAct, Function Call, Tool-use, the ChatGPT plugin store, even AutoGPT—all of them translate “saying” into “doing.” Give the model a search API, and it can look up information on its own; give it an order-placing interface, and it can actually buy you a plane ticket.

It seems language models have grown arms. But look closely, and these “actions” all revolve around two steps: First, abstract the tool into a text description—function names, parameters, return values—all language. Second, the model still only makes probabilistic choices in language space about “which function to call next.” The actual work of turning screws, clicking mice, and executing physical processes is done by the encapsulated little worker bees outside.

In other words, the model is still standing behind a pane of glass, using a “language joystick” to remotely control the world. No matter how long the joystick gets, the glass remains.

The real-time friction, damping, and unexpected errors from the world are castrated by the API’s return values into a few lines of JSON, then sent back to the model. Errors are castrated, feedback is distorted, and the next round of decisions continues to float in the language layer. As long as an exception not covered by the API appears in the system—a delayed delivery, network packet loss, a robotic arm slipping—the entire chain breaks down.

So what ReAct and its ilk solve is “plugging the language model into the digital world’s socket,” not “letting the model grow its own skin to rub against the roughness of the real world.” No matter how many sockets there are, they can’t replace a physical body.

This brings us back to Nietzsche’s old saying: Language is just a metaphor. APIs are also metaphors, and even more cunning ones—they make developers mistakenly believe they’ve “grounded” the model, when in reality they’ve only connected it to a thin layer of semantic tinfoil that tears at the slightest touch.

The real challenge is the next step: directly encoding into the model’s parameters the causality of “if I drop a cup on the floor, it will shatter,” rather than “having read many sentences saying cups shatter.” The former requires a continuous action-sensation-consequence loop; the latter only needs text statistics. No matter how much text statistics expand, they can’t calculate the sharp temperature of broken glass shards.

Yann LeCun has reportedly decided to leave Meta soon to start a company focused on world models. This isn’t about bigger language models, but about intelligent systems with causality, time, physics, and action feedback—intelligence that breaks free from the “shadow of language.”

Recent releases like Google’s Genie 3, Tencent’s HunyuanWorld-Voyager, and Berkeley’s LWM are all attempting real-time interaction, environmental perception, and causal reasoning. It’s clear the industry is forming a consensus: the limit of language models is the limit of language itself. Continuing to scale up large models will yield sharply diminishing returns. In contrast, if we let models “do” rather than “say,” they suddenly gain a brand new coordinate system: feedback, trial and error, physical constraints, environmental complexity. This is something language models can never learn from text alone.

If Nietzsche were brought to today, he would likely say: “The greatness of language models lies in their having squeezed language dry.”

“The limitation of language models lies in their being able to squeeze only language.” This sentence can actually offer a very practical insight for R&D and product teams—treat LLMs as language systems, not world systems.

They excel at abstract thinking, text structuring, text generation, linear organization of reasoning, and turning chaotic information into clarity. But they are inherently weak at continuous real-world feedback, operational physical tasks, dynamic non-linguistic information, and exploring completely unknown domains.

When we assign them tasks, if the approach is “explain to me,” “summarize for me,” or “judge for me,” they perform like geniuses. But the moment the task becomes “act for me,” “trial and error for me,” or “create a feedback loop for me,” they immediately disconnect. This isn’t a model problem; it’s the ceiling of language.

If large language models are the Everest of linguistic intelligence, the next mountain should be called “world intelligence.”

World models will likely have several core elements: no longer relying solely on text—beginning to incorporate large amounts of perception, action, and physical control data; no longer making only probabilistic predictions—starting to build causal models that can explain, anticipate, and correct; no longer computing only between words—computing between “actions and consequences”; no longer outputting only sentences—directly outputting action strategies.

This sounds like science fiction, but it’s already beginning to happen. Future models won’t just write better papers; they’ll make robot movements more stable. They won’t just be better at reasoning; they’ll be better at surviving in the chaos of the real world.

This also means another boundary is emerging: language models won’t disappear, but they will become “a module within an intelligent system”—just as the mouth is always there, but humans don’t survive by mouth alone.

Nietzsche’s critique reminds us that language is powerful, but it has never been everything. Large language models have exhaustively explored the “dimension of language.” We’ve seen its brilliance, and we’ve seen its limits. What comes next is the leap “from language to the world”—embodiment, causality, action, feedback, the unpredictability of the real physical world. This is something language cannot describe, but intelligence must confront.

In the business world, this means: language models are no longer the complete solution for business problems, but an efficient subsystem. They solve the cognitive layer, but the execution layer, decision layer, and model-driven layer must be filled by something new. Language is like a beam of light, illuminating our way of understanding the world; but the world itself is far vaster than that beam.

If LLMs represent the limit of language, then the next step is to begin confronting the world itself once again.