Large language models (LLMs) can perform exceptionally well on various tasks, even without possessing a clear understanding of the world or its principles, as shown by recent research. This suggests that these models may encounter issues when faced with changes in the environment or the tasks they are meant to handle.
LLMs can accomplish remarkable feats, such as composing poetry or creating functional software, despite being designed primarily to predict the next word in a sentence.
Their ability to excel at certain tasks can give the impression that they are grasping fundamental concepts about reality.
However, a new study challenges this assumption. Researchers demonstrated that a widely used generative AI model can provide nearly flawless driving instructions in New York City, yet doesn’t actually have an accurate mental representation of the city’s layout.
Even though the model appeared to navigate quite effectively, its performance significantly dropped when the researchers modified street layouts and imposed detours.
Upon further analysis, the researchers discovered that the maps generated internally by the model included numerous fictitious streets twisting between the actual grid and linking distant intersections.
This raises critical concerns for the use of generative AI in real-world applications, as a model that seems to function well in a particular scenario may fail if conditions change even slightly.
“There’s hope that because LLMs can achieve such fantastic results in language, we can apply these tools across different scientific fields. However, determining if LLMs are truly capturing coherent models of the world is crucial if we aim to leverage these technologies for new discoveries,” remarks senior author Ashesh Rambachan, an assistant professor of economics and principal investigator at the MIT Laboratory for Information and Decision Systems (LIDS).
Rambachan’s research team includes lead author Keyon Vafa, a postdoctoral researcher at Harvard; Justin Y. Chen, a graduate student in electrical engineering and computer science at MIT; Jon Kleinberg, a university professor at Cornell; and Sendhil Mullainathan, an MIT professor in EECS and economics, also associated with LIDS. The findings will be presented at the upcoming Conference on Neural Information Processing Systems.
New metrics
The researchers concentrated on a specific type of generative AI model called a transformer, which underpins LLMs like GPT-4. These transformers are trained using vast amounts of linguistic data to predict the next token in a sequence.
However, merely assessing the correctness of their predictions does not suffice for scientists aiming to understand whether an LLM has developed a true world model, the researchers assert.
For instance, they discovered that a transformer could frequently predict acceptable moves in Connect 4 without having any grasp of the game rules.
To address this, the team devised two novel metrics to evaluate a transformer’s understanding of the world. Their assessments were centered on a category of problems referred to as deterministic finite automata (DFAs).
A DFA consists of a sequence of states or conditions, similar to passing through intersections to reach a destination, along with specific rules to guide the journey.
They selected two scenarios to frame as DFAs: navigating city streets in New York and playing the board game Othello.
“We needed test environments with clear world models. This allows us to rigorously analyze what it means to reconstruct that model,” clarifies Vafa.
The first metric they introduced, named sequence distinction, determines whether a model has formed a coherent world model by recognizing the differences between two distinct states, such as differing Othello boards. Transformers rely on sequences—ordered sets of data points—to generate outputs.
The second metric, sequence compression, posits that a transformer with a coherent world model should be able to identify two identical states, like two identical Othello boards, sharing the same sequence of potential next moves.
The team employed these metrics to evaluate two types of transformers: one trained on data derived from random sequences and another on data produced by following specific strategies.
Incoherent world models
To their surprise, the researchers found that the transformers that made random choices demonstrated more accurate world models, potentially due to exposure to a broader range of possible next steps during training.
“In Othello, if you observe two random players as opposed to top-tier competitors, you might witness the complete range of potential moves—even the poor choices that skilled players would avoid,” explains Vafa.
Despite the transformers consistently generating accurate navigation directions and valid Othello moves, the two metrics disclosed that only one showed a coherent world model for Othello strategies, and none were effective at developing a coherent model in the navigation example.
The researchers illustrated the consequences of this by introducing detours in their New York City map, which resulted in a complete failure of all navigation models.
“I was astonished by how rapidly performance declined when we implemented a detour. Just closing off 1 percent of the streets led to accuracy dropping from nearly 100 percent to 67 percent,” Vafa notes.
The maps generated by the models resembled an imagined version of New York City, with countless streets intertwined, overlapping the grid. They frequently exhibited absurd overpasses or streets with baffling orientations.
These findings indicate that transformers can perform remarkably well on certain tasks without comprehending the underlying rules. The researchers believe that for scientists to develop LLMs capable of accurately representing world models, a different methodology is required.
“Often, we observe these models displaying impressive capabilities and assume they must understand something about reality. We hope to encourage a more careful consideration of this question, rather than relying solely on our intuitions to answer it,” Rambachan concludes.
Looking ahead, the research team is interested in exploring a wider range of problems, especially those with partially known rules. They also intend to apply their evaluation metrics to practical, scientific challenges.
This research is supported, in part, by organizations such as the Harvard Data Science Initiative, the National Science Foundation Graduate Research Fellowship, the Vannevar Bush Faculty Fellowship, the Simons Collaboration grant, and a grant from the MacArthur Foundation.