A new study uncovers how AI systems transition from relying on word positions to understanding word meanings with increased data, offering significant insights into the technology behind tools like ChatGPT and Gemini.
The language capabilities of modern artificial intelligence systems are nothing short of remarkable, enabling natural conversations with tools like ChatGPT and Gemini, almost on par with human interaction. However, the internal workings that drive these sophisticated interactions remain largely enigmatic.
A new study published in the Journal of Statistical Mechanics: Theory and Experiment (JSTAT) provides vital insights into this enigma.
The research reveals that AI systems initially rely on the position of words in a sentence when trained with small data sets. But as these systems are fed more data, they transition abruptly to interpreting words based on their meanings, a shift analogous to a phase transition in physical systems.
“To assess relationships between words, the network can use two strategies, one of which is to exploit the positions of words,” lead author Hugo Cui, a postdoctoral researcher at Harvard University, said in a news release. “This is the first strategy that spontaneously emerges when the network is trained. However, in our study, we observed that if training continues and the network receives enough data, at a certain point — once a threshold is crossed — the strategy abruptly shifts: the network starts relying on meaning instead.”
The study focuses on the self-attention mechanism, a fundamental component of transformer language models such as ChatGPT and Gemini. These models are designed to process sequences of data, excelling at understanding word relationships within a sequence.
Initially, these AI systems infer relationships based on word positions — identifying subjects, verbs and objects. But as training progresses, meaning takes precedence.
“When we designed this work, we simply wanted to study which strategies, or mix of strategies, the networks would adopt,” Cui added. “But what we found was somewhat surprising: below a certain threshold, the network relied exclusively on position, while above it, only on meaning.”
This shift, described by Cui as a phase transition, mirrors concepts from statistical physics. In physics, phase transitions describe changes in states of matter — like water turning into vapor. Similarly, AI neural networks, composed of numerous interconnected nodes, exhibit a collective behavior that can be understood through statistical methods.
“Understanding from a theoretical viewpoint that the strategy shift happens in this manner is important,” Cui explained. “Our networks are simplified compared to the complex models people interact with daily, but they can give us hints to begin to understand the conditions that cause a model to stabilize on one strategy or another. This theoretical knowledge could hopefully be used in the future to make the use of neural networks more efficient, and safer.”
Source: Sissa Medialab

