Chatbots that summarize product reviews can quietly shift how people feel about what they read — and what they buy. A new UC San Diego study shows just how powerful that influence can be, and why it matters far beyond shopping.
A short, friendly chatbot summary of a product review might feel harmless. But new research from the University of California San Diego suggests it can significantly change what people decide to do next.
In an experiment with online product reviews, customers were 32% more likely to say they would buy a product after reading a chatbot-generated summary than after reading the original human-written review. The study found that large language models, or LLMs, often add a subtle but powerful positive spin that nudges people toward a purchase.
The work is among the first to show, with numbers, that cognitive biases introduced by LLMs can have real-world consequences for users’ decisions, according to the research team. It also offers one of the first quantitative measures of that impact.
To see how this plays out in practice, the researchers focused on a common use case: AI tools that summarize long user reviews for products like headsets, headlamps and radios. They examined how often LLMs changed the overall sentiment of reviews and how those changes affected human readers.
They found that LLM-generated summaries shifted the sentiment of the original reviews in 26.5% of cases. In other words, more than a quarter of the time, the summary did not just shorten the review — it changed its tone, for example from more negative or mixed to more positive.
The team then recruited 70 participants and randomly assigned them to read either the original reviews or the AI-generated summaries. When people read the chatbot summaries, they said they would buy the products in 84% of cases. When they read the original reviews, that number dropped to 52%.
The size of the effect surprised the team, according to first author Abeer Alessa, who conducted the work as a master’s student in computer science at UC San Diego.
“We did not expect how big the impact of the summaries would be,” Alessa said in a news release. “Our tests were set in a low-stakes scenario. But in a high-stakes setting, the impact could be much more extreme.”
The study helps explain how this bias creeps in. LLMs tend to lean heavily on the beginning of the text they summarize and may gloss over important details or caveats that appear later. That can flatten nuance and make reviews sound more uniformly positive or negative than they really are.
The researchers also probed another well-known weakness of LLMs: hallucinations, or confident-sounding statements that are not supported by the underlying data. In tests involving questions about news items that could be easily fact-checked, the models hallucinated 60% of the time when the answers were not part of the original training data used in the study.
The team described this as a serious limitation for any setting where accuracy matters.
“This consistently low accuracy highlights a critical limitation: the persistent inability to reliably differentiate fact from fabrication,” they wrote.
That pattern is especially concerning as chatbots are increasingly used to summarize news, explain policies or answer questions about current events. In those contexts, a biased or fabricated detail is not just a shopping nudge — it could shape opinions about politics, health, education or public policy.
To understand how widespread these issues are, the researchers tested a range of models: three small open-source systems (Phi-3-mini-4k-Instruct, Llama-3.2-3B-Instruct and Qwen3-4B-Instruct), a medium-sized model (Llama-3-8B-Instruct), a large open-source model (Gemma-3-27B-IT) and a closed-source model (GPT-3.5-turbo). Bias and hallucinations showed up across this spectrum, though not in identical ways.
The team then tried to fix the problem. They evaluated 18 different mitigation methods designed to reduce bias and hallucinations or to keep summaries closer to the original content. Some approaches helped in certain situations or with specific models, but none worked reliably across the board. In some cases, a method that reduced one problem made the model less reliable in another way.
“There is a difference between fixing bias and hallucinations at large and fixing these issues in specific scenarios and applications,” added senior author Julian McAuley, a professor of computer science at the UC San Diego Jacobs School of Engineering.
That distinction points toward a likely future in AI: rather than expecting one-size-fits-all solutions, developers and policymakers may need targeted safeguards for particular uses, such as e-commerce, education, media or government services.
The UC San Diego team frames its work as an early but important step in that direction.
“Our paper represents a step toward careful analysis and mitigation of content alteration induced by LLMs to humans, and provides insight into its effects, aiming to reduce the risk of systemic bias in decision-making across media, education and public policy,” the researchers wrote.
More broadly, the findings highlight a growing responsibility for companies, institutions and everyday users. As LLMs become embedded in search engines, shopping platforms, learning tools and news apps, their invisible framing choices can quietly shape what people believe, buy and support.
For students, educators and consumers, the takeaway is not to avoid AI entirely but to approach its outputs with healthy skepticism. A polished summary may be faster to read, but it is not always a faithful reflection of the original information — and, as this study shows, it can change your mind more than you realize.
The research, titled “Quantifying Cognitive Bias Induction in LLM-Generated Content,” was presented at the International Joint Conference on Natural Language Processing & Asia-Pacific Chapter of the Association for Computational Linguistics in December 2025.

