{"id":24333,"date":"2025-05-14T21:50:04","date_gmt":"2025-05-14T21:50:04","guid":{"rendered":"https:\/\/www.tun.com\/home\/?p=24333"},"modified":"2025-05-14T21:50:05","modified_gmt":"2025-05-14T21:50:05","slug":"leading-chatbots-often-exaggerate-scientific-findings-new-study","status":"publish","type":"post","link":"https:\/\/www.tun.com\/home\/leading-chatbots-often-exaggerate-scientific-findings-new-study\/","title":{"rendered":"Leading Chatbots Often Exaggerate Scientific Findings: New Study"},"content":{"rendered":"\n<div class=\"wp-block-uagb-blockquote uagb-block-e7eb3fc3 uagb-blockquote__skin-border uagb-blockquote__stack-img-none\"><blockquote class=\"uagb-blockquote\"><div class=\"uagb-blockquote__content\">Researchers reveal that prominent chatbots tend to exaggerate scientific conclusions, with accuracy prompts surprisingly leading to more overgeneralizations. The findings stress the need for vigilant use of AI in scientific communication.<\/div><footer><div class=\"uagb-blockquote__author-wrap uagb-blockquote__author-at-left\"><\/div><\/footer><\/blockquote><\/div>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-group is-content-justification-space-between is-nowrap is-layout-flex wp-container-core-group-is-layout-0dfbf163 wp-block-group-is-layout-flex\"><div style=\"font-size:16px;\" class=\"has-text-align-left wp-block-post-author\"><div class=\"wp-block-post-author__content\"><p class=\"wp-block-post-author__name\">The University Network<\/p><\/div><\/div>\n\n\n<div class=\"wp-block-uagb-social-share uagb-social-share__outer-wrap uagb-social-share__layout-horizontal uagb-block-ee584a31\">\n<div class=\"wp-block-uagb-social-share-child uagb-ss-repeater uagb-ss__wrapper uagb-block-ec619ce7\"><span class=\"uagb-ss__link\" data-href=\"https:\/\/www.facebook.com\/sharer.php?u=\" tabindex=\"0\" role=\"button\" aria-label=\"facebook\"><span class=\"uagb-ss__source-wrap\"><span class=\"uagb-ss__source-icon\"><svg xmlns=\"https:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\"><path d=\"M504 256C504 119 393 8 256 8S8 119 8 256c0 123.8 90.69 226.4 209.3 245V327.7h-63V256h63v-54.64c0-62.15 37-96.48 93.67-96.48 27.14 0 55.52 4.84 55.52 4.84v61h-31.28c-30.8 0-40.41 19.12-40.41 38.73V256h68.78l-11 71.69h-57.78V501C413.3 482.4 504 379.8 504 256z\"><\/path><\/svg><\/span><\/span><\/span><\/div>\n\n\n\n<div class=\"wp-block-uagb-social-share-child uagb-ss-repeater uagb-ss__wrapper uagb-block-32d99934\"><span class=\"uagb-ss__link\" data-href=\"https:\/\/twitter.com\/share?url=\" tabindex=\"0\" role=\"button\" aria-label=\"twitter\"><span class=\"uagb-ss__source-wrap\"><span class=\"uagb-ss__source-icon\"><svg xmlns=\"https:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\"><path d=\"M389.2 48h70.6L305.6 224.2 487 464H345L233.7 318.6 106.5 464H35.8L200.7 275.5 26.8 48H172.4L272.9 180.9 389.2 48zM364.4 421.8h39.1L151.1 88h-42L364.4 421.8z\"><\/path><\/svg><\/span><\/span><\/span><\/div>\n\n\n\n<div class=\"wp-block-uagb-social-share-child uagb-ss-repeater uagb-ss__wrapper uagb-block-1d136f14\"><span class=\"uagb-ss__link\" data-href=\"https:\/\/www.linkedin.com\/shareArticle?url=\" tabindex=\"0\" role=\"button\" aria-label=\"linkedin\"><span class=\"uagb-ss__source-wrap\"><span class=\"uagb-ss__source-icon\"><svg xmlns=\"https:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 448 512\"><path d=\"M416 32H31.9C14.3 32 0 46.5 0 64.3v383.4C0 465.5 14.3 480 31.9 480H416c17.6 0 32-14.5 32-32.3V64.3c0-17.8-14.4-32.3-32-32.3zM135.4 416H69V202.2h66.5V416zm-33.2-243c-21.3 0-38.5-17.3-38.5-38.5S80.9 96 102.2 96c21.2 0 38.5 17.3 38.5 38.5 0 21.3-17.2 38.5-38.5 38.5zm282.1 243h-66.4V312c0-24.8-.5-56.7-34.5-56.7-34.6 0-39.9 27-39.9 54.9V416h-66.4V202.2h63.7v29.2h.9c8.9-16.8 30.6-34.5 62.9-34.5 67.2 0 79.7 44.3 79.7 101.9V416z\"><\/path><\/svg><\/span><\/span><\/span><\/div>\n<\/div>\n<\/div>\n<\/div><\/div>\n\n\n\n<p>Leading chatbots, including ChatGPT and DeepSeek, often misrepresent scientific findings by exaggerating conclusions in up to 73% of cases, according to new research. The study, conducted by Uwe Peters from Utrecht University and Benjamin Chin-Yee from Western University in Canada and the University of Cambridge in the UK, highlights significant accuracy issues in AI-generated science summaries.<\/p>\n\n\n\n<p>The researchers tested 10 of the most prominent large language models (LLMs), including ChatGPT, DeepSeek, Claude and LLaMA, analyzing nearly 5,000 summaries of research articles from prestigious scientific journals, such as Nature, Science and Lancet. <\/p>\n\n\n\n<p>They discovered that six out of 10 models consistently stretched the conclusions of original texts, often transforming cautious, study-specific language into misleading, sweeping statements.<\/p>\n\n\n\n<p>\u201cStudents, researchers and policymakers may assume that if they ask ChatGPT to avoid inaccuracies, they\u2019ll get a more reliable summary. Our findings prove the opposite,\u201d Peters said in a news release.<\/p>\n\n\n\n<p>Interestingly, efforts to counteract these inaccuracies by prompting the models for accuracy had the opposite effect. When explicitly asked to avoid inaccuracies, the models were almost twice as likely to produce overgeneralized conclusions than when given unprompted summary tasks.<\/p>\n\n\n\n<p><a href=\"https:\/\/royalsocietypublishing.org\/doi\/10.1098\/rsos.241776\" target=\"_blank\" rel=\"noopener\" title=\"\">Published<\/a> in Royal Society Open Science, the study underscores a concerning trend: newer AI models, such as ChatGPT-4o and DeepSeek, performed worse in terms of accuracy compared to their older counterparts. This poses additional risks in scientific communication, where precision is critical.<\/p>\n\n\n\n<p>The researchers compared the AI-generated summaries to those written by humans. Notably, chatbots were nearly five times more likely to produce broad generalizations than human writers. <\/p>\n\n\n\n<p>\u201cWorse still, overall, newer AI models, like ChatGPT-4o and DeepSeek, performed worse than older ones,\u201d added Peters.<\/p>\n\n\n\n<p>The issue stems from the fact that overgeneralizations are prevalent in human scientific writing, which the AI models are trained on, Chin-Yee explained. <\/p>\n\n\n\n<p>Additionally, human users\u2019 preferences for clear and broadly applicable language might lead the models to overgeneralize during their training process.<\/p>\n\n\n\n<p>To mitigate these risks, the researchers recommend using LLMs like Claude, which demonstrated the highest accuracy, and adjusting settings to reduce a chatbot\u2019s &#8220;temperature,&#8221; a parameter that controls its creativity. They also advocate for prompts that enforce indirect, past-tense reporting in summaries.\u00a0<\/p>\n\n\n\n<p>\u201cIf we want AI to support science literacy rather than undermine it, we need more vigilance and testing of LLMs in science communication contexts,\u201d Peters added.<\/p>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><strong>Source:<\/strong> <a href=\"https:\/\/www.uu.nl\/en\/news\/most-leading-chatbots-routinely-exaggerate-science-findings\" target=\"_blank\" rel=\"noopener\" title=\"\">Utrecht University<\/a>\u00a0<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Leading chatbots, including ChatGPT and DeepSeek, often misrepresent scientific findings by exaggerating conclusions in up to 73% of cases, according to new research. The study, conducted by Uwe Peters from Utrecht University and Benjamin Chin-Yee from Western University in Canada and the University of Cambridge in the UK, highlights significant accuracy issues in AI-generated science [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"single-no-separators","format":"standard","meta":{"_acf_changed":false,"_uag_custom_page_level_css":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[8],"tags":[188,344,254],"class_list":["post-24333","post","type-post","status-publish","format-standard","hentry","category-ai","tag-university-of-cambridge","tag-utrecht-university","tag-western-university"],"acf":[],"aioseo_notices":[],"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false},"uagb_author_info":{"display_name":"The University Network","author_link":"https:\/\/www.tun.com\/home\/author\/funky_junkie\/"},"uagb_comment_info":0,"uagb_excerpt":"Leading chatbots, including ChatGPT and DeepSeek, often misrepresent scientific findings by exaggerating conclusions in up to 73% of cases, according to new research. The study, conducted by Uwe Peters from Utrecht University and Benjamin Chin-Yee from Western University in Canada and the University of Cambridge in the UK, highlights significant accuracy issues in AI-generated science&hellip;","_links":{"self":[{"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/posts\/24333","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/comments?post=24333"}],"version-history":[{"count":6,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/posts\/24333\/revisions"}],"predecessor-version":[{"id":24415,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/posts\/24333\/revisions\/24415"}],"wp:attachment":[{"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/media?parent=24333"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/categories?post=24333"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/tags?post=24333"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}