{"id":6794,"date":"2024-10-04T15:29:03","date_gmt":"2024-10-04T15:29:03","guid":{"rendered":"https:\/\/www.tun.com\/home\/?p=6794"},"modified":"2024-10-16T20:32:26","modified_gmt":"2024-10-16T20:32:26","slug":"ai-models-reliability-falters-despite-advancements-new-study-finds","status":"publish","type":"post","link":"https:\/\/www.tun.com\/home\/ai-models-reliability-falters-despite-advancements-new-study-finds\/","title":{"rendered":"AI Models\u2019 Reliability Falters Despite Advancements, New Study Finds"},"content":{"rendered":"\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-uagb-blockquote uagb-block-e7eb3fc3 uagb-blockquote__skin-border uagb-blockquote__stack-img-none\"><blockquote class=\"uagb-blockquote\"><div class=\"uagb-blockquote__content\">A new comprehensive study highlights the increasing unreliability of advanced AI language models, exposing significant mismatches between their performance and human expectations. Researchers emphasize the need for fundamental changes in AI design and development.<\/div><footer><div class=\"uagb-blockquote__author-wrap uagb-blockquote__author-at-left\"><\/div><\/footer><\/blockquote><\/div>\n\n\n\n<div class=\"wp-block-group is-content-justification-space-between is-nowrap is-layout-flex wp-container-core-group-is-layout-0dfbf163 wp-block-group-is-layout-flex\"><div style=\"font-size:16px;\" class=\"has-text-align-left wp-block-post-author\"><div class=\"wp-block-post-author__content\"><p class=\"wp-block-post-author__name\">The University Network<\/p><\/div><\/div>\n\n\n<div class=\"wp-block-uagb-social-share uagb-social-share__outer-wrap uagb-social-share__layout-horizontal uagb-block-ee584a31\">\n<div class=\"wp-block-uagb-social-share-child uagb-ss-repeater uagb-ss__wrapper uagb-block-ec619ce7\"><span class=\"uagb-ss__link\" data-href=\"https:\/\/www.facebook.com\/sharer.php?u=\" tabindex=\"0\" role=\"button\" aria-label=\"facebook\"><span class=\"uagb-ss__source-wrap\"><span class=\"uagb-ss__source-icon\"><svg xmlns=\"https:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\"><path d=\"M504 256C504 119 393 8 256 8S8 119 8 256c0 123.8 90.69 226.4 209.3 245V327.7h-63V256h63v-54.64c0-62.15 37-96.48 93.67-96.48 27.14 0 55.52 4.84 55.52 4.84v61h-31.28c-30.8 0-40.41 19.12-40.41 38.73V256h68.78l-11 71.69h-57.78V501C413.3 482.4 504 379.8 504 256z\"><\/path><\/svg><\/span><\/span><\/span><\/div>\n\n\n\n<div class=\"wp-block-uagb-social-share-child uagb-ss-repeater uagb-ss__wrapper uagb-block-32d99934\"><span class=\"uagb-ss__link\" data-href=\"https:\/\/twitter.com\/share?url=\" tabindex=\"0\" role=\"button\" aria-label=\"twitter\"><span class=\"uagb-ss__source-wrap\"><span class=\"uagb-ss__source-icon\"><svg xmlns=\"https:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\"><path d=\"M389.2 48h70.6L305.6 224.2 487 464H345L233.7 318.6 106.5 464H35.8L200.7 275.5 26.8 48H172.4L272.9 180.9 389.2 48zM364.4 421.8h39.1L151.1 88h-42L364.4 421.8z\"><\/path><\/svg><\/span><\/span><\/span><\/div>\n\n\n\n<div class=\"wp-block-uagb-social-share-child uagb-ss-repeater uagb-ss__wrapper uagb-block-1d136f14\"><span class=\"uagb-ss__link\" data-href=\"https:\/\/www.linkedin.com\/shareArticle?url=\" tabindex=\"0\" role=\"button\" aria-label=\"linkedin\"><span class=\"uagb-ss__source-wrap\"><span class=\"uagb-ss__source-icon\"><svg xmlns=\"https:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 448 512\"><path d=\"M416 32H31.9C14.3 32 0 46.5 0 64.3v383.4C0 465.5 14.3 480 31.9 480H416c17.6 0 32-14.5 32-32.3V64.3c0-17.8-14.4-32.3-32-32.3zM135.4 416H69V202.2h66.5V416zm-33.2-243c-21.3 0-38.5-17.3-38.5-38.5S80.9 96 102.2 96c21.2 0 38.5 17.3 38.5 38.5 0 21.3-17.2 38.5-38.5 38.5zm282.1 243h-66.4V312c0-24.8-.5-56.7-34.5-56.7-34.6 0-39.9 27-39.9 54.9V416h-66.4V202.2h63.7v29.2h.9c8.9-16.8 30.6-34.5 62.9-34.5 67.2 0 79.7 44.3 79.7 101.9V416z\"><\/path><\/svg><\/span><\/span><\/span><\/div>\n<\/div>\n<\/div>\n<\/div><\/div>\n\n\n\n<p>A new study spearheaded by researchers from the Valencian Institute for Research in Artificial Intelligence (VRAIN) at the Polytechnic University of Valencia (UPV), the Valencian Graduate School and Research Network in Artificial Intelligence (ValgrAI) and the University of Cambridge has unveiled startling findings about the reliability of large language models.<\/p>\n\n\n\n<p>Recent advancements in AI, including models like OpenAI\u2019s GPT, Meta&#8217;s LLaMA and BLOOM, have captivated the world with their enhanced problem-solving abilities. However, the study&#8217;s results indicate that these models often stumble on simpler tasks, despite their proficiency with more complex ones.<\/p>\n\n\n\n<p>&#8220;Models can solve certain complex tasks in line with human abilities, but at the same time, they fail on simple tasks in the same domain. For example, they can solve several PhD-level mathematical problems. Still, they can get a simple addition wrong,&#8221; Jos\u00e9 Hern\u00e1ndez-Orallo, a researcher at VRAIN UPV and ValgrAI, said in a <a href=\"https:\/\/www.upv.es\/noticias-upv\/noticia-14817-los-grandes-mo-en.html\" title=\"\">news release<\/a>.<\/p>\n\n\n\n<p>The study investigated three critical aspects affecting the reliability of these models.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Task Difficulty Mismatch<\/h2>\n\n\n\n<p>The research, <a href=\"https:\/\/www.nature.com\/articles\/s41586-024-07930-y\" title=\"\">published<\/a> in the journal Nature, revealed a significant discordance between the tasks humans find difficult and the models&#8217; performance on those tasks. <\/p>\n\n\n\n<p>&#8220;[T]here is no \u2018safe zone\u2019 in which models can be trusted to work perfectly,&#8221; added Yael Moros Daval, a researcher at VRAIN UPV, emphasizing the inconsistency.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Propensity for Incorrect Answers<\/h2>\n\n\n\n<p>Recent models are more inclined to provide wrong answers than to abstain from answering uncertain tasks, a stark contrast to human behavior. <\/p>\n\n\n\n<p>&#8220;This puts the onus on users to detect faults during all their interactions with models,&#8221; added Lexin Zhou, a researcher at VRAIN UPV.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Sensitivity to Problem Statements<\/h2>\n\n\n\n<p>Effective question formulation remains challenging. Prompts that succeed in complex tasks may still fail in simpler ones. <\/p>\n\n\n\n<p>&#8220;Users can be influenced by prompts that work well in complex tasks but, at the same time, get incorrect answers in simple tasks,&#8221; added co-author C\u00e8sar Ferri, a researcher at VRAIN UPV and ValgrAI.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Implications<\/h2>\n\n\n\n<p>The implications of these findings are profound, especially for general-purpose AI used in high-risk applications. The researchers argue that human supervision cannot fully compensate for these inherent reliability issues due to the overconfidence users place in these models.<\/p>\n\n\n\n<p>&#8220;Our results suggest a fundamental change is needed in the design and development of general-purpose AI,&#8221; concluded Wout Schellaert, a researcher at VRAIN UPV. <\/p>\n\n\n\n<p>This call to action resonates as the use of AI continues to expand into critical areas like health care, finance and autonomous systems.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A new study spearheaded by researchers from the Valencian Institute for Research in Artificial Intelligence (VRAIN) at the Polytechnic University of Valencia (UPV), the Valencian Graduate School and Research Network in Artificial Intelligence (ValgrAI) and the University of Cambridge has unveiled startling findings about the reliability of large language models. Recent advancements in AI, including [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"single-no-separators","format":"standard","meta":{"_acf_changed":false,"_uag_custom_page_level_css":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[8],"tags":[],"class_list":["post-6794","post","type-post","status-publish","format-standard","hentry","category-ai"],"acf":[],"aioseo_notices":[],"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false},"uagb_author_info":{"display_name":"The University Network","author_link":"https:\/\/www.tun.com\/home\/author\/funky_junkie\/"},"uagb_comment_info":0,"uagb_excerpt":"A new study spearheaded by researchers from the Valencian Institute for Research in Artificial Intelligence (VRAIN) at the Polytechnic University of Valencia (UPV), the Valencian Graduate School and Research Network in Artificial Intelligence (ValgrAI) and the University of Cambridge has unveiled startling findings about the reliability of large language models. Recent advancements in AI, including&hellip;","_links":{"self":[{"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/posts\/6794","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/comments?post=6794"}],"version-history":[{"count":7,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/posts\/6794\/revisions"}],"predecessor-version":[{"id":6870,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/posts\/6794\/revisions\/6870"}],"wp:attachment":[{"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/media?parent=6794"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/categories?post=6794"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/tags?post=6794"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}