{"id":34344,"date":"2026-02-11T08:41:00","date_gmt":"2026-02-11T08:41:00","guid":{"rendered":"https:\/\/www.tun.com\/home\/?p=34344"},"modified":"2026-02-11T13:41:45","modified_gmt":"2026-02-11T13:41:45","slug":"uc-san-diego-team-teaches-ai-to-truly-show-its-work","status":"publish","type":"post","link":"https:\/\/www.tun.com\/home\/uc-san-diego-team-teaches-ai-to-truly-show-its-work\/","title":{"rendered":"UC San Diego Team Teaches AI to Truly &#8216;Show Its Work&#8217;"},"content":{"rendered":"\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-uagb-blockquote uagb-block-e7eb3fc3 uagb-blockquote__skin-border uagb-blockquote__stack-img-none\"><blockquote class=\"uagb-blockquote\"><div class=\"uagb-blockquote__content\">A new training method from UC San Diego helps AI reason more like a careful student, not a guesser, especially on math problems that mix text and images. The approach could power safer AI tutors and more reliable analysis of charts, reports and scientific papers.<\/div><footer><div class=\"uagb-blockquote__author-wrap uagb-blockquote__author-at-left\"><\/div><\/footer><\/blockquote><\/div>\n\n\n\n<div class=\"wp-block-group is-content-justification-space-between is-nowrap is-layout-flex wp-container-core-group-is-layout-0dfbf163 wp-block-group-is-layout-flex\"><div style=\"font-size:16px;\" class=\"has-text-align-left wp-block-post-author\"><div class=\"wp-block-post-author__content\"><p class=\"wp-block-post-author__name\">The University Network<\/p><\/div><\/div>\n\n\n<div class=\"wp-block-uagb-social-share uagb-social-share__outer-wrap uagb-social-share__layout-horizontal uagb-block-ee584a31\">\n<div class=\"wp-block-uagb-social-share-child uagb-ss-repeater uagb-ss__wrapper uagb-block-ec619ce7\"><span class=\"uagb-ss__link\" data-href=\"https:\/\/www.facebook.com\/sharer.php?u=\" tabindex=\"0\" role=\"button\" aria-label=\"facebook\"><span class=\"uagb-ss__source-wrap\"><span class=\"uagb-ss__source-icon\"><svg xmlns=\"https:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\"><path d=\"M504 256C504 119 393 8 256 8S8 119 8 256c0 123.8 90.69 226.4 209.3 245V327.7h-63V256h63v-54.64c0-62.15 37-96.48 93.67-96.48 27.14 0 55.52 4.84 55.52 4.84v61h-31.28c-30.8 0-40.41 19.12-40.41 38.73V256h68.78l-11 71.69h-57.78V501C413.3 482.4 504 379.8 504 256z\"><\/path><\/svg><\/span><\/span><\/span><\/div>\n\n\n\n<div class=\"wp-block-uagb-social-share-child uagb-ss-repeater uagb-ss__wrapper uagb-block-32d99934\"><span class=\"uagb-ss__link\" data-href=\"https:\/\/twitter.com\/share?url=\" tabindex=\"0\" role=\"button\" aria-label=\"twitter\"><span class=\"uagb-ss__source-wrap\"><span class=\"uagb-ss__source-icon\"><svg xmlns=\"https:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\"><path d=\"M389.2 48h70.6L305.6 224.2 487 464H345L233.7 318.6 106.5 464H35.8L200.7 275.5 26.8 48H172.4L272.9 180.9 389.2 48zM364.4 421.8h39.1L151.1 88h-42L364.4 421.8z\"><\/path><\/svg><\/span><\/span><\/span><\/div>\n\n\n\n<div class=\"wp-block-uagb-social-share-child uagb-ss-repeater uagb-ss__wrapper uagb-block-1d136f14\"><span class=\"uagb-ss__link\" data-href=\"https:\/\/www.linkedin.com\/shareArticle?url=\" tabindex=\"0\" role=\"button\" aria-label=\"linkedin\"><span class=\"uagb-ss__source-wrap\"><span class=\"uagb-ss__source-icon\"><svg xmlns=\"https:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 448 512\"><path d=\"M416 32H31.9C14.3 32 0 46.5 0 64.3v383.4C0 465.5 14.3 480 31.9 480H416c17.6 0 32-14.5 32-32.3V64.3c0-17.8-14.4-32.3-32-32.3zM135.4 416H69V202.2h66.5V416zm-33.2-243c-21.3 0-38.5-17.3-38.5-38.5S80.9 96 102.2 96c21.2 0 38.5 17.3 38.5 38.5 0 21.3-17.2 38.5-38.5 38.5zm282.1 243h-66.4V312c0-24.8-.5-56.7-34.5-56.7-34.6 0-39.9 27-39.9 54.9V416h-66.4V202.2h63.7v29.2h.9c8.9-16.8 30.6-34.5 62.9-34.5 67.2 0 79.7 44.3 79.7 101.9V416z\"><\/path><\/svg><\/span><\/span><\/span><\/div>\n<\/div>\n<\/div>\n<\/div><\/div>\n\n\n\n<p>Artificial intelligence systems are getting better at answering tough questions, but they still have a bad habit: they can guess correctly without really understanding the problem.<\/p>\n\n\n\n<p>Engineers at the University of California San Diego say they have built a smarter way to train AI so it has to show its work, especially on complex tasks that combine text and images, such as math word problems with charts and diagrams.<\/p>\n\n\n\n<p>Their <a href=\"https:\/\/openreview.net\/pdf?id=ZyiBk1ZinG\" target=\"_blank\" rel=\"noopener\" title=\"\">new method<\/a>, presented at the <a href=\"https:\/\/neurips.cc\/\" target=\"_blank\" rel=\"noopener\" title=\"\">NeurIPS conference<\/a> in December 2025, pushed AI models to the top of widely used tests of visual mathematical reasoning. The researchers say the same ideas could lead to more trustworthy AI tutors, as well as tools that can reliably analyze business reports, complex charts and scientific papers with less risk of making things up.<\/p>\n\n\n\n<p>Most current AI systems are trained and evaluated almost entirely on whether they land on the right final answer. <\/p>\n\n\n\n<p>Study senior author Pengtao Xie, a professor in the Department of Electrical and Computer Engineering at the UC San Diego Jacobs School of Engineering, compared that approach to a familiar classroom experience.<\/p>\n\n\n\n<p>\u201cThey are graded much like students taking a multiple-choice test,\u201d he said in a news release. \u201cIf they select the right answer, they still receive full credit, even if they guessed.\u201d<\/p>\n\n\n\n<p>The UC San Diego team\u2019s method flips that script. Instead of rewarding an AI model just for being right, the system scores how well the model reasons its way through a problem.<\/p>\n\n\n\n<p>\u201cIt gets rewarded for thinking logically, step by step, rather than just guessing correctly,\u201d Xie added. \u201cIf it gets the right answer using the wrong logic, it doesn\u2019t get rewarded.\u201d<\/p>\n\n\n\n<p>That shift, from asking \u201cDid the AI get it right?\u201d to \u201cDid the AI think it through?\u201d is more than a philosophical change. In high-stakes settings like medical diagnosis, financial analysis or engineering design, a confident but poorly reasoned answer can be dangerous. A system that is trained to value sound reasoning over lucky guesses is better positioned to flag uncertainty, avoid shortcuts and provide explanations that humans can check.<\/p>\n\n\n\n<p>Until now, this kind of \u201cprocess-based\u201d training has mostly been applied to text-only models. Extending it to multimodal models \u2014 systems that must interpret both language and images \u2014 adds another layer of difficulty: the training data itself.<\/p>\n\n\n\n<p>AI models learn from massive collections of example problems and solutions. But not all data are created equal. Some datasets are rich, detailed and challenging. Others are noisy, too simple or only loosely related to the task. If a model treats all of that material as equally useful, it can slow down or even confuse its learning.<\/p>\n\n\n\n<p>Xie illustrated the problem with a vivid comparison.<\/p>\n\n\n\n<p>\u201cIt\u2019s like trying to learn calculus when half of your reading list consists of kindergarten coloring books,\u201d he said.<\/p>\n\n\n\n<p>To tackle this, the team built a training system that acts as a kind of smart curator. Instead of feeding the model every example with the same importance, the method learns to assign different weights to different datasets. High-quality, challenging examples count more; low-quality or irrelevant ones are downplayed.<\/p>\n\n\n\n<p>The system then checks its own progress on a separate set of problems and uses that feedback to keep adjusting how it prioritizes training data over time.<\/p>\n\n\n\n<p>\u201cOur system doesn\u2019t just learn from everything,\u201d added Xie. \u201cIt learns what is worth learning from. It emphasizes quality over quantity.\u201d<\/p>\n\n\n\n<p>This two-part strategy \u2014 grading the reasoning process and curating the training data \u2014 paid off in tests. When evaluated on multiple benchmarks that measure visual and mathematical reasoning, the team\u2019s system consistently outperformed other training methods.<\/p>\n\n\n\n<p>On MathVista, a widely used benchmark that tests how well AI can solve math word problems that include charts and diagrams, a model trained with the UC San Diego method achieved a top public score of 85.2%, according to the researchers. The result was verified by MathVista\u2019s organizers.<\/p>\n\n\n\n<p>Beyond raw scores, the team sees the work as a step toward making advanced reasoning AI more accessible. Many of today\u2019s most capable models are huge, proprietary systems that require enormous computing resources to train and run. The new training approach helps smaller, open models narrow that gap, according to Xie.<\/p>\n\n\n\n<p>\u201cYou don\u2019t need a trillion-dollar computing cluster to get state-of-the-art reasoning,\u201d he said.<\/p>\n\n\n\n<p>That could open the door for schools, small companies and individual developers to build specialized AI tools that run on personal computers or modest servers, rather than relying entirely on tech giants\u2019 cloud platforms.<\/p>\n\n\n\n<p>For students, one promising application is AI tutors that can walk through a math or science problem line by line, checking each step for logical consistency instead of just spitting out an answer. For professionals, better multimodal reasoning could mean AI systems that can read a financial report, interpret its graphs and tables, and explain their implications in clear language \u2014 all while being less likely to misread a chart or invent a trend.<\/p>\n\n\n\n<p>Next, the team plans to go even more granular in how they judge training data quality, moving from scoring entire datasets to evaluating individual questions and problems. They are also working to make the training process faster and less computationally demanding.<\/p>\n\n\n\n<p>As AI systems become more deeply embedded in education, business and research, methods like this \u2014 that push models to think carefully, not just answer quickly \u2014 may play a key role in making the technology both more powerful and more trustworthy.<\/p>\n\n\n\n<div style=\"height:12px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><strong>Source: <\/strong><a href=\"https:\/\/today.ucsd.edu\/story\/a-smarter-way-for-ai-to-understand-text-and-images\" target=\"_blank\" rel=\"noopener\" title=\"\">University of California San Diego<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A new training method from UC San Diego helps AI reason more like a careful student, not a guesser, especially on math problems that mix text and images. The approach could power safer AI tutors and more reliable analysis of charts, reports and scientific papers.<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"single-no-separators","format":"standard","meta":{"_acf_changed":false,"_uag_custom_page_level_css":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[8],"tags":[206],"class_list":["post-34344","post","type-post","status-publish","format-standard","hentry","category-ai","tag-uc-san-diego"],"acf":[],"aioseo_notices":[],"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false},"uagb_author_info":{"display_name":"The University Network","author_link":"https:\/\/www.tun.com\/home\/author\/funky_junkie\/"},"uagb_comment_info":0,"uagb_excerpt":"A new training method from UC San Diego helps AI reason more like a careful student, not a guesser, especially on math problems that mix text and images. The approach could power safer AI tutors and more reliable analysis of charts, reports and scientific papers.","_links":{"self":[{"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/posts\/34344","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/comments?post=34344"}],"version-history":[{"count":4,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/posts\/34344\/revisions"}],"predecessor-version":[{"id":34361,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/posts\/34344\/revisions\/34361"}],"wp:attachment":[{"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/media?parent=34344"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/categories?post=34344"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/tags?post=34344"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}