{"id":34419,"date":"2026-02-17T11:57:00","date_gmt":"2026-02-17T11:57:00","guid":{"rendered":"https:\/\/www.tun.com\/home\/?p=34419"},"modified":"2026-02-18T20:57:29","modified_gmt":"2026-02-18T20:57:29","slug":"mit-ai-model-learns-yeast-dna-language-to-cut-drug-costs","status":"publish","type":"post","link":"https:\/\/www.tun.com\/home\/mit-ai-model-learns-yeast-dna-language-to-cut-drug-costs\/","title":{"rendered":"MIT AI Model Learns Yeast DNA Language to Cut Drug Costs"},"content":{"rendered":"\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-uagb-blockquote uagb-block-e7eb3fc3 uagb-blockquote__skin-border uagb-blockquote__stack-img-none\"><blockquote class=\"uagb-blockquote\"><div class=\"uagb-blockquote__content\">MIT chemical engineers used a large language model to learn how industrial yeast reads DNA, then used it to make protein drugs more efficiently. The approach could help cut the time and cost of bringing new biologic medicines to patients.<\/div><footer><div class=\"uagb-blockquote__author-wrap uagb-blockquote__author-at-left\"><\/div><\/footer><\/blockquote><\/div>\n\n\n\n<div class=\"wp-block-group is-content-justification-space-between is-nowrap is-layout-flex wp-container-core-group-is-layout-0dfbf163 wp-block-group-is-layout-flex\"><div style=\"font-size:16px;\" class=\"has-text-align-left wp-block-post-author\"><div class=\"wp-block-post-author__content\"><p class=\"wp-block-post-author__name\">The University Network<\/p><\/div><\/div>\n\n\n<div class=\"wp-block-uagb-social-share uagb-social-share__outer-wrap uagb-social-share__layout-horizontal uagb-block-ee584a31\">\n<div class=\"wp-block-uagb-social-share-child uagb-ss-repeater uagb-ss__wrapper uagb-block-ec619ce7\"><span class=\"uagb-ss__link\" data-href=\"https:\/\/www.facebook.com\/sharer.php?u=\" tabindex=\"0\" role=\"button\" aria-label=\"facebook\"><span class=\"uagb-ss__source-wrap\"><span class=\"uagb-ss__source-icon\"><svg xmlns=\"https:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\"><path d=\"M504 256C504 119 393 8 256 8S8 119 8 256c0 123.8 90.69 226.4 209.3 245V327.7h-63V256h63v-54.64c0-62.15 37-96.48 93.67-96.48 27.14 0 55.52 4.84 55.52 4.84v61h-31.28c-30.8 0-40.41 19.12-40.41 38.73V256h68.78l-11 71.69h-57.78V501C413.3 482.4 504 379.8 504 256z\"><\/path><\/svg><\/span><\/span><\/span><\/div>\n\n\n\n<div class=\"wp-block-uagb-social-share-child uagb-ss-repeater uagb-ss__wrapper uagb-block-32d99934\"><span class=\"uagb-ss__link\" data-href=\"https:\/\/twitter.com\/share?url=\" tabindex=\"0\" role=\"button\" aria-label=\"twitter\"><span class=\"uagb-ss__source-wrap\"><span class=\"uagb-ss__source-icon\"><svg xmlns=\"https:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\"><path d=\"M389.2 48h70.6L305.6 224.2 487 464H345L233.7 318.6 106.5 464H35.8L200.7 275.5 26.8 48H172.4L272.9 180.9 389.2 48zM364.4 421.8h39.1L151.1 88h-42L364.4 421.8z\"><\/path><\/svg><\/span><\/span><\/span><\/div>\n\n\n\n<div class=\"wp-block-uagb-social-share-child uagb-ss-repeater uagb-ss__wrapper uagb-block-1d136f14\"><span class=\"uagb-ss__link\" data-href=\"https:\/\/www.linkedin.com\/shareArticle?url=\" tabindex=\"0\" role=\"button\" aria-label=\"linkedin\"><span class=\"uagb-ss__source-wrap\"><span class=\"uagb-ss__source-icon\"><svg xmlns=\"https:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 448 512\"><path d=\"M416 32H31.9C14.3 32 0 46.5 0 64.3v383.4C0 465.5 14.3 480 31.9 480H416c17.6 0 32-14.5 32-32.3V64.3c0-17.8-14.4-32.3-32-32.3zM135.4 416H69V202.2h66.5V416zm-33.2-243c-21.3 0-38.5-17.3-38.5-38.5S80.9 96 102.2 96c21.2 0 38.5 17.3 38.5 38.5 0 21.3-17.2 38.5-38.5 38.5zm282.1 243h-66.4V312c0-24.8-.5-56.7-34.5-56.7-34.6 0-39.9 27-39.9 54.9V416h-66.4V202.2h63.7v29.2h.9c8.9-16.8 30.6-34.5 62.9-34.5 67.2 0 79.7 44.3 79.7 101.9V416z\"><\/path><\/svg><\/span><\/span><\/span><\/div>\n<\/div>\n<\/div>\n<\/div><\/div>\n\n\n\n<p>A new artificial intelligence model that reads DNA like a language could help make protein-based drugs and vaccines faster and cheaper to produce.<\/p>\n\n\n\n<p>MIT chemical engineers have adapted the same kind of large language models that power chatbots to study the genetic code of an industrial yeast widely used to manufacture medicines. By learning the yeast\u2019s preferred patterns of DNA, the model can suggest better genetic recipes for making valuable proteins, from human growth hormone to cancer-fighting antibodies.<\/p>\n\n\n\n<p>In lab tests, those AI-designed DNA sequences helped yeast cells churn out more of six different therapeutic proteins than sequences generated by leading commercial tools, the researchers report in a paper <a href=\"https:\/\/www.pnas.org\/doi\/10.1073\/pnas.2522052123\" target=\"_blank\" rel=\"noopener\" title=\"\">published<\/a> in the <em>Proceedings of the National Academy of Sciences<\/em>.<\/p>\n\n\n\n<p>For drug makers, that kind of boost could translate into shorter development timelines and lower manufacturing costs for biologics \u2014 complex medicines made by living cells that are often among the most expensive treatments on the market.<\/p>\n\n\n\n<p>The goal is to bring more predictability to a process that is still surprisingly manual, according to senior author J. Christopher Love, the Raymond A. and Helen E. St. Laurent Professor of Chemical Engineering at MIT.<\/p>\n\n\n\n<p>\u201cToday, those steps are all done by very laborious experimental tasks,\u201d Love, who is also a member of the Koch Institute for Integrative Cancer Research and faculty co-director of the MIT Initiative for New Manufacturing, said in a news release. \u201cWe have been looking at the question of where could we take some of the concepts that are emerging in machine learning and apply them to make different aspects of the process more reliable and simpler to predict.\u201d<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Learning yeast\u2019s genetic \u201csyntax\u201d<\/h3>\n\n\n\n<p>Industrial yeasts such as <em>Komagataella phaffii<\/em> and <em>Saccharomyces cerevisiae<\/em> are the workhorses of the biopharmaceutical industry. They help produce billions of dollars\u2019 worth of protein drugs and vaccines every year, including insulin, hepatitis B vaccines and monoclonal antibodies.<\/p>\n\n\n\n<p>To turn yeast into a miniature factory for a new protein drug, engineers insert a gene encoding that protein into the yeast\u2019s genome and then fine-tune the cells\u2019 growth and production conditions. For biologic drugs, this development phase can account for a significant share of the overall cost of bringing a product to market.<\/p>\n\n\n\n<p>A key design decision is how to write the DNA sequence for the gene. Proteins are built from 20 amino acids, but DNA uses 64 possible three-letter \u201ccodons\u201d to encode them. That means most amino acids can be spelled several different ways in DNA.<\/p>\n\n\n\n<p>Different organisms favor different codons. Traditional codon optimization tools usually pick the most common codons in the host organism, on the theory that cells are better equipped to use them. But that simple strategy can backfire. If a cell keeps seeing the same codon for a particular amino acid, it can run short on the matching transfer RNA molecules needed to assemble proteins, slowing production.<\/p>\n\n\n\n<p>The MIT team wanted a more nuanced approach that could capture the full context of how codons are arranged in real genes.<\/p>\n\n\n\n<p>They turned to an encoder-decoder large language model, a type of AI that normally learns patterns in text. Instead of feeding it sentences, they trained it on the amino acid sequences and matching DNA sequences for roughly 5,000 proteins that <em>K. phaffii <\/em>naturally produces, using a public database from the National Center for Biotechnology Information.<\/p>\n\n\n\n<p>\u201cThe model learns the syntax or the language of how these codons are used,\u201d Love added. \u201cIt takes into account how codons are placed next to each other, and also the long-distance relationships between them.\u201d<\/p>\n\n\n\n<p>Once trained, the model could take the amino acid sequence of a desired protein and propose a DNA sequence for <em>K. phaffii <\/em>that should produce it efficiently.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Beating commercial tools in head-to-head tests<\/h3>\n\n\n\n<p>To see how well their AI system worked, the researchers asked it to design codon-optimized genes for six different proteins, including human growth hormone, human serum albumin and trastuzumab, a monoclonal antibody used to treat cancer.<\/p>\n\n\n\n<p>They also generated optimized DNA sequences for the same proteins using four commercially available codon optimization tools that represent different strategies for choosing codons.<\/p>\n\n\n\n<p>\u201cWe made sure to cover a variety of different philosophies of doing codon optimization and benchmarked them against our approach,\u201d added lead author Harini Narayanan, a former MIT postdoctoral researcher.<\/p>\n\n\n\n<p>The team then inserted each version of each gene into <em>K. phaffii<\/em> cells and measured how much of the target protein the yeast produced. For five of the six proteins, the sequences from the MIT model led to the highest yields. For the remaining protein, the model\u2019s design came in second.<\/p>\n\n\n\n<p>\u201cWe\u2019ve experimentally compared these approaches and showed that our approach outperforms the others,\u201d Narayanan added.<\/p>\n\n\n\n<p>Beyond the performance gains, Love emphasized the potential impact on how quickly new protein drugs can move from concept to production.<\/p>\n\n\n\n<p>\u201cHaving predictive tools that consistently work well is really important to help shorten the time from having an idea to getting it into production. Taking away uncertainty ultimately saves time and money,\u201d he said.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Discovering hidden biological rules<\/h3>\n\n\n\n<p><em>K. phaffii<\/em>, formerly known as <em>Pichia pastoris<\/em>, is already used to make dozens of commercial products, including medicines and food ingredients such as hemoglobin. That made it a natural starting point for the MIT team.<\/p>\n\n\n\n<p>But the researchers also wanted to know whether their approach could generalize to other species. They trained similar models on genetic data from humans, cows and other organisms. Each model produced different codon predictions, suggesting that species-specific models are needed to get the best results.<\/p>\n\n\n\n<p>When the team probed how the yeast model was making its decisions, they found that it had picked up on real biological principles that were never explicitly programmed into it.<\/p>\n\n\n\n<p>For example, the model learned to avoid certain repeated DNA elements that can interfere with gene expression. It also appeared to group amino acids based on chemical traits such as how they interact with water, reflecting underlying biophysical rules of protein structure.<\/p>\n\n\n\n<p>\u201cNot only was it learning this language, but it was also contextualizing it through aspects of biophysical and biochemical features, which gives us additional confidence that it is learning something that\u2019s actually meaningful and not simply an optimization of the task that we gave it,\u201d Love added.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Opening the toolbox<\/h3>\n\n\n\n<p>Researchers in Love\u2019s lab have already started using the new model to design genes for proteins they want <em>K. phaffii<\/em> to produce. They have also released the code so other scientists can adapt it for their own work with <em>K. phaffii<\/em> or train similar models for different organisms.<\/p>\n\n\n\n<p>In the long run, tools like this could become part of a broader AI-assisted pipeline for biologics manufacturing, helping scientists move from a protein idea on paper to a robust production process with fewer trial-and-error experiments.<\/p>\n\n\n\n<div style=\"height:11px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><strong>Source: <\/strong><a href=\"https:\/\/news.mit.edu\/2026\/new-ai-model-could-cut-costs-developing-protein-drugs-0216\" target=\"_blank\" rel=\"noopener\" title=\"\">Massachusetts Institute of Technology<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>MIT chemical engineers used a large language model to learn how industrial yeast reads DNA, then used it to make protein drugs more efficiently. The approach could help cut the time and cost of bringing new biologic medicines to patients.<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"single-no-separators","format":"standard","meta":{"_acf_changed":false,"_uag_custom_page_level_css":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[8],"tags":[103],"class_list":["post-34419","post","type-post","status-publish","format-standard","hentry","category-ai","tag-mit"],"acf":[],"aioseo_notices":[],"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false},"uagb_author_info":{"display_name":"The University Network","author_link":"https:\/\/www.tun.com\/home\/author\/funky_junkie\/"},"uagb_comment_info":0,"uagb_excerpt":"MIT chemical engineers used a large language model to learn how industrial yeast reads DNA, then used it to make protein drugs more efficiently. The approach could help cut the time and cost of bringing new biologic medicines to patients.","_links":{"self":[{"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/posts\/34419","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/comments?post=34419"}],"version-history":[{"count":9,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/posts\/34419\/revisions"}],"predecessor-version":[{"id":34481,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/posts\/34419\/revisions\/34481"}],"wp:attachment":[{"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/media?parent=34419"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/categories?post=34419"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tun.com\/home\/wp-json\/wp\/v2\/tags?post=34419"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}