A new UNC-Chapel Hill study shows that advanced AI can pinpoint where plant specimens were collected with near-human accuracy, slashing the time and cost of digitizing vast natural history collections. The breakthrough could open billions of records to scientists studying climate change and biodiversity loss.
A new study led by the University of North Carolina at Chapel Hill suggests artificial intelligence could finally crack one of the biggest bottlenecks in digitizing the world’s natural history collections.
The researchers found that advanced AI tools known as large language models, or LLMs, can determine where plant specimens were originally collected with near-human accuracy, but in a fraction of the time and at far lower cost than traditional methods.
The process, called georeferencing, involves turning often vague or old-fashioned location notes on specimen labels into precise coordinates on a map. It is essential for turning physical herbarium sheets into usable digital data for scientists studying biodiversity, climate change and ecosystem shifts.
The UNC team reports that LLMs not only handled this task well, but actually outperformed standard approaches. In tests, the models georeferenced specimens with an error margin of less than 10 kilometers, while working far faster and more cheaply than existing tools and manual workflows.
First author Yuyang Xie, a postdoctoral researcher in UNC’s Department of Biology, said the work targets a long-standing choke point in museum and herbarium digitization.
“Our study explores how large language models can take on one of the biggest bottlenecks in digitizing plant collections,” Xie said in a news release. “We are pioneering the use of these tools for georeferencing, a breakthrough that will accelerate the digitization of plant specimens and unlock new possibilities for ecological research.”
Around the world, an estimated 2-3 billion herbarium specimens are stored in cabinets and vaults. Only a small share have been fully digitized with usable spatial data. That gap limits scientists’ ability to track biodiversity loss, monitor species’ movements as the climate warms and analyze how ecosystems are changing over time.
Without coordinates, many specimens remain essentially invisible to modern data-driven research, even if they were collected decades or centuries ago.
Traditional georeferencing is slow and labor-intensive. It often requires experts to interpret handwritten labels, historical place names or vague directions, then use specialized software and maps to estimate coordinates. Quality control can involve multiple rounds of review.
By contrast, LLMs are designed to read and interpret natural language. The UNC team showed that these models can parse messy or ambiguous label text, infer the most likely location and return coordinates quickly and consistently.
Xiao Feng, an assistant professor of biology at UNC and corresponding author on the study, said recent AI advances are reshaping what is possible.
“Recent advances in LLMs can potentially transform the georeferencing process, making it faster and more accurate,” Feng said in the news release . “This gives researchers unprecedented opportunities to advance our understanding of global biodiversity distributions.”
Because LLMs can be scaled up across millions of records, the approach could dramatically accelerate efforts to bring natural history collections online. Faster georeferencing means more specimens can be included in global databases that feed into conservation planning, species distribution models and climate impact studies.
The study, published in the journal Nature Plants, is among the first to rigorously test LLMs on georeferencing tasks and to compare their performance with existing methods. The researchers found that the AI tools matched or exceeded human-level accuracy while cutting both time and cost, suggesting that institutions with limited resources could still benefit.
For herbaria and museums, this shift could be transformative. Many institutions are sitting on vast backlogs of undigitized specimens because they lack staff and funding to process them using traditional workflows.
“This technology allows us to unlock millions of records that are currently sitting in cabinets,” Xie added. “With the power of LLMs, we can rapidly digitize plant specimen data that will be critical for addressing global environmental challenges.”
Those challenges include tracking invasive species, identifying climate refuges where vulnerable plants might survive, and documenting how species ranges have shifted over the last century. Historical specimens, once digitized and mapped, become time capsules that reveal how life on Earth has changed.
The UNC team’s findings also point to broader possibilities. If LLMs can reliably georeference plant specimens, similar approaches could be applied to other natural history collections, from insects and fungi to vertebrate animals. Any collection with text labels and locality information could, in principle, be processed using AI.
The study offers a hopeful glimpse of how AI and traditional curation can work together. Human experts remain essential for verifying tricky cases, interpreting rare or unusual records and guiding research questions. But with LLMs taking on the repetitive, time-consuming work, experts may be able to focus more on analysis and discovery.
For students and early-career scientists, the shift could mean faster access to richer datasets and more opportunities to tackle big questions about biodiversity and climate.
By showing that AI can handle one of the most stubborn technical hurdles in digitization, the UNC researchers have opened the door to a future where the world’s plant collections are not just preserved on paper, but fully searchable, mappable and ready to inform urgent environmental decisions.

