Engineers at the University of Hong Kong have built two deep-learning tools that make it easier to spot cancer-linked mutations and decode RNA. The open-source algorithms could expand access to precision medicine and speed genomic discovery.
Two new artificial intelligence tools from The University of Hong Kong promise to make it easier, faster and cheaper to detect genetic mutations tied to cancer and to decode the RNA messages that keep our cells running.
Researchers in HKU’s Faculty of Engineering have developed ClairS-TO and Clair3-RNA, a pair of deep-learning algorithms that sharpen the analysis of long-read DNA and RNA sequencing data. The tools are designed to tackle long-standing bottlenecks in cancer diagnostics and RNA-based genomic research, and both studies — ClairS-TO and Clair3-RNA — are published in Nature Communications.
The work is led by Ruibang Luo, an associate professor in HKU’s School of Computing and Data Science, whose lab focuses on bioinformatics algorithms and clinical informatics. Luo’s team has spent years building the Clair series, a family of AI–driven genomic tools that has become widely used in the field.
The latest additions push that effort further, according to Luo.
“ClairS-TO and Clair3-RNA, along with other algorithms in the Clair series, have established a solid foundation for deep-learning-driven genetic mutation discovery, and accelerated the adoption of precision medicine and clinical genomics,” Luo said in a news release.
Why long-read sequencing matters
Genetic sequencing technologies read the order of DNA or RNA letters in our cells. Long-read sequencing, a newer generation of this technology, can capture continuous stretches of genetic material, revealing complex regions and structural changes that short-read methods can miss.
Those long reads are especially valuable in cancer, where tumors often carry a tangled mix of mutations, and in RNA studies, which probe how genes are turned on and off. But the same richness that makes long-read data powerful also makes it hard to interpret. Distinguishing real mutations from technical errors or natural RNA editing has been a major challenge.
The HKU team built ClairS-TO and Clair3-RNA to address those pain points.
A new way to read tumor DNA without normal tissue
ClairS-TO is aimed squarely at cancer diagnostics. Traditionally, labs compare DNA from a patient’s tumor with DNA from their healthy tissue to identify somatic mutations — changes that arise in the tumor but are not present in the rest of the body. That “tumor-normal” pairing helps filter out harmless inherited variants.
In practice, though, a matched normal sample is not always available. It may be too costly or invasive to collect, or the patient’s healthy tissue may not have been stored. That can limit access to high-quality genomic testing, particularly in resource-constrained settings.
ClairS-TO is designed to work with tumor-only samples. It uses a dual-network deep-learning architecture: one network focuses on confirming genuine mutations, while a second network is trained to reject sequencing errors and other noise. By learning patterns in long-read tumor data, the system can infer which changes are likely to be true somatic variants even without a normal sample for comparison.
This approach can make tumor DNA analysis more cost-effective and practical when sample material is limited. In clinical settings, that could mean more patients can receive detailed genomic profiling of their cancers, which in turn can guide targeted therapies and clinical trial enrollment.
First deep-learning variant caller for long-read RNA
While ClairS-TO tackles DNA in tumors, Clair3-RNA focuses on RNA, the molecule that carries genetic instructions from DNA to the cell’s protein-making machinery.
RNA sequencing reveals which genes are active, in what forms, and at what levels. Long-read RNA sequencing goes a step further by capturing full-length transcripts, making it easier to see how exons are pieced together and to detect rare or complex isoforms.
However, RNA comes with its own complications. Cells naturally edit some RNA molecules, and sequencing technologies can introduce errors. Both can masquerade as mutations, making it difficult to pinpoint true genetic variants.
Clair3-RNA is described as the world’s first deep-learning-based small variant caller built specifically for long-read RNA sequencing. It uses advanced neural network models to distinguish real mutations from biological noise and RNA editing events. That allows researchers and clinicians to analyze gene expression and genetic variants at the same time, with higher confidence.
In practical terms, Clair3-RNA could help scientists study how mutations affect RNA processing, identify disease-associated variants directly from RNA, and better understand how gene activity changes in conditions such as cancer, neurological disorders and immune diseases.
Building on a widely used AI toolkit
ClairS-TO and Clair3-RNA extend the existing Clair series, which already includes Clair3, an industry-standard tool for long-read variant calling. The Clair algorithms are known for their speed, accuracy and robustness, and they are released as open-source software.
According to HKU, the Clair tools have been downloaded more than 400,000 times and are widely adopted by leading research institutes and sequencing companies around the world. That broad uptake means new capabilities can spread quickly into both research and clinical pipelines.
For students and early-career scientists, the Clair series also offers a hands-on example of how computer science and engineering can directly impact medicine. Deep-learning models, once associated mainly with image recognition or language processing, are now central to how researchers read and interpret the genome.
What comes next
The HKU team’s latest work highlights how AI can make cutting-edge genomics more accessible. By reducing the need for matched normal samples and by taming the complexity of RNA data, ClairS-TO and Clair3-RNA could lower barriers for hospitals and labs that want to adopt long-read sequencing.
Future directions are likely to include further training on diverse patient populations, integration with clinical reporting systems, and expansion to other types of genomic variation. As long-read sequencing technologies continue to improve and drop in cost, tools like these will be critical for turning raw data into actionable insights.
For patients, the long-term promise is more accurate cancer diagnoses and more personalized treatment plans. For researchers, it is a clearer view of how DNA and RNA changes shape health and disease — and a faster path from genomic discovery to real-world impact.
Source: The City University of Hong Kong

