Disinfecting drinking water saves lives, but it can also create hundreds of little-known chemical byproducts. A new AI model from Stevens Institute of Technology is helping scientists quickly flag which ones may be most toxic — and guide future regulations.
Disinfecting drinking water is one of the biggest public health success stories in history, wiping out deadly outbreaks of cholera, typhoid and other waterborne diseases. But the same chemicals that make water safe to drink can also create new compounds that scientists are still racing to understand.
A team led by Stevens Institute of Technology has now built an artificial intelligence model that can rapidly predict the potential toxicity of more than a thousand of these disinfection byproducts. Their work, published in Environmental Science & Technology Letters, could help regulators and utilities focus on the compounds most likely to pose health risks — and keep tap water safe for generations to come.
When utilities add chlorine, chloramine and other disinfectants to water drawn from rivers, lakes or aquifers, those chemicals react with natural organic matter dissolved in the water. The reactions can produce hundreds of disinfection byproducts, or DBPs. A few of these, such as trihalomethanes and haloacetic acids, have been linked in past research to increased risks of bladder cancer and impaired fetal development.
Federal regulators already keep an eye on some of these compounds. But Tao Ye, an assistant professor at Stevens who uses AI to study complex environmental chemistry, notes that the current rules cover only a small slice of the problem.
“There are 11 such byproducts regulated by the EPA,” Ye said in a news release. “However, so far research has identified several hundred more, which we don’t know much about — and they may be more toxic than the ones that are regulated.”
Understanding which of those unregulated byproducts are truly dangerous is not straightforward. Traditional toxicology studies expose cells or organisms to one chemical at a time and measure the effects. That approach is rigorous but slow.
“Traditional toxicity testing in the lab is often time-consuming, labor-intensive, and expensive, which limits how many disinfection byproducts can be evaluated,” Ye added.
To break that bottleneck, Ye teamed up with his doctoral student Rabbi Sikder and collaborator Peng Gao at the Harvard T.H. Chan School of Public Health. Their goal was to train a machine learning model that could learn from existing lab data and then predict toxicity for many more chemicals that have never been tested directly.
The team started by combing through the scientific literature for studies that had already measured how specific DBPs affect biological systems.
“We used the laboratory testing data reported in previous literature,” Sikder added.
From those papers, the team assembled a detailed dataset.
“We collected those chemical names, their chemical structures, along with experimental exposure conditions and their corresponding toxicity values. We found toxicity values for 227 known chemicals and used them to build a machine learning predictive model to predict the toxicity for the unknown ones,” added Sikder.
The resulting AI system is what scientists call a semi-supervised learning model. It uses the 227 chemicals with known toxicity as anchors, then learns patterns that allow it to estimate toxicity for related compounds with no direct test data.
“AI and machine learning are fundamentally transforming this process by enabling rapid, scalable toxicity screening, allowing us to assess hundreds of compounds that would otherwise be impractical to test experimentally,” added Ye.
Once trained, the model was applied to 1,163 disinfection byproducts that can form during common water treatment processes. The analysis suggested that some of these unregulated compounds may have potential toxicity two to 10 times higher than certain DBPs that are already regulated by the Environmental Protection Agency.
That does not mean that every glass of tap water is laced with dangerous chemicals. In real-world systems, the mix and concentration of DBPs vary widely depending on the source water, the amount and type of disinfectant used, and how long the water spends in pipes and storage tanks.
When asked whether the findings imply that tap water is unsafe, Ye’s answer is clear: “Not at all,” he said.
The list of more than a thousand byproducts represents a theoretical universe of compounds that can form under different conditions, not what any single household is exposed to. A utility drawing from a mountain reservoir, for example, will see a different set of reactions than one treating a shallow, algae-rich lake.
Ye emphasized that the new AI tool is about staying ahead of potential risks, not sounding an alarm.
“What we are doing here is our due diligence to see what else may need to be regulated, depending on what’s in the water and what you use to clean it,” he added. “All in all, our tap water is safe to drink, and our research intends to make it even safer.”
Now that the model exists, the team hopes other scientists and agencies will use it to prioritize which DBPs deserve deeper study in the lab and in real-world water systems. By flagging high-priority compounds, AI can help focus limited testing resources where they matter most.
The work also points to a broader shift in environmental science. As chemical databases grow and computing power becomes cheaper, researchers are increasingly turning to AI to screen large numbers of pollutants, from pesticides to industrial solvents, before they become widespread problems.
For students and early-career scientists, this kind of project illustrates how data science and public health can intersect in practical ways — using algorithms not as black boxes, but as tools to protect communities.
Ye and his colleagues are also thinking about what individuals can do at home, especially those who remain uneasy about DBPs in their tap water.
“As researchers, we are always trying to do two things — advance the science and inform the public. The first thing in this case is understanding the mechanisms behind the formation of toxic compounds. And the second one is how to reduce these chemicals in our tap water, which you can do in two different ways. You can filter the water with various widely available household filters. Or you can boil it because when you boil it, these chemicals evaporate,” Ye added. “Both methods are easy to do at home.”
As utilities continue to rely on disinfection to prevent deadly disease, tools like this AI model could help ensure that the cure does not create new, hidden risks. By combining chemistry, computing and public health, the Stevens team is working to keep a basic human need — safe drinking water — as reliable as possible in a changing world.
Source: Stevens Institute of Technology

