A breakthrough study from the University of Waterloo’s Cybersecurity and Privacy Institute unveils vulnerabilities in AI watermarking, suggesting that deepfakes can bypass current detection techniques.
Researchers at the University of Waterloo’s Cybersecurity and Privacy Institute have made a startling discovery that exposes vulnerabilities in the methods used to detect AI-generated content. Their study reveals that artificial intelligence image watermarks, promoted as a solution to combat deepfakes, can be effectively removed without knowing the watermark’s design or the existence of the watermark itself.
As AI-generated images and videos grow increasingly lifelike, concerns are mounting over the potential misuse of this technology in arenas such as politics, the legal system and daily life.
“People want a way to verify what’s real and what’s not because the damages will be huge if we can’t,” lead author Andre Kassis, a doctoral candidate in computer science at the University of Waterloo, said in a news release. “From political smear campaigns to non-consensual pornography, this technology could have terrible and wide-reaching consequences.”
Major AI companies like OpenAI, Meta and Google tout invisible, encoded watermarks as a reliable means to identify AI-generated content. These watermarks are designed to be imperceptible to human users yet robust enough to withstand image manipulations, such as cropping or resolution changes. The companies argue that these coded signatures can enable the development of effective public tools for distinguishing genuine content from AI-generated material.
Contrary to these claims, the Waterloo team developed a tool named UnMarker, which can efficiently strip watermarks from images. UnMarker does so without any prior knowledge of the watermarking algorithms, internal parameters or detector interactions.
This sets UnMarker apart as the first practical and universal tool capable of removing watermarks in real-world settings.
“While watermarking schemes are typically kept secret by AI companies, they must satisfy two essential properties: they need to be invisible to human users to preserve image quality, and they must be robust, that is, resistant to manipulation of an image like cropping or reducing resolution,” added Urs Hengartner, an associate professor in the David R. Cheriton School of Computer Science at the University of Waterloo.
The researchers’ key insight is that to meet these criteria, watermarks must subtly manipulate the image’s pixel intensities across its spectral domain. By employing a statistical attack, UnMarker identifies and distorts unusual pixel frequencies, making the watermark unrecognizable to detection tools but leaving the image undetectably altered to the human eye.
In trials, UnMarker effectively removed watermarks over 50% of the time on various AI models, including Google’s SynthID and Meta’s Stable Signature.
“If we can figure this out, so can malicious actors,” Kassis added. “Watermarking is being promoted as this perfect solution, but we’ve shown that this technology is breakable. Deepfakes are still a huge threat. We live in an era where you can’t really trust what you see anymore.”
The research, published in the proceedings of the 46th IEEE Symposium on Security and Privacy, underscores the need for developing more robust methods to detect AI-generated content, as the current watermarking strategies prove insufficient against sophisticated attacks.
Source: University of Waterloo

