UC Riverside researchers, in collaboration with Google, have created an AI tool named UNITE to identify manipulated videos, providing crucial defense against the growing threat of deepfakes and synthetic media.
Researchers at the University of California Riverside have unveiled an innovative artificial intelligence model designed to expose fake videos.
Amit Roy-Chowdhury, a professor of electrical and computer engineering, and doctoral candidate Rohit Kundu, both from UC Riverside’s Marlan and Rosemary Bourns College of Engineering, collaborated with Google scientists to develop an AI model that detects video tampering — even when manipulations extend beyond face swaps and altered speech.
The system, named the Universal Network for Identifying Tampered and synthetic videos (UNITE), scrutinizes entire video frames, including backgrounds and motion patterns, to detect forgeries. This comprehensive analysis positions UNITE as one of the first tools capable of identifying synthetic or doctored videos by evaluating more than just facial content.
“Deepfakes have evolved,” Kundu said in a news release. “They’re not just about face swaps anymore. People are now creating entirely fake videos — from faces to backgrounds — using powerful generative models. Our system is built to catch all of that.”
The development of UNITE coincides with the proliferation of text-to-video and image-to-video generation tools, which have become widely accessible online. These platforms enable individuals with moderate skills to create highly convincing fake videos, posing significant risks to individuals, institutions and even democracy itself.
“It’s scary how accessible these tools have become,” Kundu added. “Anyone with moderate skills can bypass safety filters and generate realistic videos of public figures saying things they never said.”
Kundu noted that earlier deepfake detectors primarily focused on facial cues, making them ineffective against videos without faces.
“If there’s no face in the frame, many detectors simply don’t work,” he explained. “But disinformation can come in many forms. Altering a scene’s background can distort the truth just as easily.”
To tackle this challenge, UNITE employs a transformer-based deep learning model to analyze video clips, detecting subtle spatial and temporal inconsistencies that previous systems often missed. The model leverages a foundational AI framework known as SigLIP, which extracts features independent of specific people or objects. Furthermore, a novel training technique, known as “attention-diversity loss,” encourages the system to distribute its attention across multiple visual regions within each frame, preventing it from excessively concentrating on facial features.
These innovations resulted in a universal detector capable of identifying diverse types of forgeries, ranging from straightforward facial swaps to intricate, entirely synthetic videos created without any genuine source footage.
“It’s one model that handles all these scenarios,” Kundu added. “That’s what makes it universal.”
The researchers presented their findings at the esteemed 2025 Conference on Computer Vision and Pattern Recognition (CVPR) in Nashville, Tenn. Their paper outlines UNITE’s architecture and training methodology.
Kundu’s internship at Google facilitated access to extensive datasets and computing resources necessary to train the model on various forms of synthetic content, including videos generated from text or still images — formats that often confound existing detectors.
Although UNITE is still in development, it holds the potential to play a critical role in combating video disinformation. Potential users include social media platforms, fact-checkers and newsrooms dedicated to preventing manipulated videos from going viral.
“People deserve to know whether what they’re seeing is real,” Kundu said. “And as AI gets better at faking reality, we have to get better at revealing the truth.”

