AI Web Browser Assistants Found Collecting Sensitive User Data: New Study

A new study reveals significant privacy breaches by popular AI browser assistants, which are collecting sensitive user data like medical records and social security numbers without adequate safeguards.

The University Network

A recent study led by researchers from University College London (UCL) and the Mediterranea University of Reggio Calabria has unveiled alarming privacy issues with popular generative AI web browser assistants. These tools, designed to enhance web browsing with AI-powered features like summarization and search assistance, are collecting extensive personal data from users’ web activity without adequate safeguards.

The findings are presented and published as part of the USENIX Security Symposium on Aug. 13.

Unprecedented Access to Sensitive Data

The study is the first large-scale analysis of generative AI browser assistants and their impact on user privacy.

The researchers analyzed nine popular browser extensions, including ChatGPT for Google, Merlin and Copilot, finding that these tools collect detailed personal information, often without user consent.

The sensitive data captured includes medical records, social security numbers and even online banking details.

“Though many people are aware that search engines and social media platforms collect information about them for targeted advertising, these AI browser assistants operate with unprecedented access to users’ online behavior in areas of their online life that should remain private,” senior author Anna Maria Mandalari, from UCL Electronic & Electrical Engineering, said in a news release. “While they offer convenience, our findings show they often do so at the cost of user privacy, without transparency or consent and sometimes in breach of privacy legislation or the company’s own terms of service.”

Transparency and Regulation Urgently Needed

The study revealed extensive tracking, profiling and personalization practices that pose significant privacy concerns.

Tracking methods included transmitting full webpage content to servers, including form inputs and other visible information on the screen.

Some browser assistants, specifically Merlin and Sider, captured sensitive data even when users were accessing private or logged-in spaces like online health portals.

“This data collection and sharing is not trivial. Besides the selling or sharing of data with third parties, in a world where massive data hacks are frequent, there’s no way of knowing what’s happening with your browsing data once it has been gathered,” Mandalari added.

Testing and Results

For their analysis, the researchers simulated real-world browsing scenarios, creating a persona of a “rich, millennial male from California.”

The persona engaged in common online tasks, both public and private, such as reading news articles, shopping and accessing university health portals.

By intercepting and decrypting traffic between browser assistants, their servers and third-party trackers, the researchers were able to analyze the data transfer in real time.

The study found that several assistants, including ChatGPT for Google, Copilot and Monica, could infer user attributes like age, gender, income and interests and used this information to personalize responses across different browsing sessions.

Only one assistant, Perplexity, was found not to engage in profiling or personalization.

Call for Regulatory Oversight

The researchers highlighted the urgent need for regulatory oversight to protect personal data from unauthorized collection and sharing.

Findings indicated that some assistants violated U.S. data protection laws, such as the Health Insurance Portability and Accountability Act (HIPAA) and the Family Educational Rights and Privacy Act (FERPA), by gathering protected health and educational information.

Since the research was conducted in the United States, it did not cover compatibility with European data protection laws like GDPR. However, the authors suggested probable violations in those jurisdictions as well given their stricter privacy regulations.

“As generative AI becomes more embedded in our digital lives, we must ensure that privacy is not sacrificed for convenience. Our work lays the foundation for future regulation and transparency in this rapidly-evolving space,” added co-author Aurelio Canino from UCL and Mediterranea University of Reggio Calabria.

Future Recommendations

The authors recommend that developers adopt privacy-by-design principles, including local processing and explicit user consent for data collection, to mitigate these privacy concerns.

Moving forward, regulatory bodies and developers must address these issues to ensure that the use of AI in web browsing does not come at the cost of user privacy.

Source: University College London