Innovative AI Agent Autonomously Solves Cybersecurity Challenges

NYU Tandon researchers have introduced EnIGMA, an AI agent that autonomously solves cybersecurity challenges, highlighting its potential to transform real-world applications and enhance security measures.

Researchers from the NYU Tandon School of Engineering, along with NYU Abu Dhabi and other collaborating institutions, have unveiled an advanced AI agent designed to autonomously address complex cybersecurity challenges. Dubbed EnIGMA, this breakthrough was presented at the International Conference on Machine Learning (ICML) 2025, showcasing an impressive leap forward in the field.

“EnIGMA is about using Large Language Model agents for cybersecurity applications,” co-author Meet Udeshi, a NYU Tandon doctoral student, said in a news release.

EnIGMA is the culmination of collaborative efforts led by Udeshi under the guidance of Ramesh Karri, chair of NYU Tandon’s Electrical and Computer Engineering Department (ECE) and a faculty member of the NYU Center for Cybersecurity and NYU Center for Advanced Technology in Telecommunications (CATT), and Farshad Khorrami, ECE professor and CATT faculty member.

The research has also seen contributions from diverse academic voices across NYU Abu Dhabi, Princeton University and Stanford University.

Traditional AI systems have already proven their worth in domains such as software development and web navigation, but cybersecurity posed unique challenges that rendered many existing AI frameworks insufficient.

However, EnIGMA leverages a specialized framework known as “Interactive Agent Tools” to translate visual cybersecurity programs into text-based formats comprehensible by AI models.

“We have to restructure those interfaces to feed it into an LLM properly. So we’ve done that for a couple of cybersecurity tools,” Udeshi added.

This innovative approach enables the AI to process text, overcoming the limitations of traditional graphical interfaces commonly used in cybersecurity tools like debuggers and network analyzers.

A pivotal component of EnIGMA’s development was the creation of a customized Capture The Flag (CTF) benchmark dataset. These CTF challenges, which simulate real-world vulnerabilities and are used in academic competitions, played a crucial role in training the AI.

“CTFs are like a gamified version of cybersecurity used in academic competitions. They’re not true cybersecurity problems you would face in the real world, but they are very good simulations,” added Udeshi.

The research demonstrated EnIGMA’s superiority across 390 CTF challenges spread over four benchmarks, achieving state-of-the-art results. The AI agent managed to solve more than three times as many challenges compared to previous systems.

Reflecting on the evolution of AI models, Udeshi noted, “Claude 3.5 Sonnet from Anthropic was the best model, and GPT-4o was second at that time.”

In the course of their research, the team discovered a phenomenon termed “soliloquizing,” where the AI generates hallucinated observations without actually interacting with its environment. This finding may have significant implications for AI safety and reliability moving forward.

Beyond academic accolades, the implications of EnIGMA extend into practical applications.

According to Udeshi, “If you think of an autonomous LLM agent that can solve these CTFs, that agent has substantial cybersecurity skills that you can use for other cybersecurity tasks as well.”

The prospect of such an agent applying its capabilities to real-world vulnerability assessments, autonomously attempting hundreds of different approaches, signals a transformative shift in cybersecurity operations.

However, the dual-use nature of this technology necessitates caution. While it promises to enhance cybersecurity defenses, there is also the potential for misuse.

In light of this, the researchers have informed major AI companies, including Meta, Anthropic and OpenAI, about their findings.

Source: NYU Tandon School of Engineering