New Machine Learning Method Enhances Fraud Detection

Researchers at Florida Atlantic University have unveiled a cutting-edge machine learning technique that significantly improves fraud detection. The new method reduces false positives and minimizes the need for further inspection, offering a promising solution for sectors where quickly processing large amounts of data is critical.

In a groundbreaking advancement, researchers from Florida Atlantic University’s College of Engineering and Computer Science have developed a novel machine learning method that promises to revolutionize fraud detection in sectors such as health care and finance. This innovative approach significantly reduces false positives and minimizes cases requiring further inspection, both critical for preventing financial losses and enhancing operational efficiency.

Fraud is a pervasive issue in the United States, driven increasingly by technology. Remarkably, 93% of credit card fraud now involves remote account access rather than physical theft. In 2023, for the first time, fraud losses surpassed $10 billion. Medicare fraud alone costs $60 billion annually, while identity theft resulted in $16.4 billion in losses in 2021. Overall, government losses due to improper payments have exceeded $2.7 trillion since 2003.

Given these staggering figures, deploying machine learning for fraud detection has become crucial. Traditional methods often falter due to messy or unlabeled data, and the rarity of fraud cases compared to normal ones presents additional challenges.

To tackle these issues, the FAU research team has engineered a method for generating binary class labels in highly imbalanced datasets without relying on labeled data. This marks a significant advantage in industries where privacy concerns and the high cost of labeling are substantial obstacles.

The new method was tested on two real-world, large-scale datasets with severe class imbalances: European credit card transactions and Medicare Part D claims. Both datasets, encompassing hundreds of thousands to millions of entries, posed a real-world challenge perfect for evaluating fraud detection techniques.

The research, published in the Journal of Big Data, demonstrated that their labeling method effectively addresses the difficulty of labeling severely imbalanced data in an unsupervised framework. Unlike traditional methods, their approach evaluates newly generated fraud and non-fraud labels directly without relying on a supervised classifier.

“The use of machine learning in fraud detection brings many advantages,” senior author Taghi Khoshgoftaar, the Motorola Professor in the Department of Electrical Engineering and Computer Science at FAU, said in a news release. “Machine learning algorithms can label data much faster than human annotation, significantly improving efficiency. Our method represents a major advancement in fraud detection, especially in highly imbalanced datasets. It reduces the workload by minimizing cases that require further inspection, which is crucial in sectors like Medicare and credit card fraud, where fast data processing is vital to prevent financial losses and enhance operational efficiency.”

Significantly, the study found that this new method outperformed the widely-used Isolation Forest algorithm, offering a more efficient way to identify fraud while minimizing the need for further investigation. This validates the method’s reliability in generating binary class labels for fraud detection, even in challenging datasets.

The solution is scalable, does not depend on expensive and time-consuming labeled data, and requires minimal human intervention.

“Our method generates labels for both fraud or positive and non-fraud or negative instances, which are then refined to minimize the number of fraud labels,” added first author Mary Anne Walauskis, a doctoral candidate in the Department of Electrical Engineering and Computer Science at FAU. “By applying our method, we minimize false positives, or in other words, genuine instances marked as fraud, which is key to improving fraud detection. This approach ensures that only the most confidently identified fraud cases are retained, enhancing accuracy and reducing unnecessary alarms, making fraud detection more efficient.”

The innovative technique combines an ensemble of three unsupervised learning methods using the SciKit-learn library and a percentile-gradient approach. By refining the labels, the method ensures that only the most confidently identified fraud cases are flagged, reducing false positives and unnecessary investigations.

Stella Batalama, dean of the College of Engineering and Computer Science, commented on the broader implications.

“This innovative approach holds great promise for industries plagued by fraud, offering a more accessible and effective way to identify fraudulent activity and safeguard both financial and health care systems. Addressing fraud is key to mitigating its broad societal impact,” she said.

As the research team continues to refine their method, including automating the determination of the optimal number of positive instances, the potential for large-scale application in fraud detection looks promising.

Source: Florida Atlantic University