Document Type : Research Paper
Abstract
The growing threat of unwanted email (spam) messages has led to the importance of spam filtering in the secure exchange of digital communication. Spam messages invade user privacy and undermine the integrity of systems and productivity. Over 50% of global email traffic is spam, which has recently evolved into more sophisticated phishing attacks, using embedded images scams, and other methods. Traditionally spam filtering methods no longer suffice. The state of the art deep learning methods greatly improve spam filtering by performing sophisticated automated feature extraction on text, images, and mixed media. Spam filtering techniques make use of deep learning Convolutional Neural Networks (CNNs) for feature localization, Recurrent Neural Networks (RNNs) models with Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRUs) for sequential processing of text and other media, and transformer models like BERT and RoBERTa for contextual analysis of text. Reported accuracies for spam detection using hybrid methods reach 99.33%, though these methods require extensive resources.
Nonetheless, challenges such as deficiencies and gaps in datasets along with issues of interpretability, privacy, and large-scale training resource consumption remain unaddressed. Potential solutions, such as self-supervised learning, the development of lighter models, and adaptive techniques like reinforcement learning and continual learning, show value. The proposed frameworks of explainable AI (XAI) are growing in popularity, and for good reason—they increase transparency and trust. Deep learning is, without a doubt, spam filtering's paradigm shifting technology. The primary issue, however, is achieving operational efficiency.