CTICTR | BABCOCK UNIVERSITY COLLEGE OF POSTGRADUATE STUDIES JOURNALS

A Random Forest Classifier – Based Email Spam Detection Model

Adedoyin S. Adebanjo, Oreoluwa A. Adesegha, Elizabeth Ogungbefun, Faysal O. Aliyu, Emmanuel Mgbeahuruike, Babajide E. Adeoti, and Emmanuel Oyerinde

Abstract

Email spam is a constant threat to productivity and security. Traditional rule-based filters often struggle to keep up with changing spam techniques. This study introduces a spam detection model based on a Random Forest Classifier that uses a publicly available dataset. We applied text preprocessing with Natural Language Processing (NLP) methods, such as tokenization, stop-word removal, and TF-IDF, to extract important features. We evaluated the model using accuracy, precision, recall, and F1-score. The results were impressive, achieving 99% accuracy, 97% precision for legitimate emails, 100% precision for spam, 99% recall for both categories, and F1-scores of 98% for legitimate emails and 99% for spam. These results highlight the effectiveness of Random Forest in spam detection and show its promise for creating reliable and flexible email filtering systems that improve security and user experience.

Keywords: Email Spam Detection, Ensemble Learning, Random Forrest Classifier, Supervised Machine Learning.