Abstract

The current research examines one of the most effective approaches to spam email detection based on machine learning and natural language processing (NLP). The study is placed in the context of the rising cyber threats and the influx of emails, where the spam/ham data is to be classified correctfully using the combination of the Logistic Regression, NLP (including tokenization, lemmatization, and TF-IDF vectorization). The questions of the research were devoted to the efficiency of such an approach and the interpretation of its results. The data used are obtained by a publicly available Kaggle data set that contains 5,572 labeled email messages. The quantitative approach was applied, i.e. model training, evaluation, and visualization. The outcomes proved to be very accurate, precise, and had high rates of ROC-AUC proving the efficiency of the model. As conclusions show, when properly preprocessed, Logistic Regression can provide a low-cost but strongly performing method of carrying out spam-detection. Among suggestions, it is possible to calculate ensemble methods, class imbalance, and discover deep learning models in real-time implementation in further studies.

Library of Congress Subject Headings

Spam filtering (Electronic mail)--Data processing; Machine learning; Natural language processing (Computer science)

Publication Date

12-2025

Document Type

Thesis

Student Type

Graduate

Degree Name

Professional Studies (MS)

Department, Program, or Center

Graduate Programs & Research

Advisor

Ioannis Karamitsos

Advisor/Committee Member

Sanjay Modak

Recommended Citation

Almarri, Rashed, "Comparative Analysis of Machine Learning Models for Spam Email Detection" (2025). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12395

Campus

RIT Dubai

Plan Codes

PROFST-MS

Download

COinS

Theses

Comparative Analysis of Machine Learning Models for Spam Email Detection

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

Comparative Analysis of Machine Learning Models for Spam Email Detection

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links