Abstract

The growing digital interactions across various social media platforms increases the risks of phishing schemes and spam messages that compromise personal and organisational security. This thesis tackles such threats with the aid of machine learning and natural processing techniques that are utilized for detecting malicious communications with high accuracy. A total of three datasets were used in this study, two for phishing and one for spam. The research aims to explore the performance of classification models, including Logistic Regression, Support Vector Machine, Naïve Bayes, and Linear Discriminant Analysis, as well as a hybrid model combining Logistic Regression and Naïve Bayes. Additionally, feature selection tools, such as SHapley Additive exPlanations was used to identify the dominant features in the datasets, helping make the focus of the machine learning algorithms on impactful features only to achieve computational efficiency. Key performance metrics, such as accuracy, precision, recall, and F1-score, reveal that Support Vector Machine and Logistic Regression models perform well, with the hybrid model achieving a balance of precision and recall for both phishing and spam detection. Insights from feature analysis show that specific web address characteristics are crucial in phishing detection, while email content drives spam identification. This research underscores the potential of machine learning to enhance digital security and lays a foundation for future refinements to adapt to emerging threats and diversify data sources for stronger detection capabilities

Publication Date

2024

Document Type

Thesis

Student Type

Graduate

Degree Name

Computing Security (MS)

Department, Program, or Center

Electrical Engineering

Advisor

Huda Saadeh

Advisor/Committee Member

Kevser Akpinar

Advisor/Committee Member

Ali Asi

Campus

RIT Dubai

Share

COinS