Abstract

The growing digital interactions across various social media platforms increases the risks of phishing schemes and spam messages that compromise personal and organisational security. This thesis tackles such threats with the aid of machine learning and natural processing techniques that are utilized for detecting malicious communications with high accuracy. A total of three datasets were used in this study, two for phishing and one for spam. The research aims to explore the performance of classification models, including Logistic Regression, Support Vector Machine, Naïve Bayes, and Linear Discriminant Analysis, as well as a hybrid model combining Logistic Regression and Naïve Bayes. Additionally, feature selection tools, such as SHapley Additive exPlanations was used to identify the dominant features in the datasets, helping make the focus of the machine learning algorithms on impactful features only to achieve computational efficiency. Key performance metrics, such as accuracy, precision, recall, and F1-score, reveal that Support Vector Machine and Logistic Regression models perform well, with the hybrid model achieving a balance of precision and recall for both phishing and spam detection. Insights from feature analysis show that specific web address characteristics are crucial in phishing detection, while email content drives spam identification. This research underscores the potential of machine learning to enhance digital security and lays a foundation for future refinements to adapt to emerging threats and diversify data sources for stronger detection capabilities

Library of Congress Subject Headings

Phishing--Prevention; Spam filtering (Electronic mail); Social media--Security measures; Machine learning; Natural language processing (Computer science)

Publication Date

2024

Document Type

Thesis

Student Type

Graduate

Degree Name

Computing Security (MS)

Department, Program, or Center

Electrical Engineering

Advisor

Huda Saadeh

Advisor/Committee Member

Kevser Akpinar

Advisor/Committee Member

Ali Asi

Recommended Citation

Al Shehhi, Ahmed, "Machine Learning Based Solutions for Detecting Social Media Spam and Phishing Incidents" (2024). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/11929

Campus

RIT Dubai

Plan Codes

COMPSEC-MS

Download

COinS

Theses

Machine Learning Based Solutions for Detecting Social Media Spam and Phishing Incidents

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

Machine Learning Based Solutions for Detecting Social Media Spam and Phishing Incidents

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links