Abstract

The growing use of the internet resulted in emerging of new websites every day (Total number of Websites - Internet Live Stats, 2020). Web surfing has become important for everyone regardless of their occupation, age or location. However, as the use of the internet is increasing so is the vulnerability to malware attacks through malicious websites (Softpedia, 2016). Identifying and dealing with such malicious website has been quite difficult in the past as it is quite challenging to separate good websites from bad websites. However, by using machine learning algorithms on large datasets it is now possible to detect such websites beforehand. Classifiers trained using algorithms such as logistic regression and Support Vector Machine (SVM) can be used to detect malicious websites and the users can be warned about the risk before they visit such sites. This project focuses on using a variety of different classification algorithms to distinguish whether a website is malicious or not using the Kaggle Malicious and Benign Website Dataset. We have showcased that it is possible to detect malicious websites with a reasonable amount of certainty (90% of the 75 malicious websites in the test set were identified) using machine learning models. We have also determined the features that were critical in predicting the likelihood of a website being malicious. Most of our key features are easily available (URL Length, number of Special characters, Country, Age of website).

Publication Date

4-20-2020

Document Type

Master's Project

Student Type

Graduate

Degree Name

Professional Studies (MS)

Department, Program, or Center

Graduate Programs & Research (Dubai)

Advisor

Sanjay Modak

Advisor/Committee Member

Ehsan Warriach

Campus

RIT Dubai

Share

COinS