Abstract

The notion of Attack Surface refers to the critical points on the boundary of a software system which are accessible from outside or contain valuable content for attackers. The ability to identify attack surface components of software system has a significant role in effectiveness of different security analysis approaches such as vulnerability analysis. Most prior works focus on the security analysis approach and use an approximation of attack surfaces. There have not been many attempts to create a comprehensive list of attack surface components. Although limited number of studies have focused on attack surface analysis, they defined attack surface components based on project specific hypotheses to evaluate security risk of specific types of software applications. This thesis provides a comprehensive attack surface model and proposes novel approaches for automating detection of attack surface components in source code. By leveraging a qualitative analysis approach, we empirically identify an extensive list of attack surface components. To this end, we conduct a Grounded Theory (GT) analysis on 1444 previously published vulnerability reports and weaknesses. We extract vulnerability information from two publicly available repositories: 1) Common Vulnerabilities and Exposures (CVE) and 2) Common Weakness Enumeration (CWE). We ask three key questions: where the attacks come from, what they target, and how they emerge. To answer these questions three core categories for attack surface components are defined: Entry points, Targets, and Mechanisms. We extract attack surface concepts related to each category from collected vulnerability information using the GT analysis and provide a comprehensive categorization that represents attack surface components of software systems from various perspectives. This research introduces 254 new attack surface components that did not exist in the literature. In this study, we propose two new generic approaches based on Language Models (LM) that can be used to detect different types of attack surface components. 1) A probability-based classification approach using novel term weighting technique; 2) A novel Natural Language Inference (NLI) model based on pre-trained CodeBERT model. We evaluate the approaches for identifying nine different types of attack surface components using a java dataset collected from GitHub. The experimental results show that the term weighting approach can detect attack surface components with Fscore higher than 80%. The proposed CodeBERT NLI approach can detect the attack surface components with Fscore higher than 92% and for some attack surface components the Fscore is 100%. We also evaluate ChatGPT performance in identifying the attack surface components. ChatGPT responses show that its capability in identifying attack surface components is different. It can detect different attack surface components with Fscore between 30%- 90%. Finally, we compare three approaches and show that the proposed CodeBERT NLI has the best performance in comparison to the term weighting approach and ChatGPT.

Publication Date

8-2023

Document Type

Dissertation

Student Type

Graduate

Degree Name

Computing and Information Sciences (Ph.D.)

Department, Program, or Center

Computing and Information Sciences Ph.D, Department of

College

Golisano College of Computing and Information Sciences

Advisor

Mehdi Mirakhorli

Advisor/Committee Member

Christian Newman

Advisor/Committee Member

Mohamed Wiem Mkaouer

Campus

RIT – Main Campus

Share

COinS