Bug localization is one of the most important stages of the bug fixing process. Bad practices make the debugging a tedious task. Investigating bugs can contribute up to a large portion of the aggregate cost for a software project. An automated strategy that can provide a ranked list of source code files with respect to how likely they contain the root cause of the problem would help the development teams to decrease the search space and leads to increase in the productivity. In this work, I have replicated the bug localization approach presented in \cite{ye2014learning} that applies the learning-to-rank technique to rank the relevant files for each bug. This technique applies domain knowledge by evaluating the textual similarity between bug reports and source code files and API specification documents plus bug fixing and code alteration history. For a given bug report, the ranking function is constructed based on the linear combination of weighted features where the features are trained on previously solved bug reports. In addition to replication of the mentioned technique, I have extended the study by evaluating the role of different text preprocessing techniques such as Stemming and Lemmatization and also a randomized selection of training folds on the overall performance of the ranking model. I found that Lemmatization of the words and randomized selection of the training folds have an adverse effect on the performance of the ranking model and consequently having lower accuracy and precision of the results.

Library of Congress Subject Headings

Ranking and selection (Statistics); Debugging in computer science; Pattern perception; Natural language processing (Computer science); Information retrieval

Publication Date


Document Type


Student Type


Degree Name

Software Engineering (MS)

Department, Program, or Center

Software Engineering (GCCIS)


Mohamed Wiem Mkaouer

Advisor/Committee Member

Christian Newman

Advisor/Committee Member

Yasmin El-Glaly


RIT – Main Campus

Plan Codes