Machine-learning algorithms have the potential to support trace retrieval methods making significant reductions in costs and human-involvement required for the creation and maintenance of traceability links between system requirements, system architecture, and the source code. These algorithms can be trained how to detect the relevant architecture and can then be sent to find it on its own. However, the long-term reductions in cost and effort face a significant upfront cost in the initial training of the algorithm. This cost comes in the form of needing to create training sets of code, which train the algorithm how to identify traceability links. These supervised or semi-supervised training methods require the involvement of highly trained, and thus expensive, experts to collect, and format, these data-sets. In this thesis, three baseline methods training datasets creation are presented. These methods are (i) Manual Expert-based, which involves a human-compiled dataset, (ii) Automated Web-Mining, which creates training datasets by collecting and data-mining APIs (specifically from technical-programming websites), and (iii) Automated Big-Data Analysis, which data-mines ultra-large code repositories to generate the training datasets. The trace-link creation accuracy achieved using each of these three methods is compared, and the cost/benefit comparisons between them is discussed. Furthermore, in a related area, potential correlations between training set size and the accuracy of recovering trace links is investigated. The results of this area of study indicate that the automated techniques, capable of creating very large training sets, allow for sufficient reliability in the problem of tracing architectural tactics. This indicates that these automated methods have potential applications in other areas of software traceability.

Library of Congress Subject Headings

Machine learning; Software architecture; Computer software--Development

Publication Date


Document Type


Student Type


Degree Name

Software Engineering (MS)

Department, Program, or Center

Software Engineering (GCCIS)


Mehdi Mirakhorli

Advisor/Committee Member

Scott Hawker

Advisor/Committee Member

Stephanie Ludi


Physical copy available from RIT's Wallace Library at Q352.5 .Z64 2015


RIT – Main Campus

Plan Codes