Abstract
This thesis details the process in which a part-of-speech tagger is developed in order to determine grammar patterns in source code identifiers. These grammar patterns are used to aid in the proper naming of identifiers in order to improve reader comprehension. This tagger is a continuation of an effort of a previous Ensemble Tagger [62], but with a focus on increasing the tagging rate while maintaining the accuracy, in order to make the tagger scalable. The Scalable Tagger will be trained on open source data sets, with a machine learning model and training features that are chosen to best suit the needs for accuracy and tagging rate. The results of the experiment will be contrasted with the results of the Ensemble Tagger to determine the Scalable Tagger’s efficacy.
Library of Congress Subject Headings
Natural language processing (Computer science); Software engineering; Machine learning
Publication Date
5-10-2023
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Advisor
J. Scott Hawker
Advisor/Committee Member
Mohamed Wiem Mkaouer
Advisor/Committee Member
Christian Newman
Recommended Citation
Burris, Gavin, "Efficiently Annotating Source Code Identifiers Using a Scalable Part of Speech Tagger" (2023). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/11471
Campus
RIT – Main Campus
Plan Codes
COMPSCI-MS