Automated app review analysis is an important avenue for extracting a variety of requirements-related information. Typically, a first step toward performing such analysis is preparing a training dataset, where developers(experts) identify a set of reviews and, manually, annotate them according to a given task. Having sufficiently large training data is important for both achieving a high prediction accuracy and avoiding over-fitting. Given millions of reviews, preparing a training set is laborious.We propose to incorporate active learning, a machine learning paradigm,in order to reduce the human effort involved in app review analysis. Our app review classification framework exploits three active learning strategies based on uncertainty sampling. We apply these strategies to an existing dataset of 4,400 app reviews for classifying app reviews as features, bugs, rating, and user experience. We find that active learning, compared to a training dataset chosen randomly, yields a significantly higher prediction accuracy under multiple scenarios.

Publication Date


Document Type


Student Type


Degree Name

Software Engineering (MS)

Department, Program, or Center

Software Engineering (GCCIS)


Pradeep Murukanah

Advisor/Committee Member

Mohamed Wiem Mkaouer

Advisor/Committee Member

J. Scott Hawker


RIT – Main Campus