Abstract
Widespread usage of mobile applications has generated massive behavioural information, which reflects routines, preferences, and lifestyles of users. Gender, demographic details are typically gathered in telecommunications settings in the registration of the SIM, but this data is often based on administrative ownership, and not the user of the device. This inconsistency can break the customer profiling, accuracy of segmentation and effectiveness of targeted marketing. The study is exploring the hypothesis through the examination of the data obtained based on an undertaking to check the validity of the registered gender data in telecommunication customer relationship management systems by using the behavioural proxy likelihood as a tool to determine this validity. The quantitative and positivist research approach is used to study a publicly available secondary dataset, TalkingData Mobile User Demographics, providing anonymised mobile app usage records and demographic labels. The behavioural features are machine coded (at the device level) and included features based on the intensity of use, type of applications, preference, and frequency of use. Exploration data analysis is performed to determine behavioural patterns by gender groups, and supervised machine learning modelling done on the “Logistic Regression”, “Gradient Boosting”, and the “Random Forest” algorithms. The metrics of model evaluation accuracy, precision, recall, F1-score, and AUC-ROC are used to evaluate model performance, and “Synthetic Minority Over-sampling Technique” (SMOTE) served to address the problem of class imbalance. The results prove that the patterns of mobile apps use can be identified as gender-oriented and have measurable behavioural patterns. The most influential predictors are found in application diversity and category engagement, which is better than mere frequency-based measurements. Random Forest and Gradient Boosting, which are ensemble methods, are found to be better than the Logistic Regression with classification accuracy of up to 74%. Behavioural mismatch rate is established at 25.93 between the predicted gender and genders recorded in CRM, indicating high discrepancies in the administrative demographic data.
Publication Date
5-4-2026
Document Type
Thesis
Student Type
Graduate
Degree Name
Professional Studies (MS)
Department, Program, or Center
Graduate Programs & Research
Advisor
EhsanWarriach
Recommended Citation
Alblooshi, Saleha, "Behaviour-Based Gender Prediction Using Mobile App Usage Patterns" (2026). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12634
Campus
RIT Dubai

Comments
This thesis has been embargoed. The full-text will be available on or around 2/10/2027.