Abstract
Variable selection is of utmost importance in aviation safety where the data contains a large number of highly correlated predictors and flight safety has to be accurately predicted. Variable selection methods were not encouraged in medical research where the subject-matter knowledge is limited. For this reason, Genell, Anna Nemes, Szilard Steineck, Gunnar Dickman, Paul W. (2010) conducted simulated study to compare Bayesian Model Averaging and stepwise regression to motivate medical researchers to conduct automatic variable selection on their regression models and encourage them to take advantage of it. In this era of data science and Machine Learning, we have extended this comparative study by considering Machine learning algorithms. Various studies have shown that the Recursive feature elimination (RFE) algorithm reduces the effect of correlation on the variable importance measure and results in minimal prediction error. In this study, we compare RFE-RF, RFE-SVM and Bayesian Model Averaging (BMA) for simulated data in the presence of correlation by varying sample sizes (30,300) for 45 variables considering both cases n
p. Our results show that the percentage of selecting true predictors is highest for the RFE-RF model of all the three models. However, though the overall percentage of selecting true predictors is highest for RFE-RF, the estimated probability of selecting correlated true predictors is better for the Bayes in comparison to the other methods.comparison to the other methods.
Library of Congress Subject Headings
Variables (Mathematics); Bayesian statistical decision theory; Recursion theory
Publication Date
12-18-2019
Document Type
Thesis
Student Type
Graduate
Degree Name
Applied Statistics (MS)
Department, Program, or Center
School of Mathematical Sciences (COS)
Advisor
Ernest Fokoue
Advisor/Committee Member
Robert Parody
Advisor/Committee Member
Carol Marchetti
Recommended Citation
Rumao, Sailee, "Exploration of Variable Importance and Variable selection techniques in presence of correlated variables" (2019). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10296
Campus
RIT – Main Campus
Plan Codes
APPSTAT-MS