Abstract

Variable selection is of utmost importance in aviation safety where the data contains a large number of highly correlated predictors and flight safety has to be accurately predicted. Variable selection methods were not encouraged in medical research where the subject-matter knowledge is limited. For this reason, Genell, Anna Nemes, Szilard Steineck, Gunnar Dickman, Paul W. (2010) conducted simulated study to compare Bayesian Model Averaging and stepwise regression to motivate medical researchers to conduct automatic variable selection on their regression models and encourage them to take advantage of it. In this era of data science and Machine Learning, we have extended this comparative study by considering Machine learning algorithms. Various studies have shown that the Recursive feature elimination (RFE) algorithm reduces the effect of correlation on the variable importance measure and results in minimal prediction error. In this study, we compare RFE-RF, RFE-SVM and Bayesian Model Averaging (BMA) for simulated data in the presence of correlation by varying sample sizes (30,300) for 45 variables considering both cases n

p. Our results show that the percentage of selecting true predictors is highest for the RFE-RF model of all the three models. However, though the overall percentage of selecting true predictors is highest for RFE-RF, the estimated probability of selecting correlated true predictors is better for the Bayes in comparison to the other methods.comparison to the other methods.

Library of Congress Subject Headings

Variables (Mathematics); Bayesian statistical decision theory; Recursion theory

Publication Date

12-18-2019

Document Type

Thesis

Student Type

Graduate

Degree Name

Applied Statistics (MS)

Department, Program, or Center

School of Mathematical Sciences (COS)

Advisor

Ernest Fokoue

Advisor/Committee Member

Robert Parody

Advisor/Committee Member

Carol Marchetti

Campus

RIT – Main Campus

Plan Codes

APPSTAT-MS

Share

COinS