Abstract
Variable selection plays a major role in multivariate high-dimensional statistical modeling. Hence, we need to select a consistent model, which avoids overfitting in prediction, enhances model interpretability and identifies relevant variables. We explore various continuous, nearly unbiased, sparse and accurate technique of linear model using coefficients paths like penalized maximum likelihood and nonconvex penalties, and iterative Sure Independence Screening (SIS). The convex penalized (pseudo-) likelihood approach based on the elastic net uses a mixture of the ℓ1 (Lasso) and ℓ2 (ridge regression) simultaneously achieve automatic variable selection, continuous shrinkage, and selection of the groups of correlated variables. Variable selection using coefficients paths for minimax concave penalty (MCP), starts applying penalization at the same rate as Lasso, and then smoothly relaxes the rate down to zero as the absolute value of the coefficient increases. The sure screening method is based on correlation learning, which computes component wise estimators using AIC for tuning the regularization parameter of the penalized likelihood Lasso. To reflect the eternal nature of spectral data, we use the Functional Data approach by approximating the finite linear combination of basis functions using B-splines. MCP, SIS and Functional regression are based on the intuition that the predictors are independent. However, high-dimensional grapevine dataset suffers from ill-conditioning of the covariance matrix due to multicollinearity. Under collinearity, the Elastic-Net Regularization path via Coordinate Descent yields the best result to control the sparsity of the model and cross-validation to reduce bias in variable selection. Iterative stepwise multiple linear regression reduces complexity and enhances the predictability of the model by selecting only significant predictors.
Library of Congress Subject Headings
Grapes--Genetics--Data processing; Functional analysis; Multivariate analysis
Publication Date
5-19-2017
Document Type
Thesis
Student Type
Graduate
Degree Name
Applied Statistics (MS)
Department, Program, or Center
School of Mathematical Sciences (COS)
Advisor
Peter Bajorski
Advisor/Committee Member
Jan van Aardt
Advisor/Committee Member
Ernest Fokoue
Recommended Citation
Jha, Uday Kant, "High-Dimensional Linear and Functional Analysis of Multivariate Grapevine Data" (2017). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/9473
Campus
RIT – Main Campus
Plan Codes
APPSTAT-MS
Comments
Physical copy available from RIT's Wallace Library at QA320 .J43 2017