Abstract
The questions brought by high dimensional data is interesting and challenging. Our study is targeting on the particular type of data in this situation that namely “large p, small n”. Since the dimensionality is massively larger than the number of observations in the data, any measurement of covariance and its inverse will be miserably affected. The definition of high dimension in statistics has been changed throughout decades. Modern datasets with over thousands of dimensions are demanding the ability to gain deeper understanding but hindered by the curse of dimensionality. We decide to review and explore further to negotiate with the curse and extend previous studies to pave a new way for estimating robustness then apply it to outlier detection and classification.
We explored the random subspace learning and expand other classification and outlier detection algorithms to adapt its framework. Our proposed methods can handle both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.
Library of Congress Subject Headings
Algorithms; Machine learning; Outliers (Statistics); Dimension reduction (Statistics); Classification--Data processing
Publication Date
5-2016
Document Type
Thesis
Student Type
Graduate
Degree Name
Applied Statistics (MS)
Department, Program, or Center
School of Mathematical Sciences (COS)
Advisor
Ernest Fokoue
Advisor/Committee Member
Steven LaLonde
Advisor/Committee Member
Joseph Voelkel
Recommended Citation
Liu, Bohan, "Random Subspace Learning on Outlier Detection and Classification with Minimum Covariance Determinant Estimator" (2016). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/9066
Campus
RIT – Main Campus
Comments
Physical copy available from RIT's Wallace Library at QA276.7 .L48 2016