The questions brought by high dimensional data is interesting and challenging. Our study is targeting on the particular type of data in this situation that namely “large p, small n”. Since the dimensionality is massively larger than the number of observations in the data, any measurement of covariance and its inverse will be miserably affected. The definition of high dimension in statistics has been changed throughout decades. Modern datasets with over thousands of dimensions are demanding the ability to gain deeper understanding but hindered by the curse of dimensionality. We decide to review and explore further to negotiate with the curse and extend previous studies to pave a new way for estimating robustness then apply it to outlier detection and classification.

We explored the random subspace learning and expand other classification and outlier detection algorithms to adapt its framework. Our proposed methods can handle both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.

Library of Congress Subject Headings

Algorithms; Machine learning; Outliers (Statistics); Dimension reduction (Statistics); Classification--Data processing

Publication Date


Document Type


Student Type


Degree Name

Applied Statistics (MS)

Department, Program, or Center

School of Mathematical Sciences (COS)


Ernest Fokoue

Advisor/Committee Member

Steven LaLonde

Advisor/Committee Member

Joseph Voelkel


Physical copy available from RIT's Wallace Library at QA276.7 .L48 2016


RIT – Main Campus