In the post-exascale era storage systems, a fundamental challenge faced by the research community is the efficient and scalable access to the stored information while meeting the high-performance requirements of big data applications. In this dissertation, we studied the limitations in the existing state-of-the-art architectures and proposed a system to address the challenges of scalability and high performance. Our proposed solution, called MITRA, supports several scientific formats, i.e., Hierarchical Data Format (HDF), network Common Data Form (netCDF), and Comma-Separated Values (CSV), and is composed of several software components that work together to provide high I/O throughput to user applications. The key novelty of MITRA lies in supporting a variety of file formats, generation and indexing of metadata for scientific datasets, and optimizing data lookup time while providing scalability of storage subsystem with the increasing amount of data. MITRA generates and manages indices using a relational database which can be effectively accessed using conventional application programming interfaces (APIs). We evaluated the performance of MITRA and compare it with the traditional approaches for its ingestion speed, content processing, lookup time, and scalability for the generated indices. Our evaluation reveals that the rich metadata indices of MITRA improve system lookup by reducing the search space for the metadata that is not present in indices. Moreover, MITRA outperforms the existing approach in terms of scalability as indices grow in size by balancing the load between available hardware resources.

Library of Congress Subject Headings

Metadata--Management; Research--Abstracting and indexing; Big data

Publication Date


Document Type


Student Type


Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science (GCCIS)


Mustafa Rafique

Advisor/Committee Member

Michael Mior

Advisor/Committee Member

Ifeoma Nwogu


RIT – Main Campus

Plan Codes