Prabin Dutta


The Human Genome Project generated vast amounts of DNA sequenced data scattered in disparate data sources in a variety of formats. Integrating biological data and extracting information held in DNA sequences are major ongoing tasks for biologists and software professionals. This thesis explored issues of finding, extracting, merging and synthesizing information from multiple disparate data sources containing DNA sequenced data, which is composed of 3 billion chemical building blocks of bases. We proposed a biological data integration framework based on typical usage patterns to simplify these issues for biologists. The framework uses a relational database management system at the backend, and provides techniques to extract, store, and manage the data. This framework was implemented, evaluated, and compared with existing biological data integration solutions.

Library of Congress Subject Headings

Human Genome Project--Data processing; Database management; Computational biology; Relational databases

Publication Date


Document Type


Student Type


Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science (GCCIS)


Rajendra Raj

Advisor/Committee Member

Paul Tymann

Advisor/Committee Member

Warren Carithers


Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works. Physical copy available through RIT's The Wallace Library at: QH447 .D88 2008


RIT – Main Campus