Abstract
The Human Genome Project generated vast amounts of DNA sequenced data scattered in disparate data sources in a variety of formats. Integrating biological data and extracting information held in DNA sequences are major ongoing tasks for biologists and software professionals. This thesis explored issues of finding, extracting, merging and synthesizing information from multiple disparate data sources containing DNA sequenced data, which is composed of 3 billion chemical building blocks of bases. We proposed a biological data integration framework based on typical usage patterns to simplify these issues for biologists. The framework uses a relational database management system at the backend, and provides techniques to extract, store, and manage the data. This framework was implemented, evaluated, and compared with existing biological data integration solutions.
Library of Congress Subject Headings
Human Genome Project--Data processing; Database management; Computational biology; Relational databases
Publication Date
2008
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Advisor
Rajendra Raj
Advisor/Committee Member
Paul Tymann
Advisor/Committee Member
Warren Carithers
Recommended Citation
Dutta, Prabin, "A framework for integrating DNA sequenced data" (2008). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/7767
Campus
RIT – Main Campus
Comments
Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works. Physical copy available through RIT's The Wallace Library at: QH447 .D88 2008