Abstract
ALFRED is a central and curated repository for allele frequency data for anthropologically defined human populations. To study and estimate the relationships and similarities between populations, researchers require a large and complete data set. However, the data set within ALFRED is not complete. Specifically, not all the populations in the database have been typed for all the polymorphisms. Mining ALFRED for the largest complete data set is equivalent to the 'Maximal Biclique' problem in graph theory. This is proven to be NP-Complete and no single algorithm can find the perfect solution in polynomial time. This project describes a heuristic (Largest Maximal Biclique Heuristic) which finds the largest complete data set from ALFRED, in real time. The program is compared to various other methods, including Wen- Chieh Chang's implementation of the 'maximal biclique' algorithm proposed by Alexe et.al. The algorithm efficiently mines ALFRED to extract the largest complete data set, and the results are made available for researchers in uniform data exchange format, through a Web site. Since ALFRED is updated frequently, the LMBH program is set up to mine ALFRED on a regular basis and provide researchers with the most up-to-date, largest complete data set from ALFRED.
Library of Congress Subject Headings
Population genetics--Data processing; Genetic algorithms; Data mining; Allelomorphism; Graph theory; Bipartite graphs
Publication Date
5-24-2006
Document Type
Thesis
Department, Program, or Center
Biomedical Sciences (CHST)
Advisor
Osier, Michael - Chair
Advisor/Committee Member
Reynolds, Carl
Advisor/Committee Member
Halavin, James
Recommended Citation
Uduman, Mohamed, "Identifying the largest complete data set from ALFRED" (2006). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/2734
Campus
RIT – Main Campus
Comments
Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works. Physical copy available through RIT's The Wallace Library at: QH455 .U48 2006