Abstract
A wide range of researchers is beginning to utilize customized statistical methods for analyzing data as hardware and software become cheaper and more widely available. Cluster Rank Analysis (CRA) is an existing multivariate statistical algorithm that existed as an inefficient service-oriented application. Here it is described how CRA was optimized and parallelized using an available computing cluster and both open source and custom software. This was followed by the development of a command-line submission system for CRA jobs, as well as a Web retrieval system for the results of analyses. A subsequent timing study revealed speedup that quickly rose to 15 by the use 35 processors, and should reach a proposed maximum of 19 given over 100 processors. It was found that this speedup was limited primarily by the serial portion of code; the Ethernet communication network was sufficient for this application. By the time that even 10 processors were involved in parallel runs, the average runtime had dropped from over 100 minutes to approximately 15 minutes, before being reduced to 6 minutes by 80 processors. The locations of bottlenecks suggest that further performance increases are possible through additional parallelization. This work with CRA illustrates (1) the speed with which high-performance in-house applications can be developed and (2) the speed and efficiency with which statistical analyses of complex data structures can be carried out given commodity hardware and software resources.
Library of Congress Subject Headings
Cluster analysis--Data processing; Biology--Research--Data processing; Parallel processing (Electronic computers)
Publication Date
2007
Document Type
Thesis
Student Type
Graduate
Degree Name
Bioinformatics (MS)
Department, Program, or Center
Thomas H. Gosnell School of Life Sciences (COS)
Advisor
Michael Osier
Advisor/Committee Member
Dina Newman
Advisor/Committee Member
Paul Shipman
Recommended Citation
Esposito, Anthony G. Jr., "Parallelizing the Cluster Rank Analysis application" (2007). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/7786
Campus
RIT – Main Campus
Comments
Physical copy available from RIT's Wallace Library at QA278 .E77 2007