Abstract
De novo genomic sequencing, which is the process of discovering the sequence of a genome which has not previously been elucidated, provides unique challenges, especially for larger genomes. Modern high-throughput sequencing technologies have addressed the issue of covering the entire genome in a reasonable time by fragmenting the genome into portions that can be examined in a massively-parallel approach. While these have saved considerable time and cost for the chemical process of determining the sequence of a genome, they result in sets of many tens of millions of sequence fragments called reads, each of which is typically on the order of just 100 to 300 bases long. Assembling these reads into a genomic sequence is highly computationally complex.
A variety of assembly software packages are readily available for this purpose. In this project, a set of genomic assemblers was selected for examination. These programs were then tested with an Illumina data set for the grape species Vitis romanetii. Experimental runs with this dataset were performed to evaluate the run time required as well as the contiguity, completeness, and accuracy of the resulting assemblies. Different approaches to quality control preprocessing of the sequence data were also explored and evaluated. The results strongly recommend the use of the program MaSuRCA, run with data which has not been preprocessed for quality control. The second highest recommendation would be the use of ABySS with data preprocessed via QuorUM error-correction.
In the process of these tests, it was also hoped that at least the beginnings of a draft genome for V. romanetii would be produced. The assemblies which came closest to publication quality were produced by MaSuRCA. Examination of these using the assessment software BUSCO suggest that the best of these assemblies may well be approaching publishable quality.
Library of Congress Subject Headings
Genomics--Data processing; Grapes--Genetics--Data processing
Publication Date
6-20-2019
Document Type
Thesis
Student Type
Graduate
Degree Name
Bioinformatics (MS)
Department, Program, or Center
Thomas H. Gosnell School of Life Sciences (COS)
Advisor
Michael V. Osier
Advisor/Committee Member
Lance Cadle-Davidson
Advisor/Committee Member
Julie A. Thomas
Recommended Citation
Olsen, Lars J., "Functional Comparison of Current Software Tools for Genomic Assembly from High Throughput Sequencing Data" (2019). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10121
Campus
RIT – Main Campus
Plan Codes
BIOINFO-MS