Abstract
Within the biological sciences, spreadsheets are commonly used as a data entry and storage medium. While this practice is simple and generally well understood, the unrestrained flexibility of the spreadsheet medium allows errors to accumulate and potentially propagate. Such errors impede accurate analysis, hindering research. The underlying problem is that the error correction facilities of typical spreadsheet programs are lackluster at best, if they exist at all. For this reason, Error Sentinel was developed. Error Sentinel is a spreadsheet program with programmable error correction facilities. These facilities allow users to define exactly what clean data is, along with corrections for erroneous data. Such rules are specified via a custom visual programming language. Once error correction rules are written, users inputting data need not be familiar with the rules or even have programming skills in order to utilize them. Error Sentinel can be used interactively like a typical spreadsheet program, or non-interactively as with more traditional error correction techniques. To test Error Sentinel's real-world capabilities, it was successfully applied to the correction of the mtHaplogroups data set. This application has shown that Error Sentinel requires far less time and code to perform error correction than with previous methods. Benchmarking has shown that such gains are at only a modest cost in performance. While Error Sentinel appears quite simplistic compared to typical spreadsheet programs, its error correction facilities are robust, and it is fully capable of being applied to arbitrary data sets represented in the spreadsheet medium.
Library of Congress Subject Headings
Electronic spreadsheets--Computer programs; Biology--Data processing; Error-correcting codes (Information theory)
Publication Date
6-21-2011
Document Type
Thesis
Department, Program, or Center
Thomas H. Gosnell School of Life Sciences (COS)
Advisor
Osier, Michael
Advisor/Committee Member
Skuse, Gary
Advisor/Committee Member
Newman, Dina
Recommended Citation
Dewey, Kyle, "Error sentinel: A Rule-based spreadsheet program for intelligent data entry, error correction, and curation" (2011). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/4078
Campus
RIT – Main Campus
Comments
Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works. Physical copy available through RIT's The Wallace Library at: HF5548.2 .D49 2011