Abstract
Uncertain data is an increasingly prevalent topic in database research, given the advance of instruments which inherently generate uncertainty in their data. In particular, the problem of indexing uncertain data for range queries has received considerable attention. To efficiently process range queries, existing approaches mainly focus on reducing the number of disk I/Os. However, due to the inherent complexity of uncertain data, processing a range query may incur high computational cost in addition to the I/O cost. In this paper, I present a novel indexing strategy focusing on one-dimensional uncertain continuous data, called threshold interval indexing. Threshold interval indexing is able to balance I/O cost and computational cost to achieve an optimal overall query performance. A key ingredient of the proposed indexing structure is a dynamic interval tree. The dynamic interval tree is much more resistant to skew than R-trees, which are widely used in other indexing structures. This interval tree optimizes pruning by storing x-bounds, or pre-calculated probability boundaries, at each node. In addition to the basic threshold interval index, I present two variants, called the strong threshold interval index and the hyper threshold interval index, which leverage x-bounds not only for pruning but also for accepting results. Furthermore, I present a more efficient memory-loaded versions of these indexes, which reduce the storage size so the primary interval tree can be loaded into memory. Each index description includes methods for querying, parallelizing, updating, bulk loading, and externalizing. I perform an extensive set of experiments to demonstrate the effectiveness and efficiency of the proposed indexing strategies.
Library of Congress Subject Headings
Data mining; Uncertainty (Information theory); Information storage and retrieval systems; Indexing
Publication Date
2010
Document Type
Thesis
Department, Program, or Center
Computer Science (GCCIS)
Advisor
Rege, Mangeet
Recommended Citation
Knight, Andrew, "Threshold interval indexing techniques for complicated uncertain data" (2010). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/672
Campus
RIT – Main Campus
Comments
Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works. Physical copy available through RIT's The Wallace Library at: QA76.9.D343 K64 2010