Abstract
The goal of scene classification is to automatically assign a scene image to a semantic category (i.e. "building" or "river") based on analyzing the visual contents of this image. This is a challenging problem due to the scene images' variability, ambiguity, and a wide range of illumination or scale conditions that may apply. On the contrary, it is a fundamental problem in computer vision and can be used to guide other processes such as image browsing, contentbased image retrieval and object recognition by providing contextual information. This thesis implemented two scene classification systems: one is based on Spatial Pyramid Matching (SPM) and the other one is applying Hierarchical Dirichlet Processes (HDP). Both approaches are based on the most popular "bag-of-words" representation, which is a histogram of quantized visual features. SPM represents an image as a "spatial pyramid" which is produced by computing histograms of local features for multiple levels with different resolutions. "Spatial Pyramid Matching" is then used to estimate the overall perceptual similarity between images which can be used as a support vector machine (SVM) kernel. In the second approach, HDP is used to model the "bag-of-words" representations of images; each image is described as a mixture of latent themes and each theme is described as a mixture of words. The number of themes is automatically inferred from data. The themes are shared by images not only inside one scene category but also across all categories. Both systems are tested on three popular datasets from the field and their performances are compared. In addition, the two approaches are combined, resulting in performance improvement over either separate system.
Library of Congress Subject Headings
Image processing--Digital techniques; Images, Photographic--Classification; Image analysis; Dirichlet forms
Publication Date
2010
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Advisor
Gaborski, Roger
Recommended Citation
Yin, Haohui, "Scene classification using spatial pyramid matching and hierarchical Dirichlet processes" (2010). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/248
Campus
RIT – Main Campus
Plan Codes
COMPSCI-MS
Comments
Note: imported from RIT’s Digital Media Library running on DSpace to RIT Scholar Works in December 2013.