Abstract
Over the past decades, deep learning methods have emerged as the predominant approach in computer vision. Most approaches rely on complex interactions among pixels, involving heavy computations. However, Human vision possesses a unique attention mechanism that selectively focuses on relevant parts of the visual field while disregarding irrelevant information. This can be likened to a clustering approach, in which individual pixel points are decomposed and reorganized into relevant concepts to address various tasks. This dissertation explores the frontier of vision clustering by integrating innovative prototypical learning with advanced Transformer architectures, which heralds a significant paradigm shift applied to vision clustering. At the heart of this scholarly endeavor is the conceptualization of the Prototypical Vision Clustering Transformer —– a framework that integrates the rigor of classical clustering algorithms with the dynamic capabilities of queries within Transformer models. This innovative confluence is predicated on the hypothesis that mimicking human perceptual accuracy in machine vision can be significantly enhanced by abstracting vision characteristics into refined prototypical forms. Employing sophisticated mechanisms such as Cross-Attention Prototyping, the research redefines traditional attention paradigms to foster a novel expectation-maximization clustering approach, thereby optimizing the fidelity and granularity of prototype mappings. These methodological advancements engender a robust framework capable of interpreting and categorizing complex visual data streams with unprecedented precision. The culmination of this research delineates a transformative approach to prototypical vision clustering, establishing a vanguard for future explorations in automated visual systems that aspire to parallel the subtleties of human visual discernment.
Library of Congress Subject Headings
Computer vision--Data processing; Optical pattern recognition; Image processing--Digital techniques; Deep learning (Machine learning); Computer algorithms
Publication Date
12-20-2024
Document Type
Dissertation
Student Type
Graduate
Degree Name
Electrical and Computer Engineering (Ph.D)
Department, Program, or Center
Electrical and Computer Engineering Technology
College
Kate Gleason College of Engineering
Advisor
None provided
Recommended Citation
Liang, James Chenhao, "Toward Prototypical Vision Clustering" (2024). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12002
Campus
RIT – Main Campus
Plan Codes
ECE-PHD