Abstract

Over the past decades, deep learning methods have emerged as the predominant approach in computer vision. Most approaches rely on complex interactions among pixels, involving heavy computations. However, Human vision possesses a unique attention mechanism that selectively focuses on relevant parts of the visual field while disregarding irrelevant information. This can be likened to a clustering approach, in which individual pixel points are decomposed and reorganized into relevant concepts to address various tasks. This dissertation explores the frontier of vision clustering by integrating innovative prototypical learning with advanced Transformer architectures, which heralds a significant paradigm shift applied to vision clustering. At the heart of this scholarly endeavor is the conceptualization of the Prototypical Vision Clustering Transformer —– a framework that integrates the rigor of classical clustering algorithms with the dynamic capabilities of queries within Transformer models. This innovative confluence is predicated on the hypothesis that mimicking human perceptual accuracy in machine vision can be significantly enhanced by abstracting vision characteristics into refined prototypical forms. Employing sophisticated mechanisms such as Cross-Attention Prototyping, the research redefines traditional attention paradigms to foster a novel expectation-maximization clustering approach, thereby optimizing the fidelity and granularity of prototype mappings. These methodological advancements engender a robust framework capable of interpreting and categorizing complex visual data streams with unprecedented precision. The culmination of this research delineates a transformative approach to prototypical vision clustering, establishing a vanguard for future explorations in automated visual systems that aspire to parallel the subtleties of human visual discernment.

Library of Congress Subject Headings

Computer vision--Data processing; Optical pattern recognition; Image processing--Digital techniques; Deep learning (Machine learning); Computer algorithms

Publication Date

12-20-2024

Document Type

Dissertation

Student Type

Graduate

Degree Name

Electrical and Computer Engineering (Ph.D)

Department, Program, or Center

Electrical and Computer Engineering Technology

College

Kate Gleason College of Engineering

Advisor

None provided

Campus

RIT – Main Campus

Plan Codes

ECE-PHD

Share

COinS