Abstract

Over the past decades, deep learning methods have emerged as the predominant approach in computer vision. Most approaches rely on complex interactions among pixels, involving heavy computations. However, Human vision possesses a unique attention mechanism that selectively focuses on relevant parts of the visual field while disregarding irrelevant information. This can be likened to a clustering approach, in which individual pixel points are decomposed and reorganized into relevant concepts to address various tasks. This dissertation explores the frontier of vision clustering by integrating innovative prototypical learning with advanced Transformer architectures, which heralds a significant paradigm shift applied to vision clustering. At the heart of this scholarly endeavor is the conceptualization of the Prototypical Vision Clustering Transformer —– a framework that integrates the rigor of classical clustering algorithms with the dynamic capabilities of queries within Transformer models. This innovative confluence is predicated on the hypothesis that mimicking human perceptual accuracy in machine vision can be significantly enhanced by abstracting vision characteristics into refined prototypical forms. Employing sophisticated mechanisms such as Cross-Attention Prototyping, the research redefines traditional attention paradigms to foster a novel expectation-maximization clustering approach, thereby optimizing the fidelity and granularity of prototype mappings. These methodological advancements engender a robust framework capable of interpreting and categorizing complex visual data streams with unprecedented precision. The culmination of this research delineates a transformative approach to prototypical vision clustering, establishing a vanguard for future explorations in automated visual systems that aspire to parallel the subtleties of human visual discernment.

Library of Congress Subject Headings

Computer vision--Data processing; Optical pattern recognition; Image processing--Digital techniques; Deep learning (Machine learning); Computer algorithms

Publication Date

12-20-2024

Document Type

Dissertation

Student Type

Graduate

Degree Name

Electrical and Computer Engineering (Ph.D)

Department, Program, or Center

Electrical and Computer Engineering Technology

College

Kate Gleason College of Engineering

Advisor

None provided

Recommended Citation

Liang, James Chenhao, "Toward Prototypical Vision Clustering" (2024). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12002

Campus

RIT – Main Campus

Plan Codes

ECE-PHD

Download

COinS

Theses

Toward Prototypical Vision Clustering

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

College

Advisor

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

Toward Prototypical Vision Clustering

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

College

Advisor

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links