Abstract

The emergence of large, general-purpose foundation models has sparked significant interest in the broader machine learning community. Among the many models being released, the Segment Anything Model (SAM) has demonstrated exceptional capabilities for object segmentation in various settings. Given its expansive training data, SAM has been used for image segmentation across various downstream tasks ranging from tumour segmentation to aerial object detection. However, the majority of pretraining data used by SAM consists of naturally-occurring images, which have significantly different characteristics from images of tumours or images taken by drones. In order to align SAM to these previously unseen domains, we need to fine-tune the model to leverage its prior knowledge and learn generalizable features from newer images to increase performance on a given target dataset. However, given the extremely large size and number of parameters, traditional fine-tuning methods are too costly to be applied to foundation models such as SAM. To overcome this limitation, a new family of methods known as Parameter-Efficient Fine-Tuning (PEFT) techniques have emerged to effectively and efficiently tailor these large models to application domains outside their training data. While there has been considerable research on developing new PEFT techniques, different methods modify the representation of a model differently, making it a non-trivial task to select the most appropriate method for a particular domain of interest. To this end, we propose a new framework, Mixture-of-PEFTs (MoPEFT), that is inspired by traditional Mixture-of-Experts (MoE) methodologies and use it to fine-tune SAM. Our MoPEFT framework incorporates three different PEFT techniques as submodules and learns to dynamically activate the ones that are best suited for a given data-task setup. We test our method on the Segment Anything Model across 22 different datasets spread over 5 domains and show that MoPEFT consistently outperforms other fine-tuning methods on the MESS benchmark.

Library of Congress Subject Headings

Image segmentation; Machine learning

Publication Date

8-2024

Document Type

Thesis

Student Type

Graduate

Degree Name

Data Science (MS)

Department, Program, or Center

Software Engineering, Department of

College

Golisano College of Computing and Information Sciences

Advisor

Andreas Savakis

Advisor/Committee Member

Travis Desell

Advisor/Committee Member

Qi Yu

Recommended Citation

Sahay, Rajat, "A Mixture-of-Experts Approach to Fine-Tuning the Segment Anything Model" (2024). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/11880

Campus

RIT – Main Campus

Plan Codes

DSCI-MS

Download

COinS

Theses

A Mixture-of-Experts Approach to Fine-Tuning the Segment Anything Model

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

College

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

A Mixture-of-Experts Approach to Fine-Tuning the Segment Anything Model

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

College

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links