Abstract

The Connectionist Temporal Classification (CTC) loss function is the most commonly used loss function in the field of Optical Music Recognition (OMR). However, OMR suffers from a massive class imbalance problem, exacerbated by the fact that CTC loss is subject to the spiky distribution problem, wherein the blank token introduced by CTC is vastly overpredicted and appears in timesteps where it would make more sense to predict a non-blank token, since CTC will collapse repeated tokens into a single token. This work posits that alternative loss functions to CTC that optimize for an increase in entropy of the prior probability distribution output of the model will lead to better generalization and lower error rates. The three main loss functions tested are FocalCTC, SR-CTC, and EnCTC, each of which optimize for increased entropy for different aspects of the estimated prior distribution. Experiments are conducted on all three. Both FocalCTC and EnCTC show an improvement over baseline CTC.

Publication Date

12-2025

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science, Department of

College

Golisano College of Computing and Information Sciences

Advisor

Richard Zanibbi

Advisor/Committee Member

Richard Lange

Advisor/Committee Member

Joe Geigel

Campus

RIT – Main Campus

Share

COinS