Abstract
The Connectionist Temporal Classification (CTC) loss function is the most commonly used loss function in the field of Optical Music Recognition (OMR). However, OMR suffers from a massive class imbalance problem, exacerbated by the fact that CTC loss is subject to the spiky distribution problem, wherein the blank token introduced by CTC is vastly overpredicted and appears in timesteps where it would make more sense to predict a non-blank token, since CTC will collapse repeated tokens into a single token. This work posits that alternative loss functions to CTC that optimize for an increase in entropy of the prior probability distribution output of the model will lead to better generalization and lower error rates. The three main loss functions tested are FocalCTC, SR-CTC, and EnCTC, each of which optimize for increased entropy for different aspects of the estimated prior distribution. Experiments are conducted on all three. Both FocalCTC and EnCTC show an improvement over baseline CTC.
Publication Date
12-2025
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Science (MS)
Department, Program, or Center
Computer Science, Department of
College
Golisano College of Computing and Information Sciences
Advisor
Richard Zanibbi
Advisor/Committee Member
Richard Lange
Advisor/Committee Member
Joe Geigel
Recommended Citation
Saynganthone, Hritik, "An Examination of High-Entropy Alternatives of Connectionist Temporal Classification Loss for Optical Music Recognition Using Convolutional Recurrent Neural Networks" (2025). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12492
Campus
RIT – Main Campus
