Abstract

Large language models (LLMs) have demonstrated strong performance on a range of reasoning tasks, however, their reliability often depends not only on model size or training data, but also on inference-time strategies. However, existing inference-time methods are typically evaluated in isolation and under differing experimental assumptions, making it difficult to draw systematic conclusions about their relative effectiveness. This thesis proposes a controlled empirical study of inference-time scaling strategies for large language models under fixed inference-time compute budgets. The findings reveal that no single strategy dominates uniformly. PRM guided selection with the IBM Granite verifier achieves the highest absolute accuracy across arithmetic and compositional tasks. Multi-agent debate with heterogeneous critic-solver pairs achieves the highest overall accuracy on object counting, surpassing PRM guidance. Majority voting provides reliable, compute-efficient improvements across most settings when answer diversity is sufficient, and while beam search helps, it does not usually provide much benefit with width growth. We also show that verifier choice and aggregation method (notably, last-step aggregation) substantially affect PRM performance, and that model heterogeneity in debate consistently outperforms self-debate.

Library of Congress Subject Headings

Generative artificial intelligence--Evaluation; Intelligent agents (Computer software)--Evaluation; Machine learning--Evaluation

Publication Date

4-2026

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science, Department of

College

Golisano College of Computing and Information Sciences

Advisor

Christopher Homan

Advisor/Committee Member

Eduardo Coelho De Lima

Advisor/Committee Member

Richard Lange

Recommended Citation

Owolabi, Oluwamayowa, "A Comparative Study of Inference-Time Scaling Strategies for Large Language Models" (2026). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12550

Campus

RIT – Main Campus

Plan Codes

COMPSCI-MS

Download

COinS

Theses

A Comparative Study of Inference-Time Scaling Strategies for Large Language Models

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

College

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

A Comparative Study of Inference-Time Scaling Strategies for Large Language Models

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

College

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links