Abstract

This report presents a novel influence function for gradient-boosted decision trees (GBDTs), a widely-used class of predictive models. Influence estimation aims to quantify how individual training samples affect a model’s predictions, offering valuable insights for model debugging, data quality analysis, and in- terpretability. Existing influence functions for GBDTs—such as LeafInfluence, LeafInfluenceSP, and BoostIn—have shown varying degrees of success, with BoostIn currently recognized as the state-of-the-art in terms of estimation quality and computational efficiency. In this work, we propose BoostInLCA, a new influence function that extends BoostIn by incorporating information from non-leaf nodes via the Lowest Common Ancestor (LCA) path, thereby relaxing the strict leaf-node matching constraint of BoostIn. Our source code is avail- able at https://github.com/AnuragChoubey95/abcboost_influence.git. We integrate BoostInLCA into the open-source ABCBoost framework and conduct an extensive empirical evaluation comparing BoostInLCA to BoostIn across five distinct experiments on 13 real-world tabular datasets. Our results show that BoostInLCA outperforms BoostIn in 3 out of the 5 experiments, while performing comparably in the remaining cases

Library of Congress Subject Headings

Decision trees; Hierarchical clustering (Cluster analysis); Deep learning (Machine learning)

Publication Date

5-2025

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Science (MS)

College

Golisano College of Computing and Information Sciences

Advisor

Weijie Zhao

Advisor/Committee Member

Arthur Nunes

Advisor/Committee Member

Leon Reznik

Campus

RIT – Main Campus

Plan Codes

COMPSCI-MS

Share

COinS