Abstract
This report presents a novel influence function for gradient-boosted decision trees (GBDTs), a widely-used class of predictive models. Influence estimation aims to quantify how individual training samples affect a model’s predictions, offering valuable insights for model debugging, data quality analysis, and in- terpretability. Existing influence functions for GBDTs—such as LeafInfluence, LeafInfluenceSP, and BoostIn—have shown varying degrees of success, with BoostIn currently recognized as the state-of-the-art in terms of estimation quality and computational efficiency. In this work, we propose BoostInLCA, a new influence function that extends BoostIn by incorporating information from non-leaf nodes via the Lowest Common Ancestor (LCA) path, thereby relaxing the strict leaf-node matching constraint of BoostIn. Our source code is avail- able at https://github.com/AnuragChoubey95/abcboost_influence.git. We integrate BoostInLCA into the open-source ABCBoost framework and conduct an extensive empirical evaluation comparing BoostInLCA to BoostIn across five distinct experiments on 13 real-world tabular datasets. Our results show that BoostInLCA outperforms BoostIn in 3 out of the 5 experiments, while performing comparably in the remaining cases
Library of Congress Subject Headings
Decision trees; Hierarchical clustering (Cluster analysis); Deep learning (Machine learning)
Publication Date
5-2025
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Science (MS)
College
Golisano College of Computing and Information Sciences
Advisor
Weijie Zhao
Advisor/Committee Member
Arthur Nunes
Advisor/Committee Member
Leon Reznik
Recommended Citation
Choubey, Anurag, "Enhancing Influence Estimation in Gradient Boosted Decision Trees Through Hierarchical Analysis" (2025). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12099
Campus
RIT – Main Campus
Plan Codes
COMPSCI-MS