Abstract

Knowledge graphs represent real-world data in a directed graph format where two entities, connected by a predicate, represent one fact. Link prediction models predict new relationships using existing entities and predicates, and are trained using benchmarking knowledge graphs. Redundancies exist within benchmarking knowledge graphs that allegedly artificially improve the results of link prediction. It is assumed in the link prediction field that if a model scores well on a highly redundant benchmarking knowledge graph, it may not directly correlate to good performance on other more complex knowledge graphs. This research creates new analysis methods and evaluation metrics for measuring redundancies in knowledge graphs. We use Horn rules to define five different redundancy types: near-duplicate, near-reverse, symmetric, transitive, and Cartesian product. Support and confidence of these rules quantify the redundancy. Using the quantified results, the two main goals are: (1) measuring levels of redundancy in benchmarking knowledge graphs and (2) offsetting link prediction results based on predicate-specific redundancies within knowledge graphs. Using a single metric to report levels (1) confirms high levels of redundancy in FB15k, WN18 and YAGO3-10, which are known to be highly redundant. High levels of redundancy are also seen for BioKG, which is a new finding. Offsetting link prediction results (2) was done by using predicate-specific redundancy values as weights for several metrics borrowed from the information retrieval field: RR, R@k, and BPM@k. This method resulted in decreased values across each metric for FB15k, WN18RR, and YAGO3-10, indicating that redundancies artificially inflate link prediction scores on these knowledge graphs. However, Hetionet and NELL-995 show increased values across each metric, indicating that redundancies do not have the same impact on those knowledge graphs. Other knowledge graphs showed mixed results across different link prediction models and metrics. These results indicate that redundancies in benchmarking knowledge graphs may not have the same impact across different knowledge graphs, link prediction models, and evaluation metrics. The new methods introduced to measure redundancy provide important insights to interpret behavior for link prediction. It becomes even more important to be aware of the level of redundancy in knowledge graphs since they lead to unpredictable link prediction results. Since we cannot prove consistent impact of redundancies, the lower performance of link prediction on knowledge graphs with redundancy removed cannot be explained by a lack of redundancy. We argue that removing redundancy from knowledge graphs is not a valid method of handling redundancy, and should not be used as a solution to the problem that redundancies present.

Library of Congress Subject Headings

Semantic networks (Information theory); Link theory

Publication Date

4-28-2026

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science, Department of

College

Golisano College of Computing and Information Sciences

Advisor

Carlos Rivero

Advisor/Committee Member

Zachary Butler

Advisor/Committee Member

Matthew Fluet

Recommended Citation

Klecan, Lydia, "Assessing Difficulty of Link Prediction on Benchmarking Knowledge Graphs" (2026). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12552

Campus

RIT – Main Campus

Plan Codes

COMPSCI-MS

Download

COinS

Theses

Assessing Difficulty of Link Prediction on Benchmarking Knowledge Graphs

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

College

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

Assessing Difficulty of Link Prediction on Benchmarking Knowledge Graphs

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

College

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links