Abstract

Knowledge graphs are useful for many applications like product recommendations and web search query engines. However, knowledge graphs are marked by incompleteness. Fact-prediction algorithms aim to expand knowledge graphs by predicting missing facts. Fact-prediction algorithms train models using positive facts present and creating negative facts not present in the knowledge graph at hand. Negative facts are obtained by corrupting information in the positive facts present in the knowledge graph at hand. Although it is generally assumed that negative facts drive the accuracy of fact-prediction algorithms, this concept has not been thoroughly examined yet. In this work, we investigate whether negative facts indeed drive fact-prediction accuracy by employing different negative fact generation strategies in translation-based algorithms, a popular branch of fact-prediction algorithms. We propose a new negative fact generation strategy that utilizes knowledge from immediate neighbors to corrupt a fact. Our extensive experiments using well-known benchmarking datasets show that negative facts indeed drive the accuracy of fact-prediction models, and that this accuracy dramatically changes depending on the negative fact generation strategy used for training and testing models. Assuming that the strategies generate negative facts with different levels of semantic plausibility, we observe that models trained using certain strategies are not able to distinguish missing facts from nonsensical or semantically-related facts. Additionally, our results show that the accuracy of models trained using the local-closed world assumption, the most common negative fact generation strategy, can be achieved with a combination of neighborhood-based and nonsensical strategies. This implies that fact-prediction algorithms can be trained using individual subgraphs instead of the whole knowledge graph, opening new research avenues.

Library of Congress Subject Headings

Data mining; Statistics; Control theory

Publication Date

11-2020

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science (GCCIS)

Advisor

Carlos R. Rivero

Advisor/Committee Member

Zack Butler

Advisor/Committee Member

Ifeoma Nwogu

Recommended Citation

Bansal, Iti, "Analyzing the Impact of Negative Sampling on Fact-Prediction Algorithms" (2020). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10610

Campus

RIT – Main Campus

Plan Codes

COMPSCI-MS

Download

COinS

Theses

Analyzing the Impact of Negative Sampling on Fact-Prediction Algorithms

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Search

Browse

Author Corner

RIT Links

Theses

Analyzing the Impact of Negative Sampling on Fact-Prediction Algorithms

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Recommended Citation

Campus

Plan Codes

Share

Search

Browse

Author Corner

RIT Links