Abstract
Knowledge graphs are useful for many applications like product recommendations and web search query engines. However, knowledge graphs are marked by incompleteness. Fact-prediction algorithms aim to expand knowledge graphs by predicting missing facts. Fact-prediction algorithms train models using positive facts present and creating negative facts not present in the knowledge graph at hand. Negative facts are obtained by corrupting information in the positive facts present in the knowledge graph at hand. Although it is generally assumed that negative facts drive the accuracy of fact-prediction algorithms, this concept has not been thoroughly examined yet. In this work, we investigate whether negative facts indeed drive fact-prediction accuracy by employing different negative fact generation strategies in translation-based algorithms, a popular branch of fact-prediction algorithms. We propose a new negative fact generation strategy that utilizes knowledge from immediate neighbors to corrupt a fact. Our extensive experiments using well-known benchmarking datasets show that negative facts indeed drive the accuracy of fact-prediction models, and that this accuracy dramatically changes depending on the negative fact generation strategy used for training and testing models. Assuming that the strategies generate negative facts with different levels of semantic plausibility, we observe that models trained using certain strategies are not able to distinguish missing facts from nonsensical or semantically-related facts. Additionally, our results show that the accuracy of models trained using the local-closed world assumption, the most common negative fact generation strategy, can be achieved with a combination of neighborhood-based and nonsensical strategies. This implies that fact-prediction algorithms can be trained using individual subgraphs instead of the whole knowledge graph, opening new research avenues.
Library of Congress Subject Headings
Data mining; Statistics; Control theory
Publication Date
11-2020
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Advisor
Carlos R. Rivero
Advisor/Committee Member
Zack Butler
Advisor/Committee Member
Ifeoma Nwogu
Recommended Citation
Bansal, Iti, "Analyzing the Impact of Negative Sampling on Fact-Prediction Algorithms" (2020). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10610
Campus
RIT – Main Campus
Plan Codes
COMPSCI-MS