Abstract
A knowledge base is a large collection of real-world facts which can be interpreted by both humans and machines. Most of these knowledge bases are incomplete as some are extracted from natural language sources that contain gaps, while others are manually developed and extended. Rule mining is the process of discovering rules that succinctly capture the inference patterns present in the knowledge base at hand. These rules can be executed and new, missing facts can be inferred to complete knowledge bases. The new rules also help identify errors in the knowledge base and help understand its content better. This thesis deals with one popular rule mining algorithm, AMIE. Knowledge bases do not contain negative facts. So, in order to measure the quality of the mined rules, we need to deduce negative evidence from the actual (positive) facts present in the knowledge base. In the standard approach, we assume the knowledge is complete and any missing information in the knowledge base is considered a negative. However, knowledge bases operate under the open world assumption, that is, missing information in the knowledge base is treated as unknown. AMIE introduces a less restrictive measure where facts are considered either negative or unknown depending on the positive facts present in the knowledge base. The confidence of a given rule is measured by counting the number of occurrences of facts in the knowledge base that fit the rule. A rule contains multiple components, each component of the rule is matched against the whole knowledge base. This confidence measure follows Prolog semantics where different components of the rule can share the same element of the fact. We observed this approach to measuring confidence does not always obtain the best result. In this thesis, we explore a new approach using Graph semantics, which restricts different components of the rule from sharing the same element of a fact, resulting in confidence gain for certain rules. Our experimental results show that we mined more rules when we use Graph semantics as a confidence measure compared to Prolog semantics. Confidence measure of certain rules improve with Graph semantics. AMIE uses the information spread (functionality) around one component of the rule to dictate how the confidence will be measured. We demonstrate gain in confidence for certain rules by altering the definition of functionality. In addition, we propose a new Apriori algorithm for rule mining. We conceptualize rules as graphs and generate unique graph patterns for rules of various sizes. We then expand these patterns instead of growing rules, which reduces the number of queries to the knowledge base.
Library of Congress Subject Headings
Data mining; Natural language processing (Computer science)
Publication Date
5-9-2023
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Advisor
Carlos R. Rivero
Advisor/Committee Member
Michael Minor
Advisor/Committee Member
Thomas J. Borrelli
Recommended Citation
Gangadhar, Bhaskar Krishna, "Rule Mining from Knowledge Bases: Semantics, Queries, and Estimations" (2023). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/11439
Campus
RIT – Main Campus
Plan Codes
COMPSCI-MS