Abstract

A knowledge base is a large collection of real-world facts which can be interpreted by both humans and machines. Most of these knowledge bases are incomplete as some are extracted from natural language sources that contain gaps, while others are manually developed and extended. Rule mining is the process of discovering rules that succinctly capture the inference patterns present in the knowledge base at hand. These rules can be executed and new, missing facts can be inferred to complete knowledge bases. The new rules also help identify errors in the knowledge base and help understand its content better. This thesis deals with one popular rule mining algorithm, AMIE. Knowledge bases do not contain negative facts. So, in order to measure the quality of the mined rules, we need to deduce negative evidence from the actual (positive) facts present in the knowledge base. In the standard approach, we assume the knowledge is complete and any missing information in the knowledge base is considered a negative. However, knowledge bases operate under the open world assumption, that is, missing information in the knowledge base is treated as unknown. AMIE introduces a less restrictive measure where facts are considered either negative or unknown depending on the positive facts present in the knowledge base. The confidence of a given rule is measured by counting the number of occurrences of facts in the knowledge base that fit the rule. A rule contains multiple components, each component of the rule is matched against the whole knowledge base. This confidence measure follows Prolog semantics where different components of the rule can share the same element of the fact. We observed this approach to measuring confidence does not always obtain the best result. In this thesis, we explore a new approach using Graph semantics, which restricts different components of the rule from sharing the same element of a fact, resulting in confidence gain for certain rules. Our experimental results show that we mined more rules when we use Graph semantics as a confidence measure compared to Prolog semantics. Confidence measure of certain rules improve with Graph semantics. AMIE uses the information spread (functionality) around one component of the rule to dictate how the confidence will be measured. We demonstrate gain in confidence for certain rules by altering the definition of functionality. In addition, we propose a new Apriori algorithm for rule mining. We conceptualize rules as graphs and generate unique graph patterns for rules of various sizes. We then expand these patterns instead of growing rules, which reduces the number of queries to the knowledge base.

Library of Congress Subject Headings

Data mining; Natural language processing (Computer science)

Publication Date

5-9-2023

Document Type

Thesis

Student Type

Graduate

Degree Name

Computer Science (MS)

Department, Program, or Center

Computer Science (GCCIS)

Advisor

Carlos R. Rivero

Advisor/Committee Member

Michael Minor

Advisor/Committee Member

Thomas J. Borrelli

Campus

RIT – Main Campus

Plan Codes

COMPSCI-MS

Share

COinS