Abstract

The increasing amount of scientific information available to researchers in the form of biomedical literature is beginning to bring about a need for the development of tools to extract information automatically from these sources. One segment of information of particular interest to researchers is the linkage information between genes and diseases. These linkages can help researchers interpret large-scale genomics studies as well as make logical connections between gene expression levels and certain phenotypes. To make the finding and collecting of this information practical, automated methods of information extraction are required. In this paper, I propose a method for the automated extraction and database storage of linkages between genes and diseases from MEDLINE text using a combination of term co-occurrence and natural language processing techniques. This method incorporates pre-defined lexicons for genes and diseases, tokenization, statistically-driven part-of-speech tagging and chunking, as well as template matching based on a set of training templates to find relationship-containing statements in the MEDLINE text. Results of an experiment on a test set of 50 abstracts demonstrate that this method to extract disease: gene relationships from MEDLINE text can be applied with success, giving a precision of 97% and a recall between 51% and 78%.

Library of Congress Subject Headings

Medical genetics--Data processing; Text processing (Computer science); nformation retrieval; MEDLINE; Medicine--Research

Publication Date

2005

Document Type

Thesis

Student Type

Graduate

Degree Name

Bioinformatics (MS)

Department, Program, or Center

Thomas H. Gosnell School of Life Sciences (COS)

Advisor

Debra Burhans

Advisor/Committee Member

Jun Xu

Advisor/Committee Member

David Lawlor

Comments

Physical copy available from RIT's Wallace Library at RB155 .P34 2005

Recommended Citation

Paine, Jennifer R., "Automated extraction of disease-gene relationships from MEDLINE" (2005). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/7953

Campus

RIT – Main Campus

Download

COinS

Theses

Automated extraction of disease-gene relationships from MEDLINE

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Comments

Recommended Citation

Campus

Search

Browse

Author Corner

RIT Links

Theses

Automated extraction of disease-gene relationships from MEDLINE

Author

Abstract

Library of Congress Subject Headings

Publication Date

Document Type

Student Type

Degree Name

Department, Program, or Center

Advisor

Advisor/Committee Member

Advisor/Committee Member

Comments

Recommended Citation

Campus

Share

Search

Browse

Author Corner

RIT Links