Abstract
In this age of Internet, Natural Language Processing (NLP) techniques are the key sources for providing information required by users. However, with the extensive usage of available data, a secondary level of wrappers that interact with NLP tools have become necessary. These tools must extract a concise summary from the primary data set retrieved. The main reason for using text summarization techniques is to obtain this secondary level of information. Text summarization using NLP techniques is an interesting area of research with various implications for information retrieval.
This report deals with the use of Latent Semantic Analysis (LSA) for generic text summarization and compares it with other models available. It proposes text summarization using LDS in conjunction with open-source NLP frameworks such as Mahout and Lucene. The LSA algorithm can be scaled to multiple large-sized documents using these framworks. The performance of this algorithm is then compared with other models commonly used for summarization and Recall-Oriented Understudy of Gisting Evaluation (ROUGE) scores. This project implements a text summarization framework, which uses available open-source tools and cloud resources to summarize documents from many languages such as, in the case of this study, English and Hindi.
Publication Date
6-2014
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Advisor
Rajendra K.Raj
Advisor/Committee Member
Xumin Liu
Advisor/Committee Member
Leon Reznik
Recommended Citation
Prabhala, Bhargav, "Scalable Multi-document Summarization Using Natural Language Processing" (2014). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/8969
Campus
RIT – Main Campus
Plan Codes
COMPSCI-MS