Abstract
Language is more than a tool of conveying information; it is utilized in all aspects of our lives. Yet only a small number of languages in the 7,000 languages worldwide are highly resourced by human language technologies (HLT). Despite African languages representing over 2,000 languages, only a few African languages are highly resourced, for which there exists a considerable amount of parallel digital data.
We present a novel approach to machine translation (MT) for under-resourced languages by improving the quality of the model using a paradigm called ``humans in the Loop.''
This thesis describes the work carried out to create a Bambara-French MT system including data discovery, data preparation, model hyper-parameter tuning, the development of a crowdsourcing platform for humans in the loop, vocabulary sizing, and segmentation. We present a novel approach to machine translation (MT) for under-resourced languages by improving the quality of the model using a paradigm called ``humans in the Loop.'' We achieved a BLEU (bilingual evaluation understudy) score of 17.5. The results confirm that MT for Bambara, despite our small data set, is viable. This work has the potential to contribute to the reduction of language barriers between the people of Sub-Saharan Africa and the rest of the world.
Library of Congress Subject Headings
Bambara language--Translation into French; Translators (Computer programs); Translating and interpreting--Data processing; Computational linguistics; Corpora (Linguistics); Human-computer interaction
Publication Date
8-2020
Document Type
Thesis
Student Type
Graduate
Degree Name
Computer Science (MS)
Department, Program, or Center
Computer Science (GCCIS)
Advisor
Christopher M. Homan
Advisor/Committee Member
Marcos Zampieri
Advisor/Committee Member
Sarah Luger
Recommended Citation
Tapo, Allahsera Auguste, "Machine-assisted translation by Human-in-the-loop Crowdsourcing for Bambara" (2020). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/10584
Campus
RIT – Main Campus
Plan Codes
COMPSCI-MS