Abstract
This thesis presents the design and evaluation of an AI-powered multimodal tour guide that uses image recognition and personalised storytelling to enhance cultural heritage experiences. Traditional approaches to learning about monuments rely on static plaques, generic tour content, or manual web searches, which limit personalisation, interactivity, and accessibility. To address these limitations, the study implemented a “Multimodal Monument Explorer” that allows users to upload a photo of a landmark or describe it in natural language and then receive rich, context-aware explanations in both text and audio form. The system integrates a persistent vector database of monument images, OpenCLIP-based visual embeddings, and a large multimodal language model to identify visually similar landmarks and generate tailored narratives. Multiple guide personas (e.g., historian, epic storyteller, comedian) adapt the narrative style to user preferences while preserving factual accuracy, and a Streamlit interface orchestrates retrieval, multimodal reasoning, and real-time text-to-speech synthesis. The research followed a design science and mixed- methods evaluation strategy. A quantitative study on 100 image–query pairs showed high retrieval performance (P@1 = 0.87, P@3=0.94, P@5=0.98, MRR=0.91) with end-to-end response times under eight seconds for both text and image interactions. A user study with 20 participants yielded an average System Usability Scale score of 88.5 (“Excellent”) and very positive ratings for narrative quality, engagement, and persona enhancement. The findings demonstrate the technical feasibility and user value of multimodal, persona-driven AI tour guides and offer practical design insights for future AI-driven cultural heritage applications at the intersection of data analytics, human–AI interaction, and digital tourism.
Library of Congress Subject Headings
Tour guides (Manuals)--Interactive multimedia--Design; Tour guides (Manuals)--Automation; Multimodal user interfaces (Computer systems)--Design
Publication Date
12-2025
Document Type
Thesis
Student Type
Graduate
Degree Name
Professional Studies (MS)
Advisor
Sanjay Modak
Advisor/Committee Member
Khalid Ezzeldeen
Recommended Citation
Aljawi, Abdulla Ibrahim, "AI-Powered Multimodal Tour Guide: Enhancing Cultural Tourism with Image Recognition, and Personalized Storytelling" (2025). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12436
Campus
RIT Dubai
Plan Codes
PROFST-MS
