Abstract

A large increase in the use of Generative AI has been observed in the last few years. Data science is also a rapidly growing field with a high demand for skilled professionals. The goal of this thesis is to explore the potential of Generative AI, specifically ChatGPT, in facilitating data science education. The focus is on how ChatGPT can be used as a tutor to help solve practical exercises. The capabilities of Generative AI were explored along with the comparison of a few different models in terms of data science. Exploratory analysis was conducted to compare Generative AI models and choose the better one for the thesis. The best practices for solving practical exercises using Generative AI were also explored. The prompt engineering practices for solving practical exercises have also been explored and described in this thesis. The effectiveness of using ChatGPT as a tutor of data science has also been evaluated in three different ways. First, a series of sessions were created with ChatGPT to help solve and explain data science concepts in a structured way and the accuracy of these answers was analyzed. Second, a study was conducted to see how helpful ChatGPT is in helping participants solve data science questions. The third approach was using another Generative AI model, Claude, to test how ChatGPT acted as a tutor for undergraduate students. It was found that ChatGPT provides more factually correct answers as compared to Gemini and is better at solving problems and explaining concepts as compared to GitHub Copilot. There are limitations to ChatGPT when it comes to computations and solution building for data science, but its use can facilitate the learning process of students. Topics like schema building, data creating, query writing, normalization, itemset mining, and clustering can be learned and understood with ChatGPT. Using it for educational purposes will facilitate faster and better learning for students as compared to scenarios where its use is prohibited. ChatGPT can be used for solving questions and explaining answers. Students can work on questions step by step with the help of ChatGPT and learn the process of solving the questions. Some questions can be solved easily with simple prompts while some require more structured prompts. The results of our evaluations are mostly positive with a few limitations. The results show that ChatGPT can be used as a tutor for learning data science but can not be the only source of learning. Some guidance or knowledge is needed for better use of the Generative AI. Our main takeaway is that Generative AI can not substitute teachers but can act as a personalized tutor for each student. It can explain the solutions given in the textbooks in more detail and can also help with error resolution. A large number of participants also said that they are likely to use ChatGPT as a data science tutor in the future. In conclusion, ChatGPT has the potential to revolutionize data science education by acting as a personalized tutor, enhancing the learning experience, and bridging gaps in understanding complex concepts.

Library of Congress Subject Headings

ChatGPT; Data mining--Study and teaching--Automation; Artificial intelligence--Educational applications

Publication Date

5-2025

Document Type

Thesis

Student Type

Graduate

College

Golisano College of Computing and Information Sciences

Advisor

Carlos R. Rivero

Advisor/Committee Member

Zachary Butler

Advisor/Committee Member

Aaron Deever

Campus

RIT – Main Campus

Plan Codes

COMPSCI-MS

Share

COinS