Abstract

The rise of deepfake audio technology has introduced a serious threat to information credibility, personal security, and media integrity. This thesis investigates the application of machine learning techniques for detecting synthetic audio through the analysis of acoustic features, including Mel-Frequency Cepstral Coefficients (MFCCs), spectral centroid, chroma_stft, and zero-crossing rate. The dataset used in this study was sourced from Kaggle and contains labeled samples of real and fake audio clips. The research aimed to train and evaluate multiple machine learning models—Support Vector Machines (SVM), Random Forest, XGBoost, Logistic Regression, and Neural Networks—to determine the most effective approach for deepfake audio classification. Each model was assessed using standard performance metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. The study followed the CRISP-DM methodology, encompassing data preprocessing, feature extraction, model training, and performance evaluation to ensure methodological rigor and reproducibility. The findings reveal that the Neural Network model consistently outperformed all other classifiers, achieving 98% accuracy when trained on all variables and maintaining 96% accuracy, recall, and precision when using ten features selected through Random Forest. This demonstrates the model’s robustness, efficiency, and capacity for generalization, even with a reduced feature set. In contrast, traditional models such as Logistic Regression and LSVM achieved accuracies around 92%, while their performance decreased notably after dimensionality reduction. Sensitivity analysis and partial dependence plots further confirmed that key features—particularly rms and MFCC components—had the strongest influence on classification outcomes. Overall, this research demonstrates that combining deep learning with optimized acoustic feature selection enables accurate and interpretable detection of deepfake audio. The study contributes a scalable and generalizable detection framework that can be applied to real-world verification systems, supporting advancements in digital forensics, cybersecurity, and media authenticity.

Library of Congress Subject Headings

Deepfakes--Data processing; Natural language processing (Computer science); Forensic acoustics; Automatic speech recognition; Machine learning; Neural networks (Computer science)

Publication Date

12-2025

Document Type

Thesis

Student Type

Graduate

Degree Name

Professional Studies (MS)

Department, Program, or Center

Graduate Programs & Research

Advisor

Sanjay Modak

Advisor/Committee Member

Hammou Messatfa

Campus

RIT Dubai

Plan Codes

PROFST-MS

Share

COinS