Abstract
The rise of deepfake audio technology has introduced a serious threat to information credibility, personal security, and media integrity. This thesis investigates the application of machine learning techniques for detecting synthetic audio through the analysis of acoustic features, including Mel-Frequency Cepstral Coefficients (MFCCs), spectral centroid, chroma_stft, and zero-crossing rate. The dataset used in this study was sourced from Kaggle and contains labeled samples of real and fake audio clips. The research aimed to train and evaluate multiple machine learning models—Support Vector Machines (SVM), Random Forest, XGBoost, Logistic Regression, and Neural Networks—to determine the most effective approach for deepfake audio classification. Each model was assessed using standard performance metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. The study followed the CRISP-DM methodology, encompassing data preprocessing, feature extraction, model training, and performance evaluation to ensure methodological rigor and reproducibility. The findings reveal that the Neural Network model consistently outperformed all other classifiers, achieving 98% accuracy when trained on all variables and maintaining 96% accuracy, recall, and precision when using ten features selected through Random Forest. This demonstrates the model’s robustness, efficiency, and capacity for generalization, even with a reduced feature set. In contrast, traditional models such as Logistic Regression and LSVM achieved accuracies around 92%, while their performance decreased notably after dimensionality reduction. Sensitivity analysis and partial dependence plots further confirmed that key features—particularly rms and MFCC components—had the strongest influence on classification outcomes. Overall, this research demonstrates that combining deep learning with optimized acoustic feature selection enables accurate and interpretable detection of deepfake audio. The study contributes a scalable and generalizable detection framework that can be applied to real-world verification systems, supporting advancements in digital forensics, cybersecurity, and media authenticity.
Library of Congress Subject Headings
Deepfakes--Data processing; Natural language processing (Computer science); Forensic acoustics; Automatic speech recognition; Machine learning; Neural networks (Computer science)
Publication Date
12-2025
Document Type
Thesis
Student Type
Graduate
Degree Name
Professional Studies (MS)
Department, Program, or Center
Graduate Programs & Research
Advisor
Sanjay Modak
Advisor/Committee Member
Hammou Messatfa
Recommended Citation
Alfalasi, Rashed, "Deepfake Audio Detection" (2025). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12458
Campus
RIT Dubai
Plan Codes
PROFST-MS
