Abstract
This study investigates the detection of real, fully fake, and partially fake (PF) audio using classical machine-learning models as well as a segment-level analysis model. Unlike most existing research, which focuses solely on binary real-vs-fake classification, this work introduces a three-class detection framework and constructs realistic PF samples by inserting short synthetic speech segments into otherwise genuine audio recordings. Its method combines MFCC and spectral feature engineering,Wav2Vec2 embeddings, controlled PF synthesis and various models such as SVM, Random Forest, XGBoost and an attention based RNN. Segment level windowing allows fine-grained study of transition of time and breaks of manipulation. The findings reveal that the classical models are very accurate with fully fake audio- attributed to the occurrence of major artefacts in the world- but not with the PF samples, with short manipulations that are produced as acoustic melodies of real speech. The XGBoost had the highest overall performance, and SVM had the highest PF recall of all classical baselines. Predictions at the segment level also indicated boundary instability and demonstrated the drawbacks of MFCC features to predict short-scale transitions. These results reveal a severe security threat: PF audio can go around traditional detectors with only some necessary changes to key phrases. The findings of the study suggest that although classical pipelines are still useful in the context of full deepfakes, temporal structures, multimodal signals, and adversarial resistance are the key elements of robust PF detection, and future research should focus on implementing networks in real-time and using larger, more diverse datasets.
Library of Congress Subject Headings
Deepfakes--Data processing; Natural language processing (Computer science); Forensic acoustics; Automatic speech recognition; Machine learning
Publication Date
12-2025
Document Type
Thesis
Student Type
Graduate
Degree Name
Professional Studies (MS)
Department, Program, or Center
Graduate Programs & Research
Advisor
Sanjay Modak
Advisor/Committee Member
Ioannis Karamitsos
Recommended Citation
Alhelli, Ahmad, "Segment-Level Machine Learning for Detecting Partial Deepfake Audio: An RNN–SVM Hybrid Approach For Real-World Adversarial Environments" (2025). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/12447
Campus
RIT Dubai
Plan Codes
PROFST-MS
