Abstract

This thesis examines machine learning approaches for predicting failures in electrical power distribution transformers, with the goal of helping utility operators intervene before outages occur. The dataset covers 16,000 distribution transformers operated by Compa ˜n´ıa Energ ´etica de Occidente (CEO), a Colombian utility serving 42 municipalities in the Cauca Department. Each transformer record includes geographic location, rated power capacity, self-protection features, ceramic insulation criticality levels, removable connector configurations, customer categories, user counts, estimated un-supplied energy, installation types, network topology, and secondary line lengths. Failure event histories were also included, which allowed the problem to be framed as a supervised binary classification task. Data from CEO’s information systems for 2019 and 2020 was extracted, merged, and cleaned to produce a dataset of 31,746 observations across 16 features. Missing values and outliers were addressed, features were encoded and scaled, and the data was split into training (25,397 instances) and testing (6,349 instances) subsets to support reliable evaluation. Two models were trained and compared: a support vector machine (SVM) using kernel methods, and a random forest built from an ensemble of decision trees. Both were applied to the same binary classification task — predicting whether a transformer would fail based on its feature profile. Across all metrics including accuracy, precision, recall, and F1-score, the random forest outperformed the SVM on the held-out test set. It achieved 89.6% accuracy versus 84.9% for SVM, with F1 scores of 0.83 and 0.70 respectively. The performance gap likely reflects the random forest’s capacity to model non-linear feature interactions without overfitting. These results support the use of machine learning for transformer failure prediction in real utility contexts. Operators can use such models to rank assets by failure risk and focus maintenance resources accordingly. Future work could incorporate environmental monitoring data or more granular condition indicators to further improve prediction accuracy.

Library of Congress Subject Headings

Electric transformers--Evaluation--Data processing; Supervised learning (Machine learning)

Publication Date

4-2026

Document Type

Thesis

Student Type

Graduate

Degree Name

Professional Studies (MS)

Department, Program, or Center

Graduate Programs & Research

Advisor

Ioannis Karamitsos

Campus

RIT Dubai

Plan Codes

PROFST-MS

Share

COinS