Abstract

The proliferation of AI-assisted offensive tools has introduced a new category of cyber attacker that combines the speed of automation with the adaptive reasoning previously associated only with skilled human operators. Despite the richness of behavioral data captured by SSH honeypots, existing analyses treat interaction logs primarily as evidence of malicious activity rather than as a dataset capable of distinguishing between attacker types. This thesis investigates whether human-driven, traditionally automated, and AI-assisted attackers produce distinguishable behavioral signatures within SSH honeypot interactions, and whether machine learning techniques can reliably classify attacker behavior from session-level features. A controlled experimental architecture was developed comprising a Cowrie medium-interaction SSH honeypot and a centralized ELK-based logging infrastructure. A dataset of 281,621 sessions was collected across a two-month period, encompassing Hydra-based automated credential attacks, human-conducted sessions, and AI-assisted sessions generated using PentestGPT and ChatGPT. Twenty-four session-level features were engineered from command execution patterns, authentication behavior, session timing, attack phase distribution, and interaction depth. Eight machine learning classifiers were evaluated using five-fold stratified cross-validation with SMOTE oversampling applied within training folds to address a raw class imbalance of approximately 6,231:1 between automated and human sessions. Ensemble methods achieved the strongest overall performance, with Gradient Boosting and Random Forest both reaching a macro F1 of 0.809. AI-assisted sessions achieved F1 scores above 0.888 in the top three models, and human session precision was 1.000 across all models and evaluation protocols, meaning that when any classifier predicted a session as human-driven, that prediction was always correct. The best human leave-one-out F1 of 0.732 was achieved by KNN. These findings demonstrate that medium-interaction SSH honeypot data provides sufficient behavioral signal to distinguish automated and AI-assisted attacker activity from human-driven intrusions, and that classification of attacker automation level is achievable with current machine learning techniques.

Publication Date

4-2026

Document Type

Thesis

Student Type

Graduate

Degree Name

Cybersecurity (MS)

Department, Program, or Center

Cybersecurity, Department of

College

Golisano College of Computing and Information Sciences

Advisor

Bo Yuan

Advisor/Committee Member

James Rice

Advisor/Committee Member

Rob Olson

Campus

RIT – Main Campus

Share

COinS