Abstract

Dark web markets use Bitcoin and related cryptocurrencies to transfer and launder proceeds of crime while obscuring the real-world identities of operators. This creates an increasingly challenging environment for regulators and law-enforcement agencies, who must analyse suspi- cious transactions on decentralised, public, pseudonymous blockchains and increasingly dense and noisy Bitcoin transaction graphs. This thesis investigates whether, and how, blockchain analytics and machine learning can help detect illicit transactions on Bitcoin networks. The first part of the thesis is a systematic literature review of recent contributions from academia and industry that examine major solution approaches for identifying illicit flows on public blockchains, particularly within dark market ecosystems such as Silk Road and its successors. The review identifies recurring patterns in data sources, feature engineering strategies, and model families used for flow identification, and highlights key gaps related to labelling, performance evaluation, and practical deployment. The second part is an applied case study based on the publicly available Elliptic Bitcoin transaction graph dataset, an extensive graph constructed using law-enforcement intelligence with labelled licit and illicit transactions. After data cleaning and a temporal split into train- ing, validation, and test sets, two supervised learning models—Random Forest and Extreme Gradient Boosting (XGBoost)—are trained on a representative subset of informative graph- based and transaction-based features. Model performance is evaluated on the test set, with particular emphasis on the illicit class, using precision, recall, F1-score, and ROC–AUC. Random Forest achieves the highest overall accuracy and recall, while XGBoost delivers very similar performance with competitive AUC scores and stable behaviour. These findings indicate that tree-based ensemble methods can classify Bitcoin transactions effectively using anonymised, non-identifying features derived from the transaction graph. The third part is a simple Flask web application presented as a proof of concept for end-to- end model deployment. It shows how submissions of Elliptic-formatted transaction records can be scored by the trained models and returned with corresponding risk assessments, and sketches how real-time transaction data could be obtained from a public blockchain API in a future system. However, because the anonymised Elliptic features do not map one-to-one onto raw blockchain fields, this integration remains demonstrative rather than fully operational. The thesis concludes with practical implications and recommendations for law-enforcement agen- cies and suggests future research directions, including more advanced graph-based approaches and improved labelling strategies.

Library of Congress Subject Headings

Money laundering investigation--Data processing; Drug traffic--Prevention--Data processing; Bitcoin; Blockchains (Databases); Anomaly detection (Computer security); Dark Web

Publication Date

12-2025

Document Type

Thesis

Student Type

Graduate

Degree Name

Professional Studies (MS)

Department, Program, or Center

Graduate Programs & Research

Advisor

Sanjay Modak

Advisor/Committee Member

Ioannis Karamitsos

Campus

RIT Dubai

Plan Codes

PROFST-MS

Share

COinS