Recently, deep learning techniques from the computer science field have dramatically improved the ability of computers to recognize objects in images. This raised the possibility of fully automated computer-aided diagnosis in the medical field. Among all the machine learning models, convolutional neural network (CNN) is one of the most studied and validated artificial neural networks in image recognition. Not only that it has great performance, but the design of most modern CNN hidden layers also allows the model to extract meaningful features without the needs of prior knowledge. Thus, the pathology community is showing increasing interests in comparing CNN to human judgments. As demonstrated in a number of studies reporting various image analysis models that can accurately localize and characterize cells into different cell types and predict patient outcome, the pathological field is incorporating artificial intelligence technologies into their diagnosis. Although using the deep neural network on recognizing pathological slides is not a new idea and is showing promising results, its requirement of a large quantity of data for training can be a big obstacle for many unpopular histopathological cases. In the bladder cancer field, the Tumor-Node-Metastasis (TNM) system defines T1 bladder cancer as the invasion of tumor cells into the lamina propria (LP). However, pathologists often struggle to confirm LP and/or muscularis mucosae invasion using hematoxylin & eosin (H&E) stains from bladder biopsies. Accurately reporting the presence of tumor invasion, which is associated with worse clinical outcomes, is critical for adequate patient management. In this thesis, we have developed various traditional machine learning models and compared their performances to 2 convolutional neural networks (CNN), VGG16 and VGG19, on histology image classification in distinguishing non-invasive versus invasive bladder tumors. By using approximately 1,200 H&E images from non-invasive and invasive bladder cancer tissues, our results showed the traditional machine learning methods with the human-directed features outperformed the fully automated CNN model as much as 12%. For 2-class classification task to distinguish non-invasive and invasive bladder cancer tissues, we achieved around 91~96% accuracy by using classic machine learning classifiers such as random forest, logistic regression, and probabilistic neural network. Whereas, CNN with VGG16 as hidden layers only achieved around 84%. In addition to performance, because of the transparency of features extraction in the pipeline, we were able to evaluate and rank the patterns in the bladder histological images. As based on their relative importance in prediction, classic machine learning methods provided a well-rounded approach under limited data size.

Publication Date


Document Type


Student Type


Degree Name

Bioinformatics (MS)

Department, Program, or Center

Thomas H. Gosnell School of Life Sciences (COS)


Feng Cui

Advisor/Committee Member

Rui Li

Advisor/Committee Member

Hiroshi Miyamoto


RIT – Main Campus