Background. Causal mechanisms supporting the cardio-metabolic benefits of exercise can be identified for individuals who cannot exercise. With the use of appropriate causal discovery algorithms, the causal pathways can be found for even sparsely sampled data which will help direct drug discovery and pharmaceutical industries to create the appropriate drug to maintain muscles. Objective. The purpose of this study was to infer novel causal source-target interactions active in sparsely sampled data and embed these in a broader causal network extracted from the literature to test their alignment with community-wide prior knowledge and their mechanistic validity in the context of regulatory feedback dynamics. Methods. To this goal, emphasis was placed on the female STRRIDE1/PD dataset to see how the observed data predicts a Causal Directed Acyclic Graph (C-DAG). The analytes in the dataset with greater than 5 missing values were dropped from further analysis to retain a higher confidence among the graphs. The PC, named after its authors Peter and Clark, algorithm was executed for ten thousand iterations on randomly sampled columns of the modified dataset keeping intensity and amount constant as the first two columns to see their effect on the resultant DAG. Out of the 10,000 iterations, interactions that appeared more than 45%, 50%, 65%, 75% and 100% were observed. The interactions that appeared more than 50% of the times were then compared to the literature mined dataset using MedScan Natural Language Processing (NLP) techniques as a part of Pathway Studio. Results. Full consensus across all sub-sampled networks produced 136 interactions that were fully conserved. Of these 136 interactions, 64 were resolved as direct causal interactions, 5 were not direct causal interactions and 67 could only be described as associative. It was found that about 17% of the interactions were recovered from the text mining of the 285 peer-reviewed journals from a total of 64 that were predicted at a 50% consensus. Out of these 11, 4 were completely recovered whereas 7 were only partially recovered. A completely recovered interaction was LDL → ApoB and a partially recovered interaction was HDL → insulin sensitivity. Conclusion. Only 17% of the predicted interactions were found through literature mining, remaining 83% were a mix of novel interactions and self-interactions that need to be worked on further. Of the remaining interactions, 53 remain novel and give insight into how different clinical parameters interact with the cholesterol molecules, biological markers and how they interact with each other.

Library of Congress Subject Headings

Medical literature--Data processing; Biological literature--Data processing; Exercise--Health aspects--Data processing; Causation; Inference; Data mining

Publication Date


Document Type


Student Type


Degree Name

Bioinformatics (MS)

Department, Program, or Center

Thomas H. Gosnell School of Life Sciences (COS)


Gary R. Skuse

Advisor/Committee Member

Gordon Broderick

Advisor/Committee Member

Matthew Morris


RIT – Main Campus

Plan Codes