Chris Parkin


Computational biology has attacked the problem of isoelectric point prediction with little success, achieving a rough accuracy level of only 30%. In 2005, Matthew Conte performed a study focused on the relationship between sequence characteristics and isoelectric point prediction accuracy. Results indicated that charges between adjacent amino acids could have a significant impact on the overall predicted pi for the protein. In this study we introduce an evolutionary computation approach aimed at accounting for these problem dipeptides. For each possible dipeptide involving charged amino acids (7 chargeable groups -> 49 possibilities), the algorithm predicts a pKa value that, when included in the pi prediction algorithm, should result in a more accurate prediction. By accounting for these charged, adjacent amino acids, the pi prediction showed improvements for those proteins with the greatest deviation between experimental and predicted pi value (Apl > 0.7). However, these results were not generalized, as the incorporation of these values had the reverse effect on remaining proteins, most notably those from the most accurate data set (Apl < 0.1). While this research lays a foundation for improving the pi prediction algorithm, additional exploration remains necessary for an overall accuracy increase.

Library of Congress Subject Headings

Isoelectric focusing--Data processing; Proteins--Analysis; Evolutionary computation

Publication Date


Document Type


Student Type


Degree Name

Bioinformatics (MS)

Department, Program, or Center

Thomas H. Gosnell School of Life Sciences (COS)


Paul Craig


RIT – Main Campus

Plan Codes