Matthew Conte


Proteins often do not migrate as expected in two dimensional electrophoresis based on their primary sequence. The predicted isoelectric point (pI) frequently does not coincide with experimental pI values obtained in the laboratory. The reasons for these differences led to this study. Initially, 2DE data from the E. coli proteome was collected and formatted. This dataset was split into three parts each consisting of different levels of pI discrepancy (ΔpI). The protein sequence data for each ΔpI subset was run through a pipeline. At each stage of the pipeline the data were analyzed by comparing each of the three ΔpI subsets to one another. The pipeline consisted of a naïve approach (considering individual amino acid frequencies), followed by the application four different alphabets to represent sequences in a simpler way by grouping similar amino acids based on their charge, functional, chemical, and hydrophobic properties . The final step in the pipeline involved investigating the dipeptides of all of these sequences using both the 20 amino acid alphabet and the simplified groupings. An evaluation of the alphabet dipeptide analysis demonstrated the existence of certain dipeptide sequences which correlate well with differences between predicted pI and experimental pI.

Library of Congress Subject Headings

Amino acid sequence; Isoelectric focusing; Proteins--Analysis; Bioinformatics

Publication Date

Summer 2005

Document Type


Student Type


Degree Name

Bioinformatics (MS)

Department, Program, or Center

Thomas H. Gosnell School of Life Sciences (COS)


Gary R. Skuse

Advisor/Committee Member

Paul A. Craig

Advisor/Committee Member

Douglas P. Merrill


Physical copy available from RIT's Wallace Library at QP551 .C66 2005


RIT – Main Campus