Abstract
Proteins often do not migrate as expected in two dimensional electrophoresis based on their primary sequence. The predicted isoelectric point (pI) frequently does not coincide with experimental pI values obtained in the laboratory. The reasons for these differences led to this study. Initially, 2DE data from the E. coli proteome was collected and formatted. This dataset was split into three parts each consisting of different levels of pI discrepancy (ΔpI). The protein sequence data for each ΔpI subset was run through a pipeline. At each stage of the pipeline the data were analyzed by comparing each of the three ΔpI subsets to one another. The pipeline consisted of a naïve approach (considering individual amino acid frequencies), followed by the application four different alphabets to represent sequences in a simpler way by grouping similar amino acids based on their charge, functional, chemical, and hydrophobic properties . The final step in the pipeline involved investigating the dipeptides of all of these sequences using both the 20 amino acid alphabet and the simplified groupings. An evaluation of the alphabet dipeptide analysis demonstrated the existence of certain dipeptide sequences which correlate well with differences between predicted pI and experimental pI.
Library of Congress Subject Headings
Amino acid sequence; Isoelectric focusing; Proteins--Analysis; Bioinformatics
Publication Date
Summer 2005
Document Type
Thesis
Student Type
Graduate
Degree Name
Bioinformatics (MS)
Department, Program, or Center
Thomas H. Gosnell School of Life Sciences (COS)
Advisor
Gary R. Skuse
Advisor/Committee Member
Paul A. Craig
Advisor/Committee Member
Douglas P. Merrill
Recommended Citation
Conte, Matthew, "Isoelectric point prediction from the amino acid sequence of a protein" (2005). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/8076
Campus
RIT – Main Campus
Comments
Physical copy available from RIT's Wallace Library at QP551 .C66 2005