Generating motifs from known active sites and matching those motifs to an uncharacterized protein is a classic way of determining protein function. Until now, the generation of motifs has been based purely on enzymatic function. This approach does not account for situations where highly different active sites can arrive at the same function by processes like convergent evolution. As such, a secondary metric on which to base the generation of motifs is necessary. This metric exists in the form of UniProt designation for homologous proteins on a global scale or PFam for designation of homologous proteins at the active site level.

Here, we describe a tool to generate highly selective motifs using the aforementioned metrics. We were able to collapse a large number of proteins into their representative motifs with little loss in sensitivity, creating an “average” representation of each motif. These motifs will aid the characterizing proteins of known structure but unknown function.

Library of Congress Subject Headings

Proteins--Analysis--Data processing; Proteomics

Publication Date


Document Type


Student Type


Degree Name

Bioinformatics (MS)

Department, Program, or Center

Thomas H. Gosnell School of Life Sciences (COS)


Paul A. Craig

Advisor/Committee Member

Gary Skuse

Advisor/Committee Member

Feng Cui


RIT – Main Campus

Plan Codes