Protein-Centric Data Integration for Functional Analysis of Comparative Proteomics Data
High-throughput proteomic, microarray, protein interaction and other experimental methods all generate long lists of proteins and/or genes that have been identified or have varied in accumulation under the experimental conditions studied. These lists can be difficult to sort through for Biologists to make sense of. Here we describe a next step in data analysis – a bottom-up approach at data integration – starting with protein sequence identifications, mapping them to a common representation of the protein and then bringing in a wide variety of structural, functional, genetic, and disease information related to proteins derived from annotated knowledge bases and then using this information to categorize the lists using Gene Ontology (GO) terms and mappings to biological pathway databases. We illustrate with examples how this can aid in identifying important processes from large complex lists.
Key wordsGene Ontology Biological pathways Protein database UniProtKB Proteomics Bioinformatics