Abstract
After reviewing the underlying framework required for computational function prediction in the previous chapter, we discuss two advanced sequence-based function prediction methods developed in our group, namely the Protein Function Prediction (PFP) method and the Extended Similarity Group (ESG) method. PFP extends the traditional homology search by incorporating functional associations between pairs of Gene Ontology terms based on the frequencies of co-occurrences in annotation of the same proteins in the database. PFP also considers very weakly similar sequences to the query, thereby increases its sensitivity and ability to predict low resolution functional terms. On the other hand, ESG recursively searches the sequence similarity space around the query to find consensus annotations in the neighborhood. The last part of the chapter discusses the network structure of gene functional space built by connecting proteins with functional similarity. Function annotation was enriched by predictions by PFP. Similarity to structures of protein-protein interaction networks and metabolic pathway networks is discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ofran, Y., et al. Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery. Drug Discov. Today 10(21): 1475–1482 (2005).
Hawkins, T., et al. PFP: automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 74(3): 566–582 (2009).
Hawkins, T., Luban, S., Kihara, D. Enhanced automated function prediction using distantly related sequences and contextual association by PFP. Protein Sci. 15(6): 1550–1556 (2006).
Chitale, M., et al. ESG: extended similarity group method for automated protein function prediction. Bioinformatics 25(14): 1739–1745 (2009).
Altschul, S.F., et al. Basic local alignment search tool. J. Mol. Biol. 215(3): 403–410 (1990).
Pearson, W.R. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183: 63–98 (1990).
Pearson, W.R., Lipman, D.J. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85(8): 2444–2448 (1988).
Smith, T.F., Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 147(1): 195–197 (1981).
Brenner, S.E., Chothia, C., Hubbard, T.J. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl. Acad. Sci. USA 95(11): 6073–6078 (1998).
Hulsen, T., et al. Testing statistical significance scores of sequence comparison methods with structure similarity. BMC Bioinformatics 7: 444 (2006).
Altschul, S.F., et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17): 3389–3402 (1997).
Pietrokovski, S., Henikoff, J.G. Henikoff, S. The Blocks database – a system for protein classification. Nucleic Acids Res. 24(1): 197–200 (1996).
Hunter, S., et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 37(Database issue): D211–215 (2009).
Finn, R.D., et al. Pfam: clans, web tools and services. Nucleic Acids Res. 34(Database issue): D247–251 (2006).
Attwood, T.K., et al. PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res. 31(1): 400–402 (2003).
Bru, C., et al. The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res. 33(Database issue): D212–215 (2005).
Hulo, N., et al. The 20 years of PROSITE. Nucleic Acids Res. 36(Database issue): D245–249 (2008).
Letunic, I., et al. SMART 4.0: towards genomic data integration. Nucleic Acids Res. 32(Database issue): D142–144 (2004).
Wilson, D., et al. The SUPERFAMILY database in 2007: families and functions. Nucleic Acids Res. 35(Database issue): D308–313 (2007).
Haft, D.H., Selengut, J.D., White, O. The TIGRFAMs database of protein families. Nucleic Acids Res. 31(1): 371–373 (2003).
Hawkins, T., Chitale, M., Kihara, D. New paradigm in protein function prediction for large scale omics analysis. Mol. Biosyst. 4(3): 223–231 (2008).
Chitale, M., Hawkins, T., Kihara, D. Automated prediction of protein function from sequence. Prediction of protein strucutre, functions, and interactions. Bujnicki, J.M. (ed.). New York, NY: Wiley, pp. 63–86 (2009).
Kaminska, K.H., Milanowska, K., Bujnicki, J.M. The basics of protein sequence analysis. Prediction of protein structures, functions, and interactions. Bujnicki, J.M. (ed.). New York, NY: Wiley, pp. 1–38 (2009).
Hennig, S., Groth, D., Lehrach, H. Automated Gene Ontology annotation for anonymous sequence data. Nucleic Acids Res. 31(13): 3712–3715 (2003).
Zehetner, G. OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms. Nucleic Acids Res. 31(13): 3799–3803 (2003).
Khan, S., et al. GoFigure: automated Gene Ontology annotation. Bioinformatics 19(18): 2484–2485 (2003).
Martin, D.M., Berriman, M., Barton, G.J. GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinformatics 5: 178 (2004).
Vinayagam, A., et al. GOPET: a tool for automated predictions of Gene Ontology terms. BMC Bioinformatics 7: 161 (2006).
Wass, M.N., Sternberg, M.J. ConFunc – functional annotation in the twilight zone. Bioinformatics 24(6): 798–806 (2008).
Barrell, D., et al. The GOA database in 2009 – an integrated Gene Ontology Annotation resource. Nucleic Acids Res. 37(Database issue): D396–403 (2009).
Gattiker, A., et al. Automated annotation of microbial proteomes in SWISS-PROT. Comput. Biol. Chem. 27(1): 49–58 (2003).
The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res. 38(Database issue): D142–148.
Zdobnov, E.M., Apweiler, R. InterProScan – an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17(9): 847–848 (2001).
Schlicker, A., et al. A new measure for functional similarity of gene products based on Gene Ontology. BMC Bioinformatics 7: 302 (2006).
Hawkins, T., Chitale, M., Kihara, D. Functional enrichment analyses and construction of functional similarity networks with high confidence function prediction by PFP. BMC Bioinformatics 11: 265 (2010).
Barabasi, A.L., Oltvai, Z.N. Network biology: understanding the cell's functional organization. Nat. Rev. Genet. 5(2): 101–13 (2004).
Ravasz, E., et al. Hierarchical organization of modularity in metabolic networks. Science 297(5586): 1551–5 (2002).
Acknowledgements
MC is supported by grants from Purdue Research Foundation and the Showalter Trust. DK also acknowledges a grant from National Institutes of Health (GM075004) and National Science Foundation (DMS800568, EF0850009, IIS0915801).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Chitale, M., Kihara, D. (2011). Enhanced Sequence-Based Function Prediction Methods and Application to Functional Similarity Networks. In: Kihara, D. (eds) Protein Function Prediction for Omics Era. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-0881-5_2
Download citation
DOI: https://doi.org/10.1007/978-94-007-0881-5_2
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-0880-8
Online ISBN: 978-94-007-0881-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)