Enhanced Sequence-Based Function Prediction Methods and Application to Functional Similarity Networks

Chitale, Meghana; Kihara, Daisuke

doi:10.1007/978-94-007-0881-5_2

Meghana Chitale² &
Daisuke Kihara³

1022 Accesses
1 Citations

Abstract

After reviewing the underlying framework required for computational function prediction in the previous chapter, we discuss two advanced sequence-based function prediction methods developed in our group, namely the Protein Function Prediction (PFP) method and the Extended Similarity Group (ESG) method. PFP extends the traditional homology search by incorporating functional associations between pairs of Gene Ontology terms based on the frequencies of co-occurrences in annotation of the same proteins in the database. PFP also considers very weakly similar sequences to the query, thereby increases its sensitivity and ability to predict low resolution functional terms. On the other hand, ESG recursively searches the sequence similarity space around the query to find consensus annotations in the neighborhood. The last part of the chapter discusses the network structure of gene functional space built by connecting proteins with functional similarity. Function annotation was enriched by predictions by PFP. Similarity to structures of protein-protein interaction networks and metabolic pathway networks is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ofran, Y., et al. Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery. Drug Discov. Today 10(21): 1475–1482 (2005).
Article PubMed CAS Google Scholar
Hawkins, T., et al. PFP: automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 74(3): 566–582 (2009).
Article PubMed CAS Google Scholar
Hawkins, T., Luban, S., Kihara, D. Enhanced automated function prediction using distantly related sequences and contextual association by PFP. Protein Sci. 15(6): 1550–1556 (2006).
Article PubMed CAS Google Scholar
Chitale, M., et al. ESG: extended similarity group method for automated protein function prediction. Bioinformatics 25(14): 1739–1745 (2009).
Article PubMed CAS Google Scholar
Altschul, S.F., et al. Basic local alignment search tool. J. Mol. Biol. 215(3): 403–410 (1990).
PubMed CAS Google Scholar
Pearson, W.R. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183: 63–98 (1990).
Article PubMed CAS Google Scholar
Pearson, W.R., Lipman, D.J. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85(8): 2444–2448 (1988).
Article PubMed CAS Google Scholar
Smith, T.F., Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 147(1): 195–197 (1981).
Article PubMed CAS Google Scholar
Brenner, S.E., Chothia, C., Hubbard, T.J. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl. Acad. Sci. USA 95(11): 6073–6078 (1998).
Article PubMed CAS Google Scholar
Hulsen, T., et al. Testing statistical significance scores of sequence comparison methods with structure similarity. BMC Bioinformatics 7: 444 (2006).
Article PubMed Google Scholar
Altschul, S.F., et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17): 3389–3402 (1997).
Article PubMed CAS Google Scholar
Pietrokovski, S., Henikoff, J.G. Henikoff, S. The Blocks database – a system for protein classification. Nucleic Acids Res. 24(1): 197–200 (1996).
Article PubMed CAS Google Scholar
Hunter, S., et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 37(Database issue): D211–215 (2009).
Article PubMed CAS Google Scholar
Finn, R.D., et al. Pfam: clans, web tools and services. Nucleic Acids Res. 34(Database issue): D247–251 (2006).
Article PubMed CAS Google Scholar
Attwood, T.K., et al. PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res. 31(1): 400–402 (2003).
Article PubMed CAS Google Scholar
Bru, C., et al. The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res. 33(Database issue): D212–215 (2005).
Article PubMed CAS Google Scholar
Hulo, N., et al. The 20 years of PROSITE. Nucleic Acids Res. 36(Database issue): D245–249 (2008).
Article PubMed CAS Google Scholar
Letunic, I., et al. SMART 4.0: towards genomic data integration. Nucleic Acids Res. 32(Database issue): D142–144 (2004).
Article PubMed CAS Google Scholar
Wilson, D., et al. The SUPERFAMILY database in 2007: families and functions. Nucleic Acids Res. 35(Database issue): D308–313 (2007).
Article PubMed CAS Google Scholar
Haft, D.H., Selengut, J.D., White, O. The TIGRFAMs database of protein families. Nucleic Acids Res. 31(1): 371–373 (2003).
Article PubMed CAS Google Scholar
Hawkins, T., Chitale, M., Kihara, D. New paradigm in protein function prediction for large scale omics analysis. Mol. Biosyst. 4(3): 223–231 (2008).
Article PubMed CAS Google Scholar
Chitale, M., Hawkins, T., Kihara, D. Automated prediction of protein function from sequence. Prediction of protein strucutre, functions, and interactions. Bujnicki, J.M. (ed.). New York, NY: Wiley, pp. 63–86 (2009).
Google Scholar
Kaminska, K.H., Milanowska, K., Bujnicki, J.M. The basics of protein sequence analysis. Prediction of protein structures, functions, and interactions. Bujnicki, J.M. (ed.). New York, NY: Wiley, pp. 1–38 (2009).
Google Scholar
Hennig, S., Groth, D., Lehrach, H. Automated Gene Ontology annotation for anonymous sequence data. Nucleic Acids Res. 31(13): 3712–3715 (2003).
Article PubMed CAS Google Scholar
Zehetner, G. OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms. Nucleic Acids Res. 31(13): 3799–3803 (2003).
Article PubMed CAS Google Scholar
Khan, S., et al. GoFigure: automated Gene Ontology annotation. Bioinformatics 19(18): 2484–2485 (2003).
Article PubMed CAS Google Scholar
Martin, D.M., Berriman, M., Barton, G.J. GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinformatics 5: 178 (2004).
Article PubMed Google Scholar
Vinayagam, A., et al. GOPET: a tool for automated predictions of Gene Ontology terms. BMC Bioinformatics 7: 161 (2006).
Article PubMed Google Scholar
Wass, M.N., Sternberg, M.J. ConFunc – functional annotation in the twilight zone. Bioinformatics 24(6): 798–806 (2008).
Article PubMed CAS Google Scholar
Barrell, D., et al. The GOA database in 2009 – an integrated Gene Ontology Annotation resource. Nucleic Acids Res. 37(Database issue): D396–403 (2009).
Google Scholar
Gattiker, A., et al. Automated annotation of microbial proteomes in SWISS-PROT. Comput. Biol. Chem. 27(1): 49–58 (2003).
Article PubMed CAS Google Scholar
The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res. 38(Database issue): D142–148.
Google Scholar
Zdobnov, E.M., Apweiler, R. InterProScan – an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17(9): 847–848 (2001).
Article PubMed CAS Google Scholar
Schlicker, A., et al. A new measure for functional similarity of gene products based on Gene Ontology. BMC Bioinformatics 7: 302 (2006).
Article PubMed Google Scholar
Hawkins, T., Chitale, M., Kihara, D. Functional enrichment analyses and construction of functional similarity networks with high confidence function prediction by PFP. BMC Bioinformatics 11: 265 (2010).
Article PubMed Google Scholar
Barabasi, A.L., Oltvai, Z.N. Network biology: understanding the cell's functional organization. Nat. Rev. Genet. 5(2): 101–13 (2004).
Article PubMed CAS Google Scholar
Ravasz, E., et al. Hierarchical organization of modularity in metabolic networks. Science 297(5586): 1551–5 (2002).
Article PubMed CAS Google Scholar

Download references

Acknowledgements

MC is supported by grants from Purdue Research Foundation and the Showalter Trust. DK also acknowledges a grant from National Institutes of Health (GM075004) and National Science Foundation (DMS800568, EF0850009, IIS0915801).

Author information

Authors and Affiliations

Department of Computer Science, College of Science, Purdue University, West Lafayette, IN, USA
Meghana Chitale
Department of Biological Sciences; Department of Computer Science, Markey Center for Structural Biology, College of Science, Purdue University, West Lafayette, IN, 47907, USA
Daisuke Kihara

Authors

Meghana Chitale
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Kihara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daisuke Kihara .

Editor information

Editors and Affiliations

Dept. Biological Science, Purdue University, N. University St. 305, West Lafayette, 47907-2107, Indiana, USA
Daisuke Kihara

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chitale, M., Kihara, D. (2011). Enhanced Sequence-Based Function Prediction Methods and Application to Functional Similarity Networks. In: Kihara, D. (eds) Protein Function Prediction for Omics Era. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-0881-5_2

Download citation

DOI: https://doi.org/10.1007/978-94-007-0881-5_2
Published: 29 March 2011
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-0880-8
Online ISBN: 978-94-007-0881-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics