Skip to main content

Enhanced Sequence-Based Function Prediction Methods and Application to Functional Similarity Networks

  • Chapter
  • First Online:
Protein Function Prediction for Omics Era

Abstract

After reviewing the underlying framework required for computational function prediction in the previous chapter, we discuss two advanced sequence-based function prediction methods developed in our group, namely the Protein Function Prediction (PFP) method and the Extended Similarity Group (ESG) method. PFP extends the traditional homology search by incorporating functional associations between pairs of Gene Ontology terms based on the frequencies of co-occurrences in annotation of the same proteins in the database. PFP also considers very weakly similar sequences to the query, thereby increases its sensitivity and ability to predict low resolution functional terms. On the other hand, ESG recursively searches the sequence similarity space around the query to find consensus annotations in the neighborhood. The last part of the chapter discusses the network structure of gene functional space built by connecting proteins with functional similarity. Function annotation was enriched by predictions by PFP. Similarity to structures of protein-protein interaction networks and metabolic pathway networks is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ofran, Y., et al. Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery. Drug Discov. Today 10(21): 1475–1482 (2005).

    Article  PubMed  CAS  Google Scholar 

  2. Hawkins, T., et al. PFP: automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 74(3): 566–582 (2009).

    Article  PubMed  CAS  Google Scholar 

  3. Hawkins, T., Luban, S., Kihara, D. Enhanced automated function prediction using distantly related sequences and contextual association by PFP. Protein Sci. 15(6): 1550–1556 (2006).

    Article  PubMed  CAS  Google Scholar 

  4. Chitale, M., et al. ESG: extended similarity group method for automated protein function prediction. Bioinformatics 25(14): 1739–1745 (2009).

    Article  PubMed  CAS  Google Scholar 

  5. Altschul, S.F., et al. Basic local alignment search tool. J. Mol. Biol. 215(3): 403–410 (1990).

    PubMed  CAS  Google Scholar 

  6. Pearson, W.R. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183: 63–98 (1990).

    Article  PubMed  CAS  Google Scholar 

  7. Pearson, W.R., Lipman, D.J. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85(8): 2444–2448 (1988).

    Article  PubMed  CAS  Google Scholar 

  8. Smith, T.F., Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 147(1): 195–197 (1981).

    Article  PubMed  CAS  Google Scholar 

  9. Brenner, S.E., Chothia, C., Hubbard, T.J. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl. Acad. Sci. USA 95(11): 6073–6078 (1998).

    Article  PubMed  CAS  Google Scholar 

  10. Hulsen, T., et al. Testing statistical significance scores of sequence comparison methods with structure similarity. BMC Bioinformatics 7: 444 (2006).

    Article  PubMed  Google Scholar 

  11. Altschul, S.F., et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17): 3389–3402 (1997).

    Article  PubMed  CAS  Google Scholar 

  12. Pietrokovski, S., Henikoff, J.G. Henikoff, S. The Blocks database – a system for protein classification. Nucleic Acids Res. 24(1): 197–200 (1996).

    Article  PubMed  CAS  Google Scholar 

  13. Hunter, S., et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 37(Database issue): D211–215 (2009).

    Article  PubMed  CAS  Google Scholar 

  14. Finn, R.D., et al. Pfam: clans, web tools and services. Nucleic Acids Res. 34(Database issue): D247–251 (2006).

    Article  PubMed  CAS  Google Scholar 

  15. Attwood, T.K., et al. PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Res. 31(1): 400–402 (2003).

    Article  PubMed  CAS  Google Scholar 

  16. Bru, C., et al. The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res. 33(Database issue): D212–215 (2005).

    Article  PubMed  CAS  Google Scholar 

  17. Hulo, N., et al. The 20 years of PROSITE. Nucleic Acids Res. 36(Database issue): D245–249 (2008).

    Article  PubMed  CAS  Google Scholar 

  18. Letunic, I., et al. SMART 4.0: towards genomic data integration. Nucleic Acids Res. 32(Database issue): D142–144 (2004).

    Article  PubMed  CAS  Google Scholar 

  19. Wilson, D., et al. The SUPERFAMILY database in 2007: families and functions. Nucleic Acids Res. 35(Database issue): D308–313 (2007).

    Article  PubMed  CAS  Google Scholar 

  20. Haft, D.H., Selengut, J.D., White, O. The TIGRFAMs database of protein families. Nucleic Acids Res. 31(1): 371–373 (2003).

    Article  PubMed  CAS  Google Scholar 

  21. Hawkins, T., Chitale, M., Kihara, D. New paradigm in protein function prediction for large scale omics analysis. Mol. Biosyst. 4(3): 223–231 (2008).

    Article  PubMed  CAS  Google Scholar 

  22. Chitale, M., Hawkins, T., Kihara, D. Automated prediction of protein function from sequence. Prediction of protein strucutre, functions, and interactions. Bujnicki, J.M. (ed.). New York, NY: Wiley, pp. 63–86 (2009).

    Google Scholar 

  23. Kaminska, K.H., Milanowska, K., Bujnicki, J.M. The basics of protein sequence analysis. Prediction of protein structures, functions, and interactions. Bujnicki, J.M. (ed.). New York, NY: Wiley, pp. 1–38 (2009).

    Google Scholar 

  24. Hennig, S., Groth, D., Lehrach, H. Automated Gene Ontology annotation for anonymous sequence data. Nucleic Acids Res. 31(13): 3712–3715 (2003).

    Article  PubMed  CAS  Google Scholar 

  25. Zehetner, G. OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms. Nucleic Acids Res. 31(13): 3799–3803 (2003).

    Article  PubMed  CAS  Google Scholar 

  26. Khan, S., et al. GoFigure: automated Gene Ontology annotation. Bioinformatics 19(18): 2484–2485 (2003).

    Article  PubMed  CAS  Google Scholar 

  27. Martin, D.M., Berriman, M., Barton, G.J. GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinformatics 5: 178 (2004).

    Article  PubMed  Google Scholar 

  28. Vinayagam, A., et al. GOPET: a tool for automated predictions of Gene Ontology terms. BMC Bioinformatics 7: 161 (2006).

    Article  PubMed  Google Scholar 

  29. Wass, M.N., Sternberg, M.J. ConFunc – functional annotation in the twilight zone. Bioinformatics 24(6): 798–806 (2008).

    Article  PubMed  CAS  Google Scholar 

  30. Barrell, D., et al. The GOA database in 2009 – an integrated Gene Ontology Annotation resource. Nucleic Acids Res. 37(Database issue): D396–403 (2009).

    Google Scholar 

  31. Gattiker, A., et al. Automated annotation of microbial proteomes in SWISS-PROT. Comput. Biol. Chem. 27(1): 49–58 (2003).

    Article  PubMed  CAS  Google Scholar 

  32. The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res. 38(Database issue): D142–148.

    Google Scholar 

  33. Zdobnov, E.M., Apweiler, R. InterProScan – an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17(9): 847–848 (2001).

    Article  PubMed  CAS  Google Scholar 

  34. Schlicker, A., et al. A new measure for functional similarity of gene products based on Gene Ontology. BMC Bioinformatics 7: 302 (2006).

    Article  PubMed  Google Scholar 

  35. Hawkins, T., Chitale, M., Kihara, D. Functional enrichment analyses and construction of functional similarity networks with high confidence function prediction by PFP. BMC Bioinformatics 11: 265 (2010).

    Article  PubMed  Google Scholar 

  36. Barabasi, A.L., Oltvai, Z.N. Network biology: understanding the cell's functional organization. Nat. Rev. Genet. 5(2): 101–13 (2004).

    Article  PubMed  CAS  Google Scholar 

  37. Ravasz, E., et al. Hierarchical organization of modularity in metabolic networks. Science 297(5586): 1551–5 (2002).

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

MC is supported by grants from Purdue Research Foundation and the Showalter Trust. DK also acknowledges a grant from National Institutes of Health (GM075004) and National Science Foundation (DMS800568, EF0850009, IIS0915801).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daisuke Kihara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Chitale, M., Kihara, D. (2011). Enhanced Sequence-Based Function Prediction Methods and Application to Functional Similarity Networks. In: Kihara, D. (eds) Protein Function Prediction for Omics Era. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-0881-5_2

Download citation

Publish with us

Policies and ethics