Skip to main content

InterPro Protein Classification

  • Protocol
  • First Online:
Bioinformatics for Comparative Proteomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 694))

Abstract

Improvements in nucleotide sequencing technology have resulted in an ever increasing number of nucleotide and protein sequences being deposited in databases. Unfortunately, the ability to manually classify and annotate these sequences cannot keep pace with their rapid generation, resulting in an increased bias toward unannotated sequence. Automatic annotation tools can help redress the balance. There are a number of different groups working to produce protein signatures that describe protein families, functional domains or conserved sites within related groups of proteins. Protein signature databases include CATH-Gene3D, HAMAP, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY, and TIGRFAMs. Their approaches range from characterising small conserved motifs that can identify members of a family or subfamily, to the use of hidden Markov models that describe the conservation of residues over entire domains or whole proteins. To increase their value as protein classification tools, protein signatures from these 11 databases have been combined into one, powerful annotation tool: the InterPro database (http://www.ebi.ac.uk/interpro/) (Hunter et al., Nucleic Acids Res 37:D211–D215, 2009). InterPro is an open-source protein resource used for the automatic annotation of proteins, and is scalable to the analysis of entire new genomes through the use of a downloadable version of InterProScan, which can be incorporated into an existing local pipeline. InterPro provides structural information from PDB (Kouranov et al., Nucleic Acids Res 34:D302–D305, 2006), its classification in CATH (Cuff et al., Nucleic Acids Res 37:D310–D314, 2009) and SCOP (Andreeva et al., Nucleic Acids Res 36:D419–D425, 2008), as well as homology models from ModBase (Pieper et al., Nucleic Acids Res 37:D347–D354, 2009) and SwissModel (Kiefer et al., Nucleic Acids Res 37:D387–D392, 2009), allowing a direct comparison of the protein signatures with the available structural information. This chapter reviews the signature methods found in the InterPro database, and provides an overview of the InterPro resource itself.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJ, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C. (2009) InterPro: the integrative protein signature database. Nucleic Acids Res. 37, D211–D215.

    Article  PubMed  CAS  Google Scholar 

  2. Kouranov A, Xie L, de la Cruz J, Chen L, Westbrook J, Bourne PE, Berman HM. (2006) The RCSB PDB information portal for structural genomics. Nucleic Acids Res. 34, D302–D305.

    Article  PubMed  CAS  Google Scholar 

  3. Cuff AL, Sillitoe I, Lewis T, Redfern OC, Garratt R, Thornton J, Orengo CA. (2009) The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies. Nucleic Acids Res. 37, D310–D314.

    Article  PubMed  CAS  Google Scholar 

  4. Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG. (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 36, D419–D425.

    Article  PubMed  CAS  Google Scholar 

  5. Pieper U, Eswar N, Webb BM, Eramian D, Kelly L, Barkan DT, Carter H, Mankoo P, Karchin R, Marti-Renom MA, Davis FP, Sali A. (2009) MODBASE, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 37, D347–D354.

    Article  PubMed  CAS  Google Scholar 

  6. Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T. (2009) The SWISS-MODEL Repository and associated resources. Nucleic Acids Res. 37, D387–D392.

    Article  PubMed  CAS  Google Scholar 

  7. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. (1990) Basic local alignment search tool. J Mol Biol. 215, 403–410.

    PubMed  CAS  Google Scholar 

  8. Pearson WR. (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98.

    Article  PubMed  CAS  Google Scholar 

  9. UniProt Consortium. (2009) The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res. 37, D169–D174.

    Article  Google Scholar 

  10. Servant F, Bru C, Carrère S, Courcelle E, Gouzy J, Peyruc D, Kahn D. (2002) ProDom: automated clustering of homologous domains. Brief Bioinform. 3(3), 246–251.

    Article  PubMed  CAS  Google Scholar 

  11. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402.

    Article  PubMed  CAS  Google Scholar 

  12. Sigrist CJA, Cerutti L, Hulo N, Gattiker A, Falquet L, Pagni M, Bairoch A, Bucher P. (2002) PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform. 3, 265–274.

    Article  PubMed  CAS  Google Scholar 

  13. Gribskov M, Lüthy R, Eisenberg D. (1990) Profile analysis. Methods Enzymol. 183, 146–159.

    Article  PubMed  CAS  Google Scholar 

  14. Lima T, Auchincloss AH, Coudert E, Keller G, Michoud K, Rivoire C, Bulliard V, de Castro E, Lachaize C, Baratin D, Phan I, Bougueleret L, Bairoch A. (2009) HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot. Nucleic Acids Res. 37, D471–D478.

    Article  PubMed  CAS  Google Scholar 

  15. Attwood TK. (2002) The PRINTS database: a resource for identification of protein families. Brief Bioinform. 3(3), 252–263.

    Article  PubMed  CAS  Google Scholar 

  16. Krogh A, Brown M, Mian IS, Sjölander K, Haussler D. (1994) Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 235(5), 1501–1531.

    Article  PubMed  CAS  Google Scholar 

  17. Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A. (2008) The Pfam protein families database. Nucleic Acids Res. 36, D281–D288.

    Article  PubMed  CAS  Google Scholar 

  18. Heger A, Wilton CA, Sivakumar A, Holm L. (2005) ADDA: a domain database with global coverage of the protein universe. Nucleic Acids Res. 33, D188–D191.

    Article  PubMed  CAS  Google Scholar 

  19. Letunic I, Doerks T, Bork P. (2009) SMART 6: recent updates and new developments. Nucleic Acids Res. 37, D229–D232.

    Article  PubMed  CAS  Google Scholar 

  20. Haft DH, Selengut JD, White O. (2003) The TIGRFAMs database of protein families. Nucleic Acids Res. 31(1), 371–373.

    Article  PubMed  CAS  Google Scholar 

  21. Wu CH, Nikolskaya A, Huang H, Yeh LS, Natale DA, Vinayaka CR, Hu ZZ, Mazumder R, Kumar S, Kourtesis P, Ledley RS, Suzek BE, Arminski L, Chen Y, Zhang J, Cardenas JL, Chung S, Castro-Alvear J, Dinkov G, Barker WC. (2004) PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Res. 32, D112–D114.

    Article  PubMed  CAS  Google Scholar 

  22. Mi H, Lazareva-Ulitsky B, Loo R, Kejariwal A, Vandergriff J, Rabkin S, Guo N, Muruganujan A, Doremieux O, Campbell MJ, Kitano H, Thomas PD. (2005) The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res. 33, D284–D288.

    Article  PubMed  CAS  Google Scholar 

  23. Wilson D, Pethica R, Zhou Y, Talbot C, Vogel C, Madera M, Chothia C, Gough J. (2009) SUPERFAMILY – sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res. 37, D380–D386.

    Article  PubMed  CAS  Google Scholar 

  24. Yeats C, Lees J, Reid A, Kellam P, Martin N, Liu X, Orengo C. (2008) Gene3D: comprehensive structural and functional annotation of genomes. Nucleic Acids Res. 36, D414–D418.

    Article  PubMed  CAS  Google Scholar 

  25. Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R. (2005) InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116–W120.

    Article  PubMed  CAS  Google Scholar 

  26. Haider S, Ballester B, Smedley D, Zhang J, Rice P, Kasprzyk A. (2009) BioMart Central Portal – unified access to biological data. Nucleic Acids Res. 37, W23–W27.

    Article  PubMed  CAS  Google Scholar 

  27. Jones P, Côté RG, Cho SY, Klie S, Martens L, Quinn AF, Thorneycroft D, Hermjakob H. (2008) PRIDE: new developments and new datasets. Nucleic Acids Res. 36, D878–D883.

    Article  PubMed  CAS  Google Scholar 

  28. Joshi-Tope G, Gillespie M, Vastrik I, D’Eustachio P, Schmidt E, de Bono B, Jassal B, Gopinath GR, Wu GR, Matthews L, Lewis S, Birney E, Stein L. (2005) Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33, D428–D432.

    Article  PubMed  CAS  Google Scholar 

  29. Reference Genome Group of the Gene Ontology Consortium. (2009) The Gene Ontology’s Reference Genome Project: a unified framework for functional annotation across species. PLoS Comput Biol. 5(7), e1000431.

    Article  Google Scholar 

  30. Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R, Kohler C, Khadake J, Leroy C, Liban A, Lieftink C, Montecchi-Palazzi L, Orchard S, Risse J, Robbe K, Roechert B, Thorneycroft D, Zhang Y, Apweiler R, Hermjakob H. (2007) IntAct – open source resource for molecular interaction data. Nucleic Acids Res. 35, D561–D565.

    Article  PubMed  CAS  Google Scholar 

  31. Fleischmann A, Darsow M, Degtyarenko K, Fleischmann W, Boyce S, Axelsen KB, Bairoch A, Schomburg D, Tipton KF, Apweiler R. (2004) IntEnz, the integrated relational enzyme database. Nucleic Acids Res. 32, D434–D437.

    Article  PubMed  CAS  Google Scholar 

  32. Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B. (2009) The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res. 37, D233–D238.

    Article  PubMed  CAS  Google Scholar 

  33. Harmar AJ, Hills RA, Rosser EM, Jones M, Buneman OP, Dunbar DR, Greenhill SD, Hale VA, Sharman JL, Bonner TI, Catterall WA, Davenport AP, Delagrange P, Dollery CT, Foord SM, Gutman GA, Laudet V, Neubig RR, Ohlstein EH, Olsen RW, Peters J, Pin JP, Ruffolo RR, Searls DB, Wright MW, Spedding M. (2009) IUPHAR-DB: the IUPHAR database of G protein-coupled receptors and ion channels. Nucleic Acids Res. 37, D680–D685.

    Article  PubMed  CAS  Google Scholar 

  34. Degtyarenko K, Contrino S. (2004) COMe: the ontology of bioinorganic proteins. BMC Struct Biol. 4, 3.

    Article  PubMed  Google Scholar 

  35. Rawlings ND, Morton FR, Kok CY, Kong J, Barrett AJ. (2008) MEROPS: the peptidase database. Nucleic Acids Res. 36, D320–D325.

    Article  PubMed  CAS  Google Scholar 

  36. Whelan S, de Bakker PI, Quevillon E, Rodriguez N, Goldman N. (2006) PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees. Nucleic Acids Res. 34, D327–D331.

    Article  PubMed  CAS  Google Scholar 

  37. Golovin A, Henrick K. (2008) MSDmotif: exploring protein sites and motifs. BMC Bioinformatics. 9, 312.

    Article  PubMed  Google Scholar 

  38. Petryszak R, Kretschmann E, Wieser D, Apweiler R. (2005) The predictive power of the CluSTr database. Bioinformatics. 21(18), 3604–3609.

    Article  PubMed  CAS  Google Scholar 

  39. Haft DH, Selengut JD, Brinkac LM, Zafar N, White O. (2005) Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics. Bioinformatics. 21(3), 293–306.

    Article  PubMed  CAS  Google Scholar 

  40. Jimenez RC, Quinn AF, Garcia A, Labarga A, O’Neill K, Martinez F, Salazar GA, Hermjakob H. (2008) Dasty2, an Ajax protein DAS client. Bioinformatics. 21(14), 3198–3199.

    Google Scholar 

  41. Prlić A, Down TA, Hubbard TJ. (2005) Adding some SPICE to DAS. Bioinformatics. 21(Suppl 2), ii40–ii41.

    Article  PubMed  Google Scholar 

  42. Hartshorn MJ. (2002) AstexViewer: a visualisation aid for structure-based drug design. J Comput Aided Mol Des. 16(12), 871–881.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

McDowall, J., Hunter, S. (2011). InterPro Protein Classification. In: Wu, C., Chen, C. (eds) Bioinformatics for Comparative Proteomics. Methods in Molecular Biology, vol 694. Humana Press. https://doi.org/10.1007/978-1-60761-977-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-60761-977-2_3

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-60761-976-5

  • Online ISBN: 978-1-60761-977-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics