InterPro and InterProScan

Tools for Protein Sequence Classification and Comparison
  • Nicola Mulder
  • Rolf Apweiler
Part of the Methods In Molecular Biology™ book series (MIMB, volume 396)


Protein sequence classification and comparison has become increasingly important in the current “omics” revolution, where scientists are working on functional genomics and proteomics technologies for large-scale protein function prediction. However, functional classification is also important for the bench scientist wanting to analyze single or small sets of proteins, or even a single genome. A number of tools are available for sequence classification, such as sequence similarity searches, motif- or pattern-finding software, and protein signatures for identifying protein families and domains. One such tool, InterPro, is a documentation resource that integrates the major players in the protein signature field to provide a valuable tool for annotation of proteins. Protein sequences are searched using the InterProScan software to identify signatures from the InterPro member databases; Pfam, PROSITE, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D, and PANTHER. The InterPro database can be searched to retrieve precalculated matches for UniProtKB proteins, or to find additional information on protein families and domains. For completely sequenced genomes, the user can retrieve InterPro-based analyses on all nonredundant proteins in the proteome, and can execute user-selected proteome comparisons. This chapter will describe how to use InterPro and InterProScan for protein sequence classification and comparative proteomics


Functional classification domain protein family comparative proteomics InterPro 


  1. 1.
    Finn, R. D., Mistry, J., Schuster-Bockler, B., et al. (2006) Pfam: clans, web tools and services. Nucleic Acids Res. 34, D247–D251.PubMedCrossRefGoogle Scholar
  2. 2.
    Hulo, N., Bairoch, A., Bulliard, V., et al. (2006) The PROSITE database. Nucleic Acids Res. 34, D227–D230.PubMedCrossRefGoogle Scholar
  3. 3.
    Attwood, T. K., Bradley, P., Flower, D. R., et al. (2003) PRINTS and its automatic supplement pre-PRINTS. Nucleic Acids Res. 31, 400–402.PubMedCrossRefGoogle Scholar
  4. 4.
    Letunic, I., Copley, R. R., Pils, B., Pinkert, S., Schultz, J., and Bork, P. (2006) SMART 5: domains in the context of genomes and networks. Nucleic Acids Res. 34, D257–D260.PubMedCrossRefGoogle Scholar
  5. 5.
    Haft, D. H., Selengut, J. D., and White, O. (2003) The TIGRFAMs database of protein families. Nucleic Acids Res. 31, 371–373.PubMedCrossRefGoogle Scholar
  6. 6.
    Bru, C., Courcelle, E., Carrere, S., Beausse, Y., Dalmar, S., and Kahn, D. (2005). The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res. 33, D212–D215.PubMedCrossRefGoogle Scholar
  7. 7.
    Wu, C. H., Nikolskaya, A., Huang, H., et al. (2004) PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Res. 32, D112–D114.PubMedCrossRefGoogle Scholar
  8. 8.
    Madera, M., Vogel, C., Kummerfeld, S. K., Chothia, C., and Gough, J. (2004) The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res. 32, D235–D239.PubMedCrossRefGoogle Scholar
  9. 9.
    Pearl, F., Todd, A., Sillitoe, I., et al. (2005) The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res. 33, D247–D251.PubMedCrossRefGoogle Scholar
  10. 10.
    Mi, H., Lazareva-Ulitsky, B., Loo, R., et al. (2005) The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res. 32, D284–D288.Google Scholar
  11. 11.
    Mulder, N. J., Apweiler, R., Attwood, T. K., et al. (2005) InterPro, progress and status in 2005. Nucleic Acids Res. 33, D201–D205.PubMedCrossRefGoogle Scholar
  12. 12.
    Harris, M. A., Clark, J., Ireland, A., et al. (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, 258–261.CrossRefGoogle Scholar
  13. 13.
    Quevillon, E., Silventoinen, V., Pillai, S., et al. (2005) InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116–W120.PubMedCrossRefGoogle Scholar
  14. 14.
    Wu, C. H., Apweiler, R., Bairoch, A., et al. (2006) The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 34, D187–D191.PubMedCrossRefGoogle Scholar
  15. 15.
    Pruess, M., Kersey, P., and Apweiler, R. (2005) The Integr8 project: a resource for genomic and proteomic data. In Silico Biol. 5, 179–185.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press Inc. 2007

Authors and Affiliations

  • Nicola Mulder
    • 1
  • Rolf Apweiler
    • 2
  1. 1.European Bioinformatics InstituteCambridgeUK
  2. 2.European Bioinformatics InstituteCambridgeUK

Personalised recommendations