Data Mining in Proteomics pp 123-145

Part of the Methods in Molecular Biology book series (MIMB, volume 696)

Tranche Distributed Repository and

  • Bryan E. Smith
  • James A. Hill
  • Mark A. Gjukich
  • Philip C. Andrews


Tranche is a distributed repository designed to redundantly store and disseminate data sets for the proteomics community. It has several important features for researchers, including support for large data files, prepublication access controls, licensing options, and ensuring both data provenance and integrity. Tranche tightly integrates with, an online community resource that offers a variety of useful tools for proteomics researchers, including project management and data annotation. In this chapter, we discuss the development of Tranche and, paying particular attention to why it is desirable that data be publicly available and unrestricted as well as the challenges facing data archiving and open access. We then provide a technical overview of Tranche and as well as step-by-step instructions for using these resources, including the graphical user interface (GUI ), command-line tools, and Application Programmer Interface (API). We end with a brief discussion of current and future development efforts and collaborations.


  1. 1.
    Falkner JA, Ulintz PJ, Andrews PC (2006) A code and data archival and dissemination tool for the proteomics community. Am Biotechnol Lab 38:28–30Google Scholar
  2. 2.
    Toronto International Data Release Workshop Authors (2009) Prepublication data sharing. Nature 461:168–170CrossRefGoogle Scholar
  3. 3.
    Schofield PN, Bubela T, Weaver T, Portilla L et al (2009) Post-publication sharing of data and tools. Nature 461:171–173CrossRefPubMedGoogle Scholar
  4. 4.
    Editorial (2009) Data’s shameful neglect. Nature 461:145Google Scholar
  5. 5.
    Salo D (2008) Innkeeper at the roach motel. Libr Trends 57:98–123CrossRefGoogle Scholar
  6. 6.
    Heidorn PB (2008) Shedding light on the dark data in the long tail of science. Libr Trends 57:280–299CrossRefGoogle Scholar
  7. 7.
    Wiley S (2009) Why don’t we share data? The Scientist 23:33Google Scholar
  8. 8.
    Deutsch EW, Lam H, Aebersold R (2008) PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EMBO Rep 9:429–434CrossRefPubMedGoogle Scholar
  9. 9.
    Craig R, Cortens JP, Beavis RC (2004) An open source system for analyzing, validating and storing protein identification data. Proteome Res 3:1234–1242CrossRefGoogle Scholar
  10. 10.
    Martens L, Hermjakob H, Jones P, Taylor C et al (2005) The PRoteomics IDEntification database. Proteomics 5:3537–3545CrossRefPubMedGoogle Scholar
  11. 11.
    Prasad TS, Goel R, Kandasamy K, Keerthikumar S et al (2009) Human Protein Reference Database – 2009 update. Nucleic Acids Res 37:D767–D772CrossRefGoogle Scholar
  12. 12.
    Slotta DJ, Barrett T, Edgar R (2009) NCBI Peptidome: a new public repository for mass spectrometry peptide identifications. Nat Biotechnol 27:600–601CrossRefPubMedGoogle Scholar
  13. 13.
    (2007) Publication guidelines for the analysis and documentation of peptide and protein identifications. Mol Cell Proteomics ( accessed on July 13 2009.
  14. 14.
    Editorial (2007) Democratizing proteomics data. Nat Biotechnol 25:262Google Scholar
  15. 15.
    (2008) Instructions to authors. Proteomics ( accessed on July 13 2009.
  16. 16.
    (2003) Final NIH statement on sharing research data. ( accessed on July 13 2009
  17. 17.
    Howe D, Costanzo M, Fey P, Gojobori T et al (2008) The future of biocuration. Nature 455:47–50CrossRefPubMedGoogle Scholar
  18. 18.
    Martin DB, Nelson PS (2001) From genomics to proteomics: techniques and applications in cancer research. Trends Cell Biol 11:61–65Google Scholar
  19. 19.
    Tyshenko MG (2005) Current trends in publicly available genetic databases. Health Inform J 11:295–308CrossRefGoogle Scholar
  20. 20.
    (2009) About CC0--“No Rights Reserved”. ( accessed on July 13 2009
  21. 21.
    Prince JT, Carlson MW, Wang R, Lu P, Marcotte EM (2004) The need for a public proteomics repository. Nat Biotechnol 22:471–472CrossRefPubMedGoogle Scholar
  22. 22.
  23. 23.
    Schweitzer MH, Suo Z, Avci R, Asara JM et al (2007) Analyses of soft tissue from Tyrannosaurus rex suggest the presence of protein. Science 316:277–280CrossRefPubMedGoogle Scholar
  24. 24.
    Schweitzer MH, Zheng W, Organ CL, Avci R et al (2009) Biomolecular characterization and protein sequences of the Campanian hadrosaur B. canadensis. Science 324:626–631CrossRefPubMedGoogle Scholar
  25. 25.
    Taylor CF, Paton NW, Lilley KS, Binz P et al (2007) The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 25:887–893CrossRefPubMedGoogle Scholar
  26. 26.
    Pedrioli PGA, Eng JK, Hubley R, Vogelzang M et al (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nat Biotechnol 22:1459–1466CrossRefPubMedGoogle Scholar
  27. 27.
    Hamacher M, Stephan C, Meyer HE, Eisenacher M (2009) Data handling and processing in proteomics. Expert Rev Proteomics 6, 217–219. (2006) The mzData Standard. ( Scholar
  28. 28.
    Orchard S, Taylor C, Hermjakob H, Zhu W et al (2004) Current status of proteomic standards development. Expert Rev Proteomics 1:179–183CrossRefPubMedGoogle Scholar
  29. 29.
    Deutsch E (2008) mzML: a single, unifying data format for mass spectrometer output. Proteomics 8:2776–2777CrossRefPubMedGoogle Scholar
  30. 30.
    Bayer R (1971) Binary B-trees for virtual memory. ACM-SIGFIDET Workshop 1971:219–235Google Scholar
  31. 31.
    Martens L, Deutsch E, Hermjakob H, Omenn G (2009) Proteomics data submission strategy for ProteomeExchange. (

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Bryan E. Smith
    • 1
  • James A. Hill
    • 1
  • Mark A. Gjukich
    • 1
  • Philip C. Andrews
    • 1
  1. 1.Departments of Biological Chemistry, Bioinformatics and ChemistryUniversity of MichiganAnn ArborUSA

Personalised recommendations