Managing Data in High Throughput Laboratories: An Experience Report from Proteomics

  • Thodoros Topaloglou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4215)


Scientific laboratories are rich in data management challenges. This paper describes an end-to-end information management infrastructure for a high throughput proteomics industrial laboratory. A unique feature of the platform is a data and applications integration framework that is employed for the integration of heterogeneous data, applications and processes across the entire laboratory production workflow. We also define a reference architecture for implementing similar solutions organized according to the laboratory data lifecycle phases. Each phase is modeled by a set of workflows integrating programs and databases in sequences of steps and associated communication and data transfers. We discuss the issues associated with each phase, and describe how these issues were approached in the proteomics implementation.


Analysis Database Sequence Assignment Laboratory Information Management System Complex Biological Sample Integrate Microbial Genome 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Brazma, A., Hingamp, P., et al.: Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nature Genetics 29, 365–371 (2001)CrossRefGoogle Scholar
  2. 2.
    Spellman, P., Miller, M., et al.: Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biology 3(9) (2002)Google Scholar
  3. 3.
    Orchard, S., Hermjakob, H., Binz, P.A., Hoogland, C., Taylor, C.F., Zhu, W., Julian Jr., R.K., Apweiler, R.: Further steps towards data standardisation: the Proteomic Standards Initiative. Proteomics 5(2), 337–339 (2005)CrossRefGoogle Scholar
  4. 4.
    Goble, C., Wroe, C., Stevens, R.: The myGrid consortium: The myGrid Project: Services, Architecture and Demonstrator. In: Proc UK e-Science programme All Hands Conference, pp. 595–603 (2003)Google Scholar
  5. 5.
    Leser, U., Naumann, F.: (Almost) Hands-Off Information Integration for the Life Sciences. In: CIDR (2005)Google Scholar
  6. 6.
    Etzold, T., Harris, H., Beaulah, S.: SRS: An Integration Platform for Databanks and Analysis Tools in Bioinformatics. In: Lacroix, Z., Chrichlow, T. (eds.) Bioinformatics: Managing scientific data. Morgan Kaufmann, San Francisco (2003)Google Scholar
  7. 7.
    Markowitz, V.M., Korzeniewski, F., Palaniappan, K., Szeto, E., Ivanova, N., Kyrpides, N.C.: The integrated microbial genomes (IMG) system: a case study in biological data management. In: VLDB 2005 (2005)Google Scholar
  8. 8.
    Hsu, F., et al.: The UCSC Proteome Browser. Nucleic Acids Res. 33(Database issue), D454–D458 (2005)CrossRefGoogle Scholar
  9. 9.
    Boguski, M.S., McIntosh, M.W.: Biomedical informatics for proteomics. Nature 422, 233–237 (2003)CrossRefGoogle Scholar
  10. 10.
    Searls, D.: Data Integration challenges in drug discovery. Nature Reviews. Drug Discovery 4(1), 45–58 (2005)CrossRefGoogle Scholar
  11. 11.
    Markowitz, V., Campbell, J., Chen, A., Kosky, A., Palaniapan, K., Topaloglou, T.: Integration Challenges in Gene Expression Data Management. In: Lacroix, Z., Chrichlow, T. (eds.) Bioinformatics: Managing Scientific Data. Morgan Kaufmann, San Francisco (2003)Google Scholar
  12. 12.
    Tyers, M., Mann, M.: From genomics to proteomics. Nature 422(6928), 193–197 (2003)CrossRefGoogle Scholar
  13. 13.
    Aebersold, R., Mann, M.: Mass spectromentry-based proteomics. Nature 422, 198–207 (2003)CrossRefGoogle Scholar
  14. 14.
    Greenwood, M., Goble, C., Stevens, R., Zhao, J., Addis, M., Marvin, D., Moreau, L., Oinn, T.: Provenance of e-Science Experiments –experience from Bioinformatics. In: Proceedings of the UK e-Science 2nd All Hands Meeting (2003)Google Scholar
  15. 15.
    Pedrioli, P.G., Eng, J.K., et al.: A common open representation of mass spectrometry data and its application to proteomics research. Nature Biotechnology 22(11), 1459–1466 (2004)CrossRefGoogle Scholar
  16. 16.
    FDA. Guidance for Industry: Part 11, Electronic Records; Electronic Signatures: Scope and Application (2003),
  17. 17.
    Yang, X., Dondeti, V., et al.: DBParser: web-based software for shotgun proteomic data analyses. J. Proteome Research 3(5), 1002–1008 (2004)CrossRefGoogle Scholar
  18. 18.
    Topaloglou, T.: Biological Data Management: Research, Practice and Opportunities. In: VLDB (2004)Google Scholar
  19. 19.
    Markowitz, V., Topaloglou, T.: Applying Data Warehouse Concepts to Gene Expression Data Management. In: 2nd IEEE International Synposium in Bioinformatics and Bioengineering (BIBE) (2001)Google Scholar
  20. 20.
    Soldatova, L.N., King, R.D.: Are the current ontologies in biology good ontologies? Nature Biotechnology 23, 1095–1098 (2005)CrossRefGoogle Scholar
  21. 21.
    Topaloglou, T., Kosky, A., Markowitz, V.: Seamless Intergation of Biological Applications within a Database Framework. In: ISMB (1999)Google Scholar
  22. 22.
    Franklin, M., Halevy, A., Maier, D.: From Databases to Dataspaces: A new abstraction for information management. SIGMOD Record 34(4) (2005)Google Scholar
  23. 23.
    Gray, J., Liu, D.T., Nieto-Santisteban, M., Szalay, A., DeWitt, D., Heber, G.: Scientific Data Management in the Coming Decade. SIGMOD Record 34(4) (2005)Google Scholar
  24. 24.
    Jagadish, H.V., Olken, F.: Database management for life sciences research. SIGMOD Record 33(2) (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Thodoros Topaloglou
    • 1
  1. 1.Information Engineering, Department of Mechanical and Industrial EngineeringUniversity of TorontoToronto, Ontario

Personalised recommendations