An Identity Crisis in the Life Sciences

  • Jun Zhao
  • Carole Goble
  • Robert Stevens
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4145)


myGrid is an e-Science project assisting life scientists to build workflows that gather data from distributed, autonomous, replicated and heterogeneous resources. The provenance logs of workflow executions are recorded as RDF graphs. The log of one workflow run is used to trace the history of its execution process. However, by aggregating provenance logs of many workflow runs, one may gather the provenance of a common data product shared in multiple derivation paths. A successful aggregation relies on accurate and universal identification of each data product. The nature of bioinformatics data and services, however, makes this difficult. We describe the identity problem in bioinformatics data, and present a protocol for managing identity co-references and allocating identity to gathered and computed data products. The ability to overcome this problem means that the provenance of workflows in bioinformatics and other domains can be exploited to enhance the practice of e-Science.


Data Product Resource Description Framework Identity Protocol Identity Service Identity Crisis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Stevens, R., Tipney, H.J., Wroe, C., Oinn, T., Senger, M., Lord, P., Goble, C., Brass, A., Tassabehji, M.: Exploring Williams-Beuren Syndrome Using myGrid. Bioinformatics 20, 303–310 (2004)CrossRefGoogle Scholar
  2. 2.
    Zhao, J., Wroe, C., Goble, C., Stevens, R., Quan, D., Greenwood, M.: Using semantic web technologies for representing e-science provenance. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 92–106. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Li, P., Hayward, K., Jennings, C., Owen, K., Oinn, T., Stevens, R., Pearce, S., Wipat, A.: Association of variations on i kappa b-epsilon with graves’ disease using classical and mygrid methodologies. In: Proc. of the UK e-Science AHM (2004)Google Scholar
  4. 4.
    Oinn, T., Greenwood, M., Addis, M., Ferris, J., Glover, K., Goble, C., Hull, D., Marvin, D., Li, P., Lord, P., Pocock, M.R., Senger, M., Wipat, A., Wroe, C.: Taverna: Lessons in creating a workflow environment for the life sciences. Journal of Concurrency and Computation: Practice and Experience (in press, 2005)Google Scholar
  5. 5.
    Weisstein, E.W.: (Graph union),
  6. 6.
    Weisstein, E.W.: (Graph difference),
  7. 7.
    Clark, T., Martin, S., Liefeld, T.: Globally distributed object identification for biological knowledgebases. Briefings in Bioinformatics 5, 59–70 (2004)CrossRefGoogle Scholar
  8. 8.
    Martin, S., Hohman, M.M., Liefeld, T.: The impact of life science identifier on informatics data. Drug Discovovery Today 10, 1566–1572 (2005)CrossRefGoogle Scholar
  9. 9.
    Dalziel, J.: DOI in a DRM environment. White paper, Macquarie University (2004)Google Scholar
  10. 10.
    Altschul, S., Gish, W., Miller, M., Myers, E., Lipman, D.: Basic local alignment search tool. Journal of Molecular Biology 215, 403–410 (1990)Google Scholar
  11. 11.
    Galperin, M.Y.: The molecular biology database collection: 2006 update. Nucl. Acids Res. 34, 3–5 (2006)CrossRefGoogle Scholar
  12. 12.
    Carroll, J., Bizer, C., Hayes, P., Stickler, P.: Named graphs. Journal of Web Semantics 3 (2005)Google Scholar
  13. 13.
    Pruitt, K.D., Maglott, D.R.: Refseq and locuslink: Ncbi gene-centered resources. Nucleic Acids Research 29, 137–140 (2001)CrossRefGoogle Scholar
  14. 14.
    Kahn, R., Wilensky, R.: A framework for distributed digital object services. Technical Report tn95-01, Macquarie University (1995)Google Scholar
  15. 15.
    Groth, P.T., Luck, M., Moreau, L.: A protocol for recording provenance in service-oriented grids. In: Higashino, T. (ed.) OPODIS 2004. LNCS, vol. 3544, pp. 124–139. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Foster, I., Vockler, J., Wilde, M., Zhao, Y.: The virtual data grid: A new model and architecture for data-intensive collaboration. In: Proc. of the First Biennal Conference on Innovative Data System Research (2003)Google Scholar
  17. 17.
    Futrelle, J.: Harvesting rdf triples. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 64–72. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Bavoil, L., Callahan, S.P., Crossno, P.J., Freire, J., Scheidegger, C.E., Silva, C.T., Vo, H.T.: Vistrails: Enabling interactive multiple-view visualizations. In: Proc. of IEEE Visualization, pp. 135–142 (2005)Google Scholar
  19. 19.
    Arctur, D.K., Hair, D., Timson, G., Martin, E.P., Fegeas, R.: Issues and prospects for the next generation of the spatial data transfer standard (SDTS). International Journal of Geographical Information Science 12, 403–425 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jun Zhao
    • 1
  • Carole Goble
    • 1
  • Robert Stevens
    • 1
  1. 1.School of Computer ScienceUniversity of ManchesterU.K.

Personalised recommendations