Advertisement

Using Semantic Web Tools to Integrate Experimental Measurement Data on Our Own Terms

  • M. Scott Marshall
  • Lennart Post
  • Marco Roos
  • Timo M. Breit
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4277)

Abstract

The -omics data revolution, galvanized by the development of the web, has resulted in large numbers of valuable public databases and repositories. Scientists wishing to employ this data for their research are faced with the question of how to approach data integration. Ad hoc solutions can result in diminished generality, interoperability, and reusability, as well as loss of data provenance. One of the promising notions that the Semantic Web brings to the life sciences is that experimental data can be described with relevant life science terms and concepts. Subsequent integration and analysis can then take advantage of those terms, exposing logic that might otherwise only be available from the interpretation of program code. In the context of a biological use case, we examine a general semantic web approach to integrating experimental measurement data with Semantic Web tools such as Protégé and Sesame. The approach to data integration that we define is based on the linking of data with OWL classes. The general pattern that we apply consists of 1) building application-specific ontologies for “myModel” 2) identifying the concepts involved in the biological hypothesis, 3) finding data instances of the concepts, 4) finding a common domain to be used for integration, and 5) integrating the data. Our experience with current tools indicates a few semantic web bottlenecks such as a general lack of ‘semantic disclosure’ from public data resources and the need for better ‘interval join’ performance from RDF query engines.

Keywords

Data Integration Ontology Alignment Experimental Measurement Data Approach Data Integration Rule Interchange Format 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Searls, D.B.: Data integration: challenges for drug discovery. Nat. Rev. Drug Discov. 4(1), 45–58 (2005)CrossRefGoogle Scholar
  2. 2.
    Stein, L.D.: Integrating biological databases. Nat. Rev. Genet. 4(5), 337–345 (2003)CrossRefGoogle Scholar
  3. 3.
    Strahl, B.D., Allis, C.D.: The language of covalent histone modifications. Nature 403(6765), 41–45 (2000)CrossRefGoogle Scholar
  4. 4.
  5. 5.
    Rule Interchange Format Working Group Charter, http://www.w3.org/2005/rules/wg/charter
  6. 6.
    SWBP&D WG Semantic Web Tutorials, http://www.w3.org/2001/sw/BestPractices/Tutorials
  7. 7.
    Smith, B., et al.: Relations in biomedical ontologies. Genome Biol. 6(5), R46 (2005)CrossRefGoogle Scholar
  8. 8.
    Ashburner, M., et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25(1), 25–29 (2000)Google Scholar
  9. 9.
    Ding, L., et al.: Swoogle: a search and metadata engine for the semantic web. In: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pp. 652–659. ACM Press, Washington (2004)CrossRefGoogle Scholar
  10. 10.
  11. 11.
    Knublauch, H., Dameron, O., Musen, M.A.: Weaving the Biomedical Semantic Web with the Protégé OWL Plugin. In: First International Workshop on Formal Biomedical Knowledge Representation (KR-MED 2004) (Whistler (BC, Canada)), pp. 33–47. American Medical Informatics Association (2004)Google Scholar
  12. 12.
  13. 13.
    Perini, L.: Explanation in Two Dimensions: Diagrams and Biological Explanation. Biology and Philosophy 20, 257–269 (2005)CrossRefGoogle Scholar
  14. 14.
    Gribskov, M.: Challenges in data management for functional genomics. Omics 7(1), 3–5 (2003)CrossRefGoogle Scholar
  15. 15.
    Kent, W.J., et al.: The human genome browser at UCSC. Genome Res. 12(6), 996–1006 (2002)MathSciNetGoogle Scholar
  16. 16.
    Kent, W.J.: BLAT–the BLAST-like alignment tool. Genome Res. 12(4), 656–664 (2002)MathSciNetGoogle Scholar
  17. 17.
    Cheung, K.H., et al.: YeastHub: a semantic web use case for integrating data in the life sciences domain. Bioinformatics 21(suppl. 1), i85–i96 (2005)CrossRefGoogle Scholar
  18. 18.
    Semantic Web for the life sciences discussion forum, http://lists.w3.org/Archives/Public/public-semweb-lifesci/
  19. 19.
    Navigate data with the Mapper framework, Build your own data mapping system with an interlingual approach, http://www.javaworld.com/javaworld/jw-04-2002/jw-0426-mapper.html
  20. 20.
  21. 21.
    Semantic Data Integration for Histone Use Case Website, http://integrativebioinformatics.nl/semanticdataintegration.html
  22. 22.
    Schubeler, D., et al.: The histone modification pattern of active genes revealed through genome-wide chromatin analysis of a higher eukaryote. Genes Dev. 18(11), 1263–1271 (2004)CrossRefGoogle Scholar
  23. 23.
  24. 24.
    Alink, W., et al.: Efficient XQuery Support for Stand-Off Annotation. In: Proceedings of International Workshop on XQuery Implementation, Experience and Perspectives (XIME-P) (Chicago, IL, USA) (2006)Google Scholar
  25. 25.
    Eckman, B., Rice, J., Schwarz, P.: Data management in molecular and cell biology: vision and recommendations. Omics 7(1), 93–97 (2003)CrossRefGoogle Scholar
  26. 26.
    Zdobnov, E.M., et al.: The EBI SRS server-new features. Bioinformatics 18(8), 1149–1150 (2002)CrossRefGoogle Scholar
  27. 27.
    Ritter, O., et al.: Prototype implementation of the integrated genomic database. Comput. Biomed. Res. 27(2), 97–115 (1994)CrossRefGoogle Scholar
  28. 28.
    Birkland, A., Yona, G.: BIOZON: a hub of heterogeneous biological data. Nucleic Acids Res. 34(Database issue), D235–242 (2006)CrossRefGoogle Scholar
  29. 29.
    Wilkinson, M., et al.: BioMOBY successfully integrates distributed heterogeneous bioinformatics Web Services. The PlaNet exemplar case. Plant Physiol. 138(1), 5–17 (2005)Google Scholar
  30. 30.
    Stevens, R.D., Robinson, A.J., Goble, C.A.: myGrid: personalised bioinformatics on the information grid. Bioinformatics 19(suppl. 1), i302–304 (2003)CrossRefGoogle Scholar
  31. 31.
    Ben Miled, Z., et al.: An efficient implementation of a drug candidate database. J. Chem. Inf. Comput. Sci. 43(1), 25–35 (2003)Google Scholar
  32. 32.
    Mork, P., Shaker, R., Tarczy-Hornoch, P.: The Multiple Roles of Ontologies in the BioMediator Data Integration System. In: Ludäscher, B., Raschid, L. (eds.) DILS 2005. LNCS (LNBI), vol. 3615, pp. 96–104. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  33. 33.
    Caragea, D., et al.: Algorithms and Software for Collaborative Discovery from Autonomous, Semantically Heterogeneous, Distributed Information Sources. In: Jain, S., Simon, H.U., Tomita, E. (eds.) ALT 2005. LNCS (LNAI), vol. 3734, pp. 13–44. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  34. 34.
  35. 35.
    public-semweb-lifesci forum message from Benjamin H. Szekely, http://www.w3.org/mid/OFC5D7E901.5F3825EB.ON85257169.0060CA27-85257169.006B0FEE.us.ibm.com

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • M. Scott Marshall
    • 1
  • Lennart Post
    • 1
    • 2
  • Marco Roos
    • 1
  • Timo M. Breit
    • 1
  1. 1.Integrative Bioinformatics Unit 
  2. 2.Nuclear Organisation Group Institute for Informatics, Swammerdam Institute for Life Sciences Faculty of ScienceUniversity of Amsterdam 

Personalised recommendations