Skip to main content

Semantic Web Approach to Database Integration in the Life Sciences

  • Chapter
Semantic Web

Abstract

This chapter describes the challenges involved in the integration of databases storing diverse but related types of life sciences data. A major challenge in this regard is the syntactic and semantic heterogeneity of life sciences databases. There is a strong need for standardizing the syntactic and semantic data representations. We discuss how to address this by using the emerging Semantic Web technologies based on the Resource Description Framework (RDF) standard. This chapter presents two use cases, namely YeastHub and LinkHub, which demonstrate how to use the latest RDF database technology to build data warehouses that facilitate integration of genomic/proteomic data and identifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cantor C.R. Orchestrating the Human Genome Project. Science. 248: 49–51, 1990.

    Article  PubMed  CAS  Google Scholar 

  2. Berners-Lee T., Cailliau R., Luotonen A., Nielsen H. F., and Secret A. The World-Wide Web. ACM Communications. 37(3): 76–82, 1994.

    Article  Google Scholar 

  3. Benson D. A., Boguski M. S., Lipman D. J., and Ostell J. GenBank. Nucleic Acids Research. 25(1): 1–6, 1997.

    Article  PubMed  CAS  Google Scholar 

  4. Gollub J., Ball C, Binkley G., Demeter J., Finkelstein D., Hebert J., Hernandez-Boussard T., Jin H., Kaloper M., Matese J., et al. The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Research. 31(1): 94–6, 2003.

    Article  PubMed  CAS  Google Scholar 

  5. Edgar R., Domrachev M, and Lash A. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research. 30(1): 207–10, 2002.

    Article  PubMed  CAS  Google Scholar 

  6. Bader G. D., Betel D., and Hogue C.W.V. BIND: the Biomolecular Interaction Network Database. Nucl. Acids Res. 31(1): 248–250, 2003.

    Article  PubMed  CAS  Google Scholar 

  7. Peri S., Navarro J., Kristiansen T., Amanchy R., Surendranath V., Muthusamy B., Gandhi T., Chandrika K., Deshpande N., Suresh S., et al. Human protein reference database as a discovery resource for proteomics. Nucl. Acids. Res. 32: D497–501, 2004.

    Article  PubMed  CAS  Google Scholar 

  8. Joshi-Tope G., Gillespie M., Vastrik I., D’Eustachio P., Schmidt E., de Bono B., Jassal B., Gopinath G.R., Wu G.R., Matthews L., et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33 (Database issue): D428–32, 2005.

    Article  PubMed  CAS  Google Scholar 

  9. Hill A. and Kim H. The UAP Proteomics Database. Bioinformatics. 19(16): 2149–51, 2003.

    Article  PubMed  CAS  Google Scholar 

  10. Desiere F., Deutsch E. W., King N. L., Nesvizhskii A. I., Mallick P., Eng J., Chen S., Eddes J., Loevenich S. N., and Aebersold R. The PeptideAtlas project. Nucl. Acids. Res. 34 (Database Issue): D655–8, 2006.

    Article  PubMed  CAS  Google Scholar 

  11. Dwight S. S., Harris M. A., Dolinski K., Ball C. A., Binkley G., Christie K. R., Fisk D.G., Issel-Tarver L., Schroeder M, Sherlock G., et al. Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucl. Acids. Res. 30(1): 69–72, 2002.

    Article  PubMed  CAS  Google Scholar 

  12. Blake J. A., Eppig J. T., Bult C. J., Kadin J. A., and Richardson J. E. The Mouse Genome Database (MGD): updates and enhancements. Nucl. Acids. Res. 34 (Database Issue): D562–7, 2006.

    Article  PubMed  CAS  Google Scholar 

  13. Ashburner M., Ball C, Blake J., Botstein D., Butler H., Cherry M., Davis A., Dolinski K., Dwight S., Eppig J., et al. Gene ontology: tool for the unification of biology. Nature Genetics. 25: 25–29, 2000.

    Article  PubMed  CAS  Google Scholar 

  14. Apweiler R., Bairoch A., Wu C. H., Barker W. C, Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., et al. UniProt: the Universal Protein knowledgebase. Nucl. Acids Res. 32(90001): D115–119, 2004.

    Article  PubMed  CAS  Google Scholar 

  15. Bateman A., Birney E., Cerruti L., Durbin R., Etwiller L., Eddy S., Griffiths-Jones S., Howe K., Marshall M., and Sonnhammer E. The Pfam Protein Families Database. Nucleic Acids Research. 30(1), 2002.

    Google Scholar 

  16. Cheung K., Nadkarni P., Silverstein S., Kidd J., Pakstis A., Miller P., and Kidd K. PhenoDB: an integrated client/server database for linkage and population genetics. Comput Biomed Res. 29(4): 327–37, 1996.

    Article  PubMed  CAS  Google Scholar 

  17. Shannon W., Culverhouse R., and Duncan J. Analyzing microarray data using cluster analysis. Pharmacogenomics. 4(1): 41–51, 2003.

    Article  PubMed  CAS  Google Scholar 

  18. Manduchi E., Grant G.R., He H., Liu J., Mailman M. D., Pizarro A. D., Whetzel P. L., and Stoeckert C. J. RAD and the RAD Study-Annotator: an approach to collection, organization and exchange of all relevant information for high-throughput gene expression studies. Bioinformatics. 20(4): 452–9, 2004.

    Article  PubMed  CAS  Google Scholar 

  19. Sujansky W. Heterogeneous database integration in biomedicine. Journal of Biomedical Informatics. 34: 285–98, 2001.

    Article  PubMed  CAS  Google Scholar 

  20. Buneman P., Davidson S., Hart K., Overton C, and Wong L., A Data Transformation System for Biological Data Sources. in Proc. 21st Int. Conf. VLDB. 158–169, 1995.

    Google Scholar 

  21. Lee T.J., Pouliot Y., Wagner V., Gupta P., Stringer-Calvert D.W., Tenenbaum J.D., and Karp P.D. Bio Warehouse: a bioinformatics database warehouse toolkit. Bioinformatics. 7: 170, 2006.

    Article  PubMed  CAS  Google Scholar 

  22. Birkland A. and Yona G. BIOZON: a hub of heterogeneous biological data. Nucl. Acids. Res. 34 (Database Issue): D235–42, 2006.

    Article  PubMed  CAS  Google Scholar 

  23. Critchlow T., Fidelis K., Ganesh M., Musick R., and Slezak T. DataFoundry: information management for scientific data. IEEE Trans Inf Technol Biomed. 4(1): 52–7, 2000.

    Article  PubMed  CAS  Google Scholar 

  24. Sheth A. and Larson J. Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Comput. Surveys. 22(3): 183–236, 1990.

    Article  Google Scholar 

  25. Kolatkar P.R., Sakharkar M.K., Tse C. R., Kiong B. K., Wong L., Tan T.W., and Subbiah S. Development of software tools at Bioinformatics Centre (BIC) at the National University of Singapore (NUS). in Pac. Symp. Biocomputing. Honolulu, Haiwaii 735–46, 1998.

    Google Scholar 

  26. Haas L. M., Schwarz P. M., Kodali P., Kotlar E., Rice J.E., and Swope W.C. DiscoveryLink: A system for integrated access to life sciences data sources. IBM Systems Journal. 40(2): 489–511, 2001.

    Article  Google Scholar 

  27. Marenco L., Wang T.Y., Shepherd G., Miller P.L., and Nadkarni P. QIS: A framework for biomedical database federation. J Am Med Inform Assoc. 11(6): 523–34, 2004.

    Article  PubMed  Google Scholar 

  28. Berners-Lee T., Hendler J., and Lassila O. The Semantic Web. Scientific American. 284(5): 34–43, 2001.

    Google Scholar 

  29. Wang X., Gorlitsky R., and Almeida, J. S. From XML to RDF: how Semantic Web technologies will change the design of ‘omic’ standards. Nat Biotechnol. 23(9): 1099–103, 2005.

    Article  PubMed  CAS  Google Scholar 

  30. Hucka M., Finney A., Sauro H., Bolouri H., Doyle J., Kitano H., Arkin A., Bornstein B., Bray D., Cornish-Bowden A., et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 19(4): 524–31, 2005.

    Article  Google Scholar 

  31. Hermjakob H., Montecchi-Palazzi L., Bader G., Wojcik J., Salwinski L., Ceol A., Moore S., Orchard S., Sarkans U., Mering C. V., et al. The HUPO PSI’s Molecular Interaction format—a community standard for the representation of protein interaction data Nature Biotechnology. 22: 177–83, 2004.

    Article  PubMed  CAS  Google Scholar 

  32. Goldbeck J., Fragoso G., Hartel F., Hendler J., Parsia B., and Oberthaler J. The National Cancer Institute’s Thesaurus and Ontology. Journal of Web Semantics. 1(1), 2003.

    Google Scholar 

  33. Cheung K.-H., Yip K.Y., Smith A., deKnikker R., Masiar A., and Gerstein M. YeastHub: a Semantic Web use case for integrating data in the life sciences domain. Bioinformatics. 21(suppl_1): i85–96, 2005.

    Article  PubMed  CAS  Google Scholar 

  34. Neumann E.K. and Quan D. Biodash: A Semantic Web Dashboard for Drug Development. in Pacific Symposium on Biocomputing. 176–87, 2006.

    Google Scholar 

  35. Donis-Keller H., Green P., Helms C, Cartinhour S., Weiffenbach B., Stephens K., Keith T., Bowden D., Smith D., Lander E., et al. A Genetic Linkage Map of the Human Genome. Cell. 51: 319–337, 1987.

    Article  PubMed  CAS  Google Scholar 

  36. Baader F., Calvanese D., McGuinness D., Nardi D., and Patel-Schneider P. The Description Logic Handbook. Cambridge University Press, 2002.

    Google Scholar 

  37. Luciano J. S. PAX of mind for pathway researchers. Drug Discov Today. 10(13): 937–42, 2005.

    Article  PubMed  CAS  Google Scholar 

  38. Romero P., Wagg J., Green M., Kaiser D., Krummenacker M., and Karp P. Computational prediction of human metabolic pathways from the complete human genome. Genome Biol. 6(1): R2, 2004.

    Article  PubMed  Google Scholar 

  39. Baker C.J.O., Shaban-Nejad A., Su X., Haarslev V., and Butler G. Infrastructure for Fungal Enzyme Biotechnologists. Journal of Web Semantics. 4(3), 2006.

    Google Scholar 

  40. Golbreich C, Zhang S., Bodenreider O. The Foundational Model of Anatomy in OWL. Journal of Web Semantics. 4(3), 2006.

    Google Scholar 

  41. Kumar A., Cheung K.-H., Tosches N., Masiar P., Liu Y., Miller P., and Snyder M. The TRIPLES database: A Community Resource for Yeast Molecular Biology. Nucl. Acids. Res. 30(1): 73–75, 2002.

    Article  PubMed  CAS  Google Scholar 

  42. Chen H., Wu Z., Wang H., and Mao Y. RDF/RDFS-based Relational Database Integration. in ICDE, Atlanta, Georgia, in press, 2006.

    Google Scholar 

  43. Stephens S., Morales A., and Quinian M. Applying Semantic Web Technologies to Drug Safety Determination. IEEE Intelligent Systems. 21(1): 82–6, 2006.

    Article  Google Scholar 

  44. Miller R., Ioannidis Y., and Ramakrishnan R. Schema Equivalence in Heterogeneous Systems: Bridging Theory and Practice. Inf. Sys. 19(1): 3–31, 1994.

    Article  Google Scholar 

  45. Haarslev V., Moeller R., and Wessel M. Querying the Semantic Web with Racer + nRQL. in Proceedings of the KI-04 Workshop on Applications of Description Logics. Ulm, Germany: Deutsche Bibliothek, 2004.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Cheung, KH., Smith, A.K., Yip, K.Y.L., Baker, C.J.O., Gerstein, M.B. (2007). Semantic Web Approach to Database Integration in the Life Sciences. In: Baker, C.J.O., Cheung, KH. (eds) Semantic Web. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-48438-9_2

Download citation

Publish with us

Policies and ethics