Abstract
This chapter describes the challenges involved in the integration of databases storing diverse but related types of life sciences data. A major challenge in this regard is the syntactic and semantic heterogeneity of life sciences databases. There is a strong need for standardizing the syntactic and semantic data representations. We discuss how to address this by using the emerging Semantic Web technologies based on the Resource Description Framework (RDF) standard. This chapter presents two use cases, namely YeastHub and LinkHub, which demonstrate how to use the latest RDF database technology to build data warehouses that facilitate integration of genomic/proteomic data and identifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cantor C.R. Orchestrating the Human Genome Project. Science. 248: 49–51, 1990.
Berners-Lee T., Cailliau R., Luotonen A., Nielsen H. F., and Secret A. The World-Wide Web. ACM Communications. 37(3): 76–82, 1994.
Benson D. A., Boguski M. S., Lipman D. J., and Ostell J. GenBank. Nucleic Acids Research. 25(1): 1–6, 1997.
Gollub J., Ball C, Binkley G., Demeter J., Finkelstein D., Hebert J., Hernandez-Boussard T., Jin H., Kaloper M., Matese J., et al. The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Research. 31(1): 94–6, 2003.
Edgar R., Domrachev M, and Lash A. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Research. 30(1): 207–10, 2002.
Bader G. D., Betel D., and Hogue C.W.V. BIND: the Biomolecular Interaction Network Database. Nucl. Acids Res. 31(1): 248–250, 2003.
Peri S., Navarro J., Kristiansen T., Amanchy R., Surendranath V., Muthusamy B., Gandhi T., Chandrika K., Deshpande N., Suresh S., et al. Human protein reference database as a discovery resource for proteomics. Nucl. Acids. Res. 32: D497–501, 2004.
Joshi-Tope G., Gillespie M., Vastrik I., D’Eustachio P., Schmidt E., de Bono B., Jassal B., Gopinath G.R., Wu G.R., Matthews L., et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33 (Database issue): D428–32, 2005.
Hill A. and Kim H. The UAP Proteomics Database. Bioinformatics. 19(16): 2149–51, 2003.
Desiere F., Deutsch E. W., King N. L., Nesvizhskii A. I., Mallick P., Eng J., Chen S., Eddes J., Loevenich S. N., and Aebersold R. The PeptideAtlas project. Nucl. Acids. Res. 34 (Database Issue): D655–8, 2006.
Dwight S. S., Harris M. A., Dolinski K., Ball C. A., Binkley G., Christie K. R., Fisk D.G., Issel-Tarver L., Schroeder M, Sherlock G., et al. Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucl. Acids. Res. 30(1): 69–72, 2002.
Blake J. A., Eppig J. T., Bult C. J., Kadin J. A., and Richardson J. E. The Mouse Genome Database (MGD): updates and enhancements. Nucl. Acids. Res. 34 (Database Issue): D562–7, 2006.
Ashburner M., Ball C, Blake J., Botstein D., Butler H., Cherry M., Davis A., Dolinski K., Dwight S., Eppig J., et al. Gene ontology: tool for the unification of biology. Nature Genetics. 25: 25–29, 2000.
Apweiler R., Bairoch A., Wu C. H., Barker W. C, Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., et al. UniProt: the Universal Protein knowledgebase. Nucl. Acids Res. 32(90001): D115–119, 2004.
Bateman A., Birney E., Cerruti L., Durbin R., Etwiller L., Eddy S., Griffiths-Jones S., Howe K., Marshall M., and Sonnhammer E. The Pfam Protein Families Database. Nucleic Acids Research. 30(1), 2002.
Cheung K., Nadkarni P., Silverstein S., Kidd J., Pakstis A., Miller P., and Kidd K. PhenoDB: an integrated client/server database for linkage and population genetics. Comput Biomed Res. 29(4): 327–37, 1996.
Shannon W., Culverhouse R., and Duncan J. Analyzing microarray data using cluster analysis. Pharmacogenomics. 4(1): 41–51, 2003.
Manduchi E., Grant G.R., He H., Liu J., Mailman M. D., Pizarro A. D., Whetzel P. L., and Stoeckert C. J. RAD and the RAD Study-Annotator: an approach to collection, organization and exchange of all relevant information for high-throughput gene expression studies. Bioinformatics. 20(4): 452–9, 2004.
Sujansky W. Heterogeneous database integration in biomedicine. Journal of Biomedical Informatics. 34: 285–98, 2001.
Buneman P., Davidson S., Hart K., Overton C, and Wong L., A Data Transformation System for Biological Data Sources. in Proc. 21st Int. Conf. VLDB. 158–169, 1995.
Lee T.J., Pouliot Y., Wagner V., Gupta P., Stringer-Calvert D.W., Tenenbaum J.D., and Karp P.D. Bio Warehouse: a bioinformatics database warehouse toolkit. Bioinformatics. 7: 170, 2006.
Birkland A. and Yona G. BIOZON: a hub of heterogeneous biological data. Nucl. Acids. Res. 34 (Database Issue): D235–42, 2006.
Critchlow T., Fidelis K., Ganesh M., Musick R., and Slezak T. DataFoundry: information management for scientific data. IEEE Trans Inf Technol Biomed. 4(1): 52–7, 2000.
Sheth A. and Larson J. Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Comput. Surveys. 22(3): 183–236, 1990.
Kolatkar P.R., Sakharkar M.K., Tse C. R., Kiong B. K., Wong L., Tan T.W., and Subbiah S. Development of software tools at Bioinformatics Centre (BIC) at the National University of Singapore (NUS). in Pac. Symp. Biocomputing. Honolulu, Haiwaii 735–46, 1998.
Haas L. M., Schwarz P. M., Kodali P., Kotlar E., Rice J.E., and Swope W.C. DiscoveryLink: A system for integrated access to life sciences data sources. IBM Systems Journal. 40(2): 489–511, 2001.
Marenco L., Wang T.Y., Shepherd G., Miller P.L., and Nadkarni P. QIS: A framework for biomedical database federation. J Am Med Inform Assoc. 11(6): 523–34, 2004.
Berners-Lee T., Hendler J., and Lassila O. The Semantic Web. Scientific American. 284(5): 34–43, 2001.
Wang X., Gorlitsky R., and Almeida, J. S. From XML to RDF: how Semantic Web technologies will change the design of ‘omic’ standards. Nat Biotechnol. 23(9): 1099–103, 2005.
Hucka M., Finney A., Sauro H., Bolouri H., Doyle J., Kitano H., Arkin A., Bornstein B., Bray D., Cornish-Bowden A., et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 19(4): 524–31, 2005.
Hermjakob H., Montecchi-Palazzi L., Bader G., Wojcik J., Salwinski L., Ceol A., Moore S., Orchard S., Sarkans U., Mering C. V., et al. The HUPO PSI’s Molecular Interaction format—a community standard for the representation of protein interaction data Nature Biotechnology. 22: 177–83, 2004.
Goldbeck J., Fragoso G., Hartel F., Hendler J., Parsia B., and Oberthaler J. The National Cancer Institute’s Thesaurus and Ontology. Journal of Web Semantics. 1(1), 2003.
Cheung K.-H., Yip K.Y., Smith A., deKnikker R., Masiar A., and Gerstein M. YeastHub: a Semantic Web use case for integrating data in the life sciences domain. Bioinformatics. 21(suppl_1): i85–96, 2005.
Neumann E.K. and Quan D. Biodash: A Semantic Web Dashboard for Drug Development. in Pacific Symposium on Biocomputing. 176–87, 2006.
Donis-Keller H., Green P., Helms C, Cartinhour S., Weiffenbach B., Stephens K., Keith T., Bowden D., Smith D., Lander E., et al. A Genetic Linkage Map of the Human Genome. Cell. 51: 319–337, 1987.
Baader F., Calvanese D., McGuinness D., Nardi D., and Patel-Schneider P. The Description Logic Handbook. Cambridge University Press, 2002.
Luciano J. S. PAX of mind for pathway researchers. Drug Discov Today. 10(13): 937–42, 2005.
Romero P., Wagg J., Green M., Kaiser D., Krummenacker M., and Karp P. Computational prediction of human metabolic pathways from the complete human genome. Genome Biol. 6(1): R2, 2004.
Baker C.J.O., Shaban-Nejad A., Su X., Haarslev V., and Butler G. Infrastructure for Fungal Enzyme Biotechnologists. Journal of Web Semantics. 4(3), 2006.
Golbreich C, Zhang S., Bodenreider O. The Foundational Model of Anatomy in OWL. Journal of Web Semantics. 4(3), 2006.
Kumar A., Cheung K.-H., Tosches N., Masiar P., Liu Y., Miller P., and Snyder M. The TRIPLES database: A Community Resource for Yeast Molecular Biology. Nucl. Acids. Res. 30(1): 73–75, 2002.
Chen H., Wu Z., Wang H., and Mao Y. RDF/RDFS-based Relational Database Integration. in ICDE, Atlanta, Georgia, in press, 2006.
Stephens S., Morales A., and Quinian M. Applying Semantic Web Technologies to Drug Safety Determination. IEEE Intelligent Systems. 21(1): 82–6, 2006.
Miller R., Ioannidis Y., and Ramakrishnan R. Schema Equivalence in Heterogeneous Systems: Bridging Theory and Practice. Inf. Sys. 19(1): 3–31, 1994.
Haarslev V., Moeller R., and Wessel M. Querying the Semantic Web with Racer + nRQL. in Proceedings of the KI-04 Workshop on Applications of Description Logics. Ulm, Germany: Deutsche Bibliothek, 2004.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Cheung, KH., Smith, A.K., Yip, K.Y.L., Baker, C.J.O., Gerstein, M.B. (2007). Semantic Web Approach to Database Integration in the Life Sciences. In: Baker, C.J.O., Cheung, KH. (eds) Semantic Web. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-48438-9_2
Download citation
DOI: https://doi.org/10.1007/978-0-387-48438-9_2
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-48436-5
Online ISBN: 978-0-387-48438-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)