Abstract
We present the GeneTegra system, an ontology-based information integration environment. We show its ability to query multiple data sources, and we evaluate the relative performance of different data repositories. GeneTegra uses Semantic Web standards to resolve the semantic and syntactic diversity of the large and increasingly complex body of publicly available data. GeneTegra contains mechanisms to create ontology models of data sources using the OWL 2 Web Ontology Language, and to define, plan, and execute queries against these models using the SPARQL query language. Data source formats supported include relational databases and XML and RDF data sources. Experimental results have been obtained to show that GeneTegra obtains equivalent results from different data repositories containing the same data, illustrating the ability of the methods proposed in querying heterogeneous sources using the same modeling paradigm.
Keywords
- Data integration
- ontology
- Semantic Web
- SPARQL
- OWL
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Collins, F.S., Green, E.D., Guttmacher, A.E., Guyer, M.S.: A vision for the future of genomics research. Nature 422(6934), 835–847 (2003)
Galperin, M.Y., Cochrane, G.R.: The 2011 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection. Nucleic Acids Research 39(Database), D1–D6 (2010)
Brazhnik, O., Jones, J.F.: Anatomy of Data Integration. J. Biomed. Inform. 40(3), 252–269 (2007)
Neumann, E.: A Life Science Semantic Web: Are We There Yet? Sci. STKE 2005(283), pe22 (2005)
Bernstein, P.A., Haas, L.M.: Information integration in the enterprise. Commun. ACM 51, 72–79 (2008)
Roddick, J.F., de Vries, D.: Reduce, Reuse, Recycle: Practical Approaches to Schema Integration, Evolution and Versioning. In: Roddick, J.F., Benjamins, V.R., Si-said Cherfi, S., Chiang, R., Claramunt, C., Elmasri, R.A., Grandi, F., Han, H., Hepp, M., Lytras, M.D., Mišić, V.B., Poels, G., Song, I.-Y., Trujillo, J., Vangenot, C. (eds.) ER Workshops 2006. LNCS, vol. 4231, pp. 209–216. Springer, Heidelberg (2006)
Hepp, M., Leenheer, P.D., de Moor, A.: Ontology management: semantic web, semantic web services, and business applications. Springer (2007)
Dillon, T., Chang, E., Hadzic, M.: Ontology Support for Biomedical Information Resources. In: 21st IEEE International Symposium on Computer-Based Medical Systems, CBMS 2008, pp. 7–16 (2008)
Bodenreider, O.: Biomedical ontologies in action: role in knowledge management, data integration and decision support. Yearb. Med. Inform., 67–79 (2008)
Goble, C., Stevens, R.: State of the nation in data integration for bioinformatics. J. Biomed. Inform. 41(5), 687–693 (2008)
Prud’hommeaux, E., Deus, H., Marshall, M.S.: Tutorial: Query Federation with SWObjects. Nature Precedings, http://precedings.nature.com/documents/5538/version/1
IO Informatics. Sentient Products Overview, http://www.io-informatics.com/products/index.html
McCusker, J.P., Phillips, J.A., González Beltrán, A., Finkelstein, A., Krauthammer, M.: Semantic web data warehousing for caGrid. BMC Bioinformatics (10 suppl. 10),S2 (2009)
Shironoshita, E.P., Jean-Mary, Y.R., Bradley, R.M., Kabuka, M.R.: semCDI: a query formulation for semantic data integration in caBIG. J. Am. Med. Inform. Assoc. 15(4), 559–568 (2008)
Shironoshita, E.P., Bradley, R.M., Jean-Mary, Y.R., Taylor, T.J., Ryan, M.T., Kabuka, M.R.: Semantic Representation and Querying of caBIG Data Services. In: Bairoch, A., Cohen-Boulakia, S., Froidevaux, C. (eds.) DILS 2008. LNCS (LNBI), vol. 5109, pp. 108–115. Springer, Heidelberg (2008)
Knutsen, T., Gobu, V., Knaus, R., Padilla-Nash, H., Augustus, M., Strausberg, R.L., et al.: The interactive online SKY/M-FISH & CGH database and the Entrez cancer chromosomes search database: linkage of chromosomal aberrations with the genome sequence. Genes Chromosomes Cancer 44(1), 52–64 (2005)
Reinhold, W.C., Reimers, M.A., Lorenzi, P., Ho, J., Shankavaram, U.T., Ziegler, M.S., et al.: Multifactorial regulation of E-cadherin expression: an integrative study. Mol. Cancer Ther. 9(1), 1–16 (2010)
Jean-Mary, Y.R., Shironoshita, E.P., Kabuka, M.R.: Ontology matching with semantic verification. Web Semantics: Science, Services and Agents on the World Wide Web 7(3), 235–251 (2009)
Jean-Mary, Y.R., Shironoshita, E.P., Kabuka, M.R.: ASMOV: Results for OAEI 2010. In: Ontology Matching Workshop OM 2010 (2010)
Shironoshita, E.P., Jean-Mary, Y.R., Bradley, R.M., Kabuka, M.R.: semQA: SPARQL with Idempotent Disjunction. IEEE Transactions on Knowledge and Data Engineering 21(3), 401–414 (2009)
Bizer, C.: D2RQ - treating non-RDF databases as virtual RDF graphs. In: Proceedings off the 3rd International Semantic Web Conference (ISWC 2004) (2004), http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.126.2314
Broekstra, J., Kampman, A., Van Harmelen, F.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, pp. 54–68 (2002)
Das, S., Sundara, S., Cyganiak, R.: R2RML: RDB to RDF Mapping Language. W3C Candidate Recommendation, February 23 (2012), http://www.w3.org/TR/r2rml/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shironoshita, E.P., Jean-Mary, Y.R., Bradley, R.M., Buendia, P., Kabuka, M.R. (2012). Cancer Data Integration and Querying with GeneTegra. In: Bodenreider, O., Rance, B. (eds) Data Integration in the Life Sciences. DILS 2012. Lecture Notes in Computer Science(), vol 7348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31040-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-31040-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31039-3
Online ISBN: 978-3-642-31040-9
eBook Packages: Computer ScienceComputer Science (R0)