Database Support for Enabling Data-Discovery Queries over Semantically-Annotated Observational Data

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7600)


Observational data plays a critical role in many scientific disciplines, and scientists are increasingly interested in performing broad-scale analyses by using observational data collected as part of many smaller scientific studies. However, while these data sets often contain similar types of information, they are typically represented using very different structures and with little semantic information about the data itself, which creates significant challenges for researchers who wish to discover existing data sets based on data semantics (observation and measurement types) and data content (the values of measurements within a data set). We present a formal framework to address these challenges that consists of a semantic observational model (to uniformly represent observation and measurement types), a high-level semantic annotation language (to map tabular resources into the model), and a declarative query language that allows researchers to express data-discovery queries over heterogeneous (annotated) data sets. To demonstrate the feasibility of our framework, we also present implementation approaches for efficiently answering discovery queries over semantically annotated data sets. In particular, we propose two storage schemes (in-place databases rdb and materialized databases mdb) to store the source data sets and their annotations. We also present two query schemes (ExeD and ExeH) to evaluate discovery queries and the results of extensive experiments comparing their effectiveness.


Complex Query Conjunctive Normal Form Query Evaluation Semantic Annotation Disjunctive Normal Form 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Knowledge network for biocomplexity (KNB),
  2. 2.
  3. 3.
    OpenGIS: Observations and measurements encoding standard (O&M),
  4. 4.
    Santa Barbara Coastal LTER repository,
  5. 5.
    The Digital Archaeological Record (tDAR),
  6. 6.
    An, Y., Mylopoulos, J., Borgida, A.: Building semantic mappings from databases to ontologies. In: AAAI (2006)Google Scholar
  7. 7.
    Arenas, M., Fagin, R., Nash, A.: Composition with target constraints. In: ICDT, pp. 129–142 (2010)Google Scholar
  8. 8.
    Berkley, C., et al.: Improving data discovery for metadata repositories through semantic search. In: CISIS, pp. 1152–1159 (2009)Google Scholar
  9. 9.
    Bhagwat, D., Chiticariu, L., Tan, W.C., Vijayvargiya, G.: An annotation management system for relational databases. In: VLDB (2004)Google Scholar
  10. 10.
    Bowers, S., Madin, J.S., Schildhauer, M.P.: A Conceptual Modeling Framework for Expressing Observational Data Semantics. In: Li, Q., Spaccapietra, S., Yu, E., Olivé, A. (eds.) ER 2008. LNCS, vol. 5231, pp. 41–54. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    Cao, H., Bowers, S., Schildhauer, M.P.: Approaches for Semantically Annotating and Discovering Scientific Observational Data. In: Hameurlain, A., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) DEXA 2011, Part I. LNCS, vol. 6860, pp. 526–541. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    Chiticariu, L., Tan, W.C., Vijayvargiya, G.: DBNotes: a post-it system for relational databases based on provenance. In: SIGMOD, pp. 942–944 (2005)Google Scholar
  13. 13.
    Fagin, R., Haas, L.M., Hernández, M., Miller, R.J., Popa, L., Velegrakis, Y.: Clio: Schema Mapping Creation and Data Exchange. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds.) Conceptual Modeling: Foundations and Applications. LNCS, vol. 5600, pp. 198–236. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  14. 14.
    Fox, P., et al.: Ontology-supported scientific data frameworks: The virtual solar-terrestrial observatory experience. Computers & Geosciences 35(4), 724–738 (2009)CrossRefGoogle Scholar
  15. 15.
    Geerts, F., Kementsietsidis, A., Milano, D.: Mondrian: Annotating and querying databases through colors and blocks. In: ICDE, p. 82 (2006)Google Scholar
  16. 16.
    Güntsc, A., et al.: Effectively searching specimen and observation data with TOQE, the thesaurus optimized query expander. Biodiversity Informatics 6, 53–58 (2009)Google Scholar
  17. 17.
    Halevy, A., Rajaraman, A., Ordille, J.: Data integration: the teenage years. In: VLDB (2006)Google Scholar
  18. 18.
    Balhoff, J., et al.: Phenex: Ontological annotation of phenotypic diversity. PLoS ONE 5 (2010)Google Scholar
  19. 19.
    Kolaitis, P.G.: Schema mappings, data exchange, and metadata management. In: PODS (2005)Google Scholar
  20. 20.
    Pennings, S., et al.: Do individual plant species show predictable responses to nitrogen addition across multiple experiments? Oikos 110(3), 547–555 (2005)CrossRefGoogle Scholar
  21. 21.
    Reeve, L., Han, H.: Survey of semantic annotation platforms. In: SAC (2005)Google Scholar
  22. 22.
    Sorokina, D., et al.: Detecting and interpreting variable interactions in observational ornithology data. In: ICDM Workshops, pp. 64–69 (2009)Google Scholar
  23. 23.
    Stoyanovich, J., Mee, W., Ross, K.A.: Semantic ranking and result visualization for life sciences publications. In: ICDE, pp. 860–871 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Dept. of Computer ScienceNew Mexico State UniversityUSA
  2. 2.Dept. of Computer ScienceGonzaga UniversityUSA
  3. 3.NCEASUniversity of California Santa BarbaraUSA

Personalised recommendations