Converting and Annotating Quantitative Data Tables

  • Mark van Assem
  • Hajo Rijgersberg
  • Mari Wigham
  • Jan Top
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6496)


Companies, governmental agencies and scientists produce a large amount of quantitative (research) data, consisting of measurements ranging from e.g. the surface temperatures of an ocean to the viscosity of a sample of mayonnaise. Such measurements are stored in tables in e.g. spreadsheet files and research reports. To integrate and reuse such data, it is necessary to have a semantic description of the data. However, the notation used is often ambiguous, making automatic interpretation and conversion to RDF or other suitable format difficult. For example, the table header cell “f (Hz)” refers to frequency measured in Hertz, but the symbol “f” can also refer to the unit farad or the quantities force or luminous flux. Current annotation tools for this task either work on less ambiguous data or perform a more limited task. We introduce new disambiguation strategies based on an ontology, which allows to improve performance on “sloppy” datasets not yet targeted by existing systems.


Cosine Similarity Annotation System Semantic Description Generic Quantity Table Header 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Agatonovic, M., Aswani, N., Bontcheva, K., Cunningham, H., Heitz, T., Li, Y., Roberts, I., Tablan, V.: Large-scale, parallel automatic patent annotation. In: Conference on Information and Knowledge Management (2008)Google Scholar
  2. 2.
    Bridgman, P.: Dimensional Analysis. Yale University Press, New Haven (1922)zbMATHGoogle Scholar
  3. 3.
    Cohen, W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proc. of IJCAI 2003 Workshop on Inf. Integration, pp. 73–78 (2003)Google Scholar
  4. 4.
    Ding, L., DiFranzo, D., Magidson, S., McGuinness, D.L., Hendler, J.: The Data-gov Wiki: A Semantic Web Portal for Linked Government Data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823. Springer, Heidelberg (2009)Google Scholar
  5. 5.
    Hakenberg, J., Royer, L., Plake, C., Strobelt, H., Schroeder, M.: Me and my friends: gene mention normalization with background knowledge. In: Proc. 2nd BioCreative Challenge Evaluation Workshop, pp. 1–4 (2007)Google Scholar
  6. 6.
    Hanisch, D., Fundel, K., Mevissen, H., Zimmer, R., Fluck, J.: ProMiner: rule-based protein and gene entity recognition. BMC bioinformatics 6(Suppl. 1), S14 (2005)Google Scholar
  7. 7.
    Hignette, G., Buche, P., Dibie-Barthélemy, J., Haemmerlé, O.: Fuzzy Annotation of Web Data Tables Driven by a Domain Ontology. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, p. 653. Springer, Heidelberg (2009)Google Scholar
  8. 8.
    Langegger, A., Woss, W.: Xlwrap - querying and integrating arbitrary spreadsheets with sparql. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 359–374. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  9. 9.
    Lynn, S., Embley, D.W.: Semantically Conceptualizing and Annotating Tables. In: Domingue, J., Anutariya, C. (eds.) ASWC 2008. LNCS, vol. 5367, pp. 345–359. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Medelyan, O., Witten, I.: Thesaurus-based index term extraction for agricultural documents. In: Proc. of the 6th Agricultural Ontology Service (AOS) Workshop at EFITA/WCCA (2005)Google Scholar
  11. 11.
    Rijgersberg, H., Wigham, M., Top, J.L.: How semantics can improve engineering processes - a case of units of measure and quantities (2010); accepted for publication in Advanced Engineering InformaticsGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Mark van Assem
    • 1
  • Hajo Rijgersberg
    • 3
  • Mari Wigham
    • 2
    • 3
  • Jan Top
    • 1
    • 2
    • 3
  1. 1.VU University AmsterdamThe Netherlands
  2. 2.Top Institute Food and NutritionThe Netherlands
  3. 3.Wageningen University and Research CentreThe Netherlands

Personalised recommendations