Assigning Semantic Labels to Data Sources

  • S.K. RamnandanEmail author
  • Amol Mittal
  • Craig A. Knoblock
  • Pedro Szekely
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9088)


There is a huge demand to be able to find and integrate heterogeneous data sources, which requires mapping the attributes of a source to the concepts and relationships defined in a domain ontology. In this paper, we present a new approach to find these mappings, which we call semantic labeling. Previous approaches map each data value individually, typically by learning a model based on features extracted from the data using supervised machine-learning techniques. Our approach differs from existing approaches in that we take a holistic view of the data values corresponding to a semantic label and use techniques that treat this data collectively, which makes it possible to capture characteristic properties of the values associated with a semantic label as a whole. Our approach supports both textual and numeric data and proposes the top \(k\) semantic labels along with their associated confidence scores. Our experiments show that the approach has higher label prediction accuracy, has lower time complexity, and is more scalable than existing systems.


Semantic labeling Source modeling 


  1. 1.
    Ambite, J.L., Darbha, S., Goel, A., Knoblock, C.A., Lerman, K., Parundekar, R., Russ, T.: Automatically Constructing Semantic Web Services from Online Sources. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 17–32. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  2. 2.
    Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: Exploring the power of tables on the web. Proc. VLDB Endow. 1(1), 538–549 (2008)CrossRefGoogle Scholar
  3. 3.
    Craswell, N.: Mean reciprocal rank. In: Liu, L., Zsu, M. (eds.) Encyclopedia of Database Systems, p. 1703. Springer, New York (2009)Google Scholar
  4. 4.
    Doan, A., Domingos, P., Halevy, A.: Learning to match schemas of data sources: a multistrategy approach. Mach. Learn. 50(3), 279–301 (2003)CrossRefzbMATHGoogle Scholar
  5. 5.
    Goel, A., Knoblock, C.A., Lerman, K.: Exploiting structure within data for accurate labeling using conditional random fields. In: Proceedings of the 14th International Conference on Artificial Intelligence (ICAI) (2012)Google Scholar
  6. 6.
    Lehmann, E., Romano, J.: Testing Statistical Hypotheses. Springer Texts in Statistics. Springer, New York (2005) zbMATHGoogle Scholar
  7. 7.
    Li, W.S., Clifton, C.: Semantic integration in heterogeneous databases using neural networks. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB). pp. 1–12 (1994)Google Scholar
  8. 8.
    Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. PVLDB 3(1), 1338–1347 (2010)Google Scholar
  9. 9.
    Mulwad, V., Finin, T., Joshi, A.: Semantic message passing for generating linked data from tables. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 363–378. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  10. 10.
    Noy, N.F.: Semantic integration: a survey of ontology-based approaches. SIGMOD Rec. 33(4), 65–70 (2004)CrossRefGoogle Scholar
  11. 11.
    Sequeda, J., Arenas, M., Miranker, D.P.: On directly mapping relational databases to RDF and OWL (extended version). CoRR abs/1202.3667 (2012)Google Scholar
  12. 12.
    Stonebraker, M., Bruckner, D., Ilyas, I., Beskales, G., Cherniack, M., Zdonik, S., Pagan, A., Xu, S.: Data curation at scale: the data tamer system. In: Proceedings of CIDR 2013 (2013)Google Scholar
  13. 13.
    Syed, Z., Finin, T., Mulwad, V., Joshi, A.: Exploiting a web of semantic data for interpreting tables. In: Proceedings of the Second Web Science Conference (2010)Google Scholar
  14. 14.
    Taheriyan, M., Knoblock, C.A., Szekely, P., Ambite, J.L.: A Scalable Approach to Learn Semantic Models of Structured Sources. In: Proceedings of the 8th IEEE International Conference on Semantic Computing (ICSC 2014) (2014)Google Scholar
  15. 15.
    Venetis, P., Halevy, A., Madhavan, J., Paşca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Recovering semantics of tables on the web. Proc. VLDB Endow. 4(9), 528–538 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • S.K. Ramnandan
    • 1
    Email author
  • Amol Mittal
    • 2
  • Craig A. Knoblock
    • 3
  • Pedro Szekely
    • 3
  1. 1.Indian Institute of Technology - MadrasChennaiIndia
  2. 2.Indian Institute of Technology - DelhiNew DelhiIndia
  3. 3.University of Southern CaliforniaLos AngelesUSA

Personalised recommendations