Advertisement

From Web Tables to Concepts: A Semantic Normalization Approach

  • Katrin BraunschweigEmail author
  • Maik Thiele
  • Wolfgang Lehner
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9381)

Abstract

Relational Web tables, embedded in HTML or published on data platforms, have become an important resource for many applications, including question answering or entity augmentation. To utilize the data, we require some understanding of what the tables are about. Previous research on recovering Web table semantics has largely focused on simple tables, which only describe a single semantic concept. However, there is also a significant number of de-normalized multi-concept tables on the Web. Treating these as single-concept tables results in many incorrect relations being extracted. In this paper, we propose a normalization approach to decompose multi-concept tables into smaller single-concept tables. First, we identify columns that represent keys or identifiers of entities. Then, we utilize the table schema as well as intrinsic data correlations to identify concept boundaries and split the tables accordingly. Experimental results on real Web tables show that our approach is feasible and effectively identifies semantic concepts.

Keywords

Web tables Conceptualization Normalization Semantics 

References

  1. 1.
    Bahmani, A., Naghibzadeh, M., Bahmani, B.: Automatic database normalization and primary key generation. In: Canadian Conference on Electrical and Computer Engineering, CCECE 2008, pp. 000011–000016, May 2008Google Scholar
  2. 2.
    Cafarella, M.J., Halevy, A.Y., Khoussainova, N.: Data integration for the relational web. Proc. VLDB Endow. 2, 1090–1101 (2009)CrossRefGoogle Scholar
  3. 3.
    Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. Proc. VLDB Endow. 1(1), 538–549 (2008)CrossRefGoogle Scholar
  4. 4.
    Cafarella, M.J., Halevy, A.Y., Zhang, Y., Wang, D.Z., Wu, E.: Uncovering the relational web. In: WebDB (2008)Google Scholar
  5. 5.
    Das Sarma, A., Fang, L., Gupta, N., Halevy, A.Y., Lee, H., Wu, F., Xin, R., Yu, C.: Finding related tables. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, pp. 817–828 (2012)Google Scholar
  6. 6.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  7. 7.
    Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: Tane: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)CrossRefzbMATHGoogle Scholar
  8. 8.
    Ilyas, I.F., Markl, V., Haas, P., Brown, P., Aboulnaga, A.: Cords: automatic discovery of correlations and soft functional dependencies. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD 2004, New York, NY, USA, pp. 647–658. ACM (2004)Google Scholar
  9. 9.
    Sorrentino, S., Bergamaschi, B., Gawinecki, M., Po, L.: Schema normalization for improving schema matching. In: Laender, A.H.F., Castano, S., Dayal, U., Casati, F., de Oliveira, J.P.M. (eds.) ER 2009. LNCS, vol. 5829, pp. 280–293. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  10. 10.
    Venetis, P., Halevy, A., Madhavan, J., Paşca, M., Shen, W., Wu, F., Miao, G., Wu, C.: Recovering semantics of tables on the web. Proc. VLDB Endow. 4(9), 528–538 (2011)CrossRefGoogle Scholar
  11. 11.
    Wang, D.Z., Dong, X.L., Sarma, A.D., Franklin, M.J., Halevy, A.Y.: Functional dependency generation and applications in pay-as-you-go data integration systems. In: 12th International Workshop on the Web and Databases, WebDB 2009, Providence, Rhode Island, USA, 28 June 2009Google Scholar
  12. 12.
    Wang, J., Wang, H., Wang, Z., Zhu, K.Q.: Understanding tables on the web. In: Atzeni, P., Cheung, D., Ram, S. (eds.) ER 2012. LNCS, vol. 7532, pp. 141–155. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Yakout, M., Ganjam, K., Chakrabarti, K., Chaudhuri, S.: Infogather: entity augmentation and attribute discovery by holistic matching with web tables. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, New York, NY, USA, pp. 97–108. ACM (2012)Google Scholar
  14. 14.
    Zhang, M., Chakrabarti, K.: Infogather+: Semantic matching and annotation of numeric and time-varying attributes in web tables. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, pp. 145–156. ACM (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Katrin Braunschweig
    • 1
    Email author
  • Maik Thiele
    • 1
  • Wolfgang Lehner
    • 1
  1. 1.Technische Universität DresdenDresdenGermany

Personalised recommendations