Mapping Entity-Attribute Web Tables to Web-Scale Knowledge Bases

  • Xiaolu Zhang
  • Yueguo Chen
  • Jinchuan Chen
  • Xiaoyong Du
  • Lei Zou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7826)

Abstract

There are many entity-attribute tables on the Web that can be utilized for enriching the entities of knowledge bases (KBs). This requires the schema mapping (matching) between the Web tables and the huge KBs. Existing solutions on schema mapping are inadequate for mapping a Web table and a KB, because of many reasons such as (1) there are many duplicates of entities and their types in a KB; (2) the schema of KB is often implicit, informal, and evolving over time; (3) the KB is typically very large in volume. In this paper, we propose a pure instance-based schema mapping solution to statistically find the effective mapping between a Web table and a KB via the matched data examples. Besides, we propose efficient solutions on finding the matched data examples as well as the overall mapping of a table and a KB. Experiments over real data sets show that our solution is much more accurate than the two baselines of existing solutions. Results also show that our solution is feasible for the mapping of Web tables to large scale KBs.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alexe, B., Chiticariu, L., Miller, R.J., Tan, W.C.: Muse: Mapping understanding and design by example. In: ICDE, pp. 10–19 (2008)Google Scholar
  2. 2.
    Alexe, B., ten Cate, B., Kolaitis, P.G., Tan, W.C.: Characterizing schema mappings via data examples. ACM Trans. Database Syst. 36(4), 23 (2011)CrossRefGoogle Scholar
  3. 3.
    Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and ontology matching with coma++. In: SIGMOD Conference, pp. 906–908 (2005)Google Scholar
  4. 4.
    Bellahsene, Z., Bonifati, A., Rahm, E. (eds.): Schema Matching and Mapping. Springer (2011)Google Scholar
  5. 5.
    Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widom, J.: Swoosh: a generic approach to entity resolution. VLDB J. 18(1), 255–276 (2009)CrossRefGoogle Scholar
  6. 6.
    Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD Conference, pp. 1247–1250 (2008)Google Scholar
  7. 7.
    Cafarella, M.J., Halevy, A.Y., Khoussainova, N.: Data integration for the relational web. PVLDB 2(1), 1090–1101 (2009)Google Scholar
  8. 8.
    Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. PVLDB 1(1), 538–549 (2008)Google Scholar
  9. 9.
    Do, H.H., Rahm, E.: Coma - a system for flexible combination of schema matching approaches. In: VLDB, pp. 610–621 (2002)Google Scholar
  10. 10.
    Engmann, D., Maßmann, S.: Instance matching with coma++. In: BTW Workshops, pp. 28–37 (2007)Google Scholar
  11. 11.
    Gonzalez, H., Halevy, A.Y., Jensen, C.S., Langen, A., Madhavan, J., Shapley, R., Shen, W.: Google fusion tables: data management, integration and collaboration in the cloud. In: SoCC, pp. 175–180 (2010)Google Scholar
  12. 12.
    Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web. Morgan & Claypool Publishers (2011)Google Scholar
  13. 13.
    Kang, J., Naughton, J.F.: On schema matching with opaque column names and data values. In: SIGMOD Conference, pp. 205–216 (2003)Google Scholar
  14. 14.
    Madhavan, J., Bernstein, P.A., Doan, A., Halevy, A.Y.: Corpus-based schema matching. In: ICDE, pp. 57–68 (2005)Google Scholar
  15. 15.
    Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: VLDB, pp. 49–58 (2001)Google Scholar
  16. 16.
    Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: ICDE, pp. 117–128 (2002)Google Scholar
  17. 17.
    Popa, L., Velegrakis, Y., Miller, R.J., Hernández, M.A., Fagin, R.: Translating web data. In: VLDB, pp. 598–609 (2002)Google Scholar
  18. 18.
    Preda, N., Kasneci, G., Suchanek, F.M., Neumann, T., Yuan, W., Weikum, G.: Active knowledge: dynamically enriching rdf knowledge bases by web services. In: SIGMOD Conference, pp. 399–410 (2010)Google Scholar
  19. 19.
    Qian, L., Cafarella, M.J., Jagadish, H.V.: Sample-driven schema mapping. In: SIGMOD Conference, pp. 73–84 (2012)Google Scholar
  20. 20.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)MATHCrossRefGoogle Scholar
  21. 21.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)Google Scholar
  22. 22.
    Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD Conference, pp. 481–492 (2012)Google Scholar
  23. 23.
    Yakout, M., Ganjam, K., Chakrabarti, K., Chaudhuri, S.: Infogather: entity augmentation and attribute discovery by holistic matching with web tables. In: SIGMOD Conference, pp. 97–108 (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Xiaolu Zhang
    • 1
  • Yueguo Chen
    • 1
  • Jinchuan Chen
    • 1
  • Xiaoyong Du
    • 1
  • Lei Zou
    • 2
  1. 1.Renmin University of ChinaChina
  2. 2.Peking UniversityChina

Personalised recommendations