Advertisement

TabEL: Entity Linking in Web Tables

  • Chandra Sekhar Bhagavatula
  • Thanapon Noraset
  • Doug Downey
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9366)

Abstract

Web tables form a valuable source of relational data. The Web contains an estimated 154 million HTML tables of relational data, with Wikipedia alone containing 1.6 million high-quality tables. Extracting the semantics of Web tables to produce machine-understandable knowledge has become an active area of research.

A key step in extracting the semantics of Web content is entity linking (EL): the task of mapping a phrase in text to its referent entity in a knowledge base (KB). In this paper we present TabEL, a new EL system for Web tables. TabEL differs from previous work by weakening the assumption that the semantics of a table can be mapped to pre-defined types and relations found in the target KB. Instead, TabEL enforces soft constraints in the form of a graphical model that assigns higher likelihood to sets of entities that tend to co-occur in Wikipedia documents and tables. In experiments, TabEL significantly reduces error when compared to current state-of-the-art table EL systems, including a \(75\%\) error reduction on Wikipedia tables and a \(60\%\) error reduction on Web tables. We also make our parsed Wikipedia table corpus and test datasets publicly available for future work.

Keywords

Web tables Entity linking Named entity disambiguation Graphical models 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cafarella, M.J., Halevy, A., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. Proceedings of the VLDB Endowment 1(1), 538–549 (2008)CrossRefGoogle Scholar
  2. 2.
    Buche, P., Dibie-Barthelemy, J., Ibanescu, L., Soler, L.: Fuzzy web data tables integration guided by an ontological and terminological resource. IEEE Transactions on Knowledge and Data Engineering 25(4), 805–819 (2013)CrossRefGoogle Scholar
  3. 3.
    Hignette, G., Buche, P., Dibie-Barthélemy, J., Haemmerlé, O.: Fuzzy annotation of web data tables driven by a domain ontology. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 638–653. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  4. 4.
    Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proceedings of the VLDB, 1338–1347 (2010)Google Scholar
  5. 5.
    Munoz, E., Hogan, A., Mileo, A.: Triplifying wikipedia’s tables. In: LD4IE@ISWC (2013)Google Scholar
  6. 6.
    Mulwad, V., Finin, T., Joshi, A.: Automatically generating government linked data from tables. In: Working Notes of AAAI Fall Symposium on Open Government Knowledge: AI Opportunities and Challenges, vol. 4 (2011)Google Scholar
  7. 7.
    Mulwad, V., Finin, T., Joshi, A.: Semantic message passing for generating linked data from tables. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 363–378. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  8. 8.
    Mulwad, V., Finin, T., Syed, Z., Joshi, A.: T2ld: Interpreting and representing tables as linked data. In: 9th International Semantic Web Conference, ISWC 2010, p. 25. Citeseer (2010)Google Scholar
  9. 9.
    Syed, Z., Finin, T., Mulwad, V., Joshi, A.: Exploiting a web of semantic data for interpreting tables. In: Proceedings of the Second Web Science Conference (2010)Google Scholar
  10. 10.
    Venetis, P., Halevy, A., Madhavan, J., Paşca, M., Shen, W., Fei, W., Miao, G., Chung, W.: Recovering semantics of tables on the web. Proceedings of the VLDB Endowment 4(9), 528–538 (2011)CrossRefGoogle Scholar
  11. 11.
    Wang, J., Wang, H., Wang, Z., Zhu, K.Q.: Understanding tables on the web. In: Atzeni, P., Cheung, D., Ram, S. (eds.) ER 2012 Main Conference 2012. LNCS, vol. 7532, pp. 141–155. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  12. 12.
    Zhang, Z.: Start small, build complete: effective and efficient semantic table interpretation using tableminer. The Semantic Web Journal, Under Transparent Review (2014)Google Scholar
  13. 13.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: 16th International World Wide Web Conference (WWW 2007). ACM Press, New York (2007)Google Scholar
  14. 14.
    Qing, L., Getoor, L.: Link-based classification. In: ICML, vol. 3, pp. 496–503 (2003)Google Scholar
  15. 15.
    Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 782–792. Association for Computational Linguistics (2011)Google Scholar
  16. 16.
    Hecht, B., Carton, S.H., Quaderi, M., Schöning, J., Raubal, M., Gergle, D., Downey, D.: Explanatory semantic relatedness and explicit spatialization for exploratory search. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 415–424. ACM (2012)Google Scholar
  17. 17.
    Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy, Chicago, USA, pp. 25–30. AAAI Press (2008)Google Scholar
  18. 18.
    Church, K.W.: One term or two? In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 310–318. ACM (1995)Google Scholar
  19. 19.
    Dohrn, H., Riehle, D.: Design and implementation of the sweble wikitext parser: unlocking the structured data of wikipedia. In: Proceedings of the 7th International Symposium on Wikis and Open Collaboration, pp. 72–81. ACM (2011)Google Scholar
  20. 20.
    Spitkovsky, V.I., Chang, A.X.: A cross-lingual dictionary for english wikipedia concepts. In LREC, pp. 3168–3175 (2012)Google Scholar
  21. 21.
    Gupta, R., Sarawagi, S.: Answering table augmentation queries from unstructured lists on the web. Proceedings of the VLDB Endowment 2(1), 289–300 (2009)CrossRefGoogle Scholar
  22. 22.
    Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: EACL, vol. 6, pp. 9–16 (2006)Google Scholar
  23. 23.
    Cheng, X., Roth, D.: Relational inference for wikification. Urbana 51, 61801 (2013)Google Scholar
  24. 24.
    Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: A unified approach. Transactions of the Association for Computational Linguistics 2 (2014)Google Scholar
  25. 25.
    Ling, X., Singh, S., Weld, D.S.: Context representation for named entity linking. In: Proceedings of the 3rd Pacific Northwest Regional NLP Workshop (NW-NLP 2014) (2014)Google Scholar
  26. 26.
    Noraset, T., Bhagavatula, C., Downey, D.: Websail wikifier at ERD 2014. In: Proceedings of the First International Workshop on Entity Recognition & Disambiguation, pp. 119–124. ACM (2014)Google Scholar
  27. 27.
    Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In EMNLP-CoNLL, vol. 7, pp. 708–716. Citeseer (2007)Google Scholar
  28. 28.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)Google Scholar
  29. 29.
    Usbeck, R., Ngonga Ngomo, A.-C., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both, A.: AGDISTIS - graph-based disambiguation of named entities using linked data. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 457–471. Springer, Heidelberg (2014) Google Scholar
  30. 30.
    Moro, A., Cecconi, F., Navigli, R.: Multilingual word sense disambiguation and entity linking for everybody. In: Proc. of ISWC (P&D), pp. 25–28 (2014)Google Scholar
  31. 31.
    Steinmetz, N., Sack, H.: Semantic multimedia information retrieval based on contextual descriptions. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 382–396. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  32. 32.
    Van Erp, M., Rizzo, G., Troncy, R.: Learning with the web: spotting named entities on the intersection of nerd and machine learning. In: # MSM, pp. 27–30. Citeseer (2013)Google Scholar
  33. 33.
    Ferragina, P., Scaiella, U.: Fast and accurate annotation of short texts with wikipedia pages. arXiv preprint arXiv:1006.3498 (2010)
  34. 34.
    Usbeck, R., Röder, M., Ngonga-Ngomo, A.C., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., et al.: Gerbil-general entity annotation benchmark framework. In: 24th World Wide Web Conference (WWW) (2015)Google Scholar
  35. 35.
    Muñoz, E., Hogan, A., Mileo, A.: Using linked data to mine rdf from wikipedia’s tables. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 533–542. ACM (2014)Google Scholar
  36. 36.
    Sekhavat, Y.A., di Paolo, F., Barbosa, D., Merialdo, P.: Knowledge base augmentation using tabular data. In: Linked Data on the Web at WWW 2014 (2014)Google Scholar
  37. 37.
    Syed, Z.S., Finin, T., Joshi, A.: Wikitology: using wikipedia as an ontology. In: Proceeding of the Second International Conference on Weblogs and Social Media (2008)Google Scholar
  38. 38.
    Sarma, A.D., Fang, L., Gupta, N., Halevy, A., Lee, H., Wu, F., Xin, R., Yu, C.: Finding related tables. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 817–828. ACM (2012)Google Scholar
  39. 39.
    Fan, J., Lu, M., Ooi, B.C., Tan, W.-C., Zhang, M.: A hybrid machine-crowdsourcing system for matching web tables. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 976–987. IEEE (2014)Google Scholar
  40. 40.
    Bhagavatula, C.S., Noraset, T., Downey, D.: Methods for exploring and mining tables on wikipedia. In: Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics, pp. 18–26. ACM (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Chandra Sekhar Bhagavatula
    • 1
  • Thanapon Noraset
    • 1
  • Doug Downey
    • 1
  1. 1.Northwestern UniversityEvanstonUSA

Personalised recommendations