TRank: Ranking Entity Types Using the Web of Data

  • Alberto Tonon
  • Michele Catasta
  • Gianluca Demartini
  • Philippe Cudré-Mauroux
  • Karl Aberer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8218)

Abstract

Much of Web search and browsing activity is today centered around entities. For this reason, Search Engine Result Pages (SERPs) increasingly contain information about the searched entities such as pictures, short summaries, related entities, and factual information. A key facet that is often displayed on the SERPs and that is instrumental for many applications is the entity type. However, an entity is usually not associated to a single generic type in the background knowledge bases but rather to a set of more specific types, which may be relevant or not given the document context. For example, one can find on the Linked Open Data cloud the fact that Tom Hanks is a person, an actor, and a person from Concord, California. All those types are correct but some may be too general to be interesting (e.g., person), while other may be interesting but already known to the user (e.g., actor), or may be irrelevant given the current browsing context (e.g., person from Concord, California). In this paper, we define the new task of ranking entity types given an entity and its context. We propose and evaluate new methods to find the most relevant entity type based on collection statistics and on the graph structure interconnecting entities and types. An extensive experimental evaluation over several document collections at different levels of granularity (e.g., sentences, paragraphs, etc.) and different type hierarchies (including DBPedia, Freebase, and schema.org) shows that hierarchy-based approaches provide more accurate results when picking entity types to be displayed to the end-user while still being highly scalable.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - A crystallization point for the Web of Data. Journal of Web Semantics, 154–165 (2009)Google Scholar
  2. 2.
    Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250 (2008)Google Scholar
  3. 3.
    Campinas, S., Ceccarelli, D., Perry, T.E., Delbru, R., Balog, K., Tummarello, G.: The Sindice-2011 dataset for entity-oriented search in the web of data. In: 1st International Workshop on Entity-Oriented Search (EOS), pp. 26–32 (2011)Google Scholar
  4. 4.
    Ciaramita, M., Altun, Y.: Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In: EMNLP, pp. 594–602 (2006)Google Scholar
  5. 5.
    Cunningham, H., Humphreys, K., Gaizauskas, R., Wilks, Y.: GATE: a general architecture for text engineering. In: ANLC, pp. 29–30 (1997)Google Scholar
  6. 6.
    Demartini, G., Difallah, D.E., Cudré-Mauroux, P.: ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: WWW, pp. 469–478 (2012)Google Scholar
  7. 7.
    Fang, Y., Si, L., Yu, Z., et al.: Purdue at TREC 2010 Entity Track: A Probabilistic Framework for Matching Types Between Candidate and Target Entities. In: TREC (2010)Google Scholar
  8. 8.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: ACL, pp. 363–370 (2005)Google Scholar
  9. 9.
    Finkel, J.R., Manning, C.D.: Joint parsing and named entity recognition. In: NAACL, pp. 326–334 (2009)Google Scholar
  10. 10.
    Gangemi, A., Nuzzolese, A.G., Presutti, V., Draicchio, F., Musetti, A., Ciancarini, P.: Automatic typing of dbpedia entities. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 65–81. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  11. 11.
    Holmes, G., Hall, M., Frank, E.: Generating rule sets from model trees. In: Foo, N.Y. (ed.) AI 1999. LNCS, vol. 1747, pp. 1–12. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  12. 12.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. In: TOIS, pp. 422–446 (2002)Google Scholar
  13. 13.
    Kalyanpur, A., Murdock, J.W., Fan, J., Welty, C.: Leveraging community-built knowledge for type coercion in question answering. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part II. LNCS, vol. 7032, pp. 144–156. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  14. 14.
    Kumar, R., Tomkins, A.: A characterization of online search behavior. IEEE Data Eng. Bull. (2009)Google Scholar
  15. 15.
    Liu, T.-Y.: Learning to rank for information retrieval. In: FTIR, pp. 225–331 (2009)Google Scholar
  16. 16.
    Matuszek, C., Cabral, J., Witbrock, M., DeOliveira, J.: An introduction to the syntax and content of cyc. In: AAAI Spring Symposium (2006)Google Scholar
  17. 17.
    Mühleisen, H., Bizer, C.: Web data commons - extracting structured data from two large web corpora. In: LDOW (2012)Google Scholar
  18. 18.
    Nadeau, D.: Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision. PhD thesis (2007)Google Scholar
  19. 19.
    Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity. In: Lamontagne, L., Marchand, M. (eds.) Canadian AI 2006. LNCS (LNAI), vol. 4013, pp. 266–277. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  20. 20.
    Pound, J., Mika, P., Zaragoza, H.: Ad-hoc object retrieval in the web of data. In: WWW, pp. 771–780 (2010)Google Scholar
  21. 21.
    Quinlan, J.R.: Learning with continuous classes. In: AI, pp. 343–348 (1992)Google Scholar
  22. 22.
    Suchanek, F.M., Abiteboul, S., Senellart, P.: Paris: Probabilistic alignment of relations, instances, and schema. In: PVLDB, pp. 157–168 (2011)Google Scholar
  23. 23.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)Google Scholar
  24. 24.
    Tonon, A., Demartini, G., Cudré-Mauroux, P.: Combining inverted indices and structured search for ad-hoc object retrieval. In: SIGIR, pp. 125–134 (2012)Google Scholar
  25. 25.
    Tylenda, T., Sozio, M., Weikum, G.: Einstein: physicist or vegetarian? summarizing semantic type graphs for knowledge discovery. In: WWW, pp. 273–276 (2011)Google Scholar
  26. 26.
    Welty, C., Murdock, J.W., Kalyanpur, A., Fan, J.: A comparison of hard filters and soft evidence for answer typing in watson. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part II. LNCS, vol. 7650, pp. 243–256. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  27. 27.
    Whang, S.E., Garcia-Molina, H.: Joint Entity Resolution on Multiple Datasets. The VLDB Journal (2013)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Alberto Tonon
    • 1
  • Michele Catasta
    • 2
  • Gianluca Demartini
    • 1
  • Philippe Cudré-Mauroux
    • 1
  • Karl Aberer
    • 2
  1. 1.eXascale InfolabUniversity of FribourgSwitzerland
  2. 2.EPFLLausanneSwitzerland

Personalised recommendations