Generating Semantic Aspects for Queries

  • Dhruv GuptaEmail author
  • Klaus Berberich
  • Jannik Strötgen
  • Demetrios Zeinalipour-Yazti
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11503)


Large document collections can be hard to explore if the user presents her information need in a limited set of keywords. Ambiguous intents arising out of these short queries often result in long-winded query sessions and many query reformulations. To alleviate this problem, in this work, we propose the novel concept of semantic aspects (e.g., \({\langle }\{\textsf {michael\text {-}phelps}\}, \{\textsf {athens, beijing, london}\}, [2004,2016] \rangle \) for the ambiguous query Open image in new window ) and present the xFactor algorithm that generates them from annotations in documents. Semantic aspects uplift document contents into a meaningful structured representation, thereby allowing the user to sift through many documents without the need to read their contents. The semantic aspects are created by the analysis of semantic annotations in the form of temporal, geographic, and named entity annotations. We evaluate our approach on a novel testbed of over 5,000 aspects on Web-scale document collections amounting to more than 450 million documents. Our results show the xFactor algorithm finds relevant aspects for highly ambiguous queries.


  1. 1.
    The ClueWeb09 dataset.
  2. 2.
    The ClueWeb12 dataset.
  3. 3.
  4. 4.
  5. 5.
    The New York Times Annotated Corpus.
  6. 6.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994, pp. 487–499 (1994)Google Scholar
  7. 7.
    Ben-Yitzhak, O., et al.: Beyond basic faceted search. In: WSDM 2008, pp. 33–44 (2008)Google Scholar
  8. 8.
    Berberich, K., Bedathur, S., Alonso, O., Weikum, G.: A language modeling approach for temporal information needs. In: Gurrin, C., et al. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 13–25. Springer, Heidelberg (2010). Scholar
  9. 9.
    Bhagavatula, C.S., Noraset, T., Downey, D.: TabEL: entity linking in web tables. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 425–441. Springer, Cham (2015). Scholar
  10. 10.
    Bianchi, F., Palmonari, M., Nozza, D.: Towards encoding time in text-based entity embeddings. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 56–71. Springer, Cham (2018). Scholar
  11. 11.
    Nguyen, T.N., Kanhabua, N., Nejdl, W.: Multiple models for recommending temporal aspects of entities. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 462–480. Springer, Cham (2018). Scholar
  12. 12.
    Blei, D.M., et al.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  13. 13.
    Bordino, I., et al.: Beyond entities: promoting explorative search with bundles. Inf. Retr. J. 19(5), 447–486 (2016)CrossRefGoogle Scholar
  14. 14.
    Ceccarelli, D., et al.: Learning relatedness measures for entity linking. In: CIKM 2013, pp. 139–148 (2013)Google Scholar
  15. 15.
    Clarke, C.L.A., et al.: Novelty and diversity in information retrieval evaluation. In: SIGIR 2008, pp. 659–666 (2008)Google Scholar
  16. 16.
    Dou, Z., et al.: Finding dimensions for queries. In: CIKM 2011, pp. 1311–1320 (2011)Google Scholar
  17. 17.
    Gabrilovich, E., et al.: FACC1: freebase annotation of ClueWeb corpora, version 1 (release date 2013-06-26, format version 1, correction level 0), June 2013Google Scholar
  18. 18.
    Grau, B.C. et al.: SemFacet: faceted search over ontology enhanced knowledge graphs. In: ISWC 2016 (2016)Google Scholar
  19. 19.
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(Suppl. 1), 5228–5235 (2004)CrossRefGoogle Scholar
  20. 20.
    Guo, J., et al.: Named entity recognition in query. In: SIGIR 2009, pp. 267–274 (2009)Google Scholar
  21. 21.
    Gupta, D., Berberich, K.: Identifying time intervals of interest to queries. In: CIKM 2014, pp. 1835–1838 (2014)Google Scholar
  22. 22.
    Hearst, M.A.: Search User Interfaces, 1st edn. Cambridge University Press, New York (2009)CrossRefGoogle Scholar
  23. 23.
    Hearst, M.A., Plaunt, C.: Subtopic structuring for full-length document access. In: SIGIR 1993. pp. 59–68 (1993)Google Scholar
  24. 24.
    Henry, J.: Providing knowledge panels with search results, 2 May 2013. US Patent App. 13/566,489
  25. 25.
    Hoffart, J., et al.: STICS: searching with strings, things, and cats. In: SIGIR 2014, pp. 1247–1248 (2014)Google Scholar
  26. 26.
    Hoffart, J., et al.: Robust disambiguation of named entities in text. In: EMNLP 2011, pp. 782–792 (2011)Google Scholar
  27. 27.
    Kong, W., Allan, J.: Extracting query facets from search results. In: SIGIR 2013, pp. 93–102 (2013)Google Scholar
  28. 28.
    Koutrika, G., et al.: Generating reading orders over document collections. In: ICDE 2015, pp. 507–518 (2015)Google Scholar
  29. 29.
    Li, C., et al.: Facetedpedia: Dynamic generation of query-dependent faceted interfaces for Wikipedia. In: WWW 2010, pp. 651–660 (2010)Google Scholar
  30. 30.
    Reinanda, R., et al.: Mining, ranking and recommending entity aspects. In: SIGIR 2015, pp. 263–272 (2015)Google Scholar
  31. 31.
    Santos, R.L.T., et al.: Search result diversification. Found. Trends® Inf. Retr. 9(1), 1–90 (2015)CrossRefGoogle Scholar
  32. 32.
    Schuhmacher, M., et al.: Ranking entities for web queries through text and knowledge. In: CIKM 2015, pp. 1461–1470 (2015)Google Scholar
  33. 33.
    Strötgen, J., Gertz, M.: Multilingual and cross-domain temporal tagging. Lang. Resour. Eval. 47(2), 269–298 (2013)CrossRefGoogle Scholar
  34. 34.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from wikipedia and wordnet. Web Semant. 6(3), 203–217 (2008)CrossRefGoogle Scholar
  35. 35.
    Tran, N.K., Tran, T., Niederée, C.: Beyond time: dynamic context-aware entity recommendation. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 353–368. Springer, Cham (2017). Scholar
  36. 36.
    Zhang, R., et al.: Learning recurrent event queries for web search. In: EMNLP 2010, pp. 1129–1139 (2010)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  • Dhruv Gupta
    • 1
    • 2
    Email author
  • Klaus Berberich
    • 1
    • 3
  • Jannik Strötgen
    • 4
  • Demetrios Zeinalipour-Yazti
    • 5
  1. 1.Max Planck Institute for InformaticsSaarbrückenGermany
  2. 2.Graduate School of Computer ScienceSaarbrückenGermany
  3. 3.htw saarSaarbrückenGermany
  4. 4.Bosch Center for Artificial IntelligenceRenningenGermany
  5. 5.University of CyprusNicosiaCyprus

Personalised recommendations