Supporting Scholarly Search with Keyqueries

  • Matthias HagenEmail author
  • Anna Beyer
  • Tim Gollub
  • Kristof Komlossy
  • Benno Stein
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9626)


We deal with a problem faced by scholars every day: identifying relevant papers on a given topic. In particular, we focus on the scenario where a scholar can come up with a few papers (e.g., suggested by a colleague) and then wants to find “all” the other related publications. Our proposed approach to the problem is based on the concept of keyqueries: formulating keyqueries from the input papers and suggesting the top results as candidates of related work.

We compare our approach to three baselines that also represent the different ways of how humans search for related work: (1) a citation-graph-based approach focusing on cited and citing papers, (2) a method formulating queries from the paper abstracts, and (3) the “related articles”-functionality of Google Scholar. The effectiveness is measured in a Cranfield-style user study on a corpus of 200,000 papers. The results indicate that our novel keyquery-based approach is on a par with the strong citation and Google Scholar baselines but with substantially different results—a combination of the different approaches yields the best results.


Retrieval Model Collaborative Filter Keyword Query Relevance Judgment Query Formulation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Beel, J., Langer, S., Genzmehr, M., Gipp, B., Breitinger, C., Nürnberger, A.: Research paper recommender system evaluation: a quantitative literature survey. In: RepSys Workshop, pp. 15–22 (2013)Google Scholar
  2. 2.
    Bendersky, M., Croft, W.B.: Finding text reuse on the web. In: WSDM, pp. 262–271 (2009)Google Scholar
  3. 3.
    Bethard, S., Jurafsky, D.: Who should I cite: learning literature search models from citation behavior. In: CIKM, pp. 609–618 (2010)Google Scholar
  4. 4.
    Caragea, C., Silvescu, A., Mitra, P., Giles, C.L.: Can’t see the forest for the trees? a citation recommendation system. In: JCDL, pp. 111–114 (2013)Google Scholar
  5. 5.
    Dasdan, A., D’Alberto, P., Kolay, S., Drome, C.: Automatic retrieval of similar content using search engine query interface. In: CIKM, pp. 701–710 (2009)Google Scholar
  6. 6.
    Ekstrand, M.D., Kannan, P., Stemper, J.A., Butler, J.T., Konstan, J.A., Riedl, J.: Automatically building research reading lists. In: RecSys, pp. 159–166 (2010)Google Scholar
  7. 7.
    El-Arini, K., Guestrin, C.: Beyond keyword search: discovering relevant scientific literature. In: KDD, pp. 439–447 (2011)Google Scholar
  8. 8.
    El-Beltagy, S.R., Rafea, A.A.: KP-Miner: a keyphrase extraction system for English and Arabic documents. Inf. Syst. 34(1), 132–144 (2009)CrossRefGoogle Scholar
  9. 9.
    Gollub, T., Hagen, M., Michel, M., Stein, B.: From keywords to keyqueries: content descriptors for the web. In: SIGIR, pp. 981–984 (2013)Google Scholar
  10. 10.
    Golshan, B., Lappas, T., Terzi, E.: Sofia search: a tool for automating related-work search. In: SIGMOD, pp. 621–624 (2012)Google Scholar
  11. 11.
    Hagen, M., Glimm, C.: Supporting more-like-this information needs: finding similar web content in different scenarios. In: Kanoulas, E., Lupu, M., Clough, P., Sanderson, M., Hall, M., Hanbury, A., Toms, E. (eds.) CLEF 2014. LNCS, vol. 8685, pp. 50–61. Springer, Heidelberg (2014)Google Scholar
  12. 12.
    Hagen, M., Stein, B.: Candidate document retrieval for web-scale text reuse detection. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 356–367. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  13. 13.
    He, Q., Kifer, D., Pei, J., Mitra, P., Giles, C.L.: Citation recommendation without author supervision. In: WSDM, pp. 755–764 (2011)Google Scholar
  14. 14.
    He, Q., Pei, J., Kifer, D., Mitra, P., Giles, C.L.: Context-aware citation recommendation. In: WWW, pp. 421–430 (2010)Google Scholar
  15. 15.
    Huang, W., Kataria, S., Caragea, C., Mitra, P., Giles, C.L., Rokach, L.: Recommending citations: translating papers into references. In: CIKM, pp. 1910–1914 (2012)Google Scholar
  16. 16.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)CrossRefGoogle Scholar
  17. 17.
    Kataria, S., Mitra, P., Bhatia, S.: Utilizing context in generative Bayesian models for linked corpus. In: AAAI, pp. 1340–1345 (2010)Google Scholar
  18. 18.
    Kim, S.N., Medelyan, O., Kan, M.-Y., Baldwin, T.: SemEval-2010 task 5: automatic keyphrase extraction from scientific articles. In: SemEval 2010, pp. 21–26 (2010)Google Scholar
  19. 19.
    Küçüktunç, O., Saule, E., Kaya, K., Catalyürek, Ü.V.: TheAdvisor: a webservice for academic recommendation. In: JCDL, pp. 433–434 (2013)Google Scholar
  20. 20.
    Livne, A., Gokuladas, V., Teevan, J., Dumais, S., Adar, E.: CiteSight: supporting contextual citation recommendation using differential search. In: SIGIR, pp. 807–816 (2014)Google Scholar
  21. 21.
    Lu, Y., He, J., Shan, D., Yan, H.: Recommending citations with translation model. In: CIKM, pp. 2017–2020 (2011)Google Scholar
  22. 22.
    Lykke, M., Larsen, B., Lund, H., Ingwersen, P.: Developing a test collection for the evaluation of integrated search. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 627–630. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  23. 23.
    Nallapati, R., Ahmed, A., Xing, E.P., Cohen, W.W.: Joint latent topic models for text and citations. In: KDD, pp. 542–550 (2008)Google Scholar
  24. 24.
    Nascimento, C., Laender, A.H.F., Soares da Silva, A., Gonçalves, M.A.: A source independent framework for research paper recommendation. In: JCDL, pp. 297–306 (2011)Google Scholar
  25. 25.
    Pickens, J., Cooper, M., Golovchinsky, G.: Reverted indexing for feedback and expansion. In: CIKM, pp. 1049–1058 (2010)Google Scholar
  26. 26.
    Robertson, S.E., Zaragoza, H., Taylor, M.J.: Simple BM25 extension to multiple weighted fields. In: CIKM, pp. 42–49 (2004)Google Scholar
  27. 27.
    Stein, B., Hagen, M.: Introducing the user-over-ranking hypothesis. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 503–509. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  28. 28.
    Sugiyama, K., Kan, M.-Y.: Exploiting potential citation papers in scholarly paper recommendation. In: JCDL, pp. 153–162 (2013)Google Scholar
  29. 29.
    Tang, J., Zhang, J.: A discriminative approach to topic-based citation recommendation. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 572–579. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  30. 30.
    Tang, X., Wan, X., Zhang, X.: Cross-language context-aware citation recommendation in scientific articles. In: SIGIR, pp. 817–826 (2014)Google Scholar
  31. 31.
    Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: KDD, pp. 448–456 (2011)Google Scholar
  32. 32.
    Yang, Y., Bansal, N., Dakka, W., Ipeirotis, P.G., Koudas, N., Papadias, D.: Query by document. In: WSDM, pp. 34–43 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Matthias Hagen
    • 1
    Email author
  • Anna Beyer
    • 1
  • Tim Gollub
    • 1
  • Kristof Komlossy
    • 1
  • Benno Stein
    • 1
  1. 1.Bauhaus-Universität WeimarWeimarGermany

Personalised recommendations