Tag-Based Paper Retrieval: Minimizing User Effort with Diversity Awareness

  • Quoc Viet Hung NguyenEmail author
  • Son Thanh Do
  • Thanh Tam Nguyen
  • Karl Aberer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9049)


As the number of scientific papers getting published is likely to soar, most of modern paper management systems (e.g. ScienceWise, Mendeley, CiteULike) support tag-based retrieval. In that, each paper is associated with a set of tags, allowing user to search for relevant papers by formulating tag-based queries against the system. One of the most critical issues in tag-based retrieval is that user often has difficulties in precisely formulating his information need. Addressing this issue, our paper tackles the problem of automatically suggesting new tags for user when he formulates a query. The set of tags are selected in such a way that resolves query ambiguity in two aspects: informativeness and diversity. While the former reduces user effort in finding the desired papers, the latter enhances the variety of information shown to user. Through studying theoretical properties of this problem, we propose a heuristic-based algorithm with several salient performance guarantees. We also demonstrate the efficiency of our approach through extensive experimentation using real-world datasets.


Ranking Score User Query Query Suggestion Query Size Domain Coverage 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
    Aktolga, E., Allan, J.: Sentiment diversification with different biases. In: SIGIR, pp. 593–602 (2013)Google Scholar
  5. 5.
    Bing, L., Lam, W., Wong, T.L.: Using query log and social tagging to refine queries based on latent topics. In: CIKM, pp. 583–592 (2011)Google Scholar
  6. 6.
    Cohen, A.M., Hersh, W.R.: A survey of current work in biomedical text mining. Briefings in Bioinformatics, 57–71 (2005)Google Scholar
  7. 7.
    Drosou, M., Pitoura, E.: Disc diversity: result diversification based on dissimilarity and coverage. In: PVLDB, pp. 13–24 (2012)Google Scholar
  8. 8.
    Feige, U., Peleg, D., Kortsarz, G.: The dense k-subgraph problem. Algorithmica, 410–421 (2001)Google Scholar
  9. 9.
    Goffman, W.: A searching procedure for information retrieval. ISR, 73–78 (1964)Google Scholar
  10. 10.
    He, J., Tong, H., Mei, Q., Szymanski, B.: Gender: a generic diversified ranking algorithm. In: NIPS, pp. 1142–1150 (2012)Google Scholar
  11. 11.
    Hurley, N., Zhang, M.: Novelty and diversity in top-n recommendation - analysis and evaluation. TOIT, 1–30 (2011)Google Scholar
  12. 12.
    Iwata, M., Sakai, T., Yamamoto, T., Chen, Y., Liu, Y., Wen, J.R., Nishio, S.: Aspectiles: tile-based visualization of diversified web search results. In: SIGIR, pp. 85–94 (2012)Google Scholar
  13. 13.
    Jain, V., Varma, M.: Learning to re-rank: query-dependent image re-ranking using click data. In: WWW, pp. 277–286 (2011)Google Scholar
  14. 14.
    Jomsri, P., Sanguansintukul, S., Choochaiwattana, W.: A comparison of search engine using “tag title and abstract” with citeulike - an initial evaluation. In: ICITST, pp. 1–5 (2009)Google Scholar
  15. 15.
    Kashyap, A., Hristidis, V., Petropoulos, M.: Facetor: cost-driven exploration of faceted query results. In: CIKM, pp. 719–728 (2010)Google Scholar
  16. 16.
    Kim, J.W., Candan, K.S., Tatemura, J.: Organization and tagging of blog and news entries based on content reuse. J. Sign. Process. Syst., 407–421 (2010)Google Scholar
  17. 17.
    Küçüktunç, O., Saule, E., Kaya, K., Çatalyürek, U.V.: Diversified recommendation on graphs: pitfalls, measures, and algorithms. In: WWW, pp. 715–726 (2013)Google Scholar
  18. 18.
    van Leuken, R.H., Garcia, L., Olivares, X., van Zwol, R.: Visual diversification of image search results. In: WWW, pp. 341–350 (2009)Google Scholar
  19. 19.
    Li, X., Snoek, C.G.M., Worring, M.: Learning social tag relevance by neighbor voting. In: TMM, pp. 1310–1322 (2009)Google Scholar
  20. 20.
    Lin, Y., Lin, H., Jin, S., Ye, Z.: Social annotation in query expansion: a machine learning approach. In: SIGIR, pp. 405–414 (2011)Google Scholar
  21. 21.
    MacRoberts, M.H., MacRoberts, B.R.: Problems of citation analysis: a critical review. JASIST, 342–349 (1989)Google Scholar
  22. 22.
    Maniu, S., Cautis, B.: Network-aware search in social tagging applications: instance optimality versus efficiency. In: CIKM, pp. 939–948 (2013)Google Scholar
  23. 23.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge University Press (2008)Google Scholar
  24. 24.
    Nemhauser, G., Wolsey, L., Fisher, M.: An analysis of approximations for maximizing submodular set functions-i. MP, 265–294 (1978)Google Scholar
  25. 25.
    Noël, S., Beale, R.: Sharing vocabularies: tag usage in citeulike. In: BCS-HCI, pp. 71–74 (2008)Google Scholar
  26. 26.
    Oliveira, V., Gomes, G., Belém, F., Brandão, W., Almeida, J., Ziviani, N., Gonçalves, M.: Automatic query expansion based on tag recommendation. In: CIKM, pp. 1985–1989 (2012)Google Scholar
  27. 27.
    Prokofyev, R., Boyarsky, A., Ruchayskiy, O., Aberer, K., Demartini, G., Cudré-Mauroux, P.: Tag recommendation for large-scale ontology-based information systems. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part II. LNCS, vol. 7650, pp. 325–336. Springer, Heidelberg (2012) Google Scholar
  28. 28.
    Russell, S.J., Norvig, P., Canny, J.F., Malik, J.M., Edwards, D.D.: Artificial Intelligence: A Modern Approach, vol. 74. Prentice Hall Englewood Cliffs (1995)Google Scholar
  29. 29.
    Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. JASIST (1997)Google Scholar
  30. 30.
    Sebastiani, F.: Machine learning in automated text categorization. CSUR, 1–47 (2002)Google Scholar
  31. 31.
    Skoutas, D., Alrifai, M.: Tag clouds revisited. In: CIKM, pp. 221–230 (2011)Google Scholar
  32. 32.
    Vieira, M.R., Razente, H.L., Barioni, M.C.N., Hadjieleftheriou, M., Srivastava, D., Traina, C., Tsotras, V.J.: On query result diversification. In: ICDE, pp. 1163–1174 (2011)Google Scholar
  33. 33.
    Wang, M., Yang, K., Hua, X.S., Zhang, H.J.: Towards a relevant and diverse search of social images. In: TMM, pp. 829–842 (2010)Google Scholar
  34. 34.
    Wang, Q., Ruan, L., Zhang, Z., Si, L.: Learning compact hashing codes for efficient tag completion and prediction. In: CIKM, pp. 1789–1794 (2013)Google Scholar
  35. 35.
    Weinberger, K.Q., Slaney, M., Van Zwol, R.: Resolving tag ambiguity. In: MM, pp. 111–120 (2008)Google Scholar
  36. 36.
    Xie, L., He, X.: Picture tags and world knowledge: learning tag relations from visual semantic sources. In: MM, pp. 967–976 (2013)Google Scholar
  37. 37.
    Zha, Z.J., Yang, L., Mei, T., Wang, M., Wang, Z.: Visual query suggestion. In: MM, pp. 15–24 (2009)Google Scholar
  38. 38.
    Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W.Y.: Improving web search results using affinity graph. In: SIGIR, pp. 504–511 (2005)Google Scholar
  39. 39.
    Zhu, G., Yan, S., Ma, Y.: Image tag refinement towards low-rank, content-tag prior and error sparsity. In: MM, pp. 461–470 (2010)Google Scholar
  40. 40.
    Zhu, X., Goldberg, A.B., Van Gael, J., Andrzejewski, D.: Improving diversity in ranking using absorbing random walks. In: HLT-NAACL, pp. 97–104 (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Quoc Viet Hung Nguyen
    • 1
    Email author
  • Son Thanh Do
    • 1
  • Thanh Tam Nguyen
    • 1
  • Karl Aberer
    • 1
  1. 1.École Polytechnique Fédérale de LausanneLausanneSwitzerland

Personalised recommendations