Scientometrics

, Volume 102, Issue 3, pp 2301–2322 | Cite as

Cluster-based polyrepresentation as science modelling approach for information retrieval

Article

Abstract

The increasing number of publications make searching and accessing the produced literature a challenging task. A recent development in bibliographic databases is to use advanced information retrieval techniques in combination with bibliographic means like citations. In this work we will present an approach that combines a cognitive information retrieval framework based on the principle of polyrepresentation with document clustering to enable the user to explore a collection more interactively than by just examining a ranked result list. Our approach uses information need representations as well as different document representations including citations. To evaluate our ideas we employ a simulated user strategy utilising a cluster ranking approach. We report on the possible effectiveness of our approach and on several strategies how users can achieve a higher search effectiveness through cluster browsing. Our results confirm that our proposed polyrepresentative cluster browsing strategy can in principle significantly improve the search effectiveness. However, further evaluations including a more refined user simulation are needed.

Keywords

Information retrieval Polyrepresentation Document clustering Bibliometrics Simulated user 

References

  1. Azzopardi, L. (2011). The economics in interactive information retrieval. In Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval (SIGIR 2011) (pp. 15–24). New York, New York, USA: ACM Press.Google Scholar
  2. Chen, C., Ibekwe-SanJuan, F., & Hou, J. (2010). The structure and dynamics of cocitation clusters: A multipleperspective cocitation analysis. Journal of the American Society for Information Science and Technology, 61(7), 1386–1409.CrossRefGoogle Scholar
  3. Cutting, D. R., Karger, D. R., Pedersen, J. O., & Tukey, J. W. (1992). Scatter/gather: A cluster-based approach to browsing large document collections. In N. Belkin, P. Ingwersen, A. M. Pejtersen & E. A. Fox (Eds.), Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval, ACM, SIGIR’92 (pp. 318–329).Google Scholar
  4. Fox, E. A., & Shaw, J. A. (1993). Combination of multiple searches. In D. Harman (Ed.), The second text retrieval conference (TREC-2). National Institute of Standards and Technology, Gaithersburg, Md. 20899 (pp. 243–252).Google Scholar
  5. Frommholz, I., & Abbasi, M. K. (2014). On clustering and polyrepresentation. In M. de Rijke, T. Kenter, A. P. de Vries, C. Zhai, F. de Jong, K. Radinsky & K. Hofmann (Eds.), Proceedings of the European conference on information retrieval (ECIR 2014) (Vol. 1, pp. 618–623). Springer.Google Scholar
  6. Frommholz, I., Larsen, B., Piwowarski, B., Lalmas, M., Ingwersen, P., & van Rijsbergen, K. (2010). Supporting polyrepresentation in a quantum-inspired geometrical retrieval framework. In Proceedings of the 2010 information interaction in context symposium (pp. 115–124). New Brunswick: ACM.Google Scholar
  7. Fuhr, N., Lechtenfeld, M., Stein, B., & Gollub, T. (2011). The optimum clustering framework: Implementing the cluster hypothesis. Information Retrieval, 15(2), 93–115.CrossRefGoogle Scholar
  8. Glenisson, P., Glänzel, W., Janssens, F., & De Moor, B. (2005a). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing & Management, 41(6), 1548–1572.CrossRefGoogle Scholar
  9. Glenisson, P., Glänzel, W., & Persson, O. (2005b). Combining full-text analysis and bibliometric indicators. A pilot study. Scientometrics, 63(1), 163–180.Google Scholar
  10. Goldszmidt, M., & Sahami, M. (1998). A probabilistic approach to full-text document clustering. Technical report ITAD-433-MS-98-044, SRI International.Google Scholar
  11. Hearst, M. A., & Pedersen, J. O. (1996). Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of the SIGIR 1996 (pp. 76–84).Google Scholar
  12. Hull, D. (1993) Using statistical testing in the evaluation of retrieval experiments. In Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 329–338). ACM.Google Scholar
  13. Ingwersen, P. (1996). Cognitive perspectives of information retrieval interaction: Elements of a cognitive IR theory. The Journal of Documentation, 52(1), 3–50.CrossRefGoogle Scholar
  14. Ingwersen, P., & Järvelin, K. (2005). The turn: Integration of information seeking and retrieval in context. Secaucus, NJ, USA: SpringerGoogle Scholar
  15. Ji, X., & Xu, W. (2006). Document clustering with prior knowledge. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 405–412). ACM.Google Scholar
  16. Ke, W., & Sugimoto, C. R., & Mostafa J. (2009). Dynamicity vs. effectiveness: Studying online clustering for scatter/gather. In Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval (pp. 19–26). ACM.Google Scholar
  17. Kelly, D., & Fu, X. (2007). Eliciting better information need descriptions from users of information search systems. Information Processing & Management, 43(1), 30–46.CrossRefGoogle Scholar
  18. Larsen, B. (2002). Exploiting citation overlaps for information retrieval: Generating a boomerang effect from the network of scientific papers. Scientometrics, 54(2), 155–178.CrossRefGoogle Scholar
  19. Larsen, B., Ingwersen, P., & Kekäläinen, J. (2006). The polyrepresentation continuum in IR. In IIiX: Proceedings of the 1st international conference on information interaction in context (pp. 88–96). New York, NY, USA: ACM.CrossRefGoogle Scholar
  20. Larsen, B., Lioma, C., Frommholz, I., & Schütze, H. (2012). Preliminary study of technical terminology for the retrieval of scientific book metadata records categories and subject descriptors. In SIGIR 2012: Proceedings of the 35th annual international ACM SIGIR conference on research and development in information retrieval (pp. 1131–1132).Google Scholar
  21. Leuski, A. (2001). Evaluating document clustering for interactive information retrieval. In Proceedings of the tenth international conference on information and knowledge management (pp. 33–40). ACM.Google Scholar
  22. Lüke, T., Schaer, P., & Mayr, P. (2013). A framework for specific term recommendation systems. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval—SIGIR’13 (p. 1093).Google Scholar
  23. Lykke, M., Larsen, B., Lund, H., & Ingwersen, P. (2010). Developing a test collection for the evaluation of integrated search. In C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, R. Stefan & K. Rijsbergen (Eds.), Proceedings ECIR 2010 (pp. 627–630). Berlin, Heidelberg: Springer.Google Scholar
  24. MacQueen, J., et al. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, California, USA (Vol. 1).Google Scholar
  25. Mayr, P., & Mutschke, P. (2013). Bibliometric-enhanced retrieval models for big scholarly information systems. In Proceedings IEEE international conference on big data workshop on scholarly big data: Challenges and ideas.Google Scholar
  26. Mayr, P., Mutschke, P., & Petras, V. (2008). Reducing semantic complexity in distributed digital libraries: Treatment of term vagueness and document re-ranking. Library Review, 57(3), 213–224.CrossRefGoogle Scholar
  27. Mutschke, P., Mayr, P., Schaer, P., & Sure, Y. (2011). Science models as value-added services for scholarly information systems. Scientometrics, 89(1), 349–364.CrossRefGoogle Scholar
  28. Na, S.-H., Kang, I.-S., & Lee, J.-H. (2007). Adaptive document clustering based on query-based similarity. Information Processing & Management, 43(4), 887–901.CrossRefGoogle Scholar
  29. Nottelmann, H., & Fuhr, N. (2003). From retrieval status values to probabilities of relevance for advanced IR applications. Information Retrieval, 6(4), 363–388.Google Scholar
  30. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Lioma C. (2006). Terrier: A high performance and scalable information retrieval platform. In Proceedings of ACM SIGIR’06 workshop on open source information retrieval (OSIR 2006).Google Scholar
  31. Raiber, F., & Kurland, O. (2013). Ranking document clusters using markov random fields. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval-SIGIR’13 (pp. 333–342).Google Scholar
  32. Ritchie, A., Teufel, S., & Robertson, S. (2006) Creating a test collection for citation-based IR experiments. In Proceedings of the human language technology conference of the NAACL, main conference.Google Scholar
  33. Robertson, S. E., Walker, S., Beaulieu, M. M., Gatford, M., & Payne, A. (1998). Okapi at TREC-7. In Proceedings of the 7th text retrival converence (TREC-7).Google Scholar
  34. Schaer, P., Mayr, P., & Lüke, T. (2012). Extending term suggestion with author names. In Proceedings of theory and practice of digital libraries 2012 (TPDL 2012).Google Scholar
  35. Skov, M., Larsen, B., & Ingwersen, P. (2008). Inter and intra-document contexts applied in polyrepresentation for best match IR. Information Processing & Management, 44(5), 1673–1683.CrossRefGoogle Scholar
  36. Smucker M. D., Allan J., Carterette B. (2007) A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the 16th ACM conference on information and knowledge management (CIKM) (pp. 623–632). ACM.Google Scholar
  37. Tombros, A., Villa, R., & Van Rijsbergen, C. (2002). The effectiveness of query-specific hierarchic clustering in information retrieval. Information Processing & Management, 38(4), 559–582.CrossRefMATHGoogle Scholar
  38. van Rijsbergen, C. J. (1979). Information retrieval (2nd ed.). Newton, MA, USA: Butterworth-Heinemann.Google Scholar
  39. Webber, W., Moffat, A., Zobel, J., & Sakai T. (2008). Precision-at-ten considered redundant. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 695–696). ACM.Google Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2014

Authors and Affiliations

  1. 1.Institute for Research in Applicable ComputingUniversity of BedfordshireLutonUK

Personalised recommendations