Advertisement

Topic Difficulty Prediction in Entity Ranking

  • Anne-Marie Vercoustre
  • Jovan Pehcevski
  • Vladimir Naumovski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5631)

Abstract

Entity ranking has recently emerged as a research field that aims at retrieving entities as answers to a query. Unlike entity extraction where the goal is to tag the names of the entities in documents, entity ranking is primarily focused on returning a ranked list of relevant entity names for the query. Many approaches to entity ranking have been proposed, and most of them were evaluated on the INEX Wikipedia test collection. In this paper, we show that the knowledge of predicted classes of topic difficulty can be used to further improve the entity ranking performance. To predict the topic difficulty, we generate a classifier that uses features extracted from an INEX topic definition to classify the topic into an experimentally pre-determined class. This knowledge is then utilised to dynamically set the optimal values for the retrieval parameters of our entity ranking system. Our experiments suggest that topic difficulty prediction is a promising approach that could be exploited to improve the effectiveness of entity ranking.

Keywords

Average Average Precision Target Category Test Collection Information Retrieval Evaluation Entity Ranking 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  2. 2.
    Carmel, D., Yom-Tov, E., Soboroff, I.: Predicting query difficulty - methods and applications. SIGIR Forum 39(2), 25–28 (2005)CrossRefGoogle Scholar
  3. 3.
    Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of the 25th ACM SIGIR conference on Research and development in information retrieval (SIGIR 2002), Tampere, Finland, pp. 299–306 (2002)Google Scholar
  4. 4.
    de Vries, A.P., Vercoustre, A.-M., Thom, J.A., Craswell, N., Lalmas, M.: Overview of the INEX 2007 entity ranking track. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 245–251. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Demartini, G., de Vries, A.P., Iofciu, T., Zhu, J.: Overview of the INEX 2008 entity ranking track. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631. Springer, Heidelberg (2009)Google Scholar
  6. 6.
    Denoyer, L., Gallinari, P.: The Wikipedia XML corpus. SIGIR Forum 40(1), 64–69 (2006)CrossRefGoogle Scholar
  7. 7.
    Grivolla, J., Jourlin, P., de Mori, R.: Automatic classification of queries by expected retrieval performance. In: Proceedings of the SIGIR workshop on predicting query difficulty, Salvador, Brazil (2005)Google Scholar
  8. 8.
    He, B., Ounis, I.: Query performance prediction. Information Systems 31(7), 585–594 (2006)CrossRefGoogle Scholar
  9. 9.
    Kleinberg, J.M.: Authoritative sources in hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Kwok, K.: An attempt to identify weakest and strongest queries. In: Proceedings of the SIGIR workshop on predicting query difficulty, Salvador, Brazil (2005)Google Scholar
  11. 11.
    Lang, H., Wang, B., Jones, G., Li, J.-T., Ding, F., Liu, Y.-X.: Query performance prediction for information retrieval based on covering topic score. Journal of Computer Science and technology 23(4), 590–601 (2008)CrossRefGoogle Scholar
  12. 12.
    Loper, E., Bird, S.: NLTK: The natural language toolkit. In: Proceedings of the ACL 2002 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics, Philadelphia, Pennsylvania, pp. 63–70 (2002)Google Scholar
  13. 13.
    Mizzaro, S.: The good, the bad, the difficult, and the easy: Something wrong with information retrieval evaluation? In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 642–646. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Mizzaro, S., Robertson, S.: HITS hits TREC: Exploring IR evaluation results with network analysis. In: Proceedings of the 30th ACM SIGIR conference on Research and development in information retrieval (SIGIR 2007), Amsterdam, The Netherlands, pp. 479–486 (2007)Google Scholar
  15. 15.
    Mothe, J., Tanguy, L.: Linguistic features to predict query difficulty. In: Proceedings of the SIGIR workshop on predicting query difficulty, Salvador, Brazil (2005)Google Scholar
  16. 16.
    Pehcevski, J., Vercoustre, A.-M., Thom, J.A.: Exploiting locality of Wikipedia links in entity ranking. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 258–269. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  18. 18.
    Thom, J.A., Pehcevski, J., Vercoustre, A.-M.: Use of Wikipedia categories in entity ranking. In: Proceedings of 12th Australasian Document Computing Symposium (ADCS 2007), Melbourne, Australia, pp. 56–63 (2007)Google Scholar
  19. 19.
    Voorhees, E.M.: The TREC robust retrieval track. In: Proceedings of the Thirteenth Text Retrieval Conference (TREC 2004) (2004)Google Scholar
  20. 20.
    Webber, W., Moffat, A., Zobel, J.: Score standardization for inter-collection comparison of retrieval systems. In: Proceedings of the 31st ACM SIGIR conference on Research and development in information retrieval (SIGIR 2008), Singapore, pp. 51–58 (2008)Google Scholar
  21. 21.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques (2/E). Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  22. 22.
    Yom-Tov, E., Fine, S., Carmel, D., Darlow, A., Amitay, E.: Juru at TREC 2004: Experiments with prediction of query difficulty. In: Proceedings of the Thirteenth Text Retrieval Conference (TREC 2004) (2004)Google Scholar
  23. 23.
    Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: Proceedings of the 30th ACM SIGIR conference on Research and development in information retrieval (SIGIR 2007), Amsterdam, The Netherlands, pp. 543–550 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Anne-Marie Vercoustre
    • 1
  • Jovan Pehcevski
    • 2
  • Vladimir Naumovski
    • 2
  1. 1.INRIA, RocquencourtFrance
  2. 2.Faculty of Management and Information TechnologiesSkopjeMacedonia

Personalised recommendations