Advertisement

Marrying Relevance and Genre Rankings: An Exploratory Study

  • Pavel BraslavskiEmail author
Chapter
Part of the Text, Speech and Language Technology book series (TLTB, volume 42)

Abstract

In this chapter, we discuss different options for using genre-related information in Web search. We conduct an experiment on merging genre-related and text-relevance rankings using a reference Web collection. A method for automatic extraction of formality score akin to readability score using canonical discriminant analysis applied to a sample of genres with decreasing formality is proposed. Effects of aggregating genre-related and text relevance rankings are considered. Evaluation of the results shows moderate positive effects. Findings suggest that further research is needed on implicit use of genre-related information in Web search.

Keywords

Genre Readability Relevance ranking Information retrieval Web search ROMIP 

Notes

Acknowledgments

We would like to thank Mikhail Ageev and Andrei Tselishchev for their help with data processing. We also thank Yandex (www.yandex.ru) for providing us with the experimental data. Many thanks to Matthew McCool and volume editors for their valuable comments on the draft.

References

  1. 1.
    Abdul-Jaleel, N., J. Allan, W.B. Croft, F. Diaz, L. Larkey, X. Li, M.D. Smucker, and C. Wade. 2005. UMass at TREC 2004: Novelty and HARD. In Proceedings of TREC 2004.Google Scholar
  2. 2.
    Ageev, M., I. Vershinnikov, and B. Dobrov. 2005. Extraction of the significant part of web pages for information retrieval (in Russian) [Izvlečenie značimoi informacii iz web-stranic dlja zada informacionnogo poiska]. In Internet-Matematika, 283–301. Available online: http://company.yandex.ru/grant/2005/07_Ageev_102942.pdf
  3. 3.
    Allan, J. 2004. HARD track overview in TREC 2003: High accuracy retrieval from documents. In Proceedings of TREC-2003, 24–37.Google Scholar
  4. 4.
    Allan, J. 2005. HARD track overview in TREC 2004: High accuracy retrieval from documents. In Proceedings of TREC-2004, 25–35.Google Scholar
  5. 5.
    Beitzel, S.M., E.C. Jensen, A. Chowdhury, D. Grossman, O. Frieder, and N. Goharian. 2004. Fusion of effective retrieval strategies in the same information retrieval system. Journal of the American Society for Information Science and Technology (JASIST) 55(10):859–868.CrossRefGoogle Scholar
  6. 6.
    Belkin, N., I. Chaleva, M. Cole, Y.-L. Li, L. Liu, Y.-H. Liu, G. Muresan, C. Smith, Y. Sun, X.-J. Yuan, and X.-M. Zhang. 2005. Rutgers’ HARD track experiences at TREC 2004. In: Proceedings of TREC-2004.Google Scholar
  7. 7.
    Braslavski, P. 2004. Document style recognition using shallow statistical analysis. In Proceedings of the ESSLLI 2004 Workshop on Combining Shallow and Deep Processing for NLP, 1–9. Nancy. Available online: http://esslli2004.loria.fr/content/readers/36.pdf
  8. 8.
    Braslavski, P., and A. Tselishchev. 2005. Style-dependent document ranking. In: Proceedings of the 7th Russian Conference on Digital Libraries (RCDL’2005), 159–164. Available online: http://www.rcdl2005.uniyar.ac.ru/ru/RCDL2005/papers/sek7_1_paper.pdf
  9. 9.
    Braslavski, P. 2007. Combining relevance and genre-related rankings: An Exploratory Study. In Proceedings of the International Workshop “Towards Genre-Enabled Search Engines: The Impact of NLP”, 1–4, Borovets, Bulgaria. Available online: http://kansas.ru/pb/ paper/ranlp2007.pdf
  10. 10.
    Collins-Thompson, K., and J.P. Callan. 2004. A language modeling approach to predicting reading difficulty. In Proceedings of HLT/NAACL, 193–200.Google Scholar
  11. 11.
    DuBay, W.H. 2004. The principles of readability. Available nline: http://www.nald.ca/fulltext/ readab/readab.pdf
  12. 12.
    Gulin, A., M. Maslov, and I. Segalovich. 2006. Yandex’ algorithm for text relevance ranking at ROMIP’2006 (in Russian) [Algoritm tekstovogo ranˇzirovanija Jandeksa na ROMIP’2006]. In Proceedings of ROMIP’2006, 40–51. Suzdal. Available online: http://www.romip.ru/romip2006/03_yandex.pdfGoogle Scholar
  13. 13.
    Gupta, S., G. Kaiser, S. Stolfo, and H. Becker. 2005. Genre classification of websites using search engine snippets. In Proceedings of SIGIR’2005 Workshop “Stylistic Analysis of Text For Information Access”. Salvador, Bahia.Google Scholar
  14. 14.
    Karlgren, J., and D. Cutting. 1994. Recognizing text genres with simple metrics using discriminant analysis. In Proceedings of the 15th Conference on Computational Linguistics, 1071–1075.Google Scholar
  15. 15.
    Kožina, M.N. 1968. Foundations of the functional stylistics (in Russian) [K osnovaniyam funkcional’noi stilistiki], Perm.Google Scholar
  16. 16.
    Kumaran, G., R. Jones, and Madani, O. 2005. Biasing web search results for topic familiarity. In Proceedings of CIKM’05, 271–272.Google Scholar
  17. 17.
    Lim, C.S., K.J. Lee, and G.C. Kim. 2005. Multiple sets of features for automatic genre classification of web documents. Information Processing and Management 41:1263–1276.CrossRefGoogle Scholar
  18. 18.
    Liu, X., W.B. Croft, P. Oh, and D. Hart. 2004. Automatic recognition of reading levels from user queries. In Proceedings of SIGIR’2004, 548–549.Google Scholar
  19. 19.
    Meyer zu Eissen, S., and B. Stein. 2004. Genre classification of web pages. In Proceedings of the 27th German Conference on Artificial Intelligence (KI-2004), 256–269. Ulm.Google Scholar
  20. 20.
    Michos, S., E. Stamatatos, N. Fakotakis, G. Kokkinakis. 1996. Categorizing texts by using a three level functional style description. In Artificial intelligence: Methodology, systems, applications, frontiers in artificial intelligence and applications, ed. A.M. Rasmsay, vol. 35. Available online: http://slt.wcl.ee.upatras.gr/papers/michos2.pdf
  21. 21.
  22. 22.
    Rauber, A., and A. Müller-Kögler. 2001. Integrating automatic genre analysis into digital libraries. In Proceedings of the JCDL’2001, 1–10.Google Scholar
  23. 23.
    Richardson, M., A. Prakash, and E. Brill. 2006. Beyond PageRank: Machine learning for static ranking. In Proceedings of WWW’2006, 707–715.Google Scholar
  24. 24.
    Rosso, M.A. 2005. Using genre to improve web search. PhD thesis, University of North Carolina, Chapel Hill, NC.Google Scholar
  25. 25.
    Russian Information Retrieval Evaluation Seminar (ROMIP). http://romip.ru
  26. 26.
    Santini, M. 2004. State-of-the-art on automatic genre identification. Technical Report ITRI-04-03, Information Technology Research Institute, University of Brighton, Brighton. Available online: ftp://ftp.itri.bton.ac.uk/reports/ITRI-04-03.pdf
  27. 27.
    Santini, M. 2007. Automatic identification of genre in web pages. PhD thesis, University of Brighton, Brighton.Google Scholar
  28. 28.
    Si, L., and J. Callan. 2001. A statistical model for scientific readability. In Proceedings of CIKM’2001, 574–576.Google Scholar
  29. 29.
    Strzalkowski, T., L. Guthrie, J. Karlgren, J. Leistensnider, F. Lin, J. Perez-Carballo, T. Straszheim, J. Wang, and J. Wilding. 1996. Natural language information retrieval: TREC-5 Report. In Proceedings of TREC’1995.Google Scholar
  30. 30.
    Stubbe, A., C. Ringlstetter, and R. Goebel, R. 2007. Elements of a learning interface for genre qualified search. In Proceedings of the International Workshop “Towards Genre-Enabled Search Engines: The Impact of NLP”, 21–28. Borovets, Bulgaria.Google Scholar
  31. 31.
    WEGA: Web Genre Analysis Project. http://www.uni-weimar.de/cms/medien/webis/research/ projects/wega.html

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  1. 1.Institute of Engineering Science RASEkaterinburgRussia

Personalised recommendations