Semistructured Data Search Evaluation

  • Ralf Schenkel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8173)


Semistructured data is of increasing importance in many application domains, but one of its core use cases is representing documents. Consequently, effectively retrieving information from semistructured documents is an important problem that has seen work from both the information retrieval (IR) and databases (DB) communities. Comparing the large number of retrieval models and systems is a non-trivial task for which established benchmark initiatives such as TREC with their focus on unstructured documents are not appropriate. This chapter gives an overview of semistructured data in general and the INEX initiative for the evaluation of XML retrieval, focusing on the most prominent Adhoc Search Track.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amer-Yahia, S., Lalmas, M.: XML search: languages, INEX and scoring. SIGMOD Record 35(4), 16–23 (2006)CrossRefGoogle Scholar
  2. 2.
    Arvola, P., Geva, S., Kamps, J., Schenkel, R., Trotman, A., Vainio, J.: Overview of the INEX 2010 ad hoc track. In: Geva, et al. (eds.) [14], pp. 1–32 (2010)Google Scholar
  3. 3.
    Arvola, P., Kekäläinen, J., Junkkari, M.: Expected reading effort in focused retrieval evaluation. Inf. Retr. 13(5), 460–484 (2010)CrossRefGoogle Scholar
  4. 4.
    Case, P., Dyck, M., Holstege, M., Amer-Yahia, S., Botev, C., Buxton, S., Doerre, J., Melton, J., Rys, M., Shanmugasundaram, J.: XQuery and XPath full text 1.0 (2011),
  5. 5.
    Chappell, T., Geva, S.: Overview of the INEX 2012 relevance feedback track. In: Forner, et al. (eds.) [9] (2012)Google Scholar
  6. 6.
    Demartini, G., Iofciu, T., de Vries, A.P.: Overview of the INEX 2009 entity ranking track. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2009. LNCS, vol. 6203, pp. 254–264. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Denoyer, L., Gallinari, P.: The Wikipedia XML corpus. In: Fuhr, et al. (eds.) [12], pp. 12–19Google Scholar
  8. 8.
    Fetahu, B., Schenkel, R.: Retrieval evaluation on focused tasks. In: Hersh, W.R., Callan, J., Maarek, Y., Sanderson, M. (eds.) SIGIR, pp. 1135–1136. ACM (2012)Google Scholar
  9. 9.
    Forner, P., Karlgren, J., Womser-Hacker, C. (eds.): CLEF 2012 Evaluation Labs and Workshop, Online Working Notes, Rome, Italy September 17-20 (2012)Google Scholar
  10. 10.
    Frommholz, I., Larson, R.R.: The heterogeneous collection track at INEX 2006. In: Fuhr, et al. (eds.) [12], pp. 312–317Google Scholar
  11. 11.
    Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.): INEX 2007. LNCS, vol. 4862. Springer, Heidelberg (2008)Google Scholar
  12. 12.
    Fuhr, N., Lalmas, M., Trotman, A. (eds.): INEX 2006. LNCS, vol. 4518. Springer, Heidelberg (2007)Google Scholar
  13. 13.
    (Sandy) Gao, S., Sperberg-McQueen, C.M., Thompson, H.S.: W3C XML schema definition language (XSD) 1.1 part 1: Structures (2012),
  14. 14.
    Geva, S., Kamps, J., Schenkel, R., Trotman, A. (eds.): INEX 2010. LNCS, vol. 6932. Springer, Heidelberg (2011)Google Scholar
  15. 15.
    Gövert, N., Fuhr, N., Lalmas, M., Kazai, G.: Evaluating the effectiveness of content-oriented XML retrieval methods. Inf. Retr. 9(6), 699–722 (2006)CrossRefGoogle Scholar
  16. 16.
    Kamps, J., Lalmas, M., Larsen, B.: Evaluation in context. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 339–351. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  17. 17.
    Kamps, J., Pehcevski, J., Kazai, G., Lalmas, M., Robertson, S.: INEX 2007 evaluation measures. In: Fuhr, et al. (eds.) [11], pp. 24–33 (2007)Google Scholar
  18. 18.
    Kazai, G., Doucet, A.: Overview of the INEX 2007 book search track (BookSearch’07). In: Fuhr, et al., [11], pp. 148–161Google Scholar
  19. 19.
    Kazai, G., Gövert, N., Lalmas, M., Fuhr, N.: The INEX evaluation initiative. In: Blanken, H.M., Grabs, T., Schek, H.-J., Schenkel, R., Weikum, G. (eds.) Intelligent Search on XML Data. LNCS, vol. 2818, pp. 279–293. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  20. 20.
    Kazai, G., Kamps, J., Koolen, M., Milic-Frayling, N.: Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking. In: Ma, W.-Y., Nie, J.-Y., Baeza-Yates, R.A., Chua, T.-S., Bruce Croft, W. (eds.) SIGIR, pp. 205–214. ACM (2011)Google Scholar
  21. 21.
    Kazai, G., Lalmas, M.: extended cumulated gain measures for the evaluation of content-oriented XML retrieval. ACM Trans. Inf. Syst. 24(4), 503–542 (2006)CrossRefGoogle Scholar
  22. 22.
    Kazai, G., Lalmas, M., de Vries, A.P.: The overlap problem in content-oriented XML retrieval evaluation. In: Sanderson, M., Järvelin, K., Allan, J., Bruza, P. (eds.) SIGIR, pp. 72–79. ACM (2004)Google Scholar
  23. 23.
    Kazai, G., Lalmas, M., Fuhr, N., Gövert, N.: A report on the first year of the INitiative for the Evaluation of XML retrieval. JASIST 55(6), 551–556 (2004)CrossRefGoogle Scholar
  24. 24.
    Koolen, M., Kazai, G., Kamps, J., Preminger, M., Doucet, A., Landoni, M.: Overview of the INEX 2012 social book search track. In: Forner, et al. (eds.) [9]Google Scholar
  25. 25.
    Lalmas, M., Tombros, A.: Evaluating XML retrieval effectiveness at INEX. SIGIR Forum 41(1), 40–57 (2007)CrossRefGoogle Scholar
  26. 26.
    Nordlie, R., Pharo, N.: Seven years of INEX interactive retrieval experiments – lessons and challenges. In: Catarci, T., Forner, P., Hiemstra, D., Peñas, A., Santucci, G. (eds.) CLEF 2012. LNCS, vol. 7488, pp. 13–23. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  27. 27.
    O’Keefe, R.A., Trotman, A.: The simplest query language that could possibly work. In: Proceedings of the 2nd INEX Workshop, pp. 167–174 (2003)Google Scholar
  28. 28.
    Pal, S., Mitra, M., Kamps, J.: Evaluation effort, reliability and reusability in XML retrieval. JASIST 62(2), 375–394 (2011)CrossRefGoogle Scholar
  29. 29.
    Pehcevski, J., Piwowarski, B.: Evaluation metrics for structured text retrieval. In: Liu, L., Tamer Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 1015–1024. Springer US (2009)Google Scholar
  30. 30.
    Peterson, D., (Sandy) Gao, S., Malhotra, A., Sperberg-McQueen, C.M., Henry, S. Thompson. W3C XML schema definition language (XSD) 1.1 part 2: Datatypes (2012),
  31. 31.
    Piwowarski, B.: EPRUM metrics and INEX 2005. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 30–42. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  32. 32.
    Piwowarski, B., Trotman, A., Lalmas, M.: Sound and complete relevance assessment for XML retrieval. ACM Trans. Inf. Syst. 27(1) (2008)Google Scholar
  33. 33.
    SanJuan, E., Bellot, P., Moriceau, V., Tannier, X.: Overview of the INEX 2010 question answering track (QA@INEX). In: Geva et al. (eds.)[14,] pp. 269–281 (2010)Google Scholar
  34. 34.
    SanJuan, E., Moriceau, V., Tannier, X., Bellot, P., Mothe, J.: Overview of the INEX 2012 tweet contextualization track. In: Forner, et al. (eds.) [9]Google Scholar
  35. 35.
    Schenkel, R., Suchanek, F.M., Kasneci, G.: YAWN: A semantically annotated Wikipedia XML corpus. In: Kemper, A., Schöning, H., Rose, T., Jarke, M., Seidl, T., Quix, C., Brochhaus, C. (eds.) BTW. LNI, vol. 103, pp. 277–291. GI (2007)Google Scholar
  36. 36.
    Thom, J.A., Wu, C.: Overview of the INEX 2010 web service discovery track. In: Geva, et al. (eds.) [14], pp. 332–335Google Scholar
  37. 37.
    Trappett, M., Geva, S., Trotman, A., Scholer, F., Sanderson, M.: Overview of the INEX 2012 snippet retrieval track. In: Forner, et al. (eds.) [9]Google Scholar
  38. 38.
    Trotman, A., Alexander, D., Geva, S.: Overview of the INEX 2010 link the wiki track. In: Geva, et al. (eds.) [14], pp. 241–249Google Scholar
  39. 39.
    Trotman, A., Lalmas, M.: Why structural hints in queries do not help XML-retrieval. In: Efthimiadis, E.N., Dumais, S.T., Hawking, D., Järvelin, K. (eds.) SIGIR, pp. 711–712. ACM (2006)Google Scholar
  40. 40.
    Trotman, A., Sigurbjörnsson, B.: Narrowed Extended XPath I (NEXI). In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 16–40. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  41. 41.
    Tsikrika, T., Kludas, J.: Overview of the WikipediaMM Task at ImageCLEF 2008. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 539–550. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  42. 42.
    Tsikrika, T., Westerveld, T.: The INEX 2007 multimedia track. In: Fuhr, et al. (eds.) [11], pp. 440–453Google Scholar
  43. 43.
    De Vries, C.M., Nayak, R., Kutty, S., Geva, S., Tagarelli, A.: Overview of the INEX 2010 XML mining track: Clustering and classification of XML documents. In: Geva, et al. (eds.) [14], pp. 363–376Google Scholar
  44. 44.
    Wang, Q., Kamps, J., Camps, G.R., Marx, M., Schuth, A., Theobald, M., Gurajada, S., Mishra, A.: Overview of the INEX 2012 linked data track. In: Forner, et al. (eds.) [9]Google Scholar
  45. 45.
    Wang, Q., Ramírez, G., Marx, M., Theobald, M., Kamps, J.: Overview of the INEX 2011 data-centric track. In: Geva, S., Kamps, J., Schenkel, R. (eds.) INEX 2011. LNCS, vol. 7424, pp. 118–137. Springer, Heidelberg (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Ralf Schenkel
    • 1
  1. 1.Universität PassauGermany

Personalised recommendations