TopX 2.0 at the INEX 2008 Efficiency Track

A (Very) Fast Object-Store for Top-k-Style XML Full-Text Search
  • Martin Theobald
  • Mohammed AbuJarour
  • Ralf Schenkel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5631)

Abstract

For the INEX Efficiency Track 2008, we were just on time to finish and evaluate our brand-new TopX 2.0 prototype. Complementing our long-running effort on efficient top-k query processing on top of a relational back-end, we now switched to a compressed object-oriented storage for text-centric XML data with direct access to customized inverted files, along with a complete reimplementation of the engine in C++. Our INEX 2008 experiments demonstrate efficiency gains of up to a factor of 30 compared to the previous Java/JDBC-based TopX 1.0 implementation over a relational back-end. TopX 2.0 achieves overall runtimes of less than 51 seconds for the entire batch of 568 Efficiency Track topics in their content-and-structure (CAS) version and less than 29 seconds for the content-only (CO) version, respectively, using a top-15, focused (i.e., non-overlapping) retrieval mode—an average of merely 89 ms per CAS query and 49 ms per CO query.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bast, H., Majumdar, D., Theobald, M., Schenkel, R., Weikum, G.: IO-Top-k: Index-optimized top-k query processing. In: VLDB, pp. 475–486 (2006)Google Scholar
  2. 2.
    Broschart, A., Schenkel, R., Theobald, M., Weikum, G.: TopX @ INEX 2007. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 49–56. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Clarke, C.L.A.: Controlling overlap in content-oriented XML retrieval. In: Baeza-Yates, R.A., Ziviani, N., Marchionini, G., Moffat, A., Tait, J. (eds.) SIGIR, pp. 314–321. ACM Press, New York (2005)Google Scholar
  4. 4.
    Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. In: SIGIR Forum (2006)Google Scholar
  5. 5.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS. ACM Press, New York (2001)Google Scholar
  6. 6.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Grust, T.: Accelerating XPath location steps. In: Franklin, M.J., Moon, B., Ailamaki, A. (eds.) SIGMOD Conference, pp. 109–120. ACM Press, New York (2002)Google Scholar
  8. 8.
    Helmer, S., Neumann, T., Moerkotte, G.: A robust scheme for multilevel extendible hashing. In: Computer and Information Sciences - 18th International Symposium (ISCIS), pp. 220–227 (2003)Google Scholar
  9. 9.
    Theobald, M., Bast, H., Majumdar, D., Schenkel, R., Weikum, G.: TopX: efficient and versatile top-k query processing for semistructured data. VLDB J. 17(1), 81–115 (2008)CrossRefGoogle Scholar
  10. 10.
    Theobald, M., Schenkel, R., Weikum, G.: An efficient and versatile query engine for TopX search. In: Böhm, K., Jensen, C.S., Haas, L.M., Kersten, M.L., Larson, P.-Å., Ooi, B.C. (eds.) VLDB, pp. 625–636. ACM Press, New York (2005)Google Scholar
  11. 11.
    Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: WWW ’08: Proceeding of the 17th international conference on World Wide Web, pp. 387–396. ACM Press, New York (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Martin Theobald
    • 1
  • Mohammed AbuJarour
    • 3
  • Ralf Schenkel
    • 1
    • 2
  1. 1.Max Planck Institute for InformaticsSaarbrückenGermany
  2. 2.Saarland UniversitySaarbrückenGermany
  3. 3.Hasso Plattner InstitutePotsdamGermany

Personalised recommendations