Skip to main content

Inverted Lists Compression Using Contextual Information

  • Conference paper
Advances in Information Processing and Protection

Abstract

In this paper we present new approach to compression of inverted lists in indexes of information retrieval systems. The technique exploits contextual information obtained from a non-supervised clustering process run on the document collection. A substantial improvement of compression factor is achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anh, V.N., Moffat, A., Inverted index compression using word-aligned binary codes, Information Retrieval, 8(2004)151-166

    Article  Google Scholar 

  2. Becks, A., Visual Knowledge Management with Adaptable Document Maps, Sankt Augustin, GMD 2001

    Google Scholar 

  3. Berry, M.W., Drmac, Z., Jessup, E.R. Matrices, vector spaces and information retrieval, SIAM Review, 41(1999)335-362

    Article  MATH  MathSciNet  Google Scholar 

  4. Bezdek, J.C., Pal, S.K., Fuzzy Models for Pattern Recognition: Methods that Search for Structures in Data, IEEE, New York, 1992

    Google Scholar 

  5. Blanco, R., Barreiro, A., Characterization of a simple case of the reassignment of document identifiers as a pattern sequencing problem, Proc. of the 28 th Annual Internat. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2005

    Google Scholar 

  6. Blandford, D., Blelloch, G., Index compression through document reordering, in: Proceesings of Data Compression Conference (DCC), 2002, pp. 342-351

    Google Scholar 

  7. Cher-Sheng Cheng, Jean Jyh-Jiun Shann, Chung-Ping Chung, Unique-order interpolative coding for fast querying and space-efficient indexing in information retrieval systems, Information Processing and Management, 42(2006)407-428

    Article  MATH  Google Scholar 

  8. Ciesielski, K., Klopotek, M.A., Contextual maps for browsing huge document collections, in: Proceedings of the 16 th International Symposium Methodologies for Intelligent Systems (ISMIS-2006), LNAI 4203, Springer, 2006

    Google Scholar 

  9. Ciesielski, K. et al., Adaptive document maps, in: Proceedings. of the Intelligent Information Processing and Web Mining, Springer, 2006, pp.109-120

    Google Scholar 

  10. Fritzke, B., A growing neural gas network learns topologies, In: G. Tesauro, D.S. Touretzky, and T.K. Leen (eds.) Advances in Neural Information Processing Systems 7, MIT Press Cambridge, MA, 1995, pp. 625-632.

    Google Scholar 

  11. Persin, M., Zobel, J., Sacks-Davis, R., Filtered document retrieval with frequency-sorted indexes, Journal of the American Society for Information Science 47(1996)749-764

    Article  Google Scholar 

  12. Robertson, S., Walker, S., Okapi/Keenbow at TREC- 8, In: E. Voorhees and D. Harman, (eds.), The 8 th Text Retrieval Conference (TREC-8), NIST Special Publication 500-246, Gaithersburg, MD, 2000, pp. 151-161

    Google Scholar 

  13. Robertson, S., Walker, S., Hancock-Beaulieu, M., Gull, A., Lau, M., Okapi at TREC, in D. Harman, ed., The 1 st Text Retrieval Conference (TREC-1), NIST Special Publication 500-207, Gaithersburg, MD, 1992, pp. 21-30

    Google Scholar 

  14. Silvestri, F., Orlando, S., Perego, R., Assigning identifiers to documents to enhance the clustering property of full text indexes, Proceedings of the 27 th ACM SIGIR Conference, 2004

    Google Scholar 

  15. Williams H., Zobel J. Compressing integers for fast file access. Computer Journal, 2(1999)193-201

    Article  Google Scholar 

  16. Witten I., Moffat A. and Bell T. Managing Gigabytes. Morgan Kaufman Publishers, New York, second edition, 1999

    Google Scholar 

  17. Zobel, J. and Moffat, A., Exploring the similarity space, ACM SIGIR Forum 32(1), 1998, 18-34

    Article  Google Scholar 

  18. Moffat, A. und Zobel, J. Self-indexing inverted files for fast text retrieval, ACM Transactions on Information Systems, 14(1996)349-379.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this paper

Cite this paper

Czerski, D., Ciesielski, K., Dramiński, M., Kłopotek, M.A., Wierzchoń, S.T. (2007). Inverted Lists Compression Using Contextual Information. In: Pejaś, J., Saeed, K. (eds) Advances in Information Processing and Protection. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-73137-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-73137-7_6

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-73136-0

  • Online ISBN: 978-0-387-73137-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics