Inverted Lists Compression Using Contextual Information

Czerski, Dariusz; Ciesielski, Krzysztof; Dramiński, Michał; Kłopotek, Mieczysław A.; Wierzchoń, Sławomir T.

doi:10.1007/978-0-387-73137-7_6

Dariusz Czerski³,
Krzysztof Ciesielski³,
Michał Dramiński³,
Mieczysław A. Kłopotek³ &
…
Sławomir T. Wierzchoń³

383 Accesses
1 Citations

Abstract

In this paper we present new approach to compression of inverted lists in indexes of information retrieval systems. The technique exploits contextual information obtained from a non-supervised clustering process run on the document collection. A substantial improvement of compression factor is achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anh, V.N., Moffat, A., Inverted index compression using word-aligned binary codes, Information Retrieval, 8(2004)151-166
Article Google Scholar
Becks, A., Visual Knowledge Management with Adaptable Document Maps, Sankt Augustin, GMD 2001
Google Scholar
Berry, M.W., Drmac, Z., Jessup, E.R. Matrices, vector spaces and information retrieval, SIAM Review, 41(1999)335-362
Article MATH MathSciNet Google Scholar
Bezdek, J.C., Pal, S.K., Fuzzy Models for Pattern Recognition: Methods that Search for Structures in Data, IEEE, New York, 1992
Google Scholar
Blanco, R., Barreiro, A., Characterization of a simple case of the reassignment of document identifiers as a pattern sequencing problem, Proc. of the 28 ^th Annual Internat. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2005
Google Scholar
Blandford, D., Blelloch, G., Index compression through document reordering, in: Proceesings of Data Compression Conference (DCC), 2002, pp. 342-351
Google Scholar
Cher-Sheng Cheng, Jean Jyh-Jiun Shann, Chung-Ping Chung, Unique-order interpolative coding for fast querying and space-efficient indexing in information retrieval systems, Information Processing and Management, 42(2006)407-428
Article MATH Google Scholar
Ciesielski, K., Klopotek, M.A., Contextual maps for browsing huge document collections, in: Proceedings of the 16 ^th International Symposium Methodologies for Intelligent Systems (ISMIS-2006), LNAI 4203, Springer, 2006
Google Scholar
Ciesielski, K. et al., Adaptive document maps, in: Proceedings. of the Intelligent Information Processing and Web Mining, Springer, 2006, pp.109-120
Google Scholar
Fritzke, B., A growing neural gas network learns topologies, In: G. Tesauro, D.S. Touretzky, and T.K. Leen (eds.) Advances in Neural Information Processing Systems 7, MIT Press Cambridge, MA, 1995, pp. 625-632.
Google Scholar
Persin, M., Zobel, J., Sacks-Davis, R., Filtered document retrieval with frequency-sorted indexes, Journal of the American Society for Information Science 47(1996)749-764
Article Google Scholar
Robertson, S., Walker, S., Okapi/Keenbow at TREC- 8, In: E. Voorhees and D. Harman, (eds.), The 8 ^th Text Retrieval Conference (TREC-8), NIST Special Publication 500-246, Gaithersburg, MD, 2000, pp. 151-161
Google Scholar
Robertson, S., Walker, S., Hancock-Beaulieu, M., Gull, A., Lau, M., Okapi at TREC, in D. Harman, ed., The 1 ^st Text Retrieval Conference (TREC-1), NIST Special Publication 500-207, Gaithersburg, MD, 1992, pp. 21-30
Google Scholar
Silvestri, F., Orlando, S., Perego, R., Assigning identifiers to documents to enhance the clustering property of full text indexes, Proceedings of the 27 ^th ACM SIGIR Conference, 2004
Google Scholar
Williams H., Zobel J. Compressing integers for fast file access. Computer Journal, 2(1999)193-201
Article Google Scholar
Witten I., Moffat A. and Bell T. Managing Gigabytes. Morgan Kaufman Publishers, New York, second edition, 1999
Google Scholar
Zobel, J. and Moffat, A., Exploring the similarity space, ACM SIGIR Forum 32(1), 1998, 18-34
Article Google Scholar
Moffat, A. und Zobel, J. Self-indexing inverted files for fast text retrieval, ACM Transactions on Information Systems, 14(1996)349-379.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Sci., Polish Acad. of Sciences, Ordona 21, 01-237 Warsaw, Poland
Dariusz Czerski, Krzysztof Ciesielski, Michał Dramiński, Mieczysław A. Kłopotek & Sławomir T. Wierzchoń

Authors

Dariusz Czerski
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Ciesielski
View author publications
You can also search for this author in PubMed Google Scholar
Michał Dramiński
View author publications
You can also search for this author in PubMed Google Scholar
Mieczysław A. Kłopotek
View author publications
You can also search for this author in PubMed Google Scholar
Sławomir T. Wierzchoń
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Szczecin University of Technology, Faculty of Computer Science, Zolnierska 49, 71 210 Szczecin, Poland
Jerzy Pejaś
Bialystock Technical University, Faculty of Computer Science, Wiejska 45A, 15-351 Bialystok, Poland
Khalid Saeed

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Czerski, D., Ciesielski, K., Dramiński, M., Kłopotek, M.A., Wierzchoń, S.T. (2007). Inverted Lists Compression Using Contextual Information. In: Pejaś, J., Saeed, K. (eds) Advances in Information Processing and Protection. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-73137-7_6

Download citation

DOI: https://doi.org/10.1007/978-0-387-73137-7_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-73136-0
Online ISBN: 978-0-387-73137-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics