Building Space-Efficient Inverted Indexes on Low-Cardinality Dimensions

Spyropoulos, Vasilis; Kotidis, Yannis

doi:10.1007/978-3-319-22849-5_30

Building Space-Efficient Inverted Indexes on Low-Cardinality Dimensions

Vasilis Spyropoulos¹⁸ &
Yannis Kotidis¹⁸

Conference paper
First Online: 01 January 2015

1182 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9261))

Abstract

Many modern applications naturally lead to the implementation of inverted indexes for effectively managing large collections of data items. Creating an inverted index on a low cardinality data domain results in replication of data descriptors, leading to increased storage overhead. For example, the use of RFID or similar sensing devices in supply-chains results in massive tracking datasets that need effective spatial or spatio-temporal indexes on them. As the volume of data grows proportionally larger than the number of spatial locations or time epochs, it is unavoidable that many of the resulting lists share large subsets of common items. In this paper we present techniques that exploit this characteristic of modern big-data applications in order to losslessly compress the resulting inverted indexes by discovering large common item sets and adapting the index so as to store just one copy of them. We apply our method in the supply chain domain using modern big-data tools and show that our techniques in many cases achieve compression ratios that exceed 50 %.

This research has been co-financed by the European Union (European Social Fund ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF) - Research Funding Program: RECOST.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bleco, D., Kotidis, Y.: RFID Data Aggregation. In: Trigoni, N., Markham, A., Nawaz, S. (eds.) GSN 2009. LNCS, vol. 5659, pp. 87–101. Springer, Heidelberg (2009)
Chapter Google Scholar
Bleco, D., Kotidis, Y.: Business intelligence on complex graph data. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, EDBT-ICDT 2012, pp. 13–20. ACM, New York (2012)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Google Scholar
Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000)
Article MathSciNet Google Scholar
Knuth, D.E.: The Art of Computer Programming, vol. 1 (3rd Ed.): Fundamental Algorithms. Addison Wesley Longman Publishing Company Inc, Redwood City (1997)
MATH Google Scholar
Rajaraman, A., Ullman, J.D.: Mining of massive datasets. Cambridge University Press, Cambridge (2012)
Google Scholar
Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall Inc, Upper Saddle River (1982)
Google Scholar
Willems, S.P.: Data set–real-world multiechelon supply chains used for inventory optimization. Manufact. Serv. Oper. Manage. 10(1), 19–23 (2008)
Article Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 1–12. ACM, New York (2000)
Google Scholar
Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)
Article MathSciNet Google Scholar
Viger, P.F., Gomariz, A., Gueniche, T., Soltani, A., Wu, C.W., Tseng, V.S.: SPMF: a java open-source pattern mining library. J. Mach. Learn. Res. 15, 3389–3393 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Athens University of Economics and Business, 76 Patission Street, Athens, Greece
Vasilis Spyropoulos & Yannis Kotidis

Authors

Vasilis Spyropoulos
View author publications
You can also search for this author in PubMed Google Scholar
Yannis Kotidis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vasilis Spyropoulos .

Editor information

Editors and Affiliations

Hewlett-Packard Enterprise, Sunnyvale, California, USA
Qiming Chen
Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
Blaise Pascal University, Aubiere, France
Farouk Toumani
University of Linz, Linz, Austria
Roland Wagner
Universidad Politécnica de Valencia, Valencia, Spain
Hendrik Decker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Spyropoulos, V., Kotidis, Y. (2015). Building Space-Efficient Inverted Indexes on Low-Cardinality Dimensions. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds) Database and Expert Systems Applications. Globe DEXA 2015 2015. Lecture Notes in Computer Science(), vol 9261. Springer, Cham. https://doi.org/10.1007/978-3-319-22849-5_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-22849-5_30
Published: 11 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22848-8
Online ISBN: 978-3-319-22849-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics