Advertisement

Adaptive Tuple Differential Coding

  • Jean-Paul Deveaux
  • Andrew Rau-Chaplin
  • Norbert Zeh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4653)

Abstract

It is desirable to employ compression techniques in Relational OLAP systems to reduce disk space requirements and increase disk I/O throughput. Tuple Differential Coding (TDC) techniques have been introduced to compress views on a tuple level by storing only the differences between consecutive ordered tuples. These techniques work well for highly regular data in which the differences between tuples are fairly constant but are less effective on real data containing either skew or outliers. In this paper we introduce Adaptive Tuple Differential Coding (ATDC), which employs optimization techniques to analyze blocks of tuples to detect large tuple differences, with the purpose of isolating them to minimize their negative effect on the compression of neighbouring tuples. Our experiments show that this new algorithm provides an increase in compression ratio of 15–30% over TDC on typical real datasets.

Keywords

Compression Ratio Compression Algorithm High Compression Ratio Compression Time Disk Block 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bender, M.A., Farach-Colton, M., Pemmasani, G., Skiena, S., Sumazin, P.: Lowest common ancestors in trees and directed acyclic graphs. Journal of Algorithms 57, 75–94 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    cgmLab: OLAP data generator (2000), http://cgmlab.cs.dal.ca/downloadarea/
  3. 3.
    Chen, Z., Seshadri, P.: An algebraic compression framework for query results. In: ICDE, pp. 177–188 (2000)Google Scholar
  4. 4.
    Dehne, F., Eavis, T., Rau-Chaplin, A.: Parallel multi-dimensional ROLAP indexing. In: Proc. Int’l Symposium on Cluster Computing and the Grid, 2003, pp. 86–93 (2003)Google Scholar
  5. 5.
    Liang, B.: Compressing data cube in parallel OLAP systems. Master’s thesis, Carleton University (2004)Google Scholar
  6. 6.
    Ng, W.K., Ravishankar, C.V.: Block-oriented compression techniques for large statistical databases. Knowledge and Data Engineering 9(2), 314–328 (1997)CrossRefGoogle Scholar
  7. 7.
    US Geological Survey. HYDRO1k elevation derivative database (2003), http://edcdaac.usgs.gov/gtopo30/hydro/index.asp
  8. 8.
    Vuillemin, J.: A unifying look at data structures. Communications of the ACM 23, 229–239 (1980)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Jean-Paul Deveaux
    • 1
  • Andrew Rau-Chaplin
    • 1
  • Norbert Zeh
    • 1
  1. 1.Faculty of Computer Science, Dalhousie University, Halifax NSCanada

Personalised recommendations