Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Data Reduction

  • Rui Zhang
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_533

Definition

Data reduction means the reduction on certain aspects of data, typically the volume of data. The reduction can also be on other aspects such as the dimensionality of data when the data is multidimensional. Reduction on any aspect of data usually implies reduction on the volume of data.

Data reduction does not make sense by itself unless it is associated with a certain purpose. The purpose in turn dictates the requirements for the corresponding data reduction techniques. A naive purpose for data reduction is to reduce the storage space. This requires a technique to compress the data into a more compact format and also to restore the original data when the data needs to be examined. Nowadays, storage space may not be the primary concern and the needs for data reduction come frequently from database applications. In this case, the purpose for data reduction is to save computational cost or disk access cost in query processing.

Historical Background

The need for data reduction...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Lelewer DA, Hirschberg DS. Data compression. ACM Comput Surv. 1987;19(3):261–96.CrossRefzbMATHGoogle Scholar
  2. 2.
  3. 3.
    Barbará D, DuMouchel W, Faloutsos C, Haas PJ, Hellerstein JM, Ioannidis YE, Jagadish HV, Johnson T, Ng RT, Poosala V, Ross KA, Sevcik KC. The New Jersey data reduction report. IEEE Data Eng Bull. 1997;20(4):3–45.Google Scholar
  4. 4.
    Poosala V, Ioannidis YE, Haas PJ, Shekita EJ. Improved histograms for selectivity estimation of range predicates. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1996. p. 294–305.Google Scholar
  5. 5.
    Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1996. p. 103–14.Google Scholar
  6. 6.
    Guha S, Rastogi R, Shim K. CURE: an efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 73–84.CrossRefGoogle Scholar
  7. 7.
    Jolliffe IT. Principal component analysis. Berlin: Springer; 1986.CrossRefzbMATHGoogle Scholar
  8. 8.
  9. 9.
    Ali ME, Zhang R, Tanin E, Kulik L. A motion-aware approach to continuous retrieval of 3D objects. In: Proceedings of the 24th International Conference on Data Engineering; 2008.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of MelbourneMelbourneAustralia
  2. 2.Dataware VenturesTucsonUSA
  3. 3.Dataware VenturesRedondo BeachUSA

Section editors and affiliations

  • Xiaofang Zhou
    • 1
  1. 1.School of Inf. Tech. & Elec. Eng.Univ. of QueenslandBrisbaneAustralia