The VLDB Journal

, Volume 27, Issue 5, pp 719–744 | Cite as

Compressed linear algebra for large-scale machine learning

  • Ahmed Elgohary
  • Matthias BoehmEmail author
  • Peter J. Haas
  • Frederick R. Reiss
  • Berthold Reinwald
Special Issue Paper


Large-scale machine learning algorithms are often iterative, using repeated read-only data access and I/O-bound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory and enable fast matrix-vector operations on in-memory data. General-purpose, heavy- and lightweight compression techniques struggle to achieve both good compression ratios and fast decompression speed to enable block-wise uncompressed operations. Therefore, we initiate work—inspired by database compression and sparse matrix formats—on value-based compressed linear algebra (CLA), in which heterogeneous, lightweight database compression techniques are applied to matrices, and then linear algebra operations such as matrix-vector multiplication are executed directly on the compressed representation. We contribute effective column compression schemes, cache-conscious operations, and an efficient sampling-based compression algorithm. Our experiments show that CLA achieves in-memory operations performance close to the uncompressed case and good compression ratios, which enables fitting substantially larger datasets into available memory. We thereby obtain significant end-to-end performance improvements up to \(9.2\mathrm{x}\).


Machine learning Large-scale Declarative Linear algebra Lossless compression 



We thank Alexandre Evfimievski and Prithviraj Sen for thoughtful discussions on compressed linear algebra and code generation, Srinivasan Parthasarathy for pointing us to the related work on graph compression, as well as our reviewers for their valuable comments and suggestions.


  1. 1.
    Abadi, D.J., et al.: Integrating compression and execution in column-oriented database systems. In: SIGMOD (2006)Google Scholar
  2. 2.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. In: CoRR (2016)Google Scholar
  3. 3.
    Adler, M., Mitzenmacher, M.: Towards compressing web graphs. In: DCC (2001)Google Scholar
  4. 4.
    Alexandrov, A., et al.: The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)CrossRefGoogle Scholar
  5. 5.
    American Statistical Association (ASA). Airline on-time performance dataset.
  6. 6.
    Ashari, A., et al.: An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs. In: ICS (2014)Google Scholar
  7. 7.
    Ashari, A., et al.: On optimizing machine learning workloads via kernel fusion. In: PPoPP (2015)Google Scholar
  8. 8.
    Bandyopadhyay, B., et al.: Topological graph sketching for incremental and scalable analytics. In: CIKM (2016)Google Scholar
  9. 9.
    Bassiouni, M.A.: Data compression in scientific and statistical databases. Trans. Softw. Eng. (TSE) 11(10), 1047–1058 (1985)CrossRefGoogle Scholar
  10. 10.
    Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC (2009)Google Scholar
  11. 11.
    Bergstra, J., et al.: Theano: a CPU and GPU math expression compiler. In: SciPy (2010)Google Scholar
  12. 12.
    Beyer, K.S., et al.: On synopses for distinct-value estimation under multiset operations. In: SIGMOD (2007)Google Scholar
  13. 13.
    Bhattacharjee, B., et al.: Efficient index compression in DB2 LUW. PVLDB 2(2), 1462–1473 (2009)Google Scholar
  14. 14.
    Bhattacherjee, S., et al.: PStore: an efficient storage framework for managing scientific data. In: SSDBM (2014)Google Scholar
  15. 15.
    Binnig, C., et al.: Dictionary-based order-preserving string compression for main memory column stores. In: SIGMOD (2009)Google Scholar
  16. 16.
    Boehm, M., et al.: SystemML: declarative machine learning on spark. PVLDB 9(13), 1425–1436 (2016)Google Scholar
  17. 17.
    Boehm, M., et al.: Declarative machine learning—a classification of basic properties and types. In: CoRR (2016)Google Scholar
  18. 18.
    Bolosky, W.J., Scott, M.L.: False sharing and its effect on shared memory performance. In: SEDMS (1993)Google Scholar
  19. 19.
    Bottou, L.: The infinite MNIST dataset.
  20. 20.
    Buehrer, G., Chellapilla, K.: A scalable pattern mining approach to web graph compression with communities. In: WSDM (2008)Google Scholar
  21. 21.
    Charikar, M., et al.: Towards estimation error guarantees for distinct values. In: SIGMOD (2000)Google Scholar
  22. 22.
    Chen, L., et al.: Towards linear algebra over normalized data. PVLDB 10(11), 1214–1225 (2017)Google Scholar
  23. 23.
    Chitta, R., et al.: Approximate kernel k-means: solution to large scale kernel clustering. In: KDD (2011)Google Scholar
  24. 24.
    Cohen, J., et al.: MAD skills: new analysis practices for big data. PVLDB 2(2), 1481–1492 (2009)Google Scholar
  25. 25.
    Constantinescu, C., Lu, M.: Quick estimation of data compression and de-duplication for large storage systems. In: CCP (2011)Google Scholar
  26. 26.
    Cormack, G.V.: Data compression on a database system. Commun. ACM 28(12), 1336–1342 (1985)CrossRefGoogle Scholar
  27. 27.
    Damme, P., et al.: Lightweight data compression algorithms: an experimental survey. In: EDBT (2017)Google Scholar
  28. 28.
    Das, S., et al.: Ricardo: integrating R and hadoop. In: SIGMOD (2010)Google Scholar
  29. 29.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI (2004)Google Scholar
  30. 30.
    Elgamal, T., et al.: sPCA: scalable principal component analysis for big data on distributed platforms. In: SIGMOD (2015)Google Scholar
  31. 31.
    Elgamal, T., et al.: SPOOF: sum-product optimization and operator fusion for large-scale machine learning. In: CIDR (2017)Google Scholar
  32. 32.
    Elgohary, A., et al.: Compressed linear algebra for large-scale machine learning. PVLDB 9(12), 960–971 (2016)Google Scholar
  33. 33.
    Fan, W., et al.: Query preserving graph compression. In: SIGMOD (2012)Google Scholar
  34. 34.
    Ghoting, A., et al.: SystemML: declarative machine learning on MapReduce. In: ICDE (2011)Google Scholar
  35. 35.
    Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40, 237–264 (1953)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Graefe, G., Shapiro, L.D.: Data compression and database performance. In: Applied Computing (1991)Google Scholar
  37. 37.
    Haas, P.J., Stokes, L.: Estimating the number of classes in a finite population. J. Am. Stat. Assoc. 93(444), 1475–1487 (1998)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Halko, N., et al.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Harnik, D., et al.: Estimation of deduplication ratios in large data sets. In: MSST (2012)Google Scholar
  40. 40.
    Harnik, D., et al.: To zip or not to zip: effective resource usage for real-time compression. In: FAST (2013)Google Scholar
  41. 41.
    Huang,B., et al.: Cumulon: optimizing statistical data analysis in the cloud. In: SIGMOD (2013)Google Scholar
  42. 42.
    Huang,B., et al.: Resource elasticity for large-scale machine learning. In: SIGMOD (2015)Google Scholar
  43. 43.
    Idreos, S., et al.: Estimating the compression fraction of an index using sampling. In: ICDE (2010)Google Scholar
  44. 44.
    Intel. MKL: Math Kernel Library.
  45. 45.
    Johnson, D.S., et al.: Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM J. Comput. 3(4), 299–325 (1974)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Johnson, N.L., et al.: Univariate Discrete Distributions, 2nd edn. Wiley, New York (1992)zbMATHGoogle Scholar
  47. 47.
    Kang, D., et al.: NoScope: Optimizing deep CNN-based queries over video streams at scale. PVLDB 10(11), 1586–1597 (2017)Google Scholar
  48. 48.
    Karakasis, V., et al.: An extended compression format for the optimization of sparse matrix-vector multiplication. Trans. Parallel Distrib. Syst. (TPDS) 24(10), 1930–1940 (2013)CrossRefGoogle Scholar
  49. 49.
    Kernert, D., et al.: SLACID—sparse linear algebra in a column-oriented in-memory database system. In: SSDBM (2014)Google Scholar
  50. 50.
    Kim, M.: TensorDB and tensor-relational model (TRM) for efficient tensor-relational operations. Ph.D. Thesis, ASU (2014)Google Scholar
  51. 51.
    Kimura, H., et al.: Compression aware physical database design. PVLDB 4(10), 657–668 (2011)Google Scholar
  52. 52.
    Kourtis, K., et al.: Optimizing sparse matrix-vector multiplication using index and value compression. In: CF (2008)Google Scholar
  53. 53.
    Kumar, A., et al.: Demonstration of Santoku: optimizing machine learning over normalized data. PVLDB 8(12), 1864–1867 (2015)Google Scholar
  54. 54.
    Kumar, A., et al.: Learning generalized linear models over normalized data. In: SIGMOD (2015)Google Scholar
  55. 55.
    Lang, H., et al.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD (2016)Google Scholar
  56. 56.
    Larson, P., et al.: SQL server column store indexes. In: SIGMOD (2011)Google Scholar
  57. 57.
    Lecun, Y.: Deep learning. Nature 521, 436–444 (2015)MathSciNetCrossRefGoogle Scholar
  58. 58.
    Li, F., et al.: When Lempel–Ziv–Welch meets machine learning: a case study of accelerating machine learning using coding. In: CoRR (2017)Google Scholar
  59. 59.
    Lichman, M.: UCI machine learning repository: higgs, covertype, US Census (1990).
  60. 60.
    Luo, S., et al.: Scalable linear algebra on a relational database system. In: ICDE (2017)Google Scholar
  61. 61.
    Maccioni, A., Abadi, D.J.: Scalable pattern matching over compressed graphs via dedensification. In: KDD (2016)Google Scholar
  62. 62.
    Maneth, S., Peternek, F.: A survey on methods and systems for graph compression. In: CoRR (2015)Google Scholar
  63. 63.
    Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (1999)CrossRefGoogle Scholar
  64. 64.
    NVIDIA. cuSPARSE: CUDA Sparse Matrix Library.
  65. 65.
    Olteanu, D., Schleich, M.: F: Regression models over factorized views. PVLDB 9(13), 1573–1576 (2016)Google Scholar
  66. 66.
    O’Neil, P.E.: Model 204 architecture and performance. In: High Performance Transaction Systems (1989)Google Scholar
  67. 67.
    Or, A., Rosen, J.: Unified memory management in spark 1.6, SPARK-10000 design document (2015)Google Scholar
  68. 68.
    Oracle. Data Warehousing Guide, 11g Release 1 (2007)Google Scholar
  69. 69.
    Papadopoulos, S., et al.: The TileDB array data storage manager. PVLDB 10(4), 349–360 (2016)Google Scholar
  70. 70.
    Qin, C., Rusu,F.: Speculative approximations for terascale analytics. In: CoRR (2015)Google Scholar
  71. 71.
    Raman, V., Swart, G.: How to wring a table dry: entropy compression of relations and querying of compressed relations. In: VLDB (2006)Google Scholar
  72. 72.
    Raman, V., et al.: DB2 with BLU acceleration: so much more than just a column store. PVLDB 6(11), 1080–1091 (2013)Google Scholar
  73. 73.
    Raskhodnikova, S., et al.: Strong lower bounds for approximating distribution support size and the distinct elements problem. SIAM J. Comput. 39(3), 813–842 (2009)MathSciNetCrossRefGoogle Scholar
  74. 74.
    Rendle, S.: Scaling factorization machines to relational data. PVLDB 6(5), 337–348 (2013)Google Scholar
  75. 75.
    Rohrmann, T., et al.: Gilbert: declarative sparse linear algebra on massively parallel dataflow systems. In: BTW (2017)Google Scholar
  76. 76.
    Saad, Y: SPARSKIT: a basic tool kit for sparse matrix computations—Version 2 (1994)Google Scholar
  77. 77.
    Satuluri, V., et al.: Local graph sparsification for scalable clustering. In: SIGMOD (2011)Google Scholar
  78. 78.
    Schelter, S., et al.: Samsara: declarative machine learning on distributed dataflow systems. In: NIPS Workshop MLSystems (2016)Google Scholar
  79. 79.
    Schlegel, B., et al.: Memory-efficient frequent-itemset mining. In: EDBT (2011)Google Scholar
  80. 80.
    Schleich, M., et al.: Learning linear regression models over factorized joins. In: SIGMOD (2016)Google Scholar
  81. 81.
    Stonebraker, M., et al.: C-store: a column-oriented DBMS. In: VLDB (2005)Google Scholar
  82. 82.
    Stonebraker, M., et al.: The Architecture of SciDB. In: SSDBM (2011)Google Scholar
  83. 83.
    Sysbase. IQ 15.4 System Administration Guide (2013)Google Scholar
  84. 84.
    Tabei, Y., et al.: Scalable partial least squares regression on grammar-compressed data matrices. In: KDD (2016)Google Scholar
  85. 85.
    Tepper, M., Sapiro, G.: Compressed nonnegative matrix factorization is fast and accurate. IEEE Trans. Signal Process. 64(9), 2269–2283 (2016)MathSciNetCrossRefGoogle Scholar
  86. 86.
    Tian, Y., et al.: Scalable and numerically stable descriptive statistics in SystemML. In: ICDE (2012)Google Scholar
  87. 87.
    Valiant, G., Valiant, P.: Estimating the unseen: an n/log(n)-sample estimator for entropy and support size. In: STOC, Shown Optimal via New CLTs (2011)Google Scholar
  88. 88.
    Wang, W., et al.: Database meets deep learning: challenges and opportunities. SIGMOD Rec. 45(2), 17–22 (2016)CrossRefGoogle Scholar
  89. 89.
    Westmann, T., et al.: The implementation and performance of compressed databases. SIGMOD Rec. 29(3), 55–67 (2000)CrossRefGoogle Scholar
  90. 90.
    Willhalm, T., et al.: SIMD-Scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB 2(1), 385–394 (2009)Google Scholar
  91. 91.
    Williams, S., et al.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: SC (2007)Google Scholar
  92. 92.
    Wu, K., et al.: Optimizing bitmap indices with efficient compression. TODS 31(1), 1–38 (2006)CrossRefGoogle Scholar
  93. 93.
    Yu, L., et al.: Exploiting matrix dependency for efficient distributed matrix computation. In: SIGMOD (2015)Google Scholar
  94. 94.
    Zadeh, R. B., et al.: Matrix computations and optimization in apache spark. In: KDD (2016)Google Scholar
  95. 95.
    Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI (2012)Google Scholar
  96. 96.
    Zhang, C., et al.: Materialization optimizations for feature selection workloads. In: SIGMOD (2014)Google Scholar
  97. 97.
    Zukowski, M., et al.: Super-scalar RAM-CPU cache compression. In: ICDE (2006)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.IBM Research – AlmadenSan JoseUSA
  2. 2.University of MarylandCollege ParkUSA

Personalised recommendations