Skip to main content
Log in

A linear algebra approach to OLAP

  • Original Article
  • Published:
Formal Aspects of Computing

Abstract

Inspired by the relational algebra of data processing, this paper addresses the foundations of data analytical processing from a linear algebra perspective. The paper investigates, in particular, how aggregation operations such as cross tabulations and data cubes essential to quantitative analysis of data can be expressed solely in terms of matrix multiplication, transposition and the Khatri–Rao variant of the Kronecker product. The approach offers a basis for deriving an algebraic theory of data consolidation, handling the quantitative as well as qualitative sides of data science in a natural, elegant and typed way. It also shows potential for parallel analytical processing, as the parallelization theory of such matrix operations is well acknowledged.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Backhouse K, Backhouse RC (2004) Safety of abstract interpretations for free, via logical relations and Galois connections. Sci Comput Programm 15(1–2): 153–196

    Article  MathSciNet  Google Scholar 

  2. Bird R, de Moor O (1997) Algebra of programming. In: Hoare CAR (ed) Series in computer science. Prentice-Hall International, New Jersey

  3. Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis, SC’09. ACM, New York, pp 18:1–18:11

  4. Bird RS (1989) Lecture notes on constructive functional programming, 1989. In: Broy M (ed) CMCS Int. Summer School directed by F.L. Bauer [et al.], vol 55. Springer, NATO Adv. Science Institute (Series F: Comp. and System Sciences), Berlin

  5. Backhouse RC, Michaelis D (2006) Exercises in quantifier manipulation. In: Uustalu T (ed) MPC’06. LNCS, vol 4014. Springer, Berlin, pp 70–81

  6. Chaudhuri S, Dayal U (1997) An overview of data warehousing and OLAP technology. SIGMOD Rec. 26: 65–74

    Article  Google Scholar 

  7. Codd EF (1970) A relational model of data for large shared data banks. CACM 13(6): 377–387

    Article  MATH  Google Scholar 

  8. Desharnais J, Grinenko A, Möller B (2014) Relational style laws and constructs of linear algebra. J Logic Algebr Meth Program 83(2): 154–168

    Article  MATH  Google Scholar 

  9. Davenport TH, Patil DJ (2012) Data scientist: the sexiest job of the 21st century. Oct Harv Bus Rev

  10. Datta A, Thomas H (1999) The cube data model: a conceptual model and algebra for on-line analytical processing in data warehouses. Dec Supp Syst 27(3): 289–301

    Article  Google Scholar 

  11. Eavis T, Dimitrov G, Dimitrov I, Cueva D, Lopez A, Taleb A (2010) Parallel OLAP with the Sidera server. Future Gener Comput Syst 26(2): 259–266

    Article  Google Scholar 

  12. Frias MF (2002) Fork algebras in algebra, logic and computer science. In: Logic and computer science. World Scientific Publishing Co., Singapore

    Google Scholar 

  13. Gray J, Bosworth A, Layman A, Pirahesh H (1996) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-total. In: Su SYW (ed) Proceedings of the 12th int. conf. on data engineering, Feb. 26–Mar. 1, 1996, New Orleans, Louisiana. IEEE Computer Society, New York, pp 152–159

  14. Goil S, Choudhary A (1997) High performance OLAP and data mining on parallel computers. Data Min Knowl Discov 1: 391–417

    Article  Google Scholar 

  15. Goil S, Choudhary A (2001) Parsimony: an infrastructure for parallel multidimensional analysis and data mining. J Parallel Distrib Comput 61(3): 285–321

    Article  MATH  Google Scholar 

  16. Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, Pellow F, Pirahesh H (1997) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min Knowl Disc 1(1): 29–53

    Article  Google Scholar 

  17. Gyssens M, Lakshmanan LVS (1997) A foundation for multi-dimensional databases. VLDB J 106–115

  18. Johnson T, Lakshmanan LV, Ng RT (2000) The 3w model and algebra for unified data mining. VLDB 21–32

  19. Jensen CS, Pedersen TB, Thomsen C (2010) Multidimensional databases and data warehousing. In: Synthesis Lectures on Data Management. Morgan & Claypool Publishers, San Rafael

    Google Scholar 

  20. Macedo H. (2012) Matrices as arrows—why categories of matrices matter. PhD thesis, University of Minho, October, MAPi PhD programme

  21. Maier D (1983) The theory of relational databases. Computer Science Press, Rockville

    MATH  Google Scholar 

  22. Macedo HD, Oliveira JN (2010) Matrices as arrows! A biproduct approach to typed linear algebra. In: MPC, LNCS, vol 6120. Springer, Berlin, pp 271–287

  23. Macedo HD, Oliveira JN (2011) Do the two middle letters of “OLAP” stand for linear algebra (“LA”)? Technical report TR-HASLab:4:2011, HASLab, U.Minho & INESC TEC, July. http://wiki.di.uminho.pt/twiki/bin/view/DI/FMHAS/TechnicalReports

  24. Macedo HD, Oliveira JN (2011) Towards linear algebras of components. In: FACS 2010 of LNCS, vol 6921. Springer, Berlin, pp 300–303

  25. Macedo HD, Oliveira JN (2013) Typing linear algebra: a biproduct-oriented approach. Sci Comput Program 78(11): 2160–2191

    Article  Google Scholar 

  26. Macedo HD, Oliveira JN (2014) Typed linear algebra for the data scientist (In preparation)

  27. Ng RT, Wagner A, Yin Y (2001) Iceberg-cube computation with PC clusters. SIGMOD Rec. 30: 25–36

    Article  Google Scholar 

  28. Oliveira JN (2009) Extended static checking by calculation using the pointfree transform. LNCS, vol 5520. Springer, Berlin, pp 195–251

  29. Oliveira JN (2011) Pointfree foundations for (generic) lossless decomposition. Technical report TR-HASLab:3:2011, HASLab, U.Minho & INESC TEC. http://wiki.di.uminho.pt/twiki/bin/view/DI/FMHAS/TechnicalReports.

  30. Oliveira JN (2012) Towards a linear algebra of programming. Formal Aspects Comput 24(4–6): 433–458

    Article  MATH  Google Scholar 

  31. Oliveira JN (2013) Weighted automata as coalgebras in categories of matrices. Int J Found Comp Sci 24(06): 709–728

    Article  MATH  Google Scholar 

  32. Oliveira JN (2014) A relation-algebraic approach to the “Hoare logic” of functional dependencies. JLAP 83(2): 249–262

    MATH  Google Scholar 

  33. Oliveira JN (2014) Relational algebra for “just good enough" hardware. In: RAMiCS. LNCS, vol 8428. Springer, Berlin, pp 119–138

  34. O’Neil P (1989) Model 204 architecture and performance. In: Gawlick D, Haynie M, Reuter A (ed) High performance transaction systems. Lecture notes in computer science, vol 359. Springer, Berlin, pp 39–59

  35. Pedersen TB, Jensen CS (2001) Multidimensional database technology. Computer 34: 40–46

    Article  Google Scholar 

  36. Park C-S, Kim MH, Lee Y-J (2002) Finding an efficient rewriting of OLAP queries using materialized views in data warehouses. Dec Supp Syst 32(4): 379–399

    Article  Google Scholar 

  37. Rao C.R., Rao M.B. (1998) Matrix algebra and its applications to statistics and econometrics. World Scientific Pub Co Inc

  38. Sorber L, Barel M, Lathauwer L (2014) Tensorlab v2.0: a MATLAB toolbox for tensor computations, January. http://www.tensorlab.net

  39. Schmidt G (2011) Relational mathematics. Encyclopedia of mathematics and its applications, vol 132, Cambridge U.P.

  40. Sorjonen S (2012) OLAP query performance in column-oriented databases. Columnar databases seminar, DCS. University of Helsinki. https://www.cs.helsinki.fi/en/courses/58312305/2012/s/s/1.

  41. Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: KDD’06: proc. of the 12th ACM SIGKDD int. conf. on knowledge discovery and data mining. ACM, New York, pp 374–383

  42. Sun J, Tao D, Papadimitriou S, Yu PS, Faloutsos C (2008) Incremental tensor analysis: theory and applications. ACM Trans Knowl Discov Data 2:11:1–11:37

  43. Vassiliadis P, Sellis T (1999) A survey of logical models for OLAP databases. SIGMOD Rec 28(4): 64–69

    Article  Google Scholar 

  44. Wu K, Otoo EJ, Shoshani A (2006) Optimizing bitmap indices with efficient compression. ACM Trans Database Syst 31: 1–38

    Article  Google Scholar 

  45. Williams S, Oliker L, Vuduc R, Shalf J, Yelick K, Demmel J (2009) Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Comput 35: 178–194

    Article  Google Scholar 

  46. Whitehorn M, Zare R, Pasumansky M (2002) Fast track to MDX. Springer, Berlin

    MATH  Google Scholar 

  47. Yang G, Jin R, Agrawal G (2003) Implementing data cube construction using a cluster middleware: algorithms, implementation experience, and performance evaluation. Future Gener Comput Syst 19(4): 533–550

    Article  Google Scholar 

  48. Yang X, Parthasarathy S, Sadayappan P (2011) Fast sparse matrix-vector multiplication on GPUs: implications for graph mining. Proc VLDB Endowment 4: 231–242

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hugo Daniel Macedo.

Additional information

Communicated by Eerke Boiten

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Macedo, H.D., Oliveira, J.N. A linear algebra approach to OLAP. Form Asp Comp 27, 283–307 (2015). https://doi.org/10.1007/s00165-014-0316-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00165-014-0316-9

Keywords

Navigation