Abstract
Inspired by the relational algebra of data processing, this paper addresses the foundations of data analytical processing from a linear algebra perspective. The paper investigates, in particular, how aggregation operations such as cross tabulations and data cubes essential to quantitative analysis of data can be expressed solely in terms of matrix multiplication, transposition and the Khatri–Rao variant of the Kronecker product. The approach offers a basis for deriving an algebraic theory of data consolidation, handling the quantitative as well as qualitative sides of data science in a natural, elegant and typed way. It also shows potential for parallel analytical processing, as the parallelization theory of such matrix operations is well acknowledged.
Similar content being viewed by others
References
Backhouse K, Backhouse RC (2004) Safety of abstract interpretations for free, via logical relations and Galois connections. Sci Comput Programm 15(1–2): 153–196
Bird R, de Moor O (1997) Algebra of programming. In: Hoare CAR (ed) Series in computer science. Prentice-Hall International, New Jersey
Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis, SC’09. ACM, New York, pp 18:1–18:11
Bird RS (1989) Lecture notes on constructive functional programming, 1989. In: Broy M (ed) CMCS Int. Summer School directed by F.L. Bauer [et al.], vol 55. Springer, NATO Adv. Science Institute (Series F: Comp. and System Sciences), Berlin
Backhouse RC, Michaelis D (2006) Exercises in quantifier manipulation. In: Uustalu T (ed) MPC’06. LNCS, vol 4014. Springer, Berlin, pp 70–81
Chaudhuri S, Dayal U (1997) An overview of data warehousing and OLAP technology. SIGMOD Rec. 26: 65–74
Codd EF (1970) A relational model of data for large shared data banks. CACM 13(6): 377–387
Desharnais J, Grinenko A, Möller B (2014) Relational style laws and constructs of linear algebra. J Logic Algebr Meth Program 83(2): 154–168
Davenport TH, Patil DJ (2012) Data scientist: the sexiest job of the 21st century. Oct Harv Bus Rev
Datta A, Thomas H (1999) The cube data model: a conceptual model and algebra for on-line analytical processing in data warehouses. Dec Supp Syst 27(3): 289–301
Eavis T, Dimitrov G, Dimitrov I, Cueva D, Lopez A, Taleb A (2010) Parallel OLAP with the Sidera server. Future Gener Comput Syst 26(2): 259–266
Frias MF (2002) Fork algebras in algebra, logic and computer science. In: Logic and computer science. World Scientific Publishing Co., Singapore
Gray J, Bosworth A, Layman A, Pirahesh H (1996) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-total. In: Su SYW (ed) Proceedings of the 12th int. conf. on data engineering, Feb. 26–Mar. 1, 1996, New Orleans, Louisiana. IEEE Computer Society, New York, pp 152–159
Goil S, Choudhary A (1997) High performance OLAP and data mining on parallel computers. Data Min Knowl Discov 1: 391–417
Goil S, Choudhary A (2001) Parsimony: an infrastructure for parallel multidimensional analysis and data mining. J Parallel Distrib Comput 61(3): 285–321
Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, Pellow F, Pirahesh H (1997) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min Knowl Disc 1(1): 29–53
Gyssens M, Lakshmanan LVS (1997) A foundation for multi-dimensional databases. VLDB J 106–115
Johnson T, Lakshmanan LV, Ng RT (2000) The 3w model and algebra for unified data mining. VLDB 21–32
Jensen CS, Pedersen TB, Thomsen C (2010) Multidimensional databases and data warehousing. In: Synthesis Lectures on Data Management. Morgan & Claypool Publishers, San Rafael
Macedo H. (2012) Matrices as arrows—why categories of matrices matter. PhD thesis, University of Minho, October, MAPi PhD programme
Maier D (1983) The theory of relational databases. Computer Science Press, Rockville
Macedo HD, Oliveira JN (2010) Matrices as arrows! A biproduct approach to typed linear algebra. In: MPC, LNCS, vol 6120. Springer, Berlin, pp 271–287
Macedo HD, Oliveira JN (2011) Do the two middle letters of “OLAP” stand for linear algebra (“LA”)? Technical report TR-HASLab:4:2011, HASLab, U.Minho & INESC TEC, July. http://wiki.di.uminho.pt/twiki/bin/view/DI/FMHAS/TechnicalReports
Macedo HD, Oliveira JN (2011) Towards linear algebras of components. In: FACS 2010 of LNCS, vol 6921. Springer, Berlin, pp 300–303
Macedo HD, Oliveira JN (2013) Typing linear algebra: a biproduct-oriented approach. Sci Comput Program 78(11): 2160–2191
Macedo HD, Oliveira JN (2014) Typed linear algebra for the data scientist (In preparation)
Ng RT, Wagner A, Yin Y (2001) Iceberg-cube computation with PC clusters. SIGMOD Rec. 30: 25–36
Oliveira JN (2009) Extended static checking by calculation using the pointfree transform. LNCS, vol 5520. Springer, Berlin, pp 195–251
Oliveira JN (2011) Pointfree foundations for (generic) lossless decomposition. Technical report TR-HASLab:3:2011, HASLab, U.Minho & INESC TEC. http://wiki.di.uminho.pt/twiki/bin/view/DI/FMHAS/TechnicalReports.
Oliveira JN (2012) Towards a linear algebra of programming. Formal Aspects Comput 24(4–6): 433–458
Oliveira JN (2013) Weighted automata as coalgebras in categories of matrices. Int J Found Comp Sci 24(06): 709–728
Oliveira JN (2014) A relation-algebraic approach to the “Hoare logic” of functional dependencies. JLAP 83(2): 249–262
Oliveira JN (2014) Relational algebra for “just good enough" hardware. In: RAMiCS. LNCS, vol 8428. Springer, Berlin, pp 119–138
O’Neil P (1989) Model 204 architecture and performance. In: Gawlick D, Haynie M, Reuter A (ed) High performance transaction systems. Lecture notes in computer science, vol 359. Springer, Berlin, pp 39–59
Pedersen TB, Jensen CS (2001) Multidimensional database technology. Computer 34: 40–46
Park C-S, Kim MH, Lee Y-J (2002) Finding an efficient rewriting of OLAP queries using materialized views in data warehouses. Dec Supp Syst 32(4): 379–399
Rao C.R., Rao M.B. (1998) Matrix algebra and its applications to statistics and econometrics. World Scientific Pub Co Inc
Sorber L, Barel M, Lathauwer L (2014) Tensorlab v2.0: a MATLAB toolbox for tensor computations, January. http://www.tensorlab.net
Schmidt G (2011) Relational mathematics. Encyclopedia of mathematics and its applications, vol 132, Cambridge U.P.
Sorjonen S (2012) OLAP query performance in column-oriented databases. Columnar databases seminar, DCS. University of Helsinki. https://www.cs.helsinki.fi/en/courses/58312305/2012/s/s/1.
Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: KDD’06: proc. of the 12th ACM SIGKDD int. conf. on knowledge discovery and data mining. ACM, New York, pp 374–383
Sun J, Tao D, Papadimitriou S, Yu PS, Faloutsos C (2008) Incremental tensor analysis: theory and applications. ACM Trans Knowl Discov Data 2:11:1–11:37
Vassiliadis P, Sellis T (1999) A survey of logical models for OLAP databases. SIGMOD Rec 28(4): 64–69
Wu K, Otoo EJ, Shoshani A (2006) Optimizing bitmap indices with efficient compression. ACM Trans Database Syst 31: 1–38
Williams S, Oliker L, Vuduc R, Shalf J, Yelick K, Demmel J (2009) Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Comput 35: 178–194
Whitehorn M, Zare R, Pasumansky M (2002) Fast track to MDX. Springer, Berlin
Yang G, Jin R, Agrawal G (2003) Implementing data cube construction using a cluster middleware: algorithms, implementation experience, and performance evaluation. Future Gener Comput Syst 19(4): 533–550
Yang X, Parthasarathy S, Sadayappan P (2011) Fast sparse matrix-vector multiplication on GPUs: implications for graph mining. Proc VLDB Endowment 4: 231–242
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Eerke Boiten
Rights and permissions
About this article
Cite this article
Macedo, H.D., Oliveira, J.N. A linear algebra approach to OLAP. Form Asp Comp 27, 283–307 (2015). https://doi.org/10.1007/s00165-014-0316-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00165-014-0316-9