PASTA: a parallel sparse tensor algorithm benchmark suite

Li, Jiajia; Ma, Yuchen; Wu, Xiaolong; Li, Ang; Barker, Kevin

doi:10.1007/s42514-019-00012-w

PASTA: a parallel sparse tensor algorithm benchmark suite

Regular Paper
Published: 05 August 2019

Volume 1, pages 111–130, (2019)
Cite this article

CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Jiajia Li ORCID: orcid.org/0000-0003-1270-4147¹,
Yuchen Ma²,
Xiaolong Wu³,
Ang Li¹ &
…
Kevin Barker¹

1360 Accesses
10 Citations
Explore all metrics

Abstract

Tensor methods have gained increasingly attention from various applications, including machine learning, quantum chemistry, healthcare analytics, social network analysis, data mining, and signal processing, to name a few. Sparse tensors and their algorithms become critical to further improve the performance of these methods and enhance the interpretability of their output. This work presents a sparse tensor algorithm benchmark suite (PASTA) for single- and multi-core CPUs. To the best of our knowledge, this is the first benchmark suite for sparse tensor world. PASTA targets on: (1) helping application users to evaluate different computer systems using its representative computational workloads; (2) providing insights to better utilize existed computer architecture and systems and inspiration for the future design. This benchmark suite is publicly released at https://gitlab.com/tensorworld/pasta, under version 0.1.0.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics on Apache Spark

Article 13 October 2016

Introduction to Bioinformatics

Bioinformatics: new tools and applications in life science and personalized medicine

Article 06 January 2021

Notes

Our convention for the dimensions of \(\mathbf {U}\) differs from that of Kolda and Bader’s definition (Kolda and Bader 2009). In particular, we transpose the matrix modes \(\mathbf {U}\), which leads to a more efficient Ttm under the row-major storage convention of the C language.

References

Abadi, M., et al.: Large-Scale Machine Learning on Heterogeneous Systems, 2015. TensorFlow, Google Brain Team, California (2015)
Google Scholar
Acar, E., Aykut-Bingol, C., Bingol, H., Bro, R., Yener, B.: Multiway analysis of epilepsy tensors. Bioinformatics 23(13), i10–i18 (2007). https://doi.org/10.1093/bioinformatics/btm210
Article Google Scholar
Acar, E., Dunlavy, D.M., Kolda, T.G., Mørup, M.: Scalable tensor factorizations for incomplete data. Chemometr. Intell. Lab. Syst. 106(1), 41–56 (2011)
Article Google Scholar
Acar, E., Kolda, T.G., Dunlavy, D.M.: All-at-once optimization for coupled matrix and tensor factorizations. (2011)
Anandkumar, A., Ge, R., Hsu, D., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. 15(1), 2773–2832 (2014)
MathSciNet MATH Google Scholar
Austin, W., Ballard, G., Kolda, T.G.: Parallel tensor compression for large-scale scientific data. In: 2016 IEEE international parallel and distributed processing symposium (IPDPS), pp. 912–922. https://doi.org/10.1109/IPDPS.2016.67
Bader, B.W., Kolda, T.G.: Efficient MATLAB computations with sparse and factored tensors. SIAM J. Sci. Comput. 30(1), 205–231 (2007). https://doi.org/10.1137/060676489
Article MathSciNet MATH Google Scholar
Bader, B.W., Kolda, T.G. et al. MATLAB Tensor Toolbox (Version 3.0-dev) (2017). https://www.tensortoolbox.org
Battaglino, C., Ballard, G., Kolda, T.G.: A practical randomized CP tensor decomposition. SIAM J. Matrix Anal. Appl. 39(2), 876–901 (2018)
Article MathSciNet MATH Google Scholar
Benson, A.R., Gleich, D.F., Leskovec, J.: Tensor spectral clustering for partitioning higher-order network structures. arXiv:1502.05058 (2015)
Beutel, A., Kumar, A., Papalexakis, E., Talukdar, P.P., Faloutsos, C., Xing, E.P.: FLEXIFACT: scalable flexible factorization of coupled tensors on hadoop. In: NIPS 2013 big learning workshop (2013)
Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the 17th international conference on parallel architectures and compilation techniques, ACM, pp. 72–81 (2008)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011). https://doi.org/10.1561/2200000016
Article MATH Google Scholar
Bro, R., Sidiropoulos, N.D., Giannakis, G.B.: A fast least squares algorithm for separating trilinear mixtures. In Independent Component Analysis (1999)
Calvin, J.A., Valeev, E.F.: TiledArray: a massively-parallel, block-sparse tensor framework (Version v0.6.0). Available from https://github.com/valeevgroup/tiledarray. (2016)
Cao, B., He, L., Kong, X., Philip, S.Y., Hao, Z., Ragin, A.B.: Tensor-based multi-view feature selection with applications to brain diseases. In: Data Mining (ICDM), 2014 IEEE international conference on, pp. 40–49 (2014). https://doi.org/10.1109/ICDM.2014.26
Cao, B., Kong, X., Yu, P.S.: A review of heterogeneous data mining for brain disorders. arXiv:abs/1508.01023 (2015)
Carroll, J.D., Chang, J.-J.: Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition. Psychometrika 35(3), 283–319 (1970). https://doi.org/10.1007/BF02310791
Article MATH Google Scholar
Carroll, J.D., Pruzansky, S., Kruskal, J.B.: CANDELINC: a general approach to multidimensional analysis of many-way arrays with linear constraints on parameters. Psychometrika 45(1980), 3–24 (1980)
Article MathSciNet MATH Google Scholar
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S., Skadron, K.: Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE international symposium on workload characterization (IISWC), pp. 44–54 (2009). https://doi.org/10.1109/IISWC.2009.5306797
Cheng, D., Peng, R., Liu, Y., Perros, I.: SPALS: fast alternating least squares via implicit leverage scores sampling. In Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.), Advances in neural information processing systems 29, Curran Associates, Inc., pp. 721–729. http://papers.nips.cc/paper/6436-spals-fast-alternating-least-squares-via-implicit-leverage-scores-sampling.pdf (2016)
Chi, E.C., Kolda, T.G.: On tensors, sparsity, and nonnegative factorizations. SIAM J. Matrix Anal. Appl. 33(4), 1272–1299 (2012)
Article MathSciNet MATH Google Scholar
Choi, J., Liu, X., Smith, S., Simon, T.: Blocking optimization techniques for sparse tensor computation. pp. 568–577 (2018). https://doi.org/10.1109/IPDPS.2018.00066
Choi, J.H., Vishwanathan, S.: DFacTo: Distributed Factorization of Tensors. In: Ghahramani, Z., Welling, M.C., Cortes, N.D., Lawrence, Weinberger, K.Q. (eds.), Advances in neural information processing systems 27, Curran Associates, Inc., pp. 1296–1304 (2014)
Cichocki, A.: Tensor decompositions: a new concept in brain data analysis? arXiv:1305.0395 (2013)
Cichocki, A.: Era of big data processing: a new approach via tensor networks and tensor decompositions. arXiv:abs/1403.2048 (2014)
Cichocki, A., Mandic, D., De Lathauwer, L., Zhou, G., Zhao, Q., Caiafa, C., Phan, H.A.: Tensor decompositions for signal processing applications: from two-way to multiway component analysis. Signal Process. Magz. IEEE 32(2), 145–163 (2015). https://doi.org/10.1109/MSP.2013.2297439
Article Google Scholar
Cichocki, A., Lee, N., Oseledets, I.V., Phan, A., Zhao, Q., Mandic, D.: Low-rank tensor networks for dimensionality reduction and large-scale optimization problems: perspectives and challenges PART 1. (2016). arXiv:cs.NA/1609.00893
Cohen, N., Sharir, O., Shashua, A.: On the expressive power of deep learning: a tensor analysis. arXiv:abs/1509.05009 (2015)
Davidson, I., Gilpin, S., Carmichael, O., Walker, P.: Network discovery via constrained tensor analysis of fMRI Data. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining(KDD ’13). ACM, New York, pp. 194–202 (2013). https://doi.org/10.1145/2487575.2487619
De Lathauwer, L.: Decompositions of a higher-order tensor in block terms-part I: lemmas for partitioned matrices. SIAM J. Matrix Anal. Appl. 30(3), 1022–1032 (2008). https://doi.org/10.1137/060661685
Article MathSciNet MATH Google Scholar
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(2000), 1253–1278 (2000)
Article MathSciNet MATH Google Scholar
De Lathauwer, L., De Moor, B., Vandewalle, J.: On the best rank-1 and Rank-(R1,R2,.,RN) approximation of higher-order tensors. SIAM J. Matrix Anal. Appl. 21(4), 1324–1342 (2000). https://doi.org/10.1137/S0895479898346995
Article MathSciNet MATH Google Scholar
De Lathauwer, L., Vervliet, N., Boussé, M., Debals, O.: Dealing with curse and blessing of dimensionality through tensor decompositions. (2017)
Dixit, K.M.: The SPEC benchmarks. Parallel Comput. 17(10–11), 1195–1209 (1991)
Article Google Scholar
Eldén, L., Savas, B.: A Newton–Grassmann method for computing the best multilinear rank-(\({r}_1,\,{r}_2,\,{r}_3\)) approximation of a tensor. SIAM J. Matrix Anal. Appl. 31(2), 248–271 (2009). https://doi.org/10.1137/070688316
Epifanovsky, E., Wormit, M., Kuś, T., Landau, A., Zuev, D., Khistyaev, K., Manohar, P., Kaliman, I., Dreuw, A., Krylov, A.I.: New implementation of high-level correlated methods using a general block tensor library for high-performance electronic structure calculations. J. Comput. Chem. 34(26), 2293–2309 (2013). https://doi.org/10.1002/jcc.23377
Article Google Scholar
Evenbly, G., Vidal, G.: Algorithms for entanglement renormalization. Phys. Rev. B 79(14), 144108 (2009)
Article Google Scholar
Fang, X., Pan, R.: Fast DTT: a near linear algorithm for decomposing a tensor into factor tensors. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp. 967–976 (2014)
Giovannetti, V., Montangero, S., Fazio, R.: Quantum multiscale entanglement renormalization ansatz channels. Phys. Rev. Lett. 101(18), 180503 (2008)
Article Google Scholar
Gorodetsky, A.A., Sertac, K., Youssef, M.M.: Efficient high-dimensional stochastic optimal motion control using tensor-train decomposition (2008)
Grasedyck, L.: Hierarchical singular value decomposition of tensors. SIAM J. Matrix Anal. Appl. 31(4), 2029–2054 (2010). https://doi.org/10.1137/090764189
Article MathSciNet MATH Google Scholar
Grasedyck, L., Kressner, D., Tobler, C.: A literature survey of low-rank tensor approximation techniques. GAMM-Mitteilungen 36(1), 53–78 (2013)
Article MathSciNet MATH Google Scholar
Hackbusch, W., Kühn, S.: A new scheme for the tensor representation. J. Fourier Anal. Appl. 15(5), 706–722 (2009). https://doi.org/10.1007/s00041-009-9094-9
Article MathSciNet MATH Google Scholar
Hansen, S., Plantenga, T., Kolda, T.G.: Newton-based optimization for Kullback–Leibler nonnegative tensor factorizations. Optim. Methods Softw. 30(2015), 1002–1029 (2015)
Article MathSciNet MATH Google Scholar
Harshman, R., Lundy, M.: Uniqueness proof for a family of models sharing features of Tucker’s three-mode factor analysis and PARAFAC/Candecomp. Psychometrika 61(1), 133–154 (1996). http://EconPapers.repec.org/RePEc:spr:psycho:v:61:y:1996:i:1:p:133-154
Harshman, R.A.: Foundations of the PARAFAC procedure: models and conditions for an “explanatory” multi-modal factor analysis. UCLA Work. Pap. Phonetic 16(1), 84 (1970)
Google Scholar
Harshman, R.A.: PARAFAC2: mathematical and technical notes. UCLA Work. Pap. Phonetic 22, 30–44 (1972)
Google Scholar
Hein, E., Conte, T., Young, J.S., Eswar, S., Li, J., Lavin, P., Vuduc, R., Riedy, J.: An initial characterization of the Emu Chick. 2018 IEEE international parallel and distributed processing symposium workshops, p. 10 (2018)
Henderson, J., Ho, J.C., Kho, A.N., Denny, J.C., Malin, B.A., Sun, J., Ghosh, J.: Granite: diversified, sparse tensor factorization for electronic health record-based phenotyping. In: 2017 IEEE international conference on healthcare informatics (ICHI), pp. 214–223 (2017). https://doi.org/10.1109/ICHI.2017.61
Hitchcock, F.L.: The expression of a tensor or a polyadic as a sum of products. J. Math. Phys 6(1), 164–189 (1927)
Article MATH Google Scholar
Ho, J.C., Ghosh, J., Steinhubl, S.R., Stewart, W.F., Denny, J.C., Malin, B.A., Sun, J.: Limestone: high-throughput candidate phenotype generation via tensor factorization. J. Biomed. Inf. 52(2014), 199–211 (2014c)
Article Google Scholar
Ho, J.C., Ghosh, J., Sun, J.: Extracting phenotypes from patient claim records using nonnegative tensor factorization. In: Brain informatics and health, Springer, pp. 142–151 (2014a)
Ho, J.C., Ghosh, J., Sun, J.: Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining(KDD ’14), ACM, New York, pp. 115–124 (2014b). https://doi.org/10.1145/2623330.2623658
Hutchinson, B., Deng, L., Dong, Y.: Tensor deep stacking networks. Pattern Anal. Mach. Intell. IEEE Trans. 35(8), 1944–1957 (2013)
Article Google Scholar
Ishteva, M., Absil, P., Van Huffel, S., De Lathauwer, L.: Best low multilinear rank approximation of higher-order tensors, based on the Riemannian trust-region scheme. SIAM J. Matrix Anal. Appl. 32(1), 115–135 (2011). https://doi.org/10.1137/090764827
Article MathSciNet MATH Google Scholar
Janzamin, M., Sedghi, H., Anandkumar, A.: Generalization bounds for neural networks through tensor factorization. arXiv:abs/1506.08473 (2015)
Jeon, I., Papalexakis, E.E., Kang, U., Faloutsos, C.: HaTen2: billion-scale tensor decompositions (Version 1.0). http://datalab.snu.ac.kr/haten2/. (2015)
Jiang, M., Cui, P., Wang, F., Xu, X., Zhu, W., Yang, S.: FEMA: flexible evolutionary multi-faceted analysis for dynamic behavioral pattern discovery. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 1186–1195 (2014)
Jiang, T., Sidiropoulos, N.D.: Kruskal’s permutation lemma and the identification of CANDECOMP/PARAFAC and bilinear models with constant modulus constraints. Signal Process. IEEE Trans. 52(9), 2625–2636 (2004)
Article MathSciNet MATH Google Scholar
Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., Boyle, R., Cantin, P.-l., Chao, C., Clark, C., Coriell, J., Daley, M., Dau, M., Dean, J., Gelb, B., Ghaemmaghami, T.V., Gottipati, R., Gulland, W., Hagmann, R., Ho, C.R., Hogberg, D., Hu, J., Hundt, R., Hurt, D., Ibarz, J., Jaffey, A., Jaworski, A., Kaplan, A., Khaitan, H., Killebrew, D., Koch, A., Kumar, N., Lacy, S., Laudon, J., Law, J., Le, D., Leary, C., Liu, Z., Lucke, K., Lundin, A., MacKean, G., Maggiore, A., Mahony, M., Miller, K., Nagarajan, R., Narayanaswami, R., Ni, R., Nix, K., Norrie, T., Omernick, M., Penukonda, N., Phelps, A., Ross, J., Ross, M., Salek, A., Samadiani, E., Severn, C., Sizikov, G., Snelham, M., Souter, J., Steinberg, D., Swing, A., Tan, M., Thorson, G., Tian, B., Toma, H., Tuttle, E., Vasudevan, V., Walter, R., Wang, W., Wilcox, E., Yoon, D.H.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th annual international symposium on computer architecture(ISCA ’17). ACM, New York, NY, pp. 1–12.https://doi.org/10.1145/3079856.3080246
Kaliman, I.A., Krylov, A.I.: New algorithm for tensor contractions on multi-core CPUs, GPUs, and accelerators enables CCSD and EOM-CCSD calculations with over 1000 basis functions on a single compute node. J. Comput. Chem. 38(11), 842–853 (2017). https://doi.org/10.1002/jcc.24713
Article Google Scholar
Kang, U., Papalexakis, E., Harpale, A., Faloutsos, C.: GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining(KDD ’12). ACM, New York, pp. 316–324 (2012). https://doi.org/10.1145/2339530.2339583
Kapteyn, A., Neudecker, H., Wansbeek, T.: An approach to n-mode components analysis. Psychometrika 51(2), 269–275 (1986). https://doi.org/10.1007/BF02293984
Article MathSciNet MATH Google Scholar
Karatzoglou, A., Amatriain, X., Baltrunas, L., Oliver, N.: Multiverse recommendation: N-dimensional tensor factorization for context-aware collaborative filtering. In: Proceedings of the 4th ACM conference on recommender systems(RecSys ’10). ACM, New York, pp. 79–86 (2010). https://doi.org/10.1145/1864708.1864727
Karlsson, L., Kressner, D., Uschmajew, A.: Parallel algorithms for tensor completion in the CP format. Parallel Comput. 57(2016), 222–234 (2016). https://doi.org/10.1016/j.parco.2015.10.002
Article MathSciNet Google Scholar
Kaya, O., Uçar, B.: Scalable sparse tensor decompositions in distributed memory systems. In: Proceedings of the international conference for high performance computing, networking, storage and analysis(SC ’15). ACM, New York, Article 77, p. 11 (2015). https://doi.org/10.1145/2807591.2807624
Kaya, O., Uçar, B.: Parallel candecomp/parafac decomposition of sparse tensors using dimension trees. SIAM J. Sci. Comput. 40(1), C99–C130 (2018). https://doi.org/10.1137/16M1102744
Article MathSciNet MATH Google Scholar
Khoromskaia, V., Khoromskij, B.N.: Tensor numerical methods in quantum chemistry. Walter de Gruyter GmbH & Co KG (2018)
Kiers, H.A.L., der Kinderen, A.: A fast method for choosing the numbers of components in Tucker3 analysis. Br. J. Math. Stat. Psychol. 56(1), 119–125 (2003)
Article MathSciNet Google Scholar
KleinOsowski, A.J., Lilja, D.J.: MinneSPEC: a new SPEC benchmark workload for simulation-based computer architecture research. IEEE Comput. Archit. Lett. 1(1), 7–7 (2002)
Article Google Scholar
Kolda, T., Bader, B.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009). https://doi.org/10.1137/07070111X
Article MathSciNet MATH Google Scholar
Kolda, T.G., Bader, B.W.: The TOPHITS model for higher-order web link analysis. Workshop Link Anal. Counterterror. Secur. 7, 26–29 (2006)
Google Scholar
Kolda, T.G., Sun, J.: Scalable tensor decompositions for multi-aspect data mining. In: Proceedings of the 2008 eighth IEEE international conference on data mining(ICDM ’08). IEEE Computer Society, Washington, DC, pp. 363–372 (2008). https://doi.org/10.1109/ICDM.2008.89
Köppl, C., Werner, H.-J.: Parallel and low-order scaling implementation of hartree-fock exchange using local density fitting. J. Chem. Theory Comput. 12(7), 3122–3134 (2016). https://doi.org/10.1021/acs.jctc.6b00251
Article Google Scholar
Latchoumane, C.-F.V. Vialatte, F.-B., Solé-Casals, J., Maurice, M., Wimalaratna, S.R. Hudson, N., Jeong, J., Cichocki, A.: Multiway array decomposition analysis of EEGs in Alzheimer’s disease. J. Neurosci. Methods207(1), 41–50 (2012)
Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I., Lempitsky, V.: Speeding-up convolutional neural networks using fine-tuned CP-decomposition. arXiv:1412.6553 (2014)
Lee, C, Potkonjak, M, Mangione-Smith, W.H., MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, IEEE Computer Society, pp. 330–335 (1997)
Lewis, C.A., Calvin, J.A., Valeev, E.F.: Clustered low-rank tensor format: introduction and application to fast construction of Hartree-Fock exchange. J. Chem. Theory Comput. 12(12), 5868–5880 (2016). https://doi.org/10.1021/acs.jctc.6b00884
Article Google Scholar
Li, A., Song, S.L., Chen, J., Liu, X., Tallent, N., Barker, K.: Tartan: evaluating modern GPU interconnect via a multi-GPU benchmark suite. In: 2018 IEEE international symposium on workload characterization (IISWC), IEEE, pp. 191–202 (2018)
Li, J.: Scalable tensor decompositions in high performance computing environments. Ph.D. Dissertation. Georgia Institute of Technology, Atlanta, GA (2018)
Li, J., Choi, J., Perros, I., Sun, J., Vuduc, R.: Model-driven sparse CP decomposition for higher-order tensors. In: 2017 IEEE international parallel and distributed processing symposium (IPDPS), pp. 1048–1057 (2017). https://doi.org/10.1109/IPDPS.2017.80
Li, J., Ma, Y., Vuduc, R.: ParTI!: a parallel tensor infrastructure for multicore CPU and GPUs (Version 1.0.0). https://github.com/hpcgarage/ParTI. (2018)
Li, J., Ma, Y., Yan, C., Vuduc, R.: Optimizing sparse tensor times matrix on multi-core and many-core architectures. In: Proceedings of the 6th workshop on irregular applications: architectures and algorithms (IA3 ’16). IEEE Press, Piscataway, pp. 26–33 (2016). https://doi.org/10.1109/IA3.2016.10
Li, J., Sun, J., Vuduc, R.: HiCOO: hierarchical storage of sparse tensors. In: Proceedings of the ACM/IEEE international conference on high performance computing, networking, storage and analysis (SC), Dallas, TX (2018)
Li, J., Tan, G., Chen, M., Sun, N.: SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication. In: Proceedings of the 34th ACM SIGPLAN conference on programming language design and implementation(PLDI ’13), ACM, New York, pp. 117–126 (2013). https://doi.org/10.1145/2491956.2462181
Li, J., Uçar, B., Çatalyürek, Ü.V., Sun, J., Barker, K., Vuduc, R.: Efficient and effective sparse tensor reordering. In: Proceedings of the ACM international conference on supercomputing(ICS ’19). ACM, New York, pp. 227–237 (2019). https://doi.org/10.1145/3330345.3330366
Li, Z., Uschmajew, A., Zhang, S.: On Convergence of the maximum block improvement method. SIAM J. Optim. 25(1), 210–233 (2015). https://doi.org/10.1137/130939110
Article MathSciNet MATH Google Scholar
Liu, B., Wen, C., Sarwate, A.D., and Dehnavi, M.M.: A unified optimization approach for sparse tensor operations on GPUs. In: 2017 IEEE international conference on cluster computing (CLUSTER), pp. 47–57 (2017). https://doi.org/10.1109/CLUSTER.2017.75
Ma, Y., Li, J., Wu, X., Yan, C., Sun, J., Vuduc, R.: Optimizing sparse tensor times matrix on GPUs. J. Parallel Distrib. Comput. (2018). https://doi.org/10.1016/j.jpdc.2018.07.018
Manzer, S., Epifanovsky, E., Krylov, A.I., Head-Gordon, M.: A general sparse tensor framework for electronic structure theory. J. Chem. Theory Comput. 13(3), 1108–1116 (2017). https://doi.org/10.1021/acs.jctc.6b00853
Article Google Scholar
Matsubara, Y., Sakurai, Y., van Panhuis, W.G., Faloutsos, C.: FUNNEL: automatic mining of spatially coevolving epidemics. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp. 105–114 (2014)
Mohlenkamp, M.J.: Musings on multilinear fitting (2010)
Mørup, M., Hansen, L.K., Arnfred, S.M., Lim, L., Madsen, K.H.: Shift invariant multilinear decomposition of neuroimaging data. Accept Publ. NeuroImage 42(4), 1439–50 (2008). http://www2.imm.dtu.dk/pubdb/p.php?5551
Nakatani, N., Chan, G.K.-L.: Efficient tree tensor network states (TTNS) for quantum chemistry: generalizations of the density matrix renormalization group algorithm. J. Chem. Phys. 138(13), 134113 (2013). https://doi.org/10.1063/1.4798639
Article Google Scholar
Nisa, I., Li, J., Sukumaran-Rajam, A., Vuduc, R.W., Sadayappan, P.: Load-balanced Sparse MTTKRP on GPUs. (2019). arXiv:1904.03329
Novikov, A., Izmailov, P., Khrulkov, V., Figurnov, M., Oseledets, I.V.: Tensor train decomposition on TensorFlow (T3F). arXiv:abs/1801.01928 (2018)
Novikov, A., Podoprikhin, D., Osokin, A., Vetrov, D.: Tensorizing neural networks. arXiv:abs/1509.06569 (2015)
Novikov, A., Rodomanov, A., Osokin, A., Vetrov, D.: Putting MRFs on a tensor train. In: Tony J., Eric P.X. (Eds.), Proceedings of the 31st international conference on machine learning (ICML-14), JMLR Workshop and Conference Proceedings, pp. 811–819 (2014). http://jmlr.org/proceedings/papers/v32/novikov14.pdf
Oh, H.: Tensors in power system computation I: distributed computation for optimal power flow, DC OPF. arXiv:abs/1605.06735 (2016)
Orús, R.: A practical introduction to tensor networks: matrix product states and projected entangled pair states. Ann. Phys. 349(2014), 117–158 (2014)
Article MathSciNet MATH Google Scholar
Oseledets, I.V.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011). https://doi.org/10.1137/090752286
Article MathSciNet MATH Google Scholar
Papalexakis, E.E., Akoglu, L., Ienco, D.: Do more views of a graph help? Community detection and clustering in multi-graphs. In: Proceedings of the 16th international conference on information fusion, FUSION 2013, Istanbul, July 9–12, 2013, pp. 899–905. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=6641090
Papalexakis, E.E., Faloutsos, C., Mitchell, T.M., Talukdar, P.P., Sidiropoulos, N.D., Murphy, B.: Turbo-SMT: accelerating coupled sparse matrix-tensor factorizations by 200x. Chapter 14, pp. 118–126 (2014). https://doi.org/10.1137/1.9781611973440.14
Papalexakis, E.E., Faloutsos, C., Sidiropoulos, N.D.: ParCube: sparse parallelizable tensor decompositions. In: Proceedings of the 2012 European conference on machine learning and knowledge discovery in databases - Volume Part I (ECML PKDD’12). Springer, Berlin, pp. 521–536 (2012). https://doi.org/10.1007/978-3-642-33460-3_39
Papalexakis, E.E., Faloutsos, C., Sidiropoulos, N.D.: ParCube: sparse parallelizable CANDECOMP-PARAFAC tensor decomposition. ACM Trans. Knowl. Discov. Data 10, 1, Article 3, pp. 25 (2015). https://doi.org/10.1145/2729980
Papalexakis, E.E., Sidiropoulos, N.D.: Co-clustering as multilinear decomposition with sparse latent factors. In: Acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on. IEEE, 2064–2067 (2011)
Peng, C., Calvin, J.A., Pavošević, F., Zhang, J., Valeev, E.F.: Massively parallel implementation of explicitly correlated coupled-cluster singles and doubles using tiledarray framework. J. Phys. Chem. A 120(51), 10231–10244 (2016). https://doi.org/10.1021/acs.jpca.6b10150
Article Google Scholar
Perros, I, Chen, R, Vuduc, R, Sun, J.: Sparse hierarchical tucker factorization and its application to healthcare. In: Proceedings of the 2015 IEEE international conference on data mining (ICDM)(ICDM ’15). IEEE Computer Society, Washington, DC, pp. 943–948 (2015). https://doi.org/10.1109/ICDM.2015.29
Perros, I., Papalexakis, E.E., Wang, F., Vuduc, R., Searles, E., Thompson, M., Sun, J.: SPARTan: scalable PARAFAC2 for large & sparse data. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’17). ACM, New York, pp. 375–384 (2017). https://doi.org/10.1145/3097983.3098014
Phipps, E.T., Kolda, T.G.: Software for sparse tensor decomposition on emerging computing architectures. arXiv:abs/1809.09175 (2018)
Poovey, J.A., Conte, T.M.: Markus Levy, and Shay Gal-On. 2009. A benchmark characterization of the EEMBC benchmark suite. IEEE micro 29, 5 (2009)
Rajih, M., Comon, P.: Enhanced line search: a novel method to accelerate Parafac. In: 2005 13th European signal processing conference, pp. 1–4 (2005)
Ravindran, N., Sidiropoulos, N.D., Smith, S., Karypis, G.: Memory-efficient parallel computation of tensor and matrix products for big tensor decompositions. Proceedings of the Asilomar conference on signals, systems, and computers (2014)
Rendle, S., Balby Marinho, L., Nanopoulos, A., Schmidt-Thieme, L.: Learning optimal ranking with tensor factorization for tag recommendation. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’09), ACM, New York, pp. 727–736 (2009). https://doi.org/10.1145/1557019.1557100
Reynolds, M., Doostan, A., Beylkin, G.: Randomized alternating least squares for canonical tensor decompositions: application to a PDE with random data. SIAM J. Sci. Comput. 38(5), A2634–A2664 (2016). https://doi.org/10.1137/15M1042802
Article MathSciNet MATH Google Scholar
Romera-Paredes, B., Aung, M.H., Bianchi-Berthouze, N., Pontil, M.: Multilinear multitask learning. In Proceedings of the 30th international conference on international conference on machine learning, Volume 28 (ICML’13). JMLR.org, III–1444–III–1452 (2013). http://dl.acm.org/citation.cfm?id=3042817.3043098
Savas, B., Lim, L.: Quasi-Newton methods on Grassmannians and multilinear approximations of tensors. SIAM J. Sci. Comput. 32(6), 3352–3393 (2010). https://doi.org/10.1137/090763172
Article MathSciNet MATH Google Scholar
Schein, A., Paisley, J., Blei, D.M., Wallach, H.: Bayesian poisson tensor factorization for inferring multilateral relations from sparse dyadic event counts. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, (KDD ’15). ACM, New York, pp. 1045–1054 (2015). https://doi.org/10.1145/2783258.2783414
Sedaghati, N., Mu, T., Pouchet, L.-N., Parthasarathy, S., Sadayappan, P.: Automatic selection of sparse matrix representation on GPUs. In: Proceedings of the 29th ACM on international conference on supercomputing (ICS ’15). ACM, New York, pp. 99–108 (2015). https://doi.org/10.1145/2751205.2751244
Setiawan, H., Huang, Z., Devlin, J., Lamar, T., Zbib, R., Schwartz, R.M., Makhoul, J.: Statistical machine translation features with multitask tensor networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the Asian Federation of natural language processing, ACL 2015, July 26-31, 2015, Beijing, Volume 1: Long Papers, pp. 31–41. (2015). http://aclweb.org/anthology/P/P15/P15-1004.pdf
Sidiropoulos, N.D., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E.E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Signal Process. 65(13), 3551–3582 (2017). https://doi.org/10.1109/TSP.2017.2690524
Article MathSciNet MATH Google Scholar
Sidiropoulos, N.D., Giannakis, G.B., Bro, R.: Blind PARAFAC receivers for DS-CDMA systems. Signal Process. IEEE Trans. 48(3), 810–823 (2000)
Article Google Scholar
Signoretto, M., Dinh, Q.T., De Lathauwer, L., Suykens, J.A.K.: Learning with tensors: a framework based on convex optimization and spectral regularization. Mach. Learn. 94(3), 303–351 (2014). https://doi.org/10.1007/s10994-013-5366-3
Article MathSciNet MATH Google Scholar
Smith, S., Beri, A., Karypis, G.: Constrained tensor factorization with accelerated AO-ADMM. In: 46th international conference on parallel processing (ICPP ’17). IEEE (2017)
Smith, S., Choi, J.W., Li, J., Vuduc, R., Park, J., Liu, X., Karypis, G.: FROSTT: the formidable repository of open sparse tensors and tools (2017). http://frostt.io/
Smith, S., Karypis, G.: Tensor-Matrix products with a compressed sparse tensor. In: Proceedings of the 5th workshop on irregular applications: architectures and algorithms. ACM, 7 (2015)
Smith, S, Karypis, G: A medium-grained algorithm for distributed sparse tensor factorization. In: Parallel and distributed processing symposium (IPDPS), 2016 IEEE international, IEEE (2016)
Smith, S., Karypis, G.: Accelerating the tucker decomposition with compressed sparse tensors. In: European conference on parallel processing, Springer, New York (2017)
Smith, S., Park, J., Karypis, G.: An exploration of optimization algorithms for high performance tensor completion. Proceedings of the 2016 ACM/IEEE conference on supercomputing (2016)
Smith, S., Park, J., Karypis, G.: Sparse tensor factorization on many-core processors with high-bandwidth memory. 31st IEEE international parallel & distributed processing symposium (IPDPS’17) (2017)
Smith, S., Ravindran, N., Sidiropoulos, N., Karypis, G.: SPLATT: efficient and parallel sparse tensor-matrix multiplication. In: Proceedings of the 29th IEEE international parallel & distributed processing symposium, (IPDPS) (2015)
Socher, R., Chen, D., Manning, C.D., Ng, A.: Reasoning with neural tensor networks for knowledge base completion. In: Advances in neural information processing systems, pp. 926–934 (2013)
Solomonik, E., Hoefler, T.: Sparse tensor algebra as a parallel programming model. (2015). arXiv:cs.MS/1512.00066
Song, C., Martínez, T.J.: Atomic orbital-based SOS-MP2 with tensor hypercontraction. I. GPU-based tensor construction and exploiting sparsity. J. Chem. Phys. 144(17), 174111 (2016). https://doi.org/10.1063/1.4948438
Article Google Scholar
Song, H.A., Hooi, B., Jereminov, M., Pandey, A., Pileggi, L.T., Faloutsos, C.: PowerCast: mining and forecasting power grid sequences. In: ECML/PKDD (2017)
Song, Z., Woodruff, D.P., Zhang, H.: Sublinear time orthogonal tensor decomposition. In: Proceedings of the 30th international conference on neural information processing systems (NIPS’16), Curran Associates Inc., USA, 793–801 (2016). http://dl.acm.org/citation.cfm?id=3157096.3157185
Sorber, L., Domanov, I., Barel, M., Lathauwer, L.: Exact line and plane search for tensor optimization. Comput. Optim. Appl. 63(1), 121–142 (2016). https://doi.org/10.1007/s10589-015-9761-5
Article MathSciNet MATH Google Scholar
Sorber, L., Van Barel, M., De Lathauwer, L.: Optimization-based algorithms for tensor decompositions: canonical polyadic decomposition, decomposition in rank-(\({L}_{r},\,{L}_{r},\,1\)) terms, and a new generalization. SIAM J. Optim. 23(2), 695–720 (2013). https://doi.org/10.1137/120868323
Su, B.-Y., Keutzer, K.: clSpMV: a cross-platform OpenCL SpMV Framework on GPUs. In: Proceedings of the 26th ACM international conference on supercomputing (ICS ’12). ACM, New York, pp. 353–364 (2012). https://doi.org/10.1145/2304576.2304624
Sun, J., Papadimitriou, S., Lin, C.-Y., Cao, N., Liu, S., Qian, W.: MultiVis: content-based social network exploration through multi-way visual analysis. In: SDM, Vol. 9. SIAM, 1063–1074 (2009)
Sun, J., Tao, D., Faloutsos, C.: Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’06). ACM, New York, pp. 374–383 (2006). https://doi.org/10.1145/1150402.1150445
Sun, J.-T., Zeng, H.-J., Liu, H., Lu, Y., Chen, Z.: CubeSVD: a novel approach to personalized web search. In: Proceedings of the 14th international conference on world wide web (WWW ’05). ACM, New York, 382–390 (2005). https://doi.org/10.1145/1060745.1060803
Symeonidis, P., Nanopoulos, A., Manolopoulos, Y.: Tag recommendations based on tensor dimensionality reduction. In: Proceedings of the 2008 ACM conference on recommender systems, (RecSys ’08). ACM, New York, NY, pp. 43–50 (2008). https://doi.org/10.1145/1454008.1454017
Tao, D., Li, X., Xindong, W., Weiming, H., Maybank, S.J.: Supervised tensor learning. Knowl. Inf. Syst. 13(1), 1–42 (2007). https://doi.org/10.1007/s10115-006-0050-6
Article Google Scholar
Tomasi, G., Bro, R.: A comparison of algorithms for fitting the PARAFAC model. Comput. Stat. Data Anal. 50(7), 1700–1734 (2006). https://doi.org/10.1016/j.csda.2004.11.013
Article MathSciNet MATH Google Scholar
Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychometrika 31(3), 279–311 (1966). https://doi.org/10.1007/BF02289464
Article MathSciNet Google Scholar
Vervliet, N., Debals, O., Sorber, L., Van Barel, M., De Lathauwer, L.: Tensorlab (Version 3.0). http://www.tensorlab.net. (2016)
Vervliet, N., De Lathauwer, L.: A randomized block sampling approach to canonical polyadic decomposition of large-scale tensors. IEEE J. Select. Top. Signal Process. 10(2), 284–295 (2016). https://doi.org/10.1109/JSTSP.2015.2503260
Article Google Scholar
Wang, F., Zhang, P., Qian, B., Wang, X., Davidson, I.: Clinical risk prediction with multilinear sparse logistic regression. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp. 145–154 (2014)
Wang, H., Thoss, M.: Multilayer formulation of the multiconfiguration time-dependent Hartree theory. J. Chem. Phys. 119(3), 1289–1299 (2003). https://doi.org/10.1063/1.1580111
Article Google Scholar
Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., Zheng, C., Lu, G., Zhan, K., Li, X., Qiu, B.: . BigDataBench: a big data benchmark suite from internet services. In: 2014 IEEE 20th international symposium on high performance computer architecture (HPCA), pp. 488–499 (2014). https://doi.org/10.1109/HPCA.2014.6835958
Wang, Y., Chen, R., Ghosh, J., Denny, J.C., Kho, A., Chen, Y., Malin, B.A., Sun, J.: Rubik: knowledge guided tensor factorization and completion for health data analytics. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’15), ACM, New York, pp. 1265–1274 (2015). https://doi.org/10.1145/2783258.2783395
Wimalawarne, K., Sugiyama, M., Tomioka, R.: Multitask learning meets tensor factorization: task imputation via convex optimization. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.), Advances in neural information processing systems 27, Curran Associates, Inc., 2825–2833 (2014). http://papers.nips.cc/paper/5628-multitask-learning-meets-tensor-factorization-task-imputation-via-convex-optimization.pdf
Wright, S., Nocedal, J.: Numerical optimization. Spring. Sci. 35(67–68), 7 (1999)
MATH Google Scholar
Xu, Y., Zhang, L., Liu, W.: Cubic analysis of social bookmarking for personalized recommendation. In: Xiaofang, Z., Jianzhong, L., HengTao, Shen., Masaru, K., Yanchun, Z. (Eds.), Frontiers of WWW research and development-APWeb 2006, Lecture notes in computer science, Vol. 3841. Springer, Berlin, pp. 733–738 (2006). https://doi.org/10.1007/11610113_66
Yu, D., Deng, L., Seide, F.: Large vocabulary speech recognition using deep tensor neural networks. In: INTERSPEECH (2012)
Yu, Q.R., Liu, Y.: Learning from multiway data: simple and efficient tensor regression. arXiv:abs/1607.02535 (2016)
Yu, R., Li, G., Liu, Y.: Tensor regression meets gaussian processes. arXiv:1710.11345 (2017)
Yu, R., Zheng, S., Anandkumar, A., Yue, Y.: Long-term forecasting using tensor-train RNNs. (2018). https://openreview.net/forum?id=HJJ0w--0W
Zhang, Z., Batselier, K., Liu, H., Daniel, L., Wong, N.: Tensor computation: a new framework for high-dimensional problems in EDA. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 36(4), 521–536 (2017)
Article Google Scholar
Zhang, Z., Yang, X., Oseledets, I.V., Karniadakis, G.E., Daniel, L.: Enabling high-dimensional hierarchical uncertainty quantification by ANOVA and tensor-train decomposition. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 34(1), 63–76 (2015)
Article Google Scholar
Zhao, Q., Caiafa, C.F., Mandic, D.P., Zhang, L., Ball, T., Schulze-bonhage, A., Cichocki, A.S.: Multilinear subspace regression: an orthogonal tensor decomposition approach. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q., (eds.), Advances in neural information processing systems 24, Curran Associates, Inc., 1269–1277 (2011). http://papers.nips.cc/paper/4328-multilinear-subspace-regression-an-orthogonal-tensor-decomposition-approach.pdf
Zhao, Y., Li, J., Liao, C., Shen, X.: Bridging the gap between deep learning and sparse matrix format selection. In: Proceedings of the 23rd ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP ’18), ACM, New York, pp. 94–108 (2018). https://doi.org/10.1145/3178487.3178495
Zhou, G., Cichocki, A., Xie, S.: Decomposition of big tensors with low multilinear rank. arXiv:abs/1412.1885 (2014)
Zhou, H., Li, L., Zhu, H.: Tensor regression with applications in neuroimaging data analysis. J. Am. Stat. Assoc. 108(502), 540–552 (2013). https://doi.org/10.1080/01621459.2013.776499
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research was partially funded by the US Department of Energy, Office for Advanced Scientific Computing (ASCR) under Award No. 66150: “CENATE: The Center for Advanced Technology Evaluation”. Pacific Northwest National Laboratory (PNNL) is a multiprogram national laboratory operated for DOE by Battelle Memorial Institute under Contract DE-AC05-76RL01830. This work was partially supported by the High Performance Data Analytics (HPDA) program at Pacific Northwest National Laboratory.

Author information

Authors and Affiliations

Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, 99354, USA
Jiajia Li, Ang Li & Kevin Barker
Hangzhou Dianzi University, Hangzhou, 310005, China
Yuchen Ma
Purdue University, West Lafayette, IN, 47907, USA
Xiaolong Wu

Authors

Jiajia Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuchen Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ang Li
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Barker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiajia Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Ma, Y., Wu, X. et al. PASTA: a parallel sparse tensor algorithm benchmark suite. CCF Trans. HPC 1, 111–130 (2019). https://doi.org/10.1007/s42514-019-00012-w

Download citation

Received: 22 January 2019
Accepted: 18 July 2019
Published: 05 August 2019
Issue Date: 01 August 2019
DOI: https://doi.org/10.1007/s42514-019-00012-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PASTA: a parallel sparse tensor algorithm benchmark suite

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Introduction to Bioinformatics

Bioinformatics: new tools and applications in life science and personalized medicine

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PASTA: a parallel sparse tensor algorithm benchmark suite

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Introduction to Bioinformatics

Bioinformatics: new tools and applications in life science and personalized medicine

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation