Fast Higher-Order Functions for Tensor Calculus with Tensors and Subtensors

  • Cem Bassoy
  • Volker Schatz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10860)


Tensors analysis has become a popular tool for solving problems in computational neuroscience, pattern recognition and signal processing. Similar to the two-dimensional case, algorithms for multidimensional data consist of basic operations accessing only a subset of tensor data. With multiple offsets and step sizes, basic operations for subtensors require sophisticated implementations even for entrywise operations.

In this work, we discuss the design and implementation of optimized higher-order functions that operate entrywise on tensors and subtensors with any non-hierarchical storage format and arbitrary number of dimensions. We propose recursive multi-index algorithms with reduced index computations and additional optimization techniques such as function inlining with partial template specialization. We show that single-index implementations of higher-order functions with subtensors introduce a runtime penalty of an order of magnitude than the recursive and iterative multi-index versions. Including data- and thread-level parallelization, our optimized implementations reach 68% of the maximum throughput of an Intel Core i9-7900X. In comparison with other libraries, the average speedup of our optimized implementations is up to 5x for map-like and more than 9x for reduce-like operations. For symmetric tensors we measured an average speedup of up to 4x.


  1. 1.
    Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D.G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI 2016, pp. 265–283. USENIX Association, Berkeley (2016)Google Scholar
  2. 2.
    Andres, B., Köthe, U., Kröger, T., Hamprecht, F.A.: Runtime-flexible multi-dimensional arrays and views for C++98 and C++0x. CoRR abs/1008.2909 (2010)Google Scholar
  3. 3.
    Brazell, M., Li, N., Navasca, C., Tamon, C.: Solving multilinear systems via tensor inversion. SIAM J. Matrix Anal. Appl. 34, 542–570 (2013)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Cohen, N.H.: Eliminating redundant recursive calls. ACM Trans. Program. Lang. Syst. (TOPLAS) 5(3), 265–299 (1983)CrossRefGoogle Scholar
  5. 5.
    Fanaee-T, H., Gama, J.: Multi-aspect-streaming tensor analysis. Knowl.-Based Syst. 89, 332–345 (2015)CrossRefGoogle Scholar
  6. 6.
    Garcia, R., Lumsdaine, A.: MultiArray: a C++ library for generic programming with arrays. Softw. Pract. Exper. 35(2), 159–188 (2005)CrossRefGoogle Scholar
  7. 7.
    Hackbusch, W.: Numerical tensor calculus. Acta Numer. 23, 651–742 (2014)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Harrison, A.P., Joseph, D.: Numeric tensor framework: exploiting and extending einstein notation. J. Comput. Sci. 16, 128–139 (2016)CrossRefGoogle Scholar
  9. 9.
    Kolda, T.G., Sun, J.: Scalable tensor decompositions for multi-aspect data mining. In: Proceedings of the 8th IEEE International Conference on Data Mining, pp. 363–372. IEEE, Washington (2008)Google Scholar
  10. 10.
    Lim, L.H.: Tensors and hypermatrices. In: Hogben, L. (ed.) Handbook of Linear Algebra, 2nd edn. Chapman and Hall, New York (2017)Google Scholar
  11. 11.
    Liu, Y.A., Stoller, S.D.: From recursion to iteration: what are the optimizations? ACM SIGPLAN Not. 34(11), 73–82 (1999)CrossRefGoogle Scholar
  12. 12.
    Savas, B., Eldén, L.: Handwritten digit classification using higher order singular value decomposition. Pattern Recogn. 40(3), 993–1003 (2007)CrossRefGoogle Scholar
  13. 13.
    Suter, S.K., Makhynia, M., Pajarola, R.: Tamresh - tensor approximation multiresolution hierarchy for interactive volume visualization. In: Proceedings of the 15th Eurographics Conference on Visualization, EuroVis 2013, pp. 151–160. Eurographics Association (2013)CrossRefGoogle Scholar
  14. 14.
    Veldhuizen, T.L.: Arrays in blitz++. In: Caromel, D., Oldehoeft, R.R., Tholburn, M. (eds.) ISCOPE 1998. LNCS, vol. 1505, pp. 223–230. Springer, Heidelberg (1998). Scholar
  15. 15.
    Ward, M.P., Bennett, K.H.: Recursion removal/introduction by formal transformation: an aid to program development and program comprehension. Comput. J. 42(8), 650–650 (1999)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Fraunhofer IOSBEttlingenGermany

Personalised recommendations