Solving Weighted Least Squares (WLS) problems on ARM-based architectures


The Weighted Least Squares algorithm (WLS) is applied to numerous optimization problems, but requires the use of high computational resources, especially when complex arithmetic is involved. This work aims to accelerate the resolution of a WLS problem by reducing the computational cost (relaying on BLAS/LAPACK routines) and the computational precision from double to single. As a test case, we design an IIR filter for a Graphic Equalizer, where the numerical errors due to single precision are easily visualized. In addition, given the importance of low power architectures for this kind of implementations, we evaluate the performance, scalability, and energy efficiency of each method on two different processors implementing the ARMv7 architecture, widely used in current mobile devices with power constraints. Results show that the method that exhibits a high theoretical computational cost overcomes in efficiency other methods with lower theoretical cost in architectures of this type.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

    The character(x) in the routine names should be replaced by s, d, c, z to indicate operations with single or double precision arithmetic on real or complex values.


  1. 1.

    Smith TM, van de Geijn RA, Smelyanskiy M, Hammond JR, Van Zee FG (2014) Anatomy of high-performance many-threaded matrix multiplication. In: 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2014)

  2. 2.

    Burrus CS (2012) Iterative reweighted least squares. OpenStax-CNC document, May 2012, module m45285. Accessed 2 Nov 2016

  3. 3.

    Khang SW (1972) Best \(L_p\) approximation. Math Comput 26(118):505–508

    Google Scholar 

  4. 4.

    Jackson LB (2008) Frequency-domain Steiglitz-McBride method for least-squares filter design, ARMA modeling, and periodogram smoothing. IEEE Signal Process Lett 15:49–52

    Article  Google Scholar 

  5. 5.

    Bank B (2012) Magnitude-priority filter design for audio applications. In: Proceedings of \(132^{{\rm nd}}\) AES Convention, Preprint No. 8591, Budapest, Hungary, May 2012

  6. 6.

    Daubechies I, Devire R, Fornasier M, Gntrk CS (2010) Iteratively reweighted least squares minimization for sparse recovery. Comput Music J 23(2):52–69

    Google Scholar 

  7. 7.

    Rämö J, Välimäki V, Bank B (2014) High-precision parallel graphic equalizer. IEEE/ACM Trans Audio Speech Lange Proc 22(12):1894–1904

    Article  Google Scholar 

  8. 8.

    Perez Gonzales E, Reiss J (2009) Automatic equalization of multi-channel audio using cross-adaptive methods. In: Proceedings of AES 127th Convention, New York, Oct. 2009

  9. 9.

    Rämö J, Välimäki V (2013) Live sound equalization and attenuation with a headset. In: Proceedings of AES 51st International Conference, Helsinki, Finland, Aug. 2013

  10. 10.

    Mäkivirta A, Antsalo P, Karjalainen M, Välimäki V (2003) Modal equalization of loudspeaker-room responses at low frequencies. J Audio Eng Soc 51(5):324–343

    Google Scholar 

  11. 11.

    Holters M, Zölzer U (2006) Graphic equalizer design using higher-order recursive filters. In: Proceedings of International Conference Digital Audio Effects, Montreal, QC, pp 37–40

  12. 12.

    Tassart S (2013) Graphical equalization using interpolated filter banks. J Audio Eng Soc 61(5):263–279

    Google Scholar 

  13. 13.

    Chen Z, Geng GS, Yin FL, Hao J (2014) A pre-distortion based design method for digital audio graphic equalizer. Digital Signal Process 25:296–302

    Article  Google Scholar 

  14. 14.

    Välimäki V, Reiss J (2016) All about audio equalization: solutions and frontiers. Appl Sci 6(5):129–145

    Article  Google Scholar 

  15. 15.

    Belloch JA, Välimäki V (2016) Efficient target-response interpolation for a graphic equalizer. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2016, pp 564–568

  16. 16.

    Belloch JA, Alventosa FJ, Alonso P, Quintana-Ortí ES, Vidal AM (2016) Accelerating multi-channel filtering of audio signal on arm processors. J Supercomput, pp 1–12. doi:10.1007/s11227-016-1689-8

  17. 17.

    Belloch JA, Gonzalez A, Igual FD, Mayo R, Quintana-Ortí ES (2015)Vectorization of binaural sound virtualization on the ARM cortex-A15 architecture. In: Proceedings of 23rd European Signal Processing Conference, (EUSIPCO), Nize, France, September 2015

  18. 18.

    Mitra G, Johnston B, Rendell A, McCreath E, Zhou J (2013) Use of simd vector operations to accelerate application code performance on low-powered arm and intel platforms. In: IEEE 27th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), May 2013, pp 1107–1116

  19. 19.

    Tomov S, Dongarra J, Baboulin M (2008) Towards dense linear algebra for hybrid gpu accelerated manycore systems. LAPACK Working Note, Tech. Rep. 210, Oct. 2008. Accessed 2 Nov 2016

  20. 20.

    Dongarra JJ, DuCroz J, Hammarling S, Hanson RJ (1985) A proposal for an extended set of fortran basic linear algebra subprograms. ACM Signum Newsletter, New York, pp 2–18

    MATH  Google Scholar 

  21. 21.

    Golub GH, Loan CFV (2013) Matrix Comput, 4th edn. The John Hopkins University Press, Baltimore

    Google Scholar 

  22. 22.

    Alonso P, Badia RM, Labarta J, Barreda M, Dolz MF, Mayo R, Quintana-Ortí ES, Reyes R (2012) Tools for power-energy modelling and analysis of parallel scientific applications. In: 41st International Conference on Parallel Processing—ICPP, 2012, pp 420–429

Download references


This work started in spring 2016 when Jose A. Belloch was a visiting postdoctoral researcher at Budapest University of Technology and Economics thanks to the European Network COST Action IC1305 inside the program Short Term Scientific Mission with the following reference: COST-SPASM-ECOST-STSM-IC1305-020416-072431. Dr. Jose A. Belloch is supported by GVA contract APOSTD/2016/069. The researchers from Universitat Jaume I are supported by the CICYT projects TIN2014-53495-R of MINECO and FEDER. The authors from the Universitat Politècnica de València are supported by MINECO Projects TEC2015-67387-C4-1-R, PROMETEOII/2014/003 and CAPAP-H5 network TIN2014-53522-REDT. The researcher from UCM is supported by the EU (FEDER) and the Spanish MINECO, under Grants TIN 2015-65277-R and TIN2012-32180. The work of Balázs Bank was supported by the ÚNKP-16-4-III New National Excellence Program of the Ministry of Human Capacities, Hungary.

Author information



Corresponding author

Correspondence to Jose A. Belloch.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Belloch, J.A., Bank, B., Igual, F.D. et al. Solving Weighted Least Squares (WLS) problems on ARM-based architectures. J Supercomput 73, 530–542 (2017).

Download citation


  • WLS
  • Audio processing
  • Low power processors
  • ARM\({}^{\circledR }\) Cortex