Abstract
The Weighted Least Squares algorithm (WLS) is applied to numerous optimization problems, but requires the use of high computational resources, especially when complex arithmetic is involved. This work aims to accelerate the resolution of a WLS problem by reducing the computational cost (relaying on BLAS/LAPACK routines) and the computational precision from double to single. As a test case, we design an IIR filter for a Graphic Equalizer, where the numerical errors due to single precision are easily visualized. In addition, given the importance of low power architectures for this kind of implementations, we evaluate the performance, scalability, and energy efficiency of each method on two different processors implementing the ARMv7 architecture, widely used in current mobile devices with power constraints. Results show that the method that exhibits a high theoretical computational cost overcomes in efficiency other methods with lower theoretical cost in architectures of this type.
Similar content being viewed by others
Notes
The character(x) in the routine names should be replaced by s, d, c, z to indicate operations with single or double precision arithmetic on real or complex values.
References
Smith TM, van de Geijn RA, Smelyanskiy M, Hammond JR, Van Zee FG (2014) Anatomy of high-performance many-threaded matrix multiplication. In: 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2014)
Burrus CS (2012) Iterative reweighted least squares. OpenStax-CNC document, May 2012, module m45285. http://cnx.org/content/m45285/1.12. Accessed 2 Nov 2016
Khang SW (1972) Best \(L_p\) approximation. Math Comput 26(118):505–508
Jackson LB (2008) Frequency-domain Steiglitz-McBride method for least-squares filter design, ARMA modeling, and periodogram smoothing. IEEE Signal Process Lett 15:49–52
Bank B (2012) Magnitude-priority filter design for audio applications. In: Proceedings of \(132^{{\rm nd}}\) AES Convention, Preprint No. 8591, Budapest, Hungary, May 2012
Daubechies I, Devire R, Fornasier M, Gntrk CS (2010) Iteratively reweighted least squares minimization for sparse recovery. Comput Music J 23(2):52–69
Rämö J, Välimäki V, Bank B (2014) High-precision parallel graphic equalizer. IEEE/ACM Trans Audio Speech Lange Proc 22(12):1894–1904
Perez Gonzales E, Reiss J (2009) Automatic equalization of multi-channel audio using cross-adaptive methods. In: Proceedings of AES 127th Convention, New York, Oct. 2009
Rämö J, Välimäki V (2013) Live sound equalization and attenuation with a headset. In: Proceedings of AES 51st International Conference, Helsinki, Finland, Aug. 2013
Mäkivirta A, Antsalo P, Karjalainen M, Välimäki V (2003) Modal equalization of loudspeaker-room responses at low frequencies. J Audio Eng Soc 51(5):324–343
Holters M, Zölzer U (2006) Graphic equalizer design using higher-order recursive filters. In: Proceedings of International Conference Digital Audio Effects, Montreal, QC, pp 37–40
Tassart S (2013) Graphical equalization using interpolated filter banks. J Audio Eng Soc 61(5):263–279
Chen Z, Geng GS, Yin FL, Hao J (2014) A pre-distortion based design method for digital audio graphic equalizer. Digital Signal Process 25:296–302
Välimäki V, Reiss J (2016) All about audio equalization: solutions and frontiers. Appl Sci 6(5):129–145
Belloch JA, Välimäki V (2016) Efficient target-response interpolation for a graphic equalizer. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2016, pp 564–568
Belloch JA, Alventosa FJ, Alonso P, Quintana-Ortí ES, Vidal AM (2016) Accelerating multi-channel filtering of audio signal on arm processors. J Supercomput, pp 1–12. doi:10.1007/s11227-016-1689-8
Belloch JA, Gonzalez A, Igual FD, Mayo R, Quintana-Ortí ES (2015)Vectorization of binaural sound virtualization on the ARM cortex-A15 architecture. In: Proceedings of 23rd European Signal Processing Conference, (EUSIPCO), Nize, France, September 2015
Mitra G, Johnston B, Rendell A, McCreath E, Zhou J (2013) Use of simd vector operations to accelerate application code performance on low-powered arm and intel platforms. In: IEEE 27th International Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), May 2013, pp 1107–1116
Tomov S, Dongarra J, Baboulin M (2008) Towards dense linear algebra for hybrid gpu accelerated manycore systems. LAPACK Working Note, Tech. Rep. 210, Oct. 2008. http://www.netlib.org/lapack/lawnspdf/lawn210.pdf. Accessed 2 Nov 2016
Dongarra JJ, DuCroz J, Hammarling S, Hanson RJ (1985) A proposal for an extended set of fortran basic linear algebra subprograms. ACM Signum Newsletter, New York, pp 2–18
Golub GH, Loan CFV (2013) Matrix Comput, 4th edn. The John Hopkins University Press, Baltimore
Alonso P, Badia RM, Labarta J, Barreda M, Dolz MF, Mayo R, Quintana-Ortí ES, Reyes R (2012) Tools for power-energy modelling and analysis of parallel scientific applications. In: 41st International Conference on Parallel Processing—ICPP, 2012, pp 420–429
Acknowledgements
This work started in spring 2016 when Jose A. Belloch was a visiting postdoctoral researcher at Budapest University of Technology and Economics thanks to the European Network COST Action IC1305 inside the program Short Term Scientific Mission with the following reference: COST-SPASM-ECOST-STSM-IC1305-020416-072431. Dr. Jose A. Belloch is supported by GVA contract APOSTD/2016/069. The researchers from Universitat Jaume I are supported by the CICYT projects TIN2014-53495-R of MINECO and FEDER. The authors from the Universitat Politècnica de València are supported by MINECO Projects TEC2015-67387-C4-1-R, PROMETEOII/2014/003 and CAPAP-H5 network TIN2014-53522-REDT. The researcher from UCM is supported by the EU (FEDER) and the Spanish MINECO, under Grants TIN 2015-65277-R and TIN2012-32180. The work of Balázs Bank was supported by the ÚNKP-16-4-III New National Excellence Program of the Ministry of Human Capacities, Hungary.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Belloch, J.A., Bank, B., Igual, F.D. et al. Solving Weighted Least Squares (WLS) problems on ARM-based architectures. J Supercomput 73, 530–542 (2017). https://doi.org/10.1007/s11227-016-1910-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1910-9