Data clustering for efficient approximate computing

Abstract

Given the saturation of single-threaded performance improvements in General-Purpose Processor, novel architectural techniques are required to meet emerging demands. In this paper, we propose a generic acceleration framework for approximate algorithms that replaces function execution by table look-up accesses in dedicated memories. A strategy based on the K-Means Clustering algorithm is used to learn mappings from arbitrary function inputs to frequently occurring outputs at compile-time. At run-time, these learned values are fetched from dedicated look-up tables and the best result is selected using the Nearest-Centroid Classifier, which is implemented in hardware. The proposed approach improves over the state-of-the-art neural acceleration solution, with nearly 3X times better performance, \(18.72\%\) up to \(90.99\%\) energy reductions and \(17\%\) area savings under similar levels of quality, thus opening new opportunities for performance harvesting in approximate accelerators.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Notes

  1. 1.

    Presenting the details of this algorithm or ANN training is beyond the scope of this paper. We present here an overview with only enough details to allow a comparison with the approximation approach we developed.

References

  1. 1.

    Beck ACS, Lisba CAL, Carro L (2012) Adaptable embedded systems. Springer Publishing Company, Incorporated, Berlin

    Google Scholar 

  2. 2.

    Xu Q, Mytkowicz T, Kim NS (2016) Approximate computing: a survey. IEEE Des Test 33(1):8–22

    Article  Google Scholar 

  3. 3.

    Mittal S (2016) A survey of techniques for approximate computing. ACM Comput Surv 48(4):1–33

    Google Scholar 

  4. 4.

    Sidiroglou-Douskos S, Misailovic S, Hoffmann H, Rinard M (2011) Managing performance versus accuracy trade-offs with loop perforation. In: Proceedings of the ACM SIGSOFT symposium and European conference on foundations of software engineering (SIGSOFT/FSE)

  5. 5.

    Brandalero M, da Silveira LA, Souza JD, Beck ACS (2017) Accelerating error-tolerant applications with approximate function reuse. Sci Comput Progr 165:54–67

    Article  Google Scholar 

  6. 6.

    Hegde R, Shanbhag NR (1999) Energy-efficient signal processing via algorithmic noise-tolerance. In: Proceedings of the international symposium on low power electronics and design (ISPLED)

  7. 7.

    Mohapatra D, Chippa VK, Raghunathan A, Roy K (2011) Design of voltage-scalable meta-functions for approximate computing. In: Proceedings of the design, automation & test in Europe (DATE), pp 1–6

  8. 8.

    Brandalero M, Beck ACS, Carro L, Shafique M (2018) Approximate on-the-fly coarse-grained reconfigurable acceleration for general-purpose applications. In: Design automation conference (DAC), pp 1–6

  9. 9.

    Esmaeilzadeh H, Sampson A, Ceze L, Burger D (2012) Neural acceleration for general-purpose approximate programs. In: Proceedings of the international symposium on microarchitecture (MICRO), pp 449–460

  10. 10.

    Yazdanbakhsh A, Park J, Sharma, Lotfi-Kamran P, Esmaeilzadeh H (2015) Neural acceleration for GPU throughput processors. In: Proceedings of the international symposium on microarchitecture (MICRO), pp 482–493

  11. 11.

    Moreau T et al. (2015) SNNAP: approximate computing on programmable SoCs via neural acceleration. In: Proceedings of the international symposium on high performance computer architecture (HPCA), pp 603–614

  12. 12.

    St. Amant R et al (2014) General-purpose code acceleration with limited-precision analog computation. ACM SIGARCH Comput Arch News 42(3):505–516

    Article  Google Scholar 

  13. 13.

    Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366

    Article  Google Scholar 

  14. 14.

    Chaudhuri S, Gulwani S, Lublinerman R, Navidpour S (2011) Proving programs robust. In: Proceedings of the ACM SIGSOFT symposium and european conference on foundations of software engineering (SIGSOFT/FSE), p 102

  15. 15.

    Yazdanbakhsh A, Mahajan D, Lotfi-Kamran P, Esmaeilzadeh H (2016) AxBench: a multiplatform benchmark suite for approximate computing. IEEE Des Test 34(2):60–68

    Article  Google Scholar 

  16. 16.

    Muralimanohar N, Balasubramonian R, Jouppi NP (2009) CACTI 6.0: a tool to model large caches. Technical Report, HP Laboratories

  17. 17.

    Browne S, Dongarra J, Garner N, Ho G, Mucci P (2000) A portable programming interface for performance evaluation on modern processors. Int J High Perform Comput Appl 14(3):189–204

    Article  Google Scholar 

  18. 18.

    Han J, Orshansky M (2013) Approximate computing: an emerging paradigm for energy-efficient design. In: Proceedings of the European test symposium (ETS), pp 1–6

  19. 19.

    Hoffmann H et al. (2011) Dynamic knobs for responsive power-aware computing. In: ACM SIGARCH computer architecture news, vol 39, no 1. ACM, pp 199–212

  20. 20.

    Misailovic S, Sidiroglou S, Hoffmann H, Rinard M (2010) Quality of service profiling. In: Proceedings of the international conference on software engineering (ICSE), p 25

  21. 21.

    Mengte J, Raghunathan A, Chakradhar S, Byna S (2010) Exploiting the forgiving nature of applications for scalable parallel execution. In: IEEE international symposium on parallel and distributed processing (IPDPS). IEEE, pp 1–12

  22. 22.

    Misailovic S, Sidiroglou S, Rinard MC (2012) Dancing with uncertainty. In: Proceedings of the 2012 ACM workshop on relaxing synchronization for multicore and manycore scalability. ACM, pp 51–60

  23. 23.

    Recht B, Re C, Wright S, Niu F (2011) Hogwild: a lock-free approach to parallelizing stochastic gradient descent. Adv Neural Inf Process Syst 693–701

  24. 24.

    Renganarayana L, Srinivasan V, Nair R, Prener D (2012) Programming with relaxed synchronization. In: Proceedings of the 2012 ACM workshop on relaxing synchronization for multicore and manycore scalability. ACM, pp 41–50

  25. 25.

    Grigorian B, Farahpour N, Reinman G (2015) BRAINIAC: bringing reliable accuracy into neurally-implemented approximate computing. In: International symposium on high performance computer architecture (HPCA), pp 615–626

  26. 26.

    Chen T et al. (2012) BenchNN: on the broad potential application scope of hardware neural network accelerators. In: Proceedings of the international symposium on workload characterization (IISWC), pp 36–45

  27. 27.

    Ionica MH, Gregg D (2015) The movidius myriad architecture’s potential for scientific computing. IEEE Micro 35(1):6–14

    Article  Google Scholar 

  28. 28.

    Chen Y-H, Krishna T, Emer JS, Sze V (2016) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits 52(1):127–138

    Article  Google Scholar 

  29. 29.

    Yoffie DB (2014) Mobileye: the future of driverless cars. Harvard Business School Case, Boston, pp 421–715

    Google Scholar 

  30. 30.

    Pham P-H et al (2012) Neuflow: dataflow vision processing system-on-a-chip. In: IEEE 55th international midwest symposium on circuits and systems (MWSCAS). IEEE, pp 1044–1047

  31. 31.

    Shoushtari M, BanaiyanMofrad A, Dutt N (2015) Exploiting partially-forgetful memories for approximate computing. IEEE Embed Syst Lett 7(1):19–22

    Article  Google Scholar 

  32. 32.

    Shafique M, Hafiz R, Rehman S, El-Harouni W, Henkel J (2016) Cross-layer approximate computing: from logic to architectures. In: Design automation conference (DAC), pp 1–6

  33. 33.

    Alvarez C, Corbal J, Valero M (2005) Fuzzy memoization for floating-point multimedia applications. IEEE Trans Comput 54(7):922–927

    Article  Google Scholar 

  34. 34.

    Liu S, Pattabiraman K, Moscibroda T, Zorn BG (2009) Flicker: saving refresh-power in mobile devices through critical data partitioning. In: Proceedings of the international conference on architectural support for programming languages and operating systems (ASPLOS’09). Citeseer

  35. 35.

    Lucas J, Alvarez-Mesa M, Andersch M, Juurlink B (2014) Sparkk: quality-scalable approximate storage in dram. In: Memory Forum 1–9

  36. 36.

    Chang IJ, Mohapatra D, Roy K (2011) A priority-based 6t/8t hybrid sram architecture for aggressive voltage scaling in video applications. IEEE Trans Circuits Syst Video Technol 21(2):101–112

    Article  Google Scholar 

  37. 37.

    Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10):1550–1560

    Article  Google Scholar 

  38. 38.

    Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666

    Article  Google Scholar 

  39. 39.

    Suresh A, Swamy BN, Rohou E, Seznec A (2015) Intercepting functions for memoization: a case study using transcendental functions. ACM Trans Archit Code Optim (TACO) 12(2):18

    Google Scholar 

  40. 40.

    Sampson A et al (2011) EnerJ: approximate data types for safe and general low-power computation. In: Proceedings of the conference on programming language design and implementation (PLDI), vol 46, no 6, p 164

  41. 41.

    Baek W, Chilimbi TM (2010) Green: a framework for supporting energy-conscious programming using controlled approximation. In: ACM sigplan notices, vol 45, no 6. ACM, pp 198–209

  42. 42.

    Esmaeilzadeh H, Sampson A, Ceze L, Burger D (2012) Architecture support for disciplined approximate programming. In: ACM SIGPLAN notices, vol 47, no 4. ACM, pp 301–312

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Michael G. Jordan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This study was financed in part by the CoordenaÇão de AperfeiÇoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. The authors would also like to thank CNPq and FAPERGS for partial support.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jordan, M.G., Brandalero, M., Malfatti, G.M. et al. Data clustering for efficient approximate computing. Des Autom Embed Syst 24, 3–22 (2020). https://doi.org/10.1007/s10617-019-09228-z

Download citation

Keywords

  • Approximate computing
  • Approximate memoization
  • Data clustering
  • Reuse