Balancing the learning ability and memory demand of a perceptron-based dynamically trainable neural network

Richter, Edward; Valancius, Spencer; McClanahan, Josiah; Mixter, John; Akoglu, Ali

doi:10.1007/s11227-018-2374-x

Balancing the learning ability and memory demand of a perceptron-based dynamically trainable neural network

Published: 16 April 2018

Volume 74, pages 3211–3235, (2018)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Edward Richter¹,
Spencer Valancius¹,
Josiah McClanahan¹,
John Mixter¹ &
…
Ali Akoglu¹

253 Accesses
Explore all metrics

Abstract

Artificial neural networks (ANNs) have become a popular means of solving complex problems in prediction-based applications such as image and natural language processing. Two challenges prominent in the neural network domain are the practicality of hardware implementation and dynamically training the network. In this study, we address these challenges with a development methodology that balances the hardware footprint and the quality of the ANN. We use the well-known perceptron-based branch prediction problem as a case study for demonstrating this methodology. This problem is perfect to analyze dynamic hardware implementations of ANNs because it exists in hardware and trains dynamically. Using our hierarchical configuration search space exploration, we show that we can decrease the memory footprint of a standard perceptron-based branch predictor by 2.3\(\times \) with only a 0.6% decrease in prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

A review on the long short-term memory model

Article 13 May 2020

References

ARM Cortex-M7 Processor (2014) ARM, revision r0p2
Akopyan F, Sawada J, Cassidy A, Alvarez-Icaza R, Arthur J, Merolla P, Imam N, Nakamura Y, Datta P, Nam GJ, Taba B, Beakes M, Brezzo B, Kuang JB, Manohar R, Risk WP, Jackson B, Modha DS (2015) Truenorth: design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Trans Comput Aided Des Integr Circuits Syst 34(10):1537–1557. https://doi.org/10.1109/TCAD.2015.2474396
Article Google Scholar
Amant RS, Jimenez DA, Burger D (2008) Low-power, high-performance analog neural branch prediction. In: 2008 41st IEEE/ACM International Symposium on Microarchitecture, pp 447–458. https://doi.org/10.1109/MICRO.2008.4771812
Bhattacharjee A (2017) Using branch predictors to predict brain activity in brain-machine implants. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, ACM, New York, NY, USA, MICRO-50 ’17, pp 409–422. https://doi.org/10.1145/3123939.3123943
Burger D, Austin TM (1997) The simplescalar tool set, version 2.0. SIGARCH Comput Archit News 25(3):13–25. https://doi.org/10.1145/268806.268810
Article Google Scholar
Calder B, Grunwald D, Lindsay D, Martin J, Mozer M, Zorn B (1995) Corpus-based static branch prediction. SIGPLAN Not 30(6):79–92. https://doi.org/10.1145/223428.207118
Article Google Scholar
Das M, Banerjee A, Sardar B (2017) An empirical study on performance of branch predictors with varying storage budgets. In: 2017 7th International Symposium on Embedded Computing and System Design (ISED), pp 1–5. https://doi.org/10.1109/ISED.2017.8303913
Henning JL (2000) SPEC CPU2000: measuring CPU performance in the new millennium. Computer 33(7):28–35. https://doi.org/10.1109/2.869367
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems 29. Curran Associates, Inc., pp 4107–4115. http://papers.nips.cc/paper/6573-binarized-neural-networks.pdf
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Quantized neural networks: Training neural networks with low precision weights and activations. CoRR arXiv:1609.07061
Jimenez DA (2003) Fast path-based neural branch prediction. In: Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, IEEE Computer Society, Washington, DC, USA, MICRO 36, p 243. http://dl.acm.org/citation.cfm?id=956417.956562
Jimenez DA, Lin C (2001) Dynamic branch prediction with perceptrons. In: Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture, pp 197–206. https://doi.org/10.1109/HPCA.2001.903263
Jimenez DA, Lin C (2002) Neural methods for dynamic branch prediction. ACM Trans Comput Syst 20(4):369–397. https://doi.org/10.1145/571637.571639
Article Google Scholar
Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A, Boyle R, Cantin P, Chao C, Clark C, Coriell J, Daley M, Dau M, Dean J, Gelb B, Ghaemmaghami TV, Gottipati R, Gulland W, Hagmann R, Ho RC, Hogberg D, Hu J, Hundt R, Hurt D, Ibarz J, Jaffey A, Jaworski A, Kaplan A, Khaitan H, Koch A, Kumar N, Lacy S, Laudon J, Law J, Le D, Leary C, Liu Z, Lucke K, Lundin A, MacKean G, Maggiore A, Mahony M, Miller K, Nagarajan R, Narayanaswami R, Ni R, Nix K, Norrie T, Omernick M, Penukonda N, Phelps A, Ross J, Salek A, Samadiani E, Severn C, Sizikov G, Snelham M, Souter J, Steinberg D, Swing A, Tan M, Thorson G, Tian B, Toma H, Tuttle E, Vasudevan V, Walter R, Wang W, Wilcox E, Yoon DH (2017) In-datacenter performance analysis of a tensor processing unit. CoRR arXiv:1704.04760
Khan MM, Lester DR, Plana LA, Rast A, Jin X, Painkras E, Furber SB (2008) Spinnaker: Mapping neural networks onto a massively-parallel chip multiprocessor. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp 2849–2856. https://doi.org/10.1109/IJCNN.2008.4634199
Ko JH, Fromm J, Philipose M, Tashev I, Zarar S (2017) Precision scaling of neural networks for efficient audio processing. ArXiv e-prints arXiv:1712.01340
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., pp 1097–1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Lu Y, Liu Y, Wang H (2011) A study of perceptron based branch prediction on simplescalar platform. In: 2011 IEEE International Conference on Computer Science and Automation Engineering, vol 4, pp 591–595. https://doi.org/10.1109/CSAE.2011.5952918
Ma Y, Gao H, Zhou H (2006) Using indexing functions to reduce conflict aliasing in branch prediction tables. IEEE Trans Comput 55(8):1057–1061. https://doi.org/10.1109/TC.2006.133
Article Google Scholar
Maas A, Le QV, ONeil TM, Vinyals O, Nguyen P, Ng AY (2012) Recurrent neural networks for noise reduction in robust ASR. In: INTERSPEECH
Mao Y, Shen J, Gui X (2018) A study on deep belief net for branch prediction. IEEE Access 6:10,779–10,786. https://doi.org/10.1109/ACCESS.2017.2772334
Article Google Scholar
McFarling S (1993) Combining branch predictors. Technical Report TN-36m, Digital Western Research Laboratory, Palo Alto, CA
Michaud P, Seznec A (2014) Pushing the branch predictability limits with the multi-poTAGE+SC predictor. In: 4th JILP Workshop on Computer Architecture Competitions (JWAC-4): Championship Branch Prediction (CBP-4), Minneapolis, USA. https://hal.archives-ouvertes.fr/hal-01087719
Murray AF (1995) Applications of neural networks. Springer, New York
Book Google Scholar
Nazzal J, El-Emary M, I, A Najim S, (2008) Multilayer perceptron neural network (MLPS) for analyzing the properties of Jordan Oil Shale. World Appl Sci J 5:546–552
Orhan U, Hekim M, Ozer M (2011) EGG signals classification using the k-means clustering and a multilayer perceptron neural network model. Expert Syst Appl 38(10):13475–13481. https://doi.org/10.1016/j.eswa.2011.04.149, http://www.sciencedirect.com/science/article/pii/S0957417411006762
Parasanna S, Sarma R, Balasubramanian S (2017) A study on improving branch prediction accuracy in the context of conditional branches. Int J Eng Technol Sci Res 4:654–662
Google Scholar
Patterson DA, Hennessy JL (2013) Computer organization and design, fifth edition: the hardware/software interface, 5th edn. Morgan Kaufmann Publishers Inc., San Francisco
Google Scholar
Rau BR (1991) Pseudo-randomly interleaved memory. In: Proceedings of the 18th Annual International Symposium on Computer Architecture, ACM, New York, NY, USA, ISCA ’91, pp 74–83. https://doi.org/10.1145/115952.115961
Sainath T, Vinyals O, Senior A, Sak H (2015) Convolutional, long short-term memory, fully connected deep neural networks. In: ICASSP
Seznec A (2005) Analysis of the o-geometric history length branch predictor. In: 32nd International Symposium on Computer Architecture (ISCA’05), pp 394–405. https://doi.org/10.1109/ISCA.2005.13
Seznec A (2007) The L-TAGE branch predictor. J Instr Level Parallelism. http://wwwjilp.org/vol9
Seznec A (2011) A 64-kbytes ISL-TAGE branch predictor. In: Proceedings of the 3rd Championship Branch Prediction
Seznec A (2011) A new case for the tage branch predictor. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, ACM, New York, NY, USA, MICRO-44, pp 117–127. https://doi.org/10.1145/2155620.2155635
Sherwood T, Sair S, Calder B (2003) Phase tracking and prediction. In: Proceedings of the 30th Annual International Symposium on Computer Architecture, ACM, New York, NY, USA, ISCA ’03, pp 336–349. https://doi.org/10.1145/859618.859657
Sprangle E, Chappell RS, Alsup M, Patt YN (1997) The agree predictor: a mechanism for reducing negative branch history interference. In: Conference Proceedings. The 24th Annual International Symposium on Computer Architecture, pp 284–291. https://doi.org/10.1145/384286.264210
Umuroglu Y, Fraser NJ, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) Finn: a framework for fast, scalable binarized neural network inference. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ACM, New York, NY, USA, FPGA ’17, pp 65–74. https://doi.org/10.1145/3020078.3021744
Vanzella E, Cristiani S, Fontana A, Nonino M, Arnouts S, Giallongo E, Grazian A, Fasano G, Popesso P, Saracco P, Zaggia S (2004) Photometric redshifts with the multilayer perceptron neural network: application to the HDF-S and SDSS. Astron Astrophys 423:761–776. https://doi.org/10.1051/0004-6361:20040176 arXiv:astro-ph/0312064
Article Google Scholar
Yeh TY, Patt YN (1991) Two-level adaptive training branch prediction. In: Proceedings of the 24th Annual International Symposium on Microarchitecture, ACM, New York, NY, USA, MICRO 24, pp 51–61. https://doi.org/10.1145/123465.123475
Zhou Z, Kejriwal M, Miikkulainen R (2013) Extended scaled neural predictor for improved branch prediction. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp 1–7. https://doi.org/10.1109/IJCNN.2013.6707059

Download references

Acknowledgements

Research reported in this publication was supported in part by Raytheon Missile Systems under the contract 2017-UNI-0008. The content is solely the responsibility of the authors and does not necessarily represent the official views of the Raytheon Missile Systems.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ, USA
Edward Richter, Spencer Valancius, Josiah McClanahan, John Mixter & Ali Akoglu

Authors

Edward Richter
View author publications
You can also search for this author in PubMed Google Scholar
Spencer Valancius
View author publications
You can also search for this author in PubMed Google Scholar
Josiah McClanahan
View author publications
You can also search for this author in PubMed Google Scholar
John Mixter
View author publications
You can also search for this author in PubMed Google Scholar
Ali Akoglu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edward Richter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Richter, E., Valancius, S., McClanahan, J. et al. Balancing the learning ability and memory demand of a perceptron-based dynamically trainable neural network. J Supercomput 74, 3211–3235 (2018). https://doi.org/10.1007/s11227-018-2374-x

Download citation

Published: 16 April 2018
Issue Date: July 2018
DOI: https://doi.org/10.1007/s11227-018-2374-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Balancing the learning ability and memory demand of a perceptron-based dynamically trainable neural network

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A review on the long short-term memory model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Balancing the learning ability and memory demand of a perceptron-based dynamically trainable neural network

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A review on the long short-term memory model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation