Advertisement

FPGA-Accelerated Machine Learning Inference as a Service for Particle Physics Computing

  • Javier Duarte
  • Philip Harris
  • Scott Hauck
  • Burt Holzman
  • Shih-Chieh Hsu
  • Sergo Jindariani
  • Suffian Khan
  • Benjamin Kreis
  • Brian Lee
  • Mia Liu
  • Vladimir Lončar
  • Jennifer Ngadiuba
  • Kevin Pedro
  • Brandon Perez
  • Maurizio Pierini
  • Dylan Rankin
  • Nhan TranEmail author
  • Matthew Trahms
  • Aristeidis Tsaris
  • Colin Versteeg
  • Ted W. Way
  • Dustin Werran
  • Zhenbin Wu
Original Article
  • 49 Downloads

Abstract

Large-scale particle physics experiments face challenging demands for high-throughput computing resources both now and in the future. New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of machine learning inference as a web service represents a heterogeneous computing solution for particle physics experiments that potentially requires minimal modification to the current computing model. As examples, we retrain the ResNet-50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC and apply a ResNet-50 model with transfer learning for neutrino event classification. Using Project Brainwave by Microsoft to accelerate the ResNet-50 image classification model, we achieve average inference times of 60 (10) ms with our experimental physics software framework using Brainwave as a cloud (edge or on-premises) service, representing an improvement by a factor of approximately 30 (175) in model inference latency over traditional CPU inference in current experimental hardware. A single FPGA service accessed by many CPUs achieves a throughput of 600–700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than small batch-size GPU throughput. Deployed as an edge or cloud service for the particle physics computing model, coprocessor accelerators can have a higher duty cycle and are potentially much more cost-effective.

Keywords

Particle physics Heterogeneous computing FPGA Machine learning 

Notes

Acknowledgements

We would like to thank the entire Microsoft Azure Machine Learning, Bing, and Project Brainwave teams for the development of and opportunity to preview and study the acceleration platform. In particular, we would like to acknowledge Doug Burger, Eric Chung, Jeremy Fowers, Daniel Lo, Kalin Ovtcharov, and Andrew Putnam, for their support and enthusiasm. We would like to thank Lothar Bauerdick and Oliver Gutsche for seed funding through USCMS computing operations. We would like to thank Alex Himmel and other NOvA collaborators for support and comments on the manuscript. Part of this work was conducted at “iBanks,” the AI GPU cluster at Caltech. We acknowledge NVIDIA, SuperMicro, and the Kavli Foundation for their support of “iBanks.” Part of this work was conducted using Google Cloud resources provided by the MIT Quest for Intelligence program. Part of this work is supported through IRIS-HEP under NSF-grant 1836650. We thank the organizers of the public available top tagging dataset (and others like it) for providing benchmarks for the physics community. The authors thank the NOvA collaboration for the use of its Monte Carlo software tools and data and for the review of this manuscript. This work was supported by the US Department of Energy and the US National Science Foundation. NOvA receives additional support from the Department of Science and Technology, India; the European Research Council; the MSMT CR, Czech Republic; the RAS, RMES, and RFBR, Russia; CNPq and FAPEG, Brazil; and the State and University of Minnesota. We are grateful for the contributions of the staff at the Ash River Laboratory, Argonne National Laboratory, and Fermilab.

Compliance with Ethical Standards

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

References

  1. 1.
    Apollinari G, Béjar Alonso I, Brüning O, Lamont M, Rossi L (2015) High-luminosity large hadron collider (HL-LHC): preliminary design report. https://cds.cern.ch/record/2116337. Accessed Dec 2018
  2. 2.
    HEP software foundation (2017) A roadmap for HEP software and computing R&D for the 2020s. arXiv:1712.06982
  3. 3.
    Acciarri R, et al (2016) Long-baseline neutrino facility (LBNF) and deep underground neutrino experiment (DUNE). arXiv:1601.05471
  4. 4.
    Mellema G et al (2013) Reionization and the cosmic dawn with the square kilometre array. Exp Astron 36:235.  https://doi.org/10.1007/s10686-013-9334-5 ADSCrossRefGoogle Scholar
  5. 5.
    National Research Council, the future of computing performance: game over or next level? (2011).  https://doi.org/10.17226/12980
  6. 6.
    Acciarri R et al (2017) Convolutional neural networks applied to neutrino events in a liquid argon time projection chamber. JINST 12(03):P03011.  https://doi.org/10.1088/1748-0221/12/03/P03011 ADSCrossRefGoogle Scholar
  7. 7.
    Aurisano A, Radovic A, Rocco D, Himmel A, Messier MD, Niner E, Pawloski G, Psihas F, Sousa A, Vahle P (2016) A convolutional neural network neutrino event classifier. JINST 11(09):P09001.  https://doi.org/10.1088/1748-0221/11/09/P09001 ADSCrossRefGoogle Scholar
  8. 8.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE Confer Comput Vis Pattern Recog.  https://doi.org/10.1109/CVPR.2016.90
  9. 9.
    Chatrchyan S et al (2013) Energy Calibration and resolution of the CMS electromagnetic calorimeter in \(pp\) collisions at \(\sqrt{s} = 7\) TeV. JINST 8:P09009.  https://doi.org/10.1088/1748-0221/8/09/P09009 CrossRefGoogle Scholar
  10. 10.
    Nguyen TQ, Weitekamp D, Anderson D, Castello R, Cerri O, Pierini M, Spiropulu M, Vlimant JR (2018) Topology classification with deep learning to improve real-time event selection at the LHC. arXiv:1807.00083
  11. 11.
    Chatrchyan S et al (2012) Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC. Phys Lett B 716:30.  https://doi.org/10.1016/j.physletb.2012.08.021 ADSCrossRefGoogle Scholar
  12. 12.
    Aad G et al (2012) Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC. Phys Lett B 716:1.  https://doi.org/10.1016/j.physletb.2012.08.020 ADSCrossRefGoogle Scholar
  13. 13.
    Duarte J et al (2018) Fast inference of deep neural networks in FPGAs for particle physics. JINST 13(07):P07027.  https://doi.org/10.1088/1748-0221/13/07/P07027 CrossRefGoogle Scholar
  14. 14.
    Low JF, Brinkerhoff AW, Busch EL, Carnes AM, Furic IK, Gleyzer S, Kotov K, Madorsky A, Rorie JT, Scurlock B, Shi W, Acosta DE (2017) Boosted decision trees in the level-1 muon endcap trigger at CMS, Tech. Rep. CMS-CR-2017-361, CERN, Geneva. https://cds.cern.ch/record/2289251. Accessed July 2018
  15. 15.
    Kasieczka G, Michael R, Tilman P (2017) Top tagging reference dataset. https://goo.gl/XGYju3. Accessed July 2018
  16. 16.
    Ayres DS et al (2007) The NOvA technical design report.  https://doi.org/10.2172/935497
  17. 17.
    Caulfield A, Chung E, Putnam A, Angepat H, Fowers J, Haselman M, Heil S, Humphrey M, Kaur P, Kim JY, Lo D, Massengill T, Ovtcharov K, Papamichael M, Woods L, Lanka S, Chiou D, Burger D (2016) A cloud-scale acceleration architecture. IEEE Comput Soc. https://www.microsoft.com/en-us/research/publication/configurable-cloud-acceleration/. Accessed Oct 2017
  18. 18.
    CMS Collaboration (2015) Technical Proposal for the Phase-II Upgrade of the compact muon solenoid. CMS Technical Proposal CERN-LHCC-2015-010, CMS-TDR-15-02. https://cds.cern.ch/record/2020886. Accessed July 2017
  19. 19.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
  20. 20.
    Huang G, Liu Z, Weinberger KQ (2017) Densely connected convolutional networks. 2017 IEEE Confer Comput Vis Pattern Recogn.  https://doi.org/10.1109/CVPR.2017.243
  21. 21.
    Xilinx (2018) Xilinx ML Suite. https://github.com/Xilinx/ml-suite. Accessed Sept 2018
  22. 22.
    Tensorflow (2018) Using TPUs. https://www.tensorflow.org/guide/using_tpu. Accessed Sept 2018
  23. 23.
    Intel (2018) Intel distribution of OpenVINO toolkit. https://software.intel.com/en-us/openvino-toolkit. Accessed Sept 2018
  24. 24.
    Kasieczka G et al (2019) The machine learning landscape of top taggers. arXiv:1902.09914
  25. 25.
    Butter A, Kasieczka G, Plehn T, Russell M (2018) Deep-learned top tagging with a lorentz layer. Sci Post Phys 5(3):028.  https://doi.org/10.21468/SciPostPhys.5.3.028 ADSCrossRefGoogle Scholar
  26. 26.
    Sjöstrand T, Ask S, Christiansen JR, Corke R, Desai N, Ilten P, Mrenna S, Prestel S, Rasmussen CO, Skands PZ (2015) An introduction to PYTHIA 8.2. Comput Phys Commun 191:159.  https://doi.org/10.1016/j.cpc.2015.01.024 ADSCrossRefzbMATHGoogle Scholar
  27. 27.
    Skands P, Carrazza S, Rojo J (2014) Tuning PYTHIA 8.1: the Monash 2013 Tune. Eur Phys J C 74(8):3024.  https://doi.org/10.1140/epjc/s10052-014-3024-y ADSCrossRefGoogle Scholar
  28. 28.
    de Favereau J, Delaere C, Demin P, Giammanco A, Lematre V, Mertens A, Selvaggi M (2014) DELPHES 3. A modular framework for fast simulation of a generic collider experiment. JHEP 02:057.  https://doi.org/10.1007/JHEP02(2014)057 CrossRefGoogle Scholar
  29. 29.
    Cacciari M, Salam GP, Soyez G (2012) FastJet user manual. Eur Phys J C 72:1896.  https://doi.org/10.1140/epjc/s10052-012-1896-2 ADSCrossRefzbMATHGoogle Scholar
  30. 30.
    Cacciari M, Salam GP (2006) Dispelling the \(N^{3}\) myth for the \(k_t\) jet-finder. Phys Lett B 641:57.  https://doi.org/10.1016/j.physletb.2006.08.037 ADSCrossRefGoogle Scholar
  31. 31.
    Cacciari M, Salam GP, Soyez G (2008) The anti-\(k_t\) jet clustering algorithm. JHEP 04:063.  https://doi.org/10.1088/1126-6708/2008/04/063 ADSCrossRefzbMATHGoogle Scholar
  32. 32.
    Qu H, Gouskos L (2019) ParticleNet: jet tagging via particle clouds. arXiv:1902.08570
  33. 33.
    Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. Proc ICML 27:807–814Google Scholar
  34. 34.
    Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. https://dblp.org/rec/bib/journals/corr/KingmaB14. Accessed July 2018
  35. 35.
    Adamson P et al (2017) Constraints on oscillation parameters from \(\nu _e\) appearance and \(\nu _\mu \) disappearance in NOvA. Phys Rev Lett 118(23):231801.  https://doi.org/10.1103/PhysRevLett.118.231801 ADSCrossRefGoogle Scholar
  36. 36.
    Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR09. http://www.image-net.org/papers/imagenet_cvpr09.bib. Accessed June 2018
  37. 37.
    Private communicates with Alex Himmel (2018), October 2018Google Scholar
  38. 38.
    Radovic A, Williams M, Rousseau D, Kagan M, Bonacorsi D, Himmel A, Aurisano A, Terao K, Wongjirad T (2018) Machine learning at the energy and intensity frontiers of particle physics. Nature 560(7716):41.  https://doi.org/10.1038/s41586-018-0361-2 ADSCrossRefGoogle Scholar
  39. 39.
    Albertsson K et al (2018) Machine learning in high energy physics community white paper. J Phys Confer Ser 1085(2):022008.  https://doi.org/10.1088/1742-6596/1085/2/022008 CrossRefGoogle Scholar
  40. 40.
    Farrell S, Anderson D, Calafiura P, Cerati G, Gray L, Kowalkowski J, Mudigonda M, Prabhat, P. Spentzouris, Spiropoulou M, Tsaris A, Vlimant JR, Zheng S (2017) The HEP.TrkX Project: deep neural networks for HL-LHC online and offline tracking. EPJ Web Confer 150:00003.  https://doi.org/10.1051/epjconf/201715000003 CrossRefGoogle Scholar
  41. 41.
    CERN (2018) TrackML particle tracking challenge. https://www.kaggle.com/c/trackml-particle-identification. Accessed July 2018
  42. 42.
    Paganini M, de Oliveira L, Nachman B (2018) CaloGAN: simulating 3D high energy particle showers in multilayer electromagnetic calorimeters with generative adversarial networks. Phys Rev D 97(1):014021.  https://doi.org/10.1103/PhysRevD.97.014021 ADSCrossRefGoogle Scholar
  43. 43.
    Google (2018) gRPC. version v1.14.0. https://grpc.io/. Accessed Sept 2018
  44. 44.
    Google (2019) Protocol buffers. https://github.com/protocolbuffers/protobuf. Accessed Sept 2018
  45. 45.
    CMS Collaboration (2018) CMSSW. version CMSSW\_10\_2\_0. https://github.com/cms-sw/cmssw. Accessed Sept 2018
  46. 46.
    Intel (2018) Thread building blocks. version 2018\_U1. https://www.threadingbuildingblocks.org. Accessed Sept 2018
  47. 47.
    Pedro K (2019) SonicCMS. version v3.1.0. https://github.com/hls-fpga-machine-learning/SonicCMS. Accessed Sept 2018

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Javier Duarte
    • 1
  • Philip Harris
    • 2
  • Scott Hauck
    • 3
  • Burt Holzman
    • 1
  • Shih-Chieh Hsu
    • 3
  • Sergo Jindariani
    • 1
  • Suffian Khan
    • 4
  • Benjamin Kreis
    • 1
  • Brian Lee
    • 4
  • Mia Liu
    • 1
  • Vladimir Lončar
    • 5
    • 6
  • Jennifer Ngadiuba
    • 5
  • Kevin Pedro
    • 1
  • Brandon Perez
    • 4
  • Maurizio Pierini
    • 5
  • Dylan Rankin
    • 2
  • Nhan Tran
    • 1
    Email author
  • Matthew Trahms
    • 3
  • Aristeidis Tsaris
    • 1
  • Colin Versteeg
    • 4
  • Ted W. Way
    • 4
  • Dustin Werran
    • 3
  • Zhenbin Wu
    • 7
  1. 1.Fermi National Accelerator LaboratoryBataviaUSA
  2. 2.Massachusetts Institute of TechnologyCambridgeUSA
  3. 3.University of WashingtonSeattleUSA
  4. 4.MicrosoftRedmondUSA
  5. 5.CERN, CH-1211Geneva 23Switzerland
  6. 6.Institute of Physics BelgradeUniversity of BelgradeBelgradeSerbia
  7. 7.University of Illinois at ChicagoChicagoUSA

Personalised recommendations