Accelerating the SPICE Circuit Simulator Using an FPGA: A Case Study

Chapter

Abstract

Spatial processing of sparse, irregular, double-precision floating-point computation using a single FPGA enables up to an order of magnitude speedup and energy-savings over a conventional microprocessor for the simulation program with integrated circuit emphasis (SPICE) circuit simulator. We develop a parallel, FPGA-based, heterogeneous architecture customized for accelerating the SPICE simulator to deliver this speedup. To properly parallelize the complete simulator, we decompose SPICE into its three constituent phases—Model Evaluation, Sparse Matrix-Solve, and Iteration Control—and customize a spatial architecture for each phase independently. Our heterogeneous FPGA organization mixes very large instruction word (VLIW), Dataflow and Streaming architectures into a cohesive, unified design. We program this parallel architecture with a high-level, domain-specific framework that identifies, exposes and exploits parallelism available in the SPICE circuit simulator using streaming (SCORE framework), data-parallel (Verilog-AMS models) and dataflow (KLU matrix solver) patterns. Our FPGA architecture is able to outperform conventional processors due to a combination of factors including high utilization of statically-scheduled resources, low-overhead dataflow scheduling of fine-grained tasks, and streaming, overlapped processing of the control algorithms. We expect approaches based on exploiting spatial parallelism to become important as frequency scaling continues to slow down and modern processing architectures turn to parallelism (e.g. multi-core, GPUs) due to constraints of power consumption.

References

  1. 1.
    A.M. Bayoumi, Y.Y. Hanafy, Massive parallelization of SPICE device model evaluation on GPU-based SIMD architectures, in Proceedings of the 1st International Forum on Next-Generation Multicore/Manycore Technologies, Cairo, Egypt (ACM, New York, 2008), pp. 1–5Google Scholar
  2. 2.
    F. Brglez, D. Bryan, K. Kozminski, Combinational profiles of sequential benchmark circuits. IEEE Int. Symp. Circ. Syst. 3, 1929–1934 (1989)CrossRefGoogle Scholar
  3. 3.
    A. Caldwell, A. Kahng, I. Markov, Improved algorithms for hypergraph bipartitioning, in Proceedings of the 2000 Asia and South Pacific Design Automation Conference (2000), pp. 661–666Google Scholar
  4. 4.
    E. Caspi, Design Automation for Streaming Systems. Ph.D., University of California, Berkeley, 2005Google Scholar
  5. 5.
    Chung-Wen Ho, A. Ruehli, P. Brennan, The modified nodal approach to network analysis. IEEE Trans. Circ. Syst. 22(6), 504–509 (1975)CrossRefGoogle Scholar
  6. 6.
    B. Conn, XPICE Circuit Simulation Software. (unpublished) (2008)Google Scholar
  7. 7.
    L. Dagum, R. Menon, OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)CrossRefGoogle Scholar
  8. 8.
    F. de Dinechin, J. Detrey, O. Cret, R. Tudoran, When FPGAs are better at floating-point than microprocessors, in Proceedings of the International ACM/SIGDA Symposium on Field-Programmable Gate Arrays (ACM, New York, NY, USA, 2008), p. 260Google Scholar
  9. 9.
    A. Dehon, Y. Markovsky, E. Caspi, M. Chu, R. Huang, S. Perissakis, L. Pozzi, J. Yeh, J. Wawrzynek, Stream computations organized for reconfigurable execution. Microprocess. Microsyst. 30(6), 334–354 (2006)CrossRefGoogle Scholar
  10. 10.
    M. DeLorimier, N. Kapre, N. Mehta, D. Rizzo, I. Eslick, R. Rubin, T.E. Uribe, T.F.J. Knight, A. DeHon, GraphStep: a system architecture for sparse-graph algorithms, in IEEE Symposium on Field-Programmable Custom Computing Machines (IEEE, Piscataway, NJ, USA, 2006), pp. 143–151Google Scholar
  11. 11.
    J. Duato, S. Yalamanchili, N. Lionel, Interconnection Networks: An Engineering Approach (Morgan Kaufmann, Los Altos, 2002)Google Scholar
  12. 12.
    J.A. Fisher, The VLIW machine: a multiprocessor for compiling scientific code. IEEE Comput. 17(7), 45–53 (1984)CrossRefGoogle Scholar
  13. 13.
    J. Gilbert, T. Peierls, Sparse partial pivoting in time proportional to arithmetic operations. SIAM J. Sci. Stat. Comput. 9(5), 862–874 (1988)CrossRefMATHMathSciNetGoogle Scholar
  14. 14.
    K. Gulati, J.F. Croix, S.P. Khatri, R. Shastry, Fast circuit simulation on graphics processing units, in Proceedings of the Asia and South Pacific Design Automation Conference (IEEE, Piscataway, NJ, USA, 2009), pp. 403–408Google Scholar
  15. 15.
    J. Hennesey, D. Patterson, Computer Architecture A Quantitative Approach, 2nd edn. (Morgan Kauffman, Los Altos, 1996)Google Scholar
  16. 16.
    S. Hutchinson, E. Keiter, R. Hoekstra, H. Watts, A. Waters, R. Schells, S. Wix, The Xyce parallel electronic simulator - An overview, in IEEE International Symposium on Circuits and Systems (IEEE, Piscataway, NJ, USA, 2000)Google Scholar
  17. 17.
    Intel, Intel Math Kernel Library 10.2.5.035 (Intel, USA, 2005)Google Scholar
  18. 18.
    N. Kapre, A. DeHon, Optimistic parallelization of floating-point accumulation, in IEEE Symposium on Computer Arithmetic (IEEE Computer Society, Washington DC, USA, 2007), pp. 205–216Google Scholar
  19. 19.
    N. Kapre, A. DeHon, Accelerating SPICE model-evaluation using FPGAs, in IEEE Symposium on Field Programmable Custom Computing Machines (IEEE, New York, 2009), pp. 37–44Google Scholar
  20. 20.
    N. Kapre, A. DeHon, Parallelizing sparse matrix solve for SPICE circuit simulation using FPGAs, in International Conference on Field-Programmable Technology (IEEE, Piscataway, NJ, USA, 2009), pp. 190–198Google Scholar
  21. 21.
    N. Kapre, A. DeHon, Performance comparison of single-precision SPICE model-evaluation on FPGA, GPU, Cell, and multi-core processors, in International Conference on Field Programmable Logic and Applications (IEEE, Piscataway, NJ, USA, 2009), pp. 65–72Google Scholar
  22. 22.
    N. Kapre, A. DeHon, VLIW-SCORE: beyond C for sequential control of SPICE FPGA acceleration, in International Conference on Field-Programmable Technology (IEEE, Piscataway, NJ, USA, 2011)Google Scholar
  23. 23.
    N. Kapre, N. Mehta, M. DeLorimier, R. Rubin, H. Barnor, M. Wilson, M. Wrighton, A. DeHon, Packet switched vs. time multiplexed FPGA overlay networks, in IEEE Symposium on Field-Programmable Custom Computing Machines (IEEE, Piscataway, NJ, USA, 2006), pp. 205–216Google Scholar
  24. 24.
    K.S. Kundert, A. Sangiovanni-Vincentelli, Sparse User’s Guide: A Sparse Linear Equation Solver (1988)Google Scholar
  25. 25.
    P. Lee, S. Ito, T. Hashimoto, J. Sato, T. Touma, G. Yokomizo, A parallel and accelerated circuit simulator with precise accuracy, in Proceedings of the 2002 Asia and South Pacific Design Automation Conference (IEEE, Piscataway, NJ, USA, 2002), pp. 213–218Google Scholar
  26. 26.
    L. Lemaitre, G. Coram, C. McAndrew, K. Kundert, M. Inc, S. Geneva, Extensions to Verilog-A to support compact device modeling, in Proceedings of the Behavioral Modeling and Simulation Conference (IEEE, Piscataway, NJ, USA, 2003), pp. 7–8Google Scholar
  27. 27.
    D. Lewis, A programmable hardware accelerator for compiled electrical simulation, in Proceedings of the 25th ACM/IEEE Design Automation Conference (IEEE, Piscataway, NJ, USA, 1988), pp. 172–177Google Scholar
  28. 28.
    D. Lewis, A compiled-code hardware accelerator for circuit simulation, in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (IEEE, Piscataway, NJ, USA, 1992), pp. 555–565Google Scholar
  29. 29.
    M. Linderman, M. Ho, D. Dill, T. Meng, G. Nolan, Towards program optimization through automated analysis of numerical precision, in Proceedings of the IEEE/ ACM International Symposium on Code Generation and Optimization (ACM, New York, 2010), pp. 230–237Google Scholar
  30. 30.
    H. Martorel, N. Kapre, FX-SCORE: a framework for fixed-point compilation of SPICE device models using Gappa ++, in IEEE Symposium on Field Programmable Custom Computing Machines (IEEE, Piscataway, NJ, USA, 2012)Google Scholar
  31. 31.
    N. Mehta, Time-Multiplexed FPGA Overlay Networks On Chip. Master’s thesis, California Institute of Technology, 2006Google Scholar
  32. 32.
    Microsoft Research, DDR2 DRAM Controller for BEE3 ( Microsoft Research, USA, 2008)Google Scholar
  33. 33.
    P. Mucci, S. Browne, C. Deane, G. Ho, PAPI: a portable interface to hardware performance counters, in Proceedings of the Department of Defense High Performance Computing Modernization Program Users Group Conference (IEEE Computer Society, Washington DC, USA, 1999), pp. 7–10Google Scholar
  34. 34.
    L.W. Nagel, SPICE2: A Computer Program to Simulate Semiconductor Circuits. Ph.D. thesis, University of California Berkeley, 1975Google Scholar
  35. 35.
    E. Natarajan, KLU A High Performance Sparse Linear Solver for Circuit Simulation Problems. Master’s thesis, University of Florida Gainesville, 2005Google Scholar
  36. 36.
    G. Papadopoulos, D. Culler, Monsoon: an explicit token-store architecture. Proc. Annu. Int. Symp. Comput. Archit. 18(3a), 82–91 (1990)CrossRefGoogle Scholar
  37. 37.
    H. Peng, C.K. Cheng, Parallel transistor level circuit simulation using domain decomposition methods, in Proceedings of the Asia and South Pacific Design Automation Conference (IEEE, Piscataway, 2009), pp. 397–402Google Scholar
  38. 38.
    A. Putnam, S. Eggers, D. Bennett, E. Dellinger, J. Mason, H. Styles, P. Sundararajan, R. Wittig, Performance and power of cache-based reconfigurable computing, in Proceedings of the International Symposium on Computer Architecture, vol. 37 (ACM, New York, 2009), p. 395Google Scholar
  39. 39.
    Simucad/Silvaco, BSIM3, BSIM4 and PSP benchmarks from Simucad (Simucad (now Silvaco), USA, 2007)Google Scholar
  40. 40.
    C. Sze, P. Restle, G. Nam, C. Alpert, ISPD2009 clock network synthesis contest, in Proceedings of the 2009 International Symposium on Physical design (ACM, New York, 2009), p. 149Google Scholar
  41. 41.
    P. Teehan, G. Lemieux, M. Greenstreet, Towards reliable 5Gbps wave-pipelined and 3Gbps surfing interconnect in 65nm FPGAs, in Proceeding of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (ACM, New York, 2009), pp. 43–52Google Scholar
  42. 42.
    Q. Wang, D.M. Lewis, Automated field-programmable compute accelerator design using partial evaluation, in Proceedings of the 5th Annual IEEE Symposium on FPGAs for Custom Computing Machines, Napa Valley, 1997, pp. 145–154Google Scholar
  43. 43.
    Xilinx, Xilinx CoreGen Reference Guide, 2100 Logic Drive, SanJose, CA, 95124, USA (2000). www.xilinx.com
  44. 44.
    Xilinx, Floating-Point Operator v5.0, 2100 Logic Drive, SanJose, CA, 95124, USA (2009). www.xilinx.com
  45. 45.
    Xilinx, MicroBlaze Processor Reference Guide, 2100 Logic Drive, SanJose, CA, 95124, USA (2010). www.xilinx.com
  46. 46.
    Xilinx, OS and Libraries Document Collection. Technical report, 2100 Logic Drive San Jose, CA 95124, USA (2010). www.xilinx.com
  47. 47.
    X. Ye, W. Dong, P. Li, S. Nassif, MAPS: multi-algorithm parallel circuit simulation, in Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (IEEE, Piscataway, NJ, USA, 2008), pp. 73–78Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2013

Authors and Affiliations

  1. 1.Nanyang Technological UniversitySingaporeSingapore
  2. 2.University of PennsylvaniaPhiladelphiaUSA

Personalised recommendations