Skip to main content

Quantum Circuit Simulation by SGEMM Emulation on Tensor Cores and Automatic Precision Selection

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2023)


Quantum circuit simulation provides the foundation for the development of quantum algorithms and the verification of quantum supremacy. Among the various methods for quantum circuit simulation, tensor network contraction has been increasing in popularity due to its ability to simulate a larger number of qubits. During tensor contraction, the input tensors are reshaped to matrices and computed by a GEMM operation, where these GEMM operations could reach up to 90% of the total calculation time. GEMM throughput can be improved by utilizing mixed-precision hardware such as Tensor Cores, but straightforward implementation results in insufficient fidelity for deep and large quantum circuits. Prior work has demonstrated that compensated summation with special care of the rounding mode can fully recover the FP32 precision of SGEMM even when using TF32 or FP16 Tensor Cores. The exponent range is a critical issue when applying such techniques to quantum circuit simulation. While TF32 supports almost the same exponent range as FP32, FP16 supports a much smaller exponent range. In this work, we use the exponent range statistics of input tensor elements to select which Tensor Cores we use for the GEMM. We evaluate our method on Random Circuit Sampling (RCS), including Sycamore’s quantum circuit, and show that the throughput is 1.86 times higher at maximum while maintaining accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. 1.

  2. 2.

    The library itself has an optional functionality to restore the scaled input matrices for general purpose.

  3. 3.


  1. Arute, F., Arya, K., et al.: Quantum supremacy using a programmable superconducting processor. Nature 574(7779), 505–510 (2019)

    Article  Google Scholar 

  2. Boixo, S., et al.: Characterizing quantum supremacy in near-term devices. Nat. Phys. 14(6), 595–600 (2018)

    Article  Google Scholar 

  3. Chen, Z.Y., Zhou, Q., Xue, C., Yang, X., Guo, G.C., Guo, G.P.: 64-qubit quantum circuit simulation. Sci. Bull. 63(15), 964–971 (2018)

    Article  Google Scholar 

  4. Chi-Chung, L., Sadayappan, P., Wenger, R.: On optimizing a class of multi-dimensional loops with reduction for parallel execution. Parallel Process. Lett. 07(02), 157–168 (1997)

    Article  MathSciNet  Google Scholar 

  5. Daniel, G., Gray, J.: Opt_einsum - a Python package for optimizing contraction order for einsum-like expressions. J. Open Source Softw. 3(26), 753 (2018)

    Article  Google Scholar 

  6. Gray, J.: quimb: a python package for quantum information and many-body calculations. J. Open Source Softw. 3(29), 819 (2018)

    Article  Google Scholar 

  7. Gray, J., Kourtis, S.: Hyper-optimized tensor network contraction. Quantum 5, 410 (2021)

    Article  Google Scholar 

  8. Guerreschi, G.G., Hogaboam, J., Baruffa, F., Sawaya, N.P.D.: Intel quantum simulator: a cloud-ready high-performance simulator of quantum circuits. Quantum Sci. Technol. 5(3), 034007 (2020)

    Article  Google Scholar 

  9. Huang, C., Zhang, F., Newman, M., et al.: Efficient parallelization of tensor network contraction for simulating quantum computation. Nat. Comput. Sci. 1(9), 578–587 (2021)

    Article  Google Scholar 

  10. Huang, J., Yu, C.D., van de Geijn, R.A.: Implementing strassen’s algorithm with CUTLASS on NVIDIA Volta GPUs. arXiv:1808.07984 (2018)

  11. Jones, T., Brown, A., Bush, I., Benjamin, S.C.: QuEST and high performance simulation of quantum computers. Sci. Rep. 9(1), 10736 (2019)

    Article  Google Scholar 

  12. Liang, L., et al.: Fast search of the optimal contraction sequence in tensor networks. IEEE J. Sel. Top. Sig. Process. 15(3), 574–586 (2021)

    Article  Google Scholar 

  13. Liu, Y.A., et al.: Closing the “quantum supremacy” gap: achieving real-time simulation of a random quantum circuit using a new Sunway supercomputer. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC’21, pp. 1–12 (2021)

    Google Scholar 

  14. Markidis, S., Der Chien, S.W., Laure, E., Peng, I.B., Vetter, J.S.: NVIDIA tensor core programmability, performance & precision. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 522–531 (2018)

    Google Scholar 

  15. Markov, I.L., Fatima, A., Isakov, S.V., Boixo, S.: Quantum supremacy is both closer and farther than it appears. arXiv:1807.10749 (2018)

  16. Markov, I.L., Shi, Y.: Simulating quantum computation by contracting tensor networks. SIAM J. Comput. 38(3), 963–981 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  17. Nguyen, T., Lyakh, D., Dumitrescu, E., Clark, D., Larkin, J., McCaskey, A.: Tensor network quantum virtual machine for simulating quantum circuits at exascale. arXiv:2104.10523 (2021)

  18. Okuta, R., Unno, Y., Nishino, D., Hido, S., Loomis, C.: CuPy: a numpy-compatible library for NVIDIA GPU calculations (2017)

    Google Scholar 

  19. Ootomo, H., Yokota, R.: Recovering single precision accuracy from tensor cores while surpassing the FP32 theoretical peak performance. Int. J. High Perform. Comput. Appl. 36(4), 475–491 (2022)

    Article  Google Scholar 

  20. Ootomo, H., Yokota, R.: Reducing shared memory footprint to leverage high throughput on tensor cores and its flexible API extension library. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia’23, pp. 1–8 (2023)

    Google Scholar 

  21. Pan, F., Chen, K., Zhang, P.: Solving the sampling problem of the sycamore quantum circuits. Phys. Rev. Lett. 129(9), 090502 (2022)

    Article  Google Scholar 

  22. Pan, F., Zhang, P.: Simulation of quantum circuits using the big-batch tensor network method. Phys. Rev. Lett. 128(3), 030501 (2022)

    Article  Google Scholar 

  23. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  24. Preskill, J.: Quantum computing and the entanglement frontier (2012)

    Google Scholar 

  25. Roberts, C., et al.: TensorNetwork: a library for physics and machine learning (2019)

    Google Scholar 

  26. Schlag, S., Heuer, T., Gottesbüren, L., Akhremtsev, Y., Schulz, C., Sanders, P.: High-quality hypergraph partitioning. ACM J. Exp. Algorithmics 27, 1–39 (2022)

    Article  MathSciNet  Google Scholar 

  27. Schuch, N., Wolf, M.M., Verstraete, F., Cirac, J.I.: The computational complexity of PEPS. Phys. Rev. Lett. 98(14), 140506 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  28. Suzuki, Y., et al.: Qulacs: a fast and versatile quantum circuit simulator for research purpose. Quantum 5, 559 (2021)

    Article  Google Scholar 

  29. Treinish, M., Gambetta, J., et al.: Qiskit/qiskit: Qiskit 0.38.0 (2022)

    Google Scholar 

  30. Villalonga, B., et al.: Establishing the quantum supremacy frontier with a 281 Pflop/s simulation. Quantum Sci. Technol. 5(3), 034003 (2020)

    Article  Google Scholar 

Download references


This work was partially supported by JSPS KAKENHI 22H03598, 21J14694, and 20K03766. This work was partially supported by “Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures” in Japan (Project ID: jh220022-NAHI).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Hiryuki Ootomo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ootomo, H., Manabe, H., Harada, K., Yokota, R. (2023). Quantum Circuit Simulation by SGEMM Emulation on Tensor Cores and Automatic Precision Selection. In: Bhatele, A., Hammond, J., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13948. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-32040-8

  • Online ISBN: 978-3-031-32041-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics