1 Introduction

The Semiconductor industry is struggling to address the next generations of complex computing problems that require a large number of transistors on a chip, thereby demanding huge transistor density supported by a few nanometers dimension of the transistor. As the growth continues with more and more transistors on a semiconductor chip, the heat dissipation issue is becoming increasingly difficult to handle. Bennet (Bennett 1982) discussed the thermodynamics of computation to explain the utilization of a reversible approach for reducing the power dissipation in the circuits. Due to the reversible approach, there is no loss of information as there is a one-to-one mapping of input to the output. Unlike irreversible circuits, in reversible circuits, the information about the inputs can be extracted from the information available at the output. One of the important ways to achieve reversible computing is through quantum computing (QC) due to its inherent ability to process data faster than conventional irreversible classical computing.

Along with QC (Nielsen and Chuang 2000; Saharia et al. 2019), artificial intelligence (AI) (Brachman and Henig 1988) and machine learning (ML) (Elfadel et al. 2019) are future areas in the Very Large Scale Integration (VLSI) domain. The design of QC algorithms and the quantum circuits for realizing reversible computing need extensive efforts compared to their realization based on classical transistor-based reversible computing. Moreover, it is very challenging to realize the quantum circuits for reversible computing physically. There are myriad ways to realize reversible computing physically, such as transistors (Bruce et al. 2002), Quantum Dot (QD) (Taylor et al. 2007), SuperConducting (SC) (Strauch et al. 2003; DiCarlo et al. 2009), Ion Traps (IT) (Wineland et al. 1998), Rydberg atoms (Saffman and Walker 2005), Linear Photonics (LP) (Franson et al. 2004), Non-Linear Photonics (NP) (Munro et al. 2005), and Spin-Torque (ST) (Cordourier-Maruri et al. 2010; Sutton and Datta 2015; Kulkarni et al. 2018a; Sharma and Tulapurkar 2021; Bhat et al. 2022; Niknam et al. 2022; Selberherr and Sverdlov 2022; Komineas 2023; Glavin et al. 2022; Choi et al. 2022).

Quantum full adder (QFA) is one of the building blocks of reversible Boolean computing. The QFA is different from the conventional full adder in terms of the number of inputs and outputs (Peres 1985; Swaminathan et al. 1990; Maimistov 1009; D. V., Alexis, B. Desoete, A. Adamski, P. Pietrzak, M. Sibiński, and T. Widerski. 2000; Cheng and Tseng 2002; Thapliyal and Srinivas 2005; Navi et al. 2010; Qi et al. 2012; Abedi et al. 2015; Seyedi and Navimipour 2018, 2021; Doshanlou et al. 2022; Lanthaler et al. 2023). Therefore, it is necessary to design and implement the QFA efficiently to reduce the complexity of reversible Boolean computing. In this paper, two different 1-Toffoli gate-based QFAs are decomposed, optimized, and simulated for the spin-torque-based qubit architecture. The contribution is to reduce the number of elementary operations to improve the fidelity of the 1-Toffoli gate-based QFAs compared to the 2-Toffoli gate-based QFAs presented in the literature.

This paper consists of five sections, including the current introductory section. The various QFA designs are highlighted in the section II. Moreover, the conventional, reduced, and optimal decompostions of 1-Toffoli gate-based QFAs are presented in section II. Section III presents spin-torque-based n-qubit architecture. Post-optimization of 1-Toffoli QFAs’ performance investigation in terms of fidelity, execution time, and the number of electrons required for the realization, is presented in section IV. Finally, section V concludes with the comparison of 1-Toffoli gate-based QFAs.

2 Optimization of quantum full adders

2.1 State of the art

There are various QC-based QFAs available in the literature. In (Cheng and Tseng 2002), the 1-bit quantum full adder is designed according to the classical truth table and utilized to realize the n-bit adder. A recursive method of hand synthesis of the reversible quaternary full-adder circuit is proposed in Khan (2008). Quantum realization of a ternary full-adder using macro-level ternary Feynman and Toffoli gates is presented in Khan and Perkowski (2007). Muthukrishnan-Stroud, Feynman, Toffoli, and C2 NOT gates are used to design QFA (Asadi et al. 2020). Moreover, 2-Toffoli gates-based QFA design is presented in Golubitsky and Maslov (2012). However, these QFA designs have high quantum cost for their realization with the various physical machine descriptions. For the physical realization of the QFA, the quantum costs must be minimal for a particular physical machine description.

2.2 1-Toffoli QFAs

Recently, 1-Toffoli gate-based QFA1 (Pujar et al. 2019) and QFA2 (Gayathri et al. 2021) with minimum cost were designed. QFA1 is designed with three controlled-NOT (CNOT) gates and one Fredkin gate and is implemented with 40 transistors. However, it is not feasible to utilize this implementation in the era of miniaturization when the 40 transistors-based design is used as a building block for complex reversible computing implementation. Moreover, though the QFA1 design restores the reversibility, the two unutilized garbage outputs make the design of complex computing challenging, wherein QFA is a building block. Also, the 40 transistors-based implementation of reversible adder lacks reconfigurability. The reconfigurable spin-qubit architectures of the order of nanometers are discussed in Sutton and Datta (2015); Kulkarni et al. 2018a; Sharma and Tulapurkar 2020). The optimal decomposition of the Toffoli gate is utilized to implement 2-Toffoli gates-based QFA in our previous work (Kulkarni et al. 2018b). It is always feasible to realize the quantum circuits based reversible circuits on a quantum platform than a classical platform due to its ability to provide high speed and parallel processing. Moreover, as no physical machine description supports the direct implementation of the Fredkin gate, it needs to be decomposed further. In (Lin et al. 2013), the decomposition of the Fredkin gate with the two CNOT gates and one Toffoli gate, is presented.

The QFA1 is modified in this paper with the help of optimal decomposition of the Toffoli gate presented in Kulkarni et al. 2018b (see Fig. 1) for the further decomposition with the help of quantum library {Ry, Rz, \(\sqrt{SWAP}\)} for the spin-torque-based n-qubit architecture. After modification with the help of the Toffoli gate, the QFA1 has five CNOT gates and one Toffoli gate. In this paper, the QFA1 is presented as a building block for the 4-bit reversible ripple carry adder (see Fig. 2). Unlike the design presented in Pujar et al. (2019), the QFA2 design (Gayathri et al. 2021) does not produce any garbage output and has inputs ‘A’ and ‘B’ directly available at the output (see Fig. 3). However, the design has one extra CNOT gate in comparison to the modified design of QFA1, as shown in Fig. 3. As reported in Lin et al. (2013), no physical machine description other than LP and NP offers direct implementation of the CNOT gate. Due to this, the decomposition of the extra CNOT gate further adds to the quantum cost of the QFA2 for other physical machine descriptions, including ST, at the elementary level.

Fig. 1
figure 1

Modification of QFA1 for the spin-torque-based n-qubit architecture

Fig. 2
figure 2

4-bit reversible ripple carry adder-based on QFA1

Fig. 3
figure 3

Modification of QFA2 for the spin-torque-based n-qubit architecture

It is essential to consider the physical realization at the elementary level to design quantum circuits. Moreover, QFA2 design is utilized to design the 10-qubit (3-bit) based ripple carry adder before its performance investigation with the help of any of the physical realizations (Bruce et al. 2002; Taylor et al. 2007; Strauch et al. 2003; DiCarlo et al. 2009; Wineland et al. 1998; Saffman and Walker 2005; Franson et al. 2004; Munro et al. 2005; Cordourier-Maruri et al. 2010; Sutton and Datta 2015). Therefore, in this paper, the optimal decomposition of the Toffoli gate (Shende and Markov 2009) is utilized for the performance investigation of QFA1 and QFA2 for the spin-torque-based n-qubit architecture. The number of operations required for the elementary quantum gates (Kulkarni et al. 2018a) are given in Table 1. In our earlier work (Kulkarni et al. 2018a), we had utilized the quantum library {\({R}_{y}{, R}_{z}\), \(\sqrt{SWAP}\)} to reduce the number of operations by (~ 36%) i.e. number of operations to realize the universal gate CNOT from 11 to 7. As far as the spin-torque based qubit-architecture is concerned, it requires same procedure to carry out the two-qubit entanglement operation in terms of required number of electrons other than the gate barriers. Therefore, the two-qubit entanglement is considered as part of the quantum cost along with the single qubit rotations.

Table 1 Elementary gates operations (Kulkarni et al. 2018a)

The optimized decomposition of the Toffoli gate and Fredkin gate utilized for QFA1 are presented in Fig. 4a, b, respectively. The single-qubit and two-qubit rotations-based optimal decompositions with the help of elementary quantum rotations library {Ry, Rz, \(\sqrt{SWAP}\)}(Glavin et al. 2022) of 1-qubit Toffoli gate-based QFA1 and QFA2 are presented in Fig. 5a, b. The optimization at the elementary level is to merge the single qubit rotations in sequence with the same axis of rotation. Moreover, optimization helps to remove the redundant operations. The optimization rules , , and are utilized to achieve the optimization process. The optimized version of the QFA is obtained from the conventional and reduced decompositions in our previous work (Kulkarni et al. 2018b). Moreover, authors have proved in their previous work (Kulkarni et al. 2018b) that the optimal decomposition always provide improved fidelity and reduced execution time in comparison to their conventional and reduced counterparts. The execution time is based on the simulation of the quantum gates.

Fig. 4
figure 4

Optimized decomposition of a Toffoli b Fredkin gates for QFA1

Fig. 5
figure 5

Optimal decomposition of a QFA1 b QFA2

The elementary level optimization concerning their reduced counterparts is 5.81% and 7.69% for the QFA1 and QFA2, respectively, and is mainly based on the design of the quantum circuits for the QFAs.

3 Reconfigurable ST-based n-qubit architecture

The ST qubit architecture, as shown in Fig. 6, is conversed in Sutton and Datta (2015), and further, the architecture is discussed for y-axis directed spin-polarized electrons-based quantum circuits for W state (Sharma and Tulapurkar 2021) and CNOT gate (Kulkarni et al. 2018a). It is a reconfigurable architecture wherein moving electrons interact with the static qubits to rotate the qubit at a specific angle. The optimal QFA is suitable for the spin-torque based qubit-architecture only. It is due the decomposition of the universal CNOT gate utilized for the optimal QFA, is based on the spin-torque based qubit-architecture. The QFA1 will suit to any other architecture if the decomposition of the CNOT gate particularly the entanglement operation matches the entanglement operation for spin-torque based qubit architecture.

Fig. 6
figure 6

Spin-torque-based n-qubit reconfigurable architecture

The transmission coefficient matrix (Kulkarni et al. 2018a) given in equation (1) is to investigate the performance of quantum gates. The second-order transmission coefficient matrix is:

$$ t^{(2)} = \frac{1}{{(1 + i4\Omega l) \, I + i\tilde{S}(\Omega - (i4\Omega^{2} l \, (e^{{i2kx_{0} }} - 1)))}} $$
(1)

, where, \(\tilde{S}\) is the standard basis matrix, \(l = \frac{{\Gamma_{Rfl} }}{J}\), \(\Omega = {\raise0.7ex\hbox{$J$} \!\mathord{\left/ {\vphantom {J {\hbar v}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\hbar v}$}}\), and \(v\) is the velocity of the electron, \(\Gamma_{Rfl}\) is the height of the reflection barrier G, J is the hyperfine or exchange interaction, and x0 is the distance of the qubit from the barrier, and ℏ is reduced Plank's constant.

The reflection matrix is

$$ r^{(2)} = t^{(2)} - I $$
(2)

, where I is the identity matrix.

The generalized reflection matrix \(R_{F}^{(2)}\) based on Eqs. (1) and (2) is provided in Eq. (3) as follows:

$$ R_{F}^{(2)} = r^{(2)} - e^{{i2kx_{0} }} t^{(2)} \left[ {I_{{2^{n + 1} }} + e^{{i2kx_{0} }} I_{{2^{n + 1} }} R_{0} } \right]^{ - 1} I_{{2^{n + 1} }} t^{(2)} $$
(3)

, where, R0 matrix represents the reflection barrier, and I is the identity matrix for the n-qubit architecture.

The reflection matrix for the two-qubit interaction on qubits Q1 and Q2 is given in equations (4) and (5), respectively, and the overall reflection matrix at the injection side barrier is given in equation (6).

$$ R_{{F_{1} }}^{(2)} = r_{1}^{(2)} - e^{{i2kx_{0} }} t_{1}^{(2)} \left[ {I_{{2^{n + 1} }} + e^{{i2kx_{0} }} I_{{2^{n + 1} }} R_{0} } \right]^{ - 1} I_{{2^{n + 1} }} t_{1}^{(2)} $$
(4)
$$ R_{{F_{2} }}^{(2)} = r_{2}^{(2)} - e^{{i2kx_{0} }} t_{2}^{(2)} \left[ {I_{{2^{n + 1} }} + e^{{i2kx_{0} }} I_{{2^{n + 1} }} R_{{_{{F_{1} }} }}^{(2)} } \right]^{ - 1} I_{{2^{n + 1} }} t_{2}^{(2)} $$
(5)
$$ R_{{F_{b} }}^{(2)} = r_{b}^{(2)} - e^{{i2kx_{12} }} t_{b}^{(2)} \left[ {I_{{2^{n + 1} }} + e^{{i2kx_{12} }} I_{{2^{n + 1} }} R_{{_{{F_{2} }} }}^{(2)} } \right]^{ - 1} I_{{2^{n + 1} }} t_{b}^{(2)} $$
(6)

The subscripts 1, 2, and (2) represent qubits Q1, Q2, and second order matrix, respectively.

4 Performance investigation of 1-Toffoli gate-based QFAs

The performance of 1-Toffoli gate-based QFAs is investigated through simulations in terms of execution time, fidelity, and the number of electrons required to realize the entire 1-Toffoli gate-based QFA1 and QFA2. The spin-state evolution of four qubits Q1-Q4 for the realization of QFA1 and QFA2 is shown in Fig. 7a, b, respectively.

Fig. 7
figure 7

a Spin qubit evolution for the input state \(|1110\rangle \) (QFA1) b Spin qubit evolution for the input state \(|1110\rangle \) (QFA2)

The fidelities for optimal decomposition of the QFAs are given in Table 2. The improvement in fidelity is 0.7% and 0.57% for QFA1 and QFA2, respectively, over the fidelity of 2-Toffoli QFA. A 9.97% increase in execution time is required for QFA2 compared to QFA1. QFA2 takes 5% more number of electrons in comparison to QFA1. Therefore, QFA1 has better fidelity compared to QFA2. Moreover, QFA2 requires more execution time than QFA1 due to one extra CNOT gate. The quantum cost for the conventional, reduced, and optimal decompositions are presented in Fig. 8.

Table 2 Fidelity comparison (For second-order optimal)
Fig. 8
figure 8

Number of elementary operations for conventional, reduced, and optimal decomposition of QFAs

The quantum cost as defined for QFA1 before the optimization and after the optimization is 86 and 81, respectively. Similarly, the quantum cost for the QFA2 before and after the optimization is 91 and 84. The quantum gates state evolution from the initial state to final state is represented by the density matrix. The density matrix has two components i.e. Real and Imaginary (Imag).

A density matrix is defined as a 2x2 matrix, which is a linear combination of the unit/identity matrix and Pauli matrices σx, σy, and σz as

$$ \rho = \frac{1}{2}\left( {I + \vec{a}\vec{\sigma }} \right) $$
(7)

The coefficient \(\overrightarrow{a}\) is known as the Bloch vector and is equal to

$$ \vec{a} = Tr\left( {\rho \vec{\sigma }} \right) $$
(8)

The density matrix evolution for the input state \(|1110\rangle \) and output state for QFA1 and QFA2 is presented in Fig. 9.

Fig. 9
figure 9

a Input density matrix for the state \(|1110\rangle \) b Output density matrix for the input state \(|1110\rangle \) (QFA1) c Output density matrix for the input state \(|1110\rangle \) (QFA2)

5 Conclusion

In this work, the optimization at elementary quantum circuits for 1-Toffoli gate-based QFAs and their realization on ST-based qubit architecture are presented. The achieved optimization is 5.81% and 7.69% for the QFA1 and QFA2, respectively, compared to their reduced counterparts. Moreover, the average fidelities for QFA1 and QFA2 are ~ 98.34% and ~ 98.21%, respectively. The improvement in fidelity is 0.7% and 0.57% for QFA1 and QFA2, respectively, over the fidelity of 2-Toffoli QFA. The execution time and the number of electrons required to realize QFA2 are more than QFA1 due to one extra CNOT gate.