Introduction

Great progress has been made in DNNs in recent years. DNNs have excellent performance in image recognition, speech recognition, machine translation and related fields and have been widely used in artificial intelligence. But as tasks become more and more complex, the requirement of computing power becomes higher and higher. The traditional computing devices based on von Neumann architecture suffer the memory wall problem due to the separation of the computing units and the storage units, which hinders the further improvement of their computing capability. Memristors achieve in-memory and parallel computing, accelerating the operation of DNNs. Meanwhile the plasticity of memristor is very similar to that of synapse. Memristors also have advantages of low power consumption and nanoscale, and are compatible with the complementary metal-oxide-semiconductor (CMOS) technology. Therefore memristors are promising elements to build new computing architectures.

The existence of memristor was predicted in theory in 1971 [6]. Since the memristor was first manufactured in 2008 [31], the research on memristor-based neural networks has developed rapidly. Memristors laid out in the form of crossbars have low power consumption, high density, and could perform the vector-multiplication in parallel. There have been various memristor-based neural networks, such as single-layer and multi-layer neural networks (SNNs, MNNs) [3, 4, 7, 10,11,12, 19, 27, 30, 41, 42, 45, 49, 50], convolutional neural networks (CNNs) [8, 21, 26, 36, 39, 40, 46, 49], Pavlov associative memory networks [5, 24, 25, 34, 43, 52], long short-term memory networks (LSTMs) [1, 2, 9, 20, 23, 28, 29, 35], pulse coupled neural networks (PCNNs) [38, 53], hierarchical temporal memory (HTM) [13, 14, 22, 33, 54].

A memristor bridge synapse-based neural network is proposed in [3]. The memristor bridge synapse achieves positive or negative synaptic weight value using four memristors. A modified chip-in-the-loop learning scheme is put forward to train the network. In [4], a memristor-based SNN is presented and it is trained with ex-situ and in-situ methods. The signed weight value is achieved by subtracting the conductance value of one memristor from that of another memristor. The results show that memristor-based networks are promising implementation of neuromorphic computing systems. In [10], memristor crossbar-based neural network with on-chip back propagation (BP) training is presented. Memristor-based multi-layer neural networks with online gradient descent training are presented in [30]. The network uses one memristor and two CMOS transistors to construct one synapse. Compared with COMS-based counterparts, the memristor-based MNNs [30] consume between 2% and 8% of the area and static power. In [48], a sign backpropagation (SBP) method is proposed to train the resistive random access memory (RRAM)-based neural networks. In [50], circuit design for memristor-based MNNs is presented and a modified BP algorithm is adopted to train the networks. In [19], the in-situ learning capability of the MNN based on the hafnium oxide-based memristor crossbar is experimentally demonstrated. In [49], memristor-based quantized neural networks are presented. The weights are quantized to accelerate the operation of the neural networks.

Besides memristor-based MNNs, there are also works for memristor-based CNNs. In [39], a memristor-based CNN is presented, which is the first time that the memristor-based circuit implements CNN. One memristor crossbar represents all groups of convolution kernels in one convolutional layer and performs the convolutional operation. An extremely parallel implementation of memristor crossbar-based CNN is presented in [40]. It uses a very sparse crossbar reproducing convolution kernels to implement the convolutional operation, and one feature map is convolved at a time. In [8], convolutional layers are mapped to resistive cross-point arrays, and the impacts of noises and bound limitations on the performance of the CNN are analyzed. In [36], a memristor-based fully convolutional network (MFCN) is put forward for semantic segmentation tasks. A fully hardware-implemented memristor-based CNN is presented in [46]. High-yield, high-performance, and uniform memristor crossbars are reported in [46], and an effective hybrid-training method which could adapt to device imperfections is put forward to train the memristor crossbar-based neural networks.

In this paper, memristor-based crossbar architectures, which have few elements in each synapse circuit and meanwhile approximately achieves many activation functions, for implementing memristor-based DNNs are presented. In the crossbars, signed weight values are achieved by subtracting the conductance values of memristors from that of reference resistors [32]. Nonlinear activation functions are approximately implemented through circuits. MMNN and MCNN are built by the presented crossbars, which also substantiate the effectiveness of the crossbars. The networks are trained by two kinds of in-situ update schemes, which are the fixed-voltage update and the approximately linear update. The performance of MMNN and MCNN trained by the two update schemes is analyzed. The robustness of the two networks to conductance variations, which are caused by the inherent characteristics of memristors and errors of programming voltages, is also explored.

The rest of the paper is organized as follows. Section “Memristor crossbar architectures” introduces the memristor model and the memristor crossbar architectures designed for fully connected (FC) layer, convolutional operation, and average pooling operation. Section “Operation of the memristor-based DNNs” introduces the operation of the DNNs built by the crossbars. Simulations and analyses are conducted in Section “Simulations and analysis”. Section “Conclusions” concludes the paper.

Memristor crossbar architectures

In this section, memristor crossbar architectures for DNNs are presented. Memristor crossbars perform vector-matrix multiplications, which are computational complicated operations in neural networks, in parallel through Kirchhoff’s law. This section first introduces the memristor model and then presents memristor crossbars for the FC layer, convolutional operation, and average pooling operation. These memristor-based crossbars could be used to build DNNs.

Memristor model

The memristor model is established to describe the behavior of realistic memristor in mathematical formula, and it can be used to explore characteristics of the memristor. It can also be adopted in simulations to speed up the system design. The HP model is [31]

$$\begin{aligned}&v(t)=i(t)R(t), \end{aligned}$$
(1)
$$\begin{aligned}&R(t)=R_{on}x(t)+R_{off}\left( 1-x(t)\right) , \end{aligned}$$
(2)

where R(t) is the resistance of the memristor, \(x\left( t\right) \) is the state variable, \(R_{on}\) and \(R_{off}\) are the internal low and high resistance of the memristor, respectively, and v(t) and i(t) are the voltage and the current, respectively. And

$$\begin{aligned}&x(t)=\frac{w\left( t\right) }{D}, \end{aligned}$$
(3)
$$\begin{aligned}&\frac{\mathrm{{d}}w\left( t\right) }{\mathrm{{d}}t}=\mu _v\frac{R_{on}}{D}i\left( t\right) f\left( x\left( t\right) \right) , \end{aligned}$$
(4)

where \(w\left( t\right) \) is the internal state variable, D is the thickness, and \(\mu _v\) is the average ion mobility. D and \(\mu _v\) are constants.

The HP model can not model characteristics of many realistic memristors precisely, therefore various memristor models have been put forward to describe behaviors of different memristors [17, 18, 37, 51]. A voltage controlled threshold model [51] that can fit realistic memristors is adopted in the paper

$$\begin{aligned} \frac{\mathrm{{d}}x\left( t\right) }{\mathrm{{d}}t}= {\left\{ \begin{array}{ll} \mu _v\frac{R_{on}}{D^2}\frac{i_{off}}{i(t)-i_0}f\left( x(t)\right) , &{}0<V_{on}<v(t) \\ 0, &{}V_{on}\le v(t)\le V_{off} \\ \mu _v\frac{R_{on}}{D^2}\frac{i(t)}{i_{on}}f\left( x(t)\right) , &{}v(t)<V_{off}<0 \end{array}\right. } \end{aligned}$$
(5)

where \(i_0\), \(i_{on}\), and \(i_{off}\) are constants, and \(f\left( x\left( t\right) \right) \) is the window function which is defined as

$$\begin{aligned} f\left( x\left( t\right) \right) =1-\left( 2x\left( t\right) -1\right) ^2. \end{aligned}$$
(6)

Memristor crossbar for FC layer

The FC layer is the basic unit to constitute MNN and is also an essential part of CNN. In the FC layer, inputs are weighted and summed, that is

$$\begin{aligned} y_j=f\left( \sum ^M_{i=1}W_{ji}x_i\right) , \end{aligned}$$
(7)

where \(x_i\) is the ith input, \(y_j\) is the jth output, \(W_{ji}\) is the weight value between the ith input unit and the jth output unit, M is the number of input units, and \(f(\cdot )\) is the activation function which could be the binary function, the sigmoid function, the rectified linear unit (ReLU), or the hyperbolic tangent (tanh) function.

Fig. 1
figure 1

The memristor crossbar for the FC layer. In the forward pass, \(V_{x,1}\) to \(V_{x,M}\) are input voltages representing input values of the layer, \(V_{o,1}\) to \(V_{o,N}\) are output voltages representing output values. In the training phase, \(V_{\delta ,1}\) to \(V_{\delta ,N}\) are errors to be back propagated and \(I_{\delta ,1}\) to \(I_{\delta ,M}\), \(I_{\delta b}\) are back propagated errors. TG is the transmission gate, and S and \({{\bar{S}}}\) are signals to switch inference and learning phases

The memristor crossbar for the FC layer is shown in Fig. 1. Through Kirchhoff’s law, the memristor crossbar implements the weighted summing up operation in (7), whose computation complexity is generally \(O(N^2)\), with computation complexity of O(1). In the inference phase, TGs (transmission gates) in the left column and the row below the memristor rows are closed and there is

$$\begin{aligned} V_f=-R_f\left( \sum ^M_{i=1}\frac{V_{x,i}}{R_s}+\frac{V_b}{R_s}\right) . \end{aligned}$$
(8)

\(V_{x,i}=V_r\cdot x_i\), where \(V_r\) is the read voltage and \(x_i\) is the original input value. The current of the jth column is

$$\begin{aligned} I_j=&\sum ^M_{i=1}\frac{V_{x,i}}{R_{i,j}}+\frac{V_b}{R_{M+1,j}}+\frac{V_f}{R_f} \nonumber \\ =&\sum ^M_{i=1}V_{x,i}\cdot \left( G_{i,j}-G_s\right) \nonumber \\&+V_b\cdot \left( G_{M+1,j}-G_s\right) , \end{aligned}$$
(9)

where \(G_s=1/R_s\) and \(G_{i,j}=1/R_{i,j}\ (j=1,2,\dots ,N)\) is the conductance of the memristor in the ith row and the jth column. The output voltage is

$$\begin{aligned} -V'_{o,j}=I_jR_{a}+\frac{V_cR_{a}}{R_c}. \end{aligned}$$
(10)

Through setting different values of \(R_a\) and \(R_c\), various activation functions can be approximately achieved. Denote the source voltages of amplifiers in the dotted box by \(V_s\) and \(V_d\). When the resistance of \(R_a\) is very large and \(V_s=0\), \(V_d=-1\mathrm V\), then it can be approximately obtained that

$$\begin{aligned} -V'_{o,j}= {\left\{ \begin{array}{ll} 1, &{}I_j>0 \\ 0. &{}I_j\le 0 \end{array}\right. } \end{aligned}$$
(11)

It is a binary activation function.

Set \(R_{a}={0.25\mathrm {V}}/\left( V_rr_{gw}\right) \), \(V_cR_a/R_c=0.5\mathrm V\), where V is volt, and then

$$\begin{aligned} -V'_{o,j}=0.25\frac{I_j}{V_rr_{gw}}\mathrm {V}+0.5\mathrm {V}, \end{aligned}$$
(12)

where \(I_j/\left( V_rr_{gw}\right) \) is the numerical value of the output of the jth column and \(r_{gw}\) is the ratio of the conductance value and the weight value. Let \(x=I_j/\left( V_rr_{gw}\right) \), \(y=-V'_{o,j}\), \(V_s=0\mathrm V\), and \(V_d=-1V\). Ignoring the voltage unit volt, there is

$$\begin{aligned} y= {\left\{ \begin{array}{ll} 1, &{}x>2 \\ 0.25x+0.5, &{}-2\le x\le 2 \\ 0. &{}x<-2 \end{array}\right. } \end{aligned}$$
(13)

This formula is an approximate realization of the sigmoid function [39].

Similarly, set \(R_{a}={1\mathrm {V}}/\left( V_rr_{gw}\right) \), \(V_c=0\), \(V_s=1\mathrm V\), and \(V_d=-1\mathrm V\), there is

$$\begin{aligned} y= {\left\{ \begin{array}{ll} 1, &{}x>1 \\ x, &{}-1\le x\le 1 \\ -1, &{}x<-1 \end{array}\right. } \end{aligned}$$
(14)

It approximately achieves the tanh function [35].

Set \(R_{a}={1\mathrm {V}}/\left( V_rr_{gw}\right) \), \(V_c=0\), \(V_s=0\), and \(V_d=-v_H\), and then

$$\begin{aligned} y= {\left\{ \begin{array}{ll} v_H, &{}x>v_H \\ x, &{}0\le x\le v_H \\ 0. &{}x<0 \end{array}\right. } \end{aligned}$$
(15)

It is an approximate realization of the ReLU function with an upper bound of \(v_H\).

Then

$$\begin{aligned} V_{o,j}=-\frac{R_1}{R_0}V'_{o,j}, \end{aligned}$$
(16)

where \(\frac{R_1}{R_0}\) is to rescale the amplitude of the output voltage to be within thresholds of memristors.

Fig. 2
figure 2

The activation function part of the classification layer, which also calculates the error. \(V_{o,j}\) is the jth activated output and \(V_{T,j}\) is the target value for \(V_{o,j}\). \(\varDelta V_j\) is the error between \(V_{o,j}\) and \(V_{T,j}\), and \(\varDelta V_j=V_{o,j}- V_{T,j}\)

For the classification layer, the activation function part of the crossbar is shown in Fig. 2, and it meanwhile calculates the error between the prediction and the target which is

$$\begin{aligned} \varDelta V_j=V_{o,j}-V_{T,j}. \end{aligned}$$
(17)

In the training phase, TGs in the first row and the right column are closed by setting S low level and \({{\bar{S}}}\) high level. And now the crossbar back propagates errors. \(V_{\delta ,1}\) to \(V_{\delta ,N}\) are errors to be back propagated and \(I_{\delta ,1}\) to \(I_{\delta ,M}\), \(I_{\delta ,b}\) are back propagated errors.

Memristor crossbar for convolutional operation

The convolutional operation uses several groups of convolution kernels to convolve feature maps, as shown in Fig. 3. The number of kernel groups is equal to the number of output feature maps. The size of each kernel is \(K_1\times K_2\), where \(K_1\) and \(K_2\) are the width and height of the kernel, respectively. The convolutional operation is

$$\begin{aligned} y^p_j=\sum ^M_{i=1}\sum ^{K_1}_{k_1=1}\sum ^{K_2}_{k_2=1}W_{k_1,k_2,i,j}x^p_{k_1,k_2,i}, \end{aligned}$$
(18)
Fig. 3
figure 3

The convolutional operation. There are three input feature maps, two output feature maps, and two groups of convolution kernels. Each kernel group contains three kernels and these kernels convolve three input feature maps and generate one output feature map

where \(y^p_j\ (j=1,2,\dots ,N)\) is the pth value in the jth output feature map, \(x^p_{k_1,k_2,i}\) is the value at the position \((k_1,k_2)\) of the pth receptive field in the ith input feature map, \(W_{k_1,k_2,i,j}\) is the weight value at the position \((k_1,k_2)\) of the ith kernel in the jth kernel groups, M is the number of input channels, and N is the number of output channels. Suppose the convolution stride is s, and the padding size is P, the dimension of the output feature map is

$$\begin{aligned} \left( N,\left[ \left( H_1-K_1+2P\right) /s\right] +1,\left[ \left( H_2-K_2+2P\right) /s\right] +1\right) ,\nonumber \\ \end{aligned}$$
(19)

where \(H_1\) and \(H_2\) is the width and the height of the input feature map, respectively, and \(\left[ \cdot \right] \) is the integral function.

Fig. 4
figure 4

The memristor crossbar for convolutional operation. It is considered as sliding windows. Each column contains M convolution kernels, and the number of columns N is the same as the number of output channels. \(V^p_{k,i}\ (k=1,2,\dots ,K_1\times K_2,\ i=1,2,\dots ,M,)\) is the kth input value from the pth receptive field in the ith input feature map, and \(K_1\) and \(K_2\) are the width and height of the kernel, respectively. \(V^p_{o,j}\ (j=1,2,\dots ,N)\) represents the pth value of the jth output feature map

There are two methods to implement the convolutional operation by means of memristor crossbar. One is to consider a compact memristor crossbar as a set of sliding windows that slide over input feature maps in turn to obtain the output feature map [39]. The other is to input an entire feature map to a sparse crossbar [40], but this method needs lots of redundant memristors and it is also challenging to make the conductance of the same convolution kernel the same. This paper adopts the first method whose crossbar scale is much smaller.

The memristor crossbar for convolutional operation is shown in Fig. 4. The current of the jth column is

$$\begin{aligned} I^p_j=&\sum ^M_{i=1}\sum ^{K_1\times K_2}_{k=1}\frac{V^p_{k,i}}{R_{k,i,j}}+\frac{V_b}{R_{M\times K_1\times K_2+1,j}}+\frac{V_f}{R_f} \nonumber \\ =&\sum ^M_{i=1}\sum ^{K_1\times K_2}_{k=1}V^p_{k,i}\cdot \left( G_{k,i,j}-G_s\right) \nonumber \\&+V_b\cdot \left( G_{M\times K_1\times K_2+1,j}-G_s\right) , \end{aligned}$$
(20)

where \(G_{k,i,j}\) is the conductance value of the memristor in the jth column that receives \(V^p_{k,i}\) and \(G_{M\times K_1\times K_2+1,j}\) is the conductance of the memristor in the jth column that receives \(V_b\). Then

$$\begin{aligned} V^p_{o,j}=\frac{R_1}{R_0}\left( I^p_jR_{a}+\frac{V_cR_{a}}{R_c}\right) , \end{aligned}$$
(21)

where j is also the index of the jth output feature map. Each column in the crossbar represents one kernel group.

Fig. 5
figure 5

The memristor array for average pooling operation. \(K_1\times K_2\) is the pooling kernel size and superscript \(p\ (p=1,2,\dots ,P)\) indicates the pth pooling region

Memristor array for average pooling operation

The average pooling operation is

$$\begin{aligned} y^p=\sum ^{K_1}_{i=1}\sum ^{K_2}_{j=1}\frac{x^p_{ij}}{K_1\times K_2}, \end{aligned}$$
(22)

where \(x^p_{ij}\) is the input value at the position (ij) of the pth receptive field, \(y^p\) is the output value of the pth receptive field, and \(K_1\) and \(K_2\) is the width and the height of the pooling kernel, respectively. This operation could be implemented by convolutional operation whose stride size is equal to the kernel size and all weight values are \(1/\left( K_1\times K_2\right) \).

The memristor array for average pooling operation is shown in Fig. 5. All memristors have resistance values of \(K_1\times K_2\times R_a\), where \(R_a\) represents the resistance of the resistor \(R_a\) in Fig. 5. The output voltage of each column is

$$\begin{aligned} V_{o,p}=\frac{\mathrm{R}_a \mathrm{R}_1}{\mathrm{R}_0}\sum ^{K_1\times K_2}_{i=1}\frac{V^p_i}{K_1\times K_2}, \end{aligned}$$
(23)

where \(p=1,2,\dots ,P\).

Operation of the memristor-based DNNs

The memristor-based DNNs are trained through the error back propagation (BP) algorithm. The memristors are updated in-situ according to the weight update value. The advantages of in-situ learning is that the learning process can adjust to hardware imperfections [4, 19, 47], and the memristors can be updated in parallel. The in-situ learning also provides a possible solution for completely on-chip learning.

Weight update schemes

Two kinds of weight update schemes are adopted to in-situ update memristors in the crossbar. They are the fixed-voltage update and the approximately linear update.

Fixed-voltage update

The fixed-voltage update means that the amplitudes and the duration of writing voltages are fixed. There are two kinds of writing voltages, which are the voltage to increase the conductance and the voltage to decrease the conductance, respectively, and they are different in sign and duration. Which one of them is used depends on the sign of the weight update value. This method is very easy to implement because there is no need to precisely convert weight update values to appropriate writing voltages which is a difficult process because of the nonlinearity of the conductance changing of the memristor [47, 48]. If \(\varDelta W\ge \sigma \), the corresponding memristor is applied the positive writing voltage, and if \(\varDelta W<-\sigma \), the memristor is applied the negative writing voltage, where \(\varDelta W\) is the weight update value and \(\sigma \) is a small non-negative constant to filter small update values. Because of the nonlinearity, the conductance updating values of all memristors are not the same. Denote the absolute values of the rising slope and the descending slope of the approximately linear region of conductance changing versus timing of writing voltages by \(k_r\) and \(k_d\), respectively. The ratio of the duration of the pulse to increase the conductance and that of the pulse to decrease the conductance is equal to \(k_d/k_r\).

Approximately linear update

The approximately linear update means that the middle approximately linear region of conductance changing of the memristor is adopted to represent most weight values [23]. The desired conductance update value \(\varDelta {G}=\varDelta {W}\cdot r_{gw}\). The desired voltage duration for adjusting the memristor is approximately calculated as \(\varDelta {G}/k_r\) for increasing or \(\varDelta {G}/k_d\) for decreasing the conductance.

Fig. 6
figure 6

The row-parallel updating method. \(V^+_w>V_{on}>\frac{1}{2}V^+_w>0\) and \(V^-_w<V_{off}<\frac{1}{2}V^-_w<0\), and voltages are divided into two phases. Memristors in the second row are to be updated. Amplitudes of voltages across the first and the third memristors in the second row are both \(V^+_w\), so their conductance is increased in the first phases of column voltages. The amplitude of the voltage across the second memristor in the second row is \(V^-_w\), so its conductance is decreased in the second phase. The conductance of other memristors remains unchanged because their voltages do not exceed threshold voltages. The amount of the change of conductance is determined by the duration of the column voltage

Update memristors in the crossbar

Memristors are updated by applying voltages with appropriate durations through a row-parallel updating method. Illustrate the method by Fig. 6 in which the conductance of \(M_1\) and \(M_3\) needs to be increased and that of \(M_2\) needs to be decreased. The voltages are divided into two phases. The first phase is to increase the conductance and the second phase is to decrease the conductance. \(V^+_w\) and \(V^-_w\) satisfy that \(V^+_w>V_{on}>\frac{1}{2}V^+_w>0\) and \(V^-_w<V_{off}<\frac{1}{2}V^-_w<0\). In the first phase, the amplitude of the row voltage is \(-\frac{1}{2}V^+_w\) column voltages for increasing the conductance are all \( \frac{1}{2}V^ +_w\). So only voltages across \(M_1\) and \(M_3\) are beyond the positive threshold voltage of the memristor. In the second phase, the amplitude of the row voltage is \(-\frac{1}{2}V^-_w\) and amplitudes of column voltages for decreasing the conductance are all \(\frac{1}{2}V^-_w\). So only the voltage across \(M_2\) is below the negative threshold voltage. Therefore only memristors in the second row are updated and the rest remain unchanged. For the fixed-voltage update, pulse durations of columns voltages do not vary, but for the approximately linear update, pulse durations are related to weight update values.

The BP training

The BP training of the network is completed through following steps

  1. 1.

    Reset all memristors to \(R_{off}\) by applying reset voltages \(V^-_w\), and then adjust the conductance to the approximately linear region by \(V^+_w\) with appropriate timing.

  2. 2.

    S is set high level and TGs in the left column in the memristor crossbar are closed. Feed input voltages to the DNNs and obtain errors through (17). Then the loss is calculated as

    $$\begin{aligned} {\mathcal {L}}=\frac{1}{2}\sum ^C_{j=1}|V_{o,j}-V_{T,j}|^2, \end{aligned}$$
    (24)

    where C is the number of classes. Or

    $$\begin{aligned} {\mathcal {L}}=\frac{1}{2}\Vert {\mathbf {V}}_o-{\mathbf {V}}_T\Vert ^2_2, \end{aligned}$$
    (25)

    where \({\mathbf {V}}_o\) is the final output voltage vector and \({\mathbf {V}}_T\) is the target voltage vector.

  3. 3.

    Back propagate errors from the (\(l+1\))th layer to the lth layer through weights of the (\(l+1\))th layer.

    For the FC layer, the error voltage vector of the lth layer is

    $$\begin{aligned} \varDelta {\mathbf {V}}^{(l)}= {\left\{ \begin{array}{ll} \left( {\mathbf {V}}_o-{\mathbf {V}}_T\right) \odot f'_l\left( {\mathbf {V}}_z\right) ,\text { if }l=L \\ \left( \left( {\mathbf {W}}^{(l+1)}\right) ^\mathrm{T}\varDelta {\mathbf {V}}^{(l+1)}\right) \odot f'_l\left( {\mathbf {V}}^{(l)}_z\right) ,l<L\nonumber \end{array}\right. }\\ \end{aligned}$$
    (26)
    Table 1 Simulation parameters

    where \({\mathbf {W}}^{(l+1)}\) is the weight matrix of the \((l+1)\)th layer, L is the number of layers, \(f'_l(\cdot )\) is the derivation of the activation function in the lth layer, \({\mathbf {V}}^{(l)}_z\) is the unactivated output voltage vector, and \(\odot \) is the element-wise multiplication. The backpropagation can be implemented through the memristor crossbar with S being low level, and now the columns are fed error voltages and the rows output propagated values.

    For the convolution layer, there is

    $$\begin{aligned} \varDelta {\mathbf {V}}^{(l)}=\varDelta {\mathbf {V}}^{(l+1)}\otimes \mathbf {rot180}\left( {\mathbf {W}}^{(l+1)}\right) \odot f'_l\left( {\mathbf {V}}^{(l)}_z\right) ,\nonumber \\ \end{aligned}$$
    (27)

    where \({\mathbf {W}}^{(l+1)}\) is one kernel in the \((l+1)\)th layer, \(\varDelta {\mathbf {V}}^{(l+1)}\) is the corresponding receptive field in the error matrix, \(\otimes \) is the convolution operation, and \(\mathbf {rot180}(\cdot )\) is the function to rotate the matrix 180 degrees. The backpropagation is implemented through weights read out from the crossbar.

  4. 4.

    Determine the weight update values. For the FC layer

    $$\begin{aligned} \varDelta {\mathbf {W}}^{(l)}=\varDelta {\mathbf {V}}^{(l)}\left( {\mathbf {V}}^{(l-1)}_o\right) ^\mathrm{T}, \end{aligned}$$
    (28)

    where \({\mathbf {V}}^{(l-1)}_o\) is the output voltage vector of the \((l-1)\)th layer.

    For the convolution layer

    $$\begin{aligned} \varDelta {\mathbf {W}}^{(l)}={\mathbf {V}}^{(l-1)}_o\otimes \varDelta {\mathbf {V}}^{(l)}. \end{aligned}$$
    (29)
  5. 5.

    Determine desired writing voltages. For the fixed-voltage update, the pulse durations of writing voltages are

    $$\begin{aligned} {\mathbf {t}}_{inc}&=\left( \varDelta {\mathbf {W}}\ge \sigma \right) \cdot t^+_0, \end{aligned}$$
    (30)
    $$\begin{aligned} {\mathbf {t}}_{dec}&=\left( \varDelta {\mathbf {W}}<-\sigma \right) \cdot t^-_0, \end{aligned}$$
    (31)

    where \(t^+_0\) and \(t^-_0\) are pulse durations of writing voltages for increasing and decreasing conductance, respectively, and amplitudes of these two voltages are \(V^+_w\) and \(V^-_w\), respectively.

    For the approximately linear update, conductance update values of memristors are

    $$\begin{aligned} \varDelta {\mathbf {G}}=\varDelta {\mathbf {W}}\cdot \ r_{gw}. \end{aligned}$$
    (32)

    Then desired pulse durations of writing voltages are

    $$\begin{aligned}&{\mathbf {t}}_{inc}=\left[ \varDelta {\mathbf {G}}\ge \left( r_{gw}\cdot \sigma \right) \right] /k_r, \end{aligned}$$
    (33)
    $$\begin{aligned}&{\mathbf {t}}_{dec}=\left[ \varDelta {\mathbf {G}}<\left( -r_{gw}\cdot \sigma \right) \right] /k_d. \end{aligned}$$
    (34)
  6. 6.

    S is set low level. Apply desired writing voltages to memristors to update their conductance through the introduced weight update schemes.

  7. 7.

    Repeat Step 2 to Step 6 until the loss is smaller than a predefined threshold value.

Because the main purpose of the paper is to evaluate the performance of the memristor-based DNNs constructed by the presented memristor crossbars along with the weight update scheme, the input and the intermediate data of the convolution operation are processed and stored in peripheral digital circuit, update values and so is the calculation of the desired conductance update values and durations of writing voltages. The BP process can also be achieved by analog circuit [15] and the conductance update value can also be determined by look up table (LUT) [49].

Simulations and analysis

MMNN and MCNN are built in this section by the presented memristor crossbars to carry out simulation experiments. The effectiveness of the circuits is substantiated in SPICE. The circuits and the learning process are also evaluated in Matlab under hardware defined constraints. The parameters of simulations are listed in Table 1. In the forward pass, the activation functions are the pseudo formulas (13), (14), (15), and in the backward pass, they are based on their original formulas.

Results of MMNN

Two-layer neural networks are built based on the memristor-based crossbar in Fig. 1 for XOR operation and digits recognition on MNIST (Modified National Institute of Standards and Technology) [44] dataset, respectively.

Table 2 The truth table of the XOR operation
Fig. 7
figure 7

Variations of input and output voltages with training cycles of the XOR operation. After about 36 training cycles, the network correctly performs the XOR operation (on the right side of the dotted line). a and b Input voltages. c Output voltages

The MMNN for XOR operation has two input units, three hidden units, and one output unit [50] and is trained by the approximately linear update scheme. The activation function is the binary function. The truth table of XOR operation is shown in Table 2. Variations of input and output voltages with training cycles of the XOR operation is shown in Fig. 7. After about 36 training cycles, the network correctly performs the XOR operation. The power consumption of crossbars for XOR operation is measured 2.18 mW in SPICE in the inference phase. But the total consumed energy is very low because the inference time is very short, which is nanosecond scale. If smaller input voltages and memristors with larger resistance are adopted, the power consumption can be reduced further.

Fig. 8
figure 8

Samples in each class of MNIST dataset. 10 samples per row belong to one class

Fig. 9
figure 9

Training accuracy, training loss, test accuracy, and test loss of MMNN trained by the fixed-voltage update scheme

Fig. 10
figure 10

Training accuracy, training loss, test accuracy, and test loss of MMNN trained by the approximately linear update scheme

The MMNN for digits recognition on MNIST has 784 input units, 256 hidden units, and 10 output units. The MNIST dataset contains handwritten digits from 0 to 9. There are total 60,000 training samples and 10,000 test samples of ten classes. Samples of each class are shown in Fig. 8. The input values are first converted to voltages among \([-V_r,V_r]\) through digital to analog converters (DACs), and then they are input to the memristor crossbars. The curves of training accuracy, training loss, test accuracy, and test loss versus training epochs under the two kinds of weight update schemes are shown in Figs. 9,  10, respectively. The classification accuracy of the fixed-voltage update is 96.42% and that of the approximately linear update is 96.29%.

Results of MCNN

Table 3 The architecture of MCNN

The architecture of MCNN in simulations is shown in Table 3 [39]. Conv1 is the first convolutional layer and Avgpool1 is the first average pooling layer. \(5\times 5\), 6, \(s=1\) means that the kernel size is \(5\times 5\), the number of output channels is 6, and the convolution stride is 1.

Fig. 11
figure 11

Samples in each class of CIFAR-10 dataset. 10 samples per row belong to the same class

MNIST and CIFAR-10 [16] datasets are adopted to substantiate the effectiveness of MCNN. CIFAR-10 is a widely used benchmark for image recognition. It contains 50,000 color training images and 10,000 test images of 10 classes, and samples of each class are shown in Fig. 11. The classification results of MCNN trained by the approximately linear update on the two datasets are shown in Figs. 12,  13, respectively. The final test accuracies of MNIST and CIFAR-10 are about 98.98% and 60.38%, respectively. MCNN is also trained by the fixed-voltage update scheme on MNIST and the test accuracy is 97.82%.

Fig. 12
figure 12

Training loss, training accuracy, test loss, and test accuracy of MCNN on MNIST

Fig. 13
figure 13

Training loss, training accuracy, test loss, and test accuracy of MCNN on CIFAR-10

Table 4 Classification Results of MMNN
Table 5 Classification results of MCNN

Results analysis

Classification results of MMNN and MCNN on MNIST are listed in Tables 4 and 5 and they are obtained by running the multiple cross-validation. It is seen that MCNN has better results than MMNN, and it can also be seen that the approximately linear update performs better than the fixed-voltage update. The confusion matrices of classification results of MNIST and Fashion-MNIST obtained by MMNN and MCNN are shown in Fig. 14.

Fig. 14
figure 14

Confusion matrices of classification results. a The confusion matrix of the classification result of MNIST obtained by MMNN trained by the fixed-voltage update scheme (expressed as MNIST-MMNN-fixed-voltage update). b MNIST-MMNN-approximately linear update. c Fashion-MNIST-MMNN-fixed-voltage update. d Fashion-MNIST-MMNN-approximately linear update. e MNIST-MCNN-fixed-voltage update. f MNIST-MCNN-approximately linear update. g Fashion-MNIST-MCNN-fixed-voltage update. h Fashion-MNIST-MCNN-approximately linear update

Robustness analysis

For the fixed-voltage update, the duration of the writing voltage has an impact on the performance. Test errors of MMNN trained on MNIST by writing voltages with different pulse durations are shown in Fig. 15. In Fig. 15, the duration time is that of the voltage to decrease conductance. It is seen that the test error becomes large if the pulse duration is very large.

Because of the inherent characteristics of memristors, there are cycle-to-cycle (C2C) and device-to-device (D2D) variations in conductance adjustment. And the errors of writing voltages also result in conductance variations. To evaluate impacts of these variations on the performance of the networks, Gaussian noises with means 0 and standard deviations from 0 to \(12\%\) of the conductance value are considered as conductance variations in the approximately linear update. The conductance value after updating is \(G_{new}=\left( G_{old}+\varDelta G\right) \left( 1+s\right) \) [15], where \(G_{old}\) is the conductance before update and s is the noise level. Test errors of MMNN and MCNN under different variation levels are shown in Figs. 1617 respectively. It is seen that as the variation degree increases, the test error increases.

Computing complexity analysis

In the FC layer, the computation complexity of the vector-matrix multiplication is generally \(O(N^2)\). In the memristor crossbar-based FC layer, the vector-matrix multiplication is performed with complexity O(1). Activation functions are also performed at the same time in the circuit. In the convolutional layer, an efficient way to perform the convolutional operation is to convert it to matrix multiplication whose complexity is generally \(O(N^3)\). The conversion is also needed for the memristor crossbar-based convoltional operation, and it is realized outside the crossbar. The advantage of the memristor crossbar-based convolutional operation is that it reduces the complexity of the matrix multiplication to O(N). In the average pooling layer, the average pooling operation is performed with complexity O(1). The weights are stored in the conductance of memristors and the vector-matrix multiplication is performed in-memory. The intermediate data of the convolutional operation is stored in the external storage which could also be realized by the memristor array.

Fig. 15
figure 15

Test errors of MMNN trained on MNIST by writing voltages with different pulse durations in the fixed-voltage update scheme

Fig. 16
figure 16

Test errors of MMNN trained through the approximately linear update scheme on MNIST under different conductance variations

Fig. 17
figure 17

Test errors of MCNN trained through the approximately linear update scheme on MNIST under different conductance variations

Comparisons

Table 6 Comparisons of MMNN with software-based and other memristor-based MNNs
Table 7 Comparisons of MCNN with software-based and other memristor-based CNNs

Comparisons of MMNN with software-based MNN and other memristor-based MNNs [19, 41, 48, 50] are shown in Table 6. Comparisons of MCNN with software-based CNN and other memristor-based CNNs [39, 41, 49] are shown in Table 7. It shows that the MMNN and MCNN built by the presented memristor-based crossbars and trained in-situ by the two kinds of weight update schemes have advantages in circuit functions and classification results compared with other memristor-based neural network circuits.

Conclusions

This paper presents memristor crossbar architectures for implementation of DNNs, which include architectures for the FC layer, convolutional operation, and average pooling operation. MMNN and MCNN are built to evaluate the performance of these memristor crossbar architectures. The networks are in-situ trained by two kinds of weight update schemes, which are the fixed-voltage update and the approximately linear update, and simulation results show that the networks trained by the weight update schemes result in satisfying performance. The robustness of MMNN and MCNN to conductance variations of memristors is also analyzed. In summary, the memristor-based DNNs constructed by presented memristor crossbars perform satisfactorily in pattern recognition tasks and have certain robustness to imperfections of hardware.