Memristor crossbar architectures for implementing deep neural networks

The paper presents memristor crossbar architectures for implementing layers in deep neural networks, including the fully connected layer, the convolutional layer, and the pooling layer. The crossbars achieve positive and negative weight values and approximately realize various nonlinear activation functions. Then the layers constructed by the crossbars are adopted to build the memristor-based multi-layer neural network (MMNN) and the memristor-based convolutional neural network (MCNN). Two kinds of in-situ weight update schemes, which are the fixed-voltage update and the approximately linear update, respectively, are used to train the networks. Consider variations resulted from the inherent characteristics of memristors and the errors of programming voltages, the robustness of MMNN and MCNN to these variations is analyzed. The simulation results on standard datasets show that deep neural networks (DNNs) built by the memristor crossbars work satisfactorily in pattern recognition tasks and have certain robustness to memristor variations.

A memristor bridge synapse-based neural network is proposed in [3]. The memristor bridge synapse achieves positive or negative synaptic weight value using four memristors. A modified chip-in-the-loop learning scheme is put forward to train the network. In [4], a memristor-based SNN is presented and it is trained with ex-situ and in-situ methods. The signed weight value is achieved by subtracting the conductance value of one memristor from that of another memristor. The results show that memristor-based networks are promising implementation of neuromorphic computing systems. In [10], memristor crossbar-based neural network with on-chip back propagation (BP) training is presented. Memristor-based multi-layer neural networks with online gradient descent training are presented in [30]. The network uses one memristor and two CMOS transistors to construct one synapse. Compared with COMS-based counterparts, the memristor-based MNNs [30] consume between 2% and 8% of the area and static power. In [48], a sign backpropagation (SBP) method is proposed to train the resistive random access memory (RRAM)-based neural networks. In [50], circuit design for memristor-based MNNs is presented and a modified BP algorithm is adopted to train the networks. In [19], the in-situ learning capability of the MNN based on the hafnium oxide-based memristor crossbar is experimentally demonstrated. In [49], memristor-based quantized neural networks are presented. The weights are quantized to accelerate the operation of the neural networks.
Besides memristor-based MNNs, there are also works for memristor-based CNNs. In [39], a memristor-based CNN is presented, which is the first time that the memristor-based circuit implements CNN. One memristor crossbar represents all groups of convolution kernels in one convolutional layer and performs the convolutional operation. An extremely parallel implementation of memristor crossbar-based CNN is presented in [40]. It uses a very sparse crossbar reproducing convolution kernels to implement the convolutional operation, and one feature map is convolved at a time. In [8], convolutional layers are mapped to resistive cross-point arrays, and the impacts of noises and bound limitations on the performance of the CNN are analyzed. In [36], a memristorbased fully convolutional network (MFCN) is put forward for semantic segmentation tasks. A fully hardware-implemented memristor-based CNN is presented in [46]. High-yield, highperformance, and uniform memristor crossbars are reported in [46], and an effective hybrid-training method which could adapt to device imperfections is put forward to train the memristor crossbar-based neural networks.
In this paper, memristor-based crossbar architectures, which have few elements in each synapse circuit and meanwhile approximately achieves many activation functions, for implementing memristor-based DNNs are presented. In the crossbars, signed weight values are achieved by subtracting the conductance values of memristors from that of reference resistors [32]. Nonlinear activation functions are approximately implemented through circuits. MMNN and MCNN are built by the presented crossbars, which also substantiate the effectiveness of the crossbars. The networks are trained by two kinds of in-situ update schemes, which are the fixed-voltage update and the approximately linear update. The performance of MMNN and MCNN trained by the two update schemes is analyzed. The robustness of the two networks to conductance variations, which are caused by the inherent characteristics of memristors and errors of programming voltages, is also explored.
The rest of the paper is organized as follows. Section "Memristor crossbar architectures" introduces the memristor model and the memristor crossbar architectures designed for fully connected (FC) layer, convolutional operation, and average pooling operation. Section "Operation of the memristor-based DNNs" introduces the operation of the DNNs built by the crossbars. Simulations and analyses are conducted in Section "Simulations and analysis". Section "Conclusions" concludes the paper.

Memristor crossbar architectures
In this section, memristor crossbar architectures for DNNs are presented. Memristor crossbars perform vector-matrix multiplications, which are computational complicated operations in neural networks, in parallel through Kirchhoff's law. This section first introduces the memristor model and then presents memristor crossbars for the FC layer, convolutional operation, and average pooling operation. These memristorbased crossbars could be used to build DNNs.

Memristor model
The memristor model is established to describe the behavior of realistic memristor in mathematical formula, and it can be used to explore characteristics of the memristor. It can also be adopted in simulations to speed up the system design. The HP model is [31] where R(t) is the resistance of the memristor, x (t) is the state variable, R on and R of f are the internal low and high resistance of the memristor, respectively, and v(t) and i(t) are the voltage and the current, respectively. And where w (t) is the internal state variable, D is the thickness, and μ v is the average ion mobility. D and μ v are constants. The HP model can not model characteristics of many realistic memristors precisely, therefore various memristor models have been put forward to describe behaviors of different memristors [17,18,37,51]. A voltage controlled threshold model [51] that can fit realistic memristors is adopted in the paper where i 0 , i on , and i of f are constants, and f (x (t)) is the window function which is defined as

Memristor crossbar for FC layer
The FC layer is the basic unit to constitute MNN and is also an essential part of CNN. In the FC layer, inputs are weighted and summed, that is where x i is the ith input, y j is the jth output, W ji is the weight value between the ith input unit and the jth output unit, M is the number of input units, and f (·) is the activation function which could be the binary function, the sigmoid function, the rectified linear unit (ReLU), or the hyperbolic tangent (tanh) function. The memristor crossbar for the FC layer is shown in Fig. 1. Through Kirchhoff's law, the memristor crossbar implements the weighted summing up operation in (7), whose computation complexity is generally O(N 2 ), with computation complexity of O(1). In the inference phase, TGs (transmission gates) in the left column and the row below the memristor rows are closed and there is where V r is the read voltage and x i is the original input value. The current of the jth column is is the conductance of the memristor in the ith row and the jth column. The output voltage is Through setting different values of R a and R c , various activation functions can be approximately achieved. Denote the source voltages of amplifiers in the dotted box by V s and V d . When the resistance of R a is very large and V s = 0, V d = −1V, then it can be approximately obtained that It is a binary activation function.
where I j / V r r gw is the numerical value of the output of the jth column and r gw is the ratio of the conductance value and the weight value. Let This formula is an approximate realization of the sigmoid function [39].
It approximately achieves the tanh function [35].
It is an approximate realization of the ReLU function with an upper bound of v H . Then where R 1 R 0 is to rescale the amplitude of the output voltage to be within thresholds of memristors.
For the classification layer, the activation function part of the crossbar is shown in Fig. 2, and it meanwhile calculates the error between the prediction and the target which is In the training phase, TGs in the first row and the right column are closed by setting S low level andS high level. And now the crossbar back propagates errors. V δ,1 to V δ,N are errors to be back propagated and I δ,1 to I δ,M , I δ,b are back propagated errors.

Fig. 2
The activation function part of the classification layer, which also calculates the error. V o, j is the jth activated output and V T , j is the target value for V o, j . ΔV j is the error between V o, j and V T , j , and

Memristor crossbar for convolutional operation
The convolutional operation uses several groups of convolution kernels to convolve feature maps, as shown in Fig. 3. The number of kernel groups is equal to the number of output feature maps. The size of each kernel is K 1 × K 2 , where K 1 and K 2 are the width and height of the kernel, respectively. The convolutional operation is where y p j ( j = 1, 2, . . . , N ) is the pth value in the jth output feature map, x p k 1 ,k 2 ,i is the value at the position (k 1 , k 2 ) of the pth receptive field in the ith input feature map, W k 1 ,k 2 ,i, j is the weight value at the position (k 1 , k 2 ) of the ith kernel in the jth kernel groups, M is the number of input channels, and N is the number of output channels. Suppose the convolution stride is s, and the padding size is P, the dimension of the output feature map is where H 1 and H 2 is the width and the height of the input feature map, respectively, and [·] is the integral function.
There are two methods to implement the convolutional operation by means of memristor crossbar. One is to consider Fig. 3 The convolutional operation. There are three input feature maps, two output feature maps, and two groups of convolution kernels. Each kernel group contains three kernels and these kernels convolve three input feature maps and generate one output feature map a compact memristor crossbar as a set of sliding windows that slide over input feature maps in turn to obtain the output feature map [39]. The other is to input an entire feature map to a sparse crossbar [40], but this method needs lots of redundant memristors and it is also challenging to make the conductance of the same convolution kernel the same. This paper adopts the first method whose crossbar scale is much smaller.
The memristor crossbar for convolutional operation is shown in Fig. 4. The current of the jth column is where G k,i, j is the conductance value of the memristor in the jth column that receives V p k,i and G M×K 1 ×K 2 +1, j is the conductance of the memristor in the jth column that receives V b . Then Fig. 4 The memristor crossbar for convolutional operation. It is considered as sliding windows. Each column contains M convolution kernels, and the number of columns N is the same as the number of output is the kth input value from the pth receptive field in the ith input feature map, and K 1 and K 2 are the width and height of the kernel, respectively.
where j is also the index of the jth output feature map. Each column in the crossbar represents one kernel group.

Memristor array for average pooling operation
The average pooling operation is where x p i j is the input value at the position (i, j) of the pth receptive field, y p is the output value of the pth receptive field, and K 1 and K 2 is the width and the height of the pooling kernel, respectively. This operation could be implemented by convolutional operation whose stride size is equal to the kernel size and all weight values are 1/ (K 1 × K 2 ).
The memristor array for average pooling operation is shown in Fig. 5. All memristors have resistance values of where R a represents the resistance of the resistor R a in Fig. 5. The output voltage of each column is where p = 1, 2, . . . , P.

Operation of the memristor-based DNNs
The memristor-based DNNs are trained through the error back propagation (BP) algorithm. The memristors are updated in-situ according to the weight update value. The advantages of in-situ learning is that the learning process can adjust to hardware imperfections [4,19,47], and the memristors can be updated in parallel. The in-situ learning also provides a possible solution for completely on-chip learning.

Weight update schemes
Two kinds of weight update schemes are adopted to in-situ update memristors in the crossbar. They are the fixed-voltage update and the approximately linear update.

Fixed-voltage update
The fixed-voltage update means that the amplitudes and the duration of writing voltages are fixed. There are two kinds of writing voltages, which are the voltage to increase the conductance and the voltage to decrease the conductance, respectively, and they are different in sign and duration. Which one of them is used depends on the sign of the weight update value. This method is very easy to implement because there is no need to precisely convert weight update values to appropriate writing voltages which is a difficult process because of the nonlinearity of the conductance changing of the memristor [47,48]. If ΔW ≥ σ , the corresponding memristor is applied the positive writing voltage, and if ΔW < −σ , the memristor is applied the negative writing voltage, where ΔW is the weight update value and σ is a small non-negative constant to filter small update values. Because of the nonlinearity, the conductance updating values of all memristors are not the same. Denote the absolute values of the rising slope and the descending slope of the approximately linear region of conductance changing versus timing of writing voltages by k r and k d , respectively. The ratio of the duration of the pulse to increase the conductance and that of the pulse to decrease the conductance is equal to k d /k r .

Approximately linear update
The approximately linear update means that the middle approximately linear region of conductance changing of the memristor is adopted to represent most weight values [23]. The desired conductance update value ΔG = ΔW · r gw . The desired voltage duration for adjusting the memristor is approximately calculated as ΔG/k r for increasing or ΔG/k d for decreasing the conductance.

Update memristors in the crossbar
Memristors are updated by applying voltages with appropriate durations through a row-parallel updating method. Illustrate the method by Fig. 6 in which the conductance of M 1 and M 3 needs to be increased and that of M 2 needs Fig. 6 The row-parallel updating method.
, and voltages are divided into two phases. Memristors in the second row are to be updated. Amplitudes of voltages across the first and the third memristors in the second row are both V + w , so their conductance is increased in the first phases of column voltages. The amplitude of the voltage across the second memristor in the second row is V − w , so its conductance is decreased in the second phase. The conductance of other memristors remains unchanged because their voltages do not exceed threshold voltages. The amount of the change of conductance is determined by the duration of the column voltage to be decreased. The voltages are divided into two phases. The first phase is to increase the conductance and the second phase is to decrease the conductance.
In the first phase, the amplitude of the row voltage is − 1 2 V + w column voltages for increasing the conductance are all 1 2 V + w . So only voltages across M 1 and M 3 are beyond the positive threshold voltage of the memristor. In the second phase, the amplitude of the row voltage is − 1 2 V − w and amplitudes of column voltages for decreasing the conductance are all 1 2 V − w . So only the voltage across M 2 is below the negative threshold voltage. Therefore only memristors in the second row are updated and the rest remain unchanged. For the fixed-voltage update, pulse durations of columns voltages do not vary, but for the approximately linear update, pulse durations are related to weight update values.

The BP training
The BP training of the network is completed through following steps 1. Reset all memristors to R of f by applying reset voltages V − w , and then adjust the conductance to the approximately linear region by V + w with appropriate timing. 2. S is set high level and TGs in the left column in the memristor crossbar are closed. Feed input voltages to the DNNs and obtain errors through (17). Then the loss is calculated as where C is the number of classes. Or where V o is the final output voltage vector and V T is the target voltage vector. 3. Back propagate errors from the (l + 1)th layer to the lth layer through weights of the (l + 1)th layer. For the FC layer, the error voltage vector of the lth layer is where W (l+1) is the weight matrix of the (l + 1)th layer, L is the number of layers, f l (·) is the derivation of the activation function in the lth layer, V (l) z is the unactivated output voltage vector, and is the element-wise multiplication. The backpropagation can be implemented through the memristor crossbar with S being low level, and now the columns are fed error voltages and the rows output propagated values. For the convolution layer, there is where W (l+1) is one kernel in the (l + 1)th layer, ΔV (l+1) is the corresponding receptive field in the error matrix, ⊗ is the convolution operation, and rot180(·) is the function to rotate the matrix 180 degrees. The backpropagation is implemented through weights read out from the crossbar.

Determine the weight update values. For the FC layer
where V (l−1) o is the output voltage vector of the (l − 1)th layer. For the convolution layer Approximately linear region where t + 0 and t − 0 are pulse durations of writing voltages for increasing and decreasing conductance, respectively, and amplitudes of these two voltages are V + w and V − w , respectively. For the approximately linear update, conductance update values of memristors are Then desired pulse durations of writing voltages are 6. S is set low level. Apply desired writing voltages to memristors to update their conductance through the introduced weight update schemes. 7. Repeat Step 2 to Step 6 until the loss is smaller than a predefined threshold value.
Because the main purpose of the paper is to evaluate the performance of the memristor-based DNNs constructed by the presented memristor crossbars along with the weight update scheme, the input and the intermediate data of the convolution operation are processed and stored in peripheral digital circuit, update values and so is the calculation of the desired conductance update values and durations of writing voltages. The BP process can also be achieved by analog circuit [15] and the conductance update value can also be determined by look up table (LUT) [49].   Table 1. In the forward pass, the activation functions are the pseudo formulas (13), (14), (15), and in the backward pass, they are based on their original formulas.

Results of MMNN
Two-layer neural networks are built based on the memristorbased crossbar in Fig. 1 for XOR operation and digits recognition on MNIST (Modified National Institute of Standards and Technology) [44] dataset, respectively. The MMNN for XOR operation has two input units, three hidden units, and one output unit [50] and is trained by the approximately linear update scheme. The activation function is the binary function. The truth table of XOR operation is shown in Table 2. Variations of input and output voltages with training cycles of the XOR operation is shown in Fig. 7. After about 36 training cycles, the network correctly performs the XOR operation. The power consumption of crossbars for XOR operation is measured 2.18 mW in SPICE in the inference phase. But the total consumed energy is very low because the inference time is very short, which is nanosecond scale. If smaller input voltages and memristors  with larger resistance are adopted, the power consumption can be reduced further.
The MMNN for digits recognition on MNIST has 784 input units, 256 hidden units, and 10 output units. The MNIST dataset contains handwritten digits from 0 to 9. There are total 60,000 training samples and 10,000 test samples of ten classes. Samples of each class are shown in Fig. 8. The input values are first converted to voltages among

Results of MCNN
The architecture of MCNN in simulations is shown in Table 3 [39]. Conv1 is the first convolutional layer and Avg-pool1 is the first average pooling layer. 5 × 5, 6, s = 1 means that the kernel size is 5 × 5, the number of output channels is 6, and the convolution stride is 1.  MNIST and CIFAR-10 [16] datasets are adopted to substantiate the effectiveness of MCNN. CIFAR-10 is a widely used benchmark for image recognition. It contains 50,000 color training images and 10,000 test images of 10 classes, and samples of each class are shown in Fig. 11. The classification results of MCNN trained by the approximately linear update on the two datasets are shown in Figs. 12, 13, respectively. The final test accuracies of MNIST and CIFAR-10 are about 98.98% and 60.38%, respectively. MCNN is also trained by the fixed-voltage update scheme on MNIST and the test accuracy is 97.82%.

Results analysis
Classification results of MMNN and MCNN on MNIST are listed in Tables 4 and 5 and they are obtained by running the multiple cross-validation. It is seen that MCNN has better results than MMNN, and it can also be seen that the approximately linear update performs better than the fixed-voltage update. The confusion matrices of classi-

Robustness analysis
For the fixed-voltage update, the duration of the writing voltage has an impact on the performance. Test errors of MMNN trained on MNIST by writing voltages with different pulse durations are shown in Fig. 15. In Fig. 15, the duration time is that of the voltage to decrease conductance. It is seen that the test error becomes large if the pulse duration is very large.
Because of the inherent characteristics of memristors, there are cycle-to-cycle (C2C) and device-to-device (D2D) variations in conductance adjustment. And the errors of writing voltages also result in conductance variations. To evaluate impacts of these variations on the performance of the networks, Gaussian noises with means 0 and standard deviations from 0 to 12% of the conductance value are considered as conductance variations in the approximately linear update. The conductance value after updating is G new = (G old + ΔG) (1 + s) [15], where G old is the conductance before update and s is the noise level. Test errors of MMNN and MCNN under different variation levels are shown in Figs. 16, 17 respectively. It is seen that as the variation degree increases, the test error increases.

Computing complexity analysis
In the FC layer, the computation complexity of the vectormatrix multiplication is generally O(N 2 ). In the memristor crossbar-based FC layer, the vector-matrix multiplication is performed with complexity O (1). Activation functions are also performed at the same time in the circuit. In the convolutional layer, an efficient way to perform the convolutional operation is to convert it to matrix multiplication whose complexity is generally O(N 3 ). The conversion is also needed for  the memristor crossbar-based convoltional operation, and it is realized outside the crossbar. The advantage of the memristor crossbar-based convolutional operation is that it reduces the complexity of the matrix multiplication to O(N ). In the average pooling layer, the average pooling operation is performed with complexity O (1). The weights are stored in the conductance of memristors and the vector-matrix multiplication is performed in-memory. The intermediate data of the convolutional operation is stored in the external storage which could also be realized by the memristor array.

Comparisons
Comparisons of MMNN with software-based MNN and other memristor-based MNNs [19,41,48,50] are shown in Table 6. Comparisons of MCNN with software-based CNN and other memristor-based CNNs [39,41,49] are shown in Table 7. It shows that the MMNN and MCNN built by the presented memristor-based crossbars and trained in-situ by the two kinds of weight update schemes have advantages in circuit functions and classification results compared with other memristor-based neural network circuits. The memristor crossbar is physically implemented and the activation functions are implemented in software * * "-" means that the indicator is not applicable or the related information is not provided in that paper to conductance variations of memristors is also analyzed. In summary, the memristor-based DNNs constructed by presented memristor crossbars perform satisfactorily in pattern recognition tasks and have certain robustness to imperfections of hardware.

Compliance with ethical standards
Conflict of interest On behalf of all authors, the corresponding author states that there is no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.