Abstract
As the core algorithm of artificial intelligence, deep learning has brought new breakthroughs and opportunities to all walks of life. This paper summarizes the principles of deep learning algorithms such as Autoencoder (AE), Boltzmann Machine (BM), Deep Belief Network (DBM), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Recursive Neural Network (RNN). The characteristics and differences of deep learning frameworks such as Tensorflow, Caffe, Theano and PyTorch are compared and analyzed. Finally, the application and performance of hardware platforms such as CPU and GPU in deep learning acceleration are introduced. In this paper, the development and application of deep learning algorithm, framework and hardware technology can provide reference and basis for the selection of deep learning technology.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The development of deep learning experienced three upsurges: from 1940s to 1960s, the idea of artificial neural network was born in the field of control; from 1980s to 1990s, neural networks were interpreted as connectionism; After entering the 21st century, it was revived in the name of deep learning [1]. The concept of deep learning originates from the research of deep neural network, which is also the core branch of machine learning field. For example, multi-layer perceptron is a simple network learning structure. Generally speaking, deep learning is to realize complex nonlinear mapping by stacking and feature extraction of multi-layer artificial networks. In essence, compared with traditional artificial neural networks, deep learning does not add more complex logical structures, but significantly improves the feature extraction and nonlinear approximation capabilities of the model only by adding hidden layers. Since Hinton formally proposed the concept of “deep learning” [2] in 2006, it immediately triggered a research upsurge in the academic world and the investment of the industry, and many excellent deep learning algorithms began to emerge. For example, during the Visual Recognition Contest (ILSVRC) from 2010 to 2017, CNN demonstrated its powerful image processing capability and confirmed its leading position in the field of computer vision image [3]. In 2016, the intelligent Go program AlphaGo [4] developed by Google defeated the world Go champion Lee Sedol by an absolute advantage. The success of AlphaGo marked the arrival of the era of artificial intelligence with deep learning as the core.
After years of development, the rise of deep learning has led to the creation of common programming frameworks such as Tensorflow, Caffe, Theano, MXNet, PyTorch and Keras, It also promotes the rapid development of AI hardware acceleration platforms and dedicated chips, including GPU, CPU, FPGA and ASIC. This paper focuses on the current research hotspots and mainstream deep learning algorithms in the field of artificial intelligence. The basic principles and applications of Autoencoder (AE), Boltzmann Machine (BM), Deep Belief Network (DBM), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Recursive Neural Network (RNN) are summarized. The performance characteristics and differences of deep learning framework, AI hardware acceleration platform and dedicated chip are compared and analyzed.
2 Deep Learning Algorithms
2.1 Auto-Encoder (AE)
As a special multi-layer perceptron, Auto-encoder (AE) is mainly composed of encoder and decoder [5]. As shown in Fig. 1, the basic Auto-encoder can be regarded as a three-layer neural network, from input ‘x’ to ‘a’ is the process of encoding, and from ‘a’ to ‘y’ is the process of decoding. The learning of auto-encoder is a process to reduce the error between output ‘y’ and input signal ‘x’. The output expectation of Auto-encoder is the input, so it is generally regarded as an unsupervised learning algorithm, mainly used for data dimension reduction or feature extraction. In the training process of neural network, Auto-encoder is often used to determine the initialization parameters of the network. The principle is that if the encoded data can be restored accurately after decoding, the weight of the hidden layer is considered to be able to store the data information better.
The approximation ability of Auto-encoder for input and output is not the stronger the better, especially when the output of Auto-encoder is exactly equal to the input, the process only realizes the replication of the original data, and does not extract the inherent characteristics of the input information. Therefore, in order to enable the Auto-encoder to learn the key features, usually impose some constraints on the Auto-encoder. As a result, a variety of improved Auto-encoder emerged, such as: Sparse Auto-encoder (SAE) makes neurons inactive in most cases by adding penalty items, and the number of nodes in the hidden layer is less than that in the input layer, so as to represent the input data with fewer characteristic parameters [6]. Stack Autoencoders (SAE) make it possible to extract deeper data features by stacking multiple autoencoders in series to deepen the layers of the network [7]; The Denoising Autoencoder (DAE) improves the robustness by adding noise interference during training [8]. Contraction Autoencoder (CAE) can learn mapping relations with stronger contraction by adding regular terms [9]. In addition, Deep Autoencoder (DAE), Stacked Denoised Autoencoder (SDAE), Sparse Stacked Autoencoder (SSAE), etc. [10,11,12].
2.2 Boltzmann Machine
Boltzmann Machine (BM) is a generative random neural network proposed by Hinton [13]. Traditional BM does not have the concept of layers, and its neurons are in a fully connected state, which is divided into visible unit and hidden unit. These two parts are binary variables, and the state can only be 0 or 1. Due to the complexity of the fully connected structure of BM, the variant of BM - Restricted Boltzmann machine is widely used at present (Fig. 2).
Restricted Boltzmann Machine (RBM) was first proposed by Smolensky [14] and has been widely used in data dimension reduction, feature extraction, classification and collaborative filtering. RBM is a shallow network similar to BM in structure, the difference is that RBM cancels the connection between layers and the neurons between layers do not affect each other, thus simplifying the model.
2.3 Deep Boltzmann Machine and Deep Belief Network
Deep Boltzmann Machine (DBM) is a model composed of multiple Restricted Boltzmann Machine, and the network layers are bidirectional connections [15]. Compared with RBM, DBM can learn higher-order features from unlabeled data and has better robustness, so it is suitable for target recognition and speech recognition.
Deep Belief Network (DBN) is also a deep neural network composed of multiple RBM, which differs from DBM in that only the network layer at the output part of RBM is bidirectional propagation [16]. Different from general neural models, DBM aims at establishing joint distribution between data and expected output, to make the network generate the expected output as much as possible, so as to extract and restore data features more abstractly. DBN is a practical deep learning algorithm, and its excellent scalability and compatibility have been proved in the application of feature recognition, data classification, speech recognition and image processing. For example, the combination of DBN and Multi-layer Perceptron (MLP) has good performance in facial expression recognition [17]. The combination of DBN and Support Vector Machine (SVM) has excellent performance in text classification [18].
2.4 Convolutional Neural Network
Convolutional Neural Network (CNN) was originally a deep learning algorithm derived from the discovery of ‘Receptive Field’ [19], which has excellent ability in image feature extraction. With the successful application of Lenet-5 model in the field of handwritten number recognition, scholars from all walks of life began to study the application of CNN in the fields of speech and image. In 2012, The AlexNet model proposed by Krizhevsky beats many excellent neural network models in the Image Net Image classification competition, which also pushed the application research of CNN to a climax [20] (Fig. 3).
Convolutional neural network [21]
Convolutional neural network is mainly composed of input layer, convolutional layer, excitation layer, pooling layer, full connection layer and output layer, among which the convolutional layer and pooling layer are the core structure of CNN. Different from other deep learning algorithms, CNN mainly uses convolution kernel (filter) for convolution calculation, and uses pooling layer to reduce inter-layer connections to further extract features. It obtains high-level features through repeated extraction and compression of features, and then uses the output for classification and regression.
Weight sharing mechanism and local perception field are two major features of CNN. They have similar functions with pooling layer and can reduce the risk of overfitting by reducing inter-layer connections and network parameters. Weight sharing means that a filter will be used multiple times, it will slide across the feature surface and do multiple convolution computations [22]. Local perception field is inspired by the process of human observing the outside world, which is from the local to the whole. Therefore, a single filter does not need to perceive the whole, but only needs to extract local features and summarize them at a higher level.
In recent years, CNN has gradually emerged in various industries, such as Alphago, speech recognition, natural language processing, image generation and face recognition, etc. [23,24,25,26]. At the same time, many improved CNN models were born, such as VGG, ResNet, GoogLeNet and MobileNet.
VGG.
In 2014, Simonyan and Zisserman [27] proposed the VGGmodel, it won the first prize in positioning task and the second prize in classification task in the ImageNet Challenge. In order to improve the fitting ability, the network layer of VGG is increased to 19 layers, and the convolution kernel with small receptive field (3 × 3) is used to replace the large one (5 × 5 or 7 × 7), thus increasing the nonlinear expression ability of the network.
ResNet.
VGG proved that the deep network structure can effectively improve the fitting ability of the model, but the deeper network tends to cause gradient dispersion, which makes the network unable to converge. In 2015, Kaiming [28] proposed ResNet, which effectively alleviated the problem of neural network degradation, and won the first prize of classification, positioning, detection and segmentation tasks with absolute superiority in ILSVRC and COCO competitions. To solve the problem of gradient disappearance, Kaiming introduces a Residual Block structure in the network, which enables the model use Shortcut to implement Identity Mapping.
GoogLeNet.
To solve the problem of too many parameters in large-scale network model, Google proposed Inception V1 [29] network architecture in 2014 and constructed GoogLeNet, which won the first prize in the ImageNet Challenge classification and detection task in the same year. Inception V1 abandons the full connection layer and changes the convolutional layer to a sparse network structure, that results in a significant reduction of the network parameters. In 2015, Google proposed Batch Normalization operation and improved the original GoogLeNet based on this technology, obtained a better model—Inception V2 [30]. In the same year, Inception V3 [31] is also born. Its core idea is to decompose the convolution kernel into smaller convolution, such as splitting 7 × 7 into 1 × 7 and 7 × 1, to further reduce network parameters. In 2016, Google launched Inception V4 by combining Inception and ResNet, which has been improved in training speed and performance [32]. When the number of filters is too large (More than 1000), the training of Inception V4 will become unstable, but it can be alleviated by adding an Activate Scaling factor.
MobileNet.
In recent years, in order to promote the combination of neural network model and mobile devices, neural network model began to develop towards the direction of lightweight. In 2017, Google designs MobileNet V1 by Depthwise Convolution [33] and allows users to change the network width and input resolution, thus achieving a tradeoff between latency and accuracy. In 2018, Google introduced The Inverted Residuals and Linear Bottlenecks on the basis of MobileNet V1, and put forward MobileNet V2 [34]. In 2019, Google proposed MobileNet V3 by combining Depthwise Convolution, Inverted Residuals and Linear Bottlenecks [35]. It is proved that MobileNet has excellent performance in multi-objective tasks, such as classification, target detection and semantic segmentation.
2.5 Recurrent Neural Network
Recurrent neural network (RNN) is a kind of deep learning model that is good at dealing with time series. RNN expands neurons at each layer in time dimension, realizes forward transmission of data in the network through sequential input of information, and stores information in ‘long-term memory unit’ to establish sequential relations between data.
As shown in Fig. 4, RNN reduces the computation of the network by sharing parameters (W, U, V). RNN mainly uses Back Propagation Through Time algorithm [36] to update the parameters of each node. Its forward Propagation can be expressed as:
Although RNN can consider the correlation between information, traditional RNN is usually difficult to achieve long-term preservation of information. Due to the excitation function and multiplication, when RNN has a large number of network layers or a long time sequence of data, sometimes the gradient will grow or decay exponentially with iteration, resulting in gradient disappearance and gradient explosion [37].
LSTM.
In order to solve the shortcomings of traditional RNN, Hochreiter [38] proposed LSTM. LSTM introduces three types of gated units in RNN to realize information extraction, abandoned and long-term storage, which not only improves the problems of gradient disappearance and excessive gradient, but also improves the long-term storage capacity of RNN for information. Each memory cell in the LSTM contains one cell and three gates. A basic structure is shown in the Fig. 5: In the three types of gating units, input gate is used to control the proportion of the current input data X(t) into the network; Forget gate is used to control the extent to which the long-term memory unit abandons information when passing through each neuron. Output gate is used to control the output of the current neuron and the input to the next neuron.
Three types of gate control units are shown:
The calculation of Cell is shown:
The calculation of long-term memory unit C and hidden layer output h are as follows:
LSTM memory cell [39]
LSTM has many excellent variants, of which the more successful improvement is the bi-directional LSTM. Bi-directional LSTM realizes the simultaneous utilization of past and future information through two-way propagation of data in the time dimension [40]. In some problems, its prediction performance is better than one-way LSTM. Greff [39] discussed the performance of 8 variants based on Vanilla LSTM, and conducted experimental comparisons in the three fields of TIMIT speech recognition, handwritten character recognition and polyphonic music modeling. The results showed that the performance of 8 variants did not significantly improve; Forgetting gate and output gate are the two most important parts of LSTM model, and the combination of these two gate units can not only simplify the LSTM structure, but also will not reduce the performance.
GRU.
As a simplified model of LSTM, GRU only uses two gating units to save and forget information, including update gate for input and forget, and reset gate for output [41]. GRU replaces forget gate and Input gate with Update gate compared with LSTM, simplifying structure and reducing computation without reducing performance. At present, there is no final conclusion to show the performance of LSTM and GRU, but a large number of practices have proved that the performance of the two network models is often similar in general problems [42].
2.6 Recursive Neural Network
Recursive neural network is a deep learning model with tree-like hierarchical structure, its information will be collected layer by layer from the end of the branch, and finally reach root end, that is, to establish the connection between information from the spatial dimension. Compared with recurrent neural network, recursive neural network can map words and sentences expressing different semantics into a vector space, and use the distance between statements to determine semantics [43], rather than just considering word order relations. Recursive neural networks have powerful natural language processing capabilities, but constructing such tree-structured networks requires manual annotation of sentences or words as parsing trees, which is relatively expensive (Fig. 6).
Syntax parse tree and natural scene parse tree [44]
3 Deep Learning Framework
In the early stage of the development of deep learning, in order to simplify the process of model building and avoid repeated work, some researchers or institutions packaged codes that could realize basic functions into frameworks for the public to use. Currently, commonly used deep learning frameworks include Tensorflow, Caffe, Theano, MXNet, PyTorch, Keras, etc.
3.1 Tensorflow
Tensorflow is an open source framework for machine learning and deep learning developed by Google. It uses the form of a Data Flow Graph to build models and provides TF. Gradients for quickly calculating gradients. Tensorflow is highly flexible and portable, it supports multiple language interfaces such as Python and C++. It can not only be deployed on servers with multiple cpus and gpus, but also run on mobile phones [48]. Therefore, Tensorflow is widely used in many fields such as voice and image. Although it is not superior to other frameworks in terms of running speed and memory consumption, it is relatively complete in terms of theory, functions, tutorials and peripheral services, which is suitable for most deep learning beginners.
3.2 Caffe
Caffe is an open source framework for deep learning, and is maintained by Berkeley Vision Center (BVLC). Caffe can flexibly modify and design new network layers according to different requirements, and is very suitable for modeling deep convolutional neural networks [49]. Caffe has demonstrated excellent image processing skills in ImageNet competitions and has become one of the most popular frameworks in computer vision. Caffe’s models are usually implemented in text form, which is easy to learn. In addition, Caffe can use GPU for training acceleration through Nvidia’s CUDA architecture and cuDNN accelerators. However, Caffe is not flexible enough to modify or add the network layer, and is not good at dealing with language modeling problems.
3.3 Theano
Theano is an efficient and convenient mathematical compiler developed by the Polytechnic Institute of Montreal, it is the first architecture to use symbolic tensor diagrams to build network models. Theano is a framework developed based on Python that relies on the Numpy toolkit, and is well suited for large-scale deep learning algorithm design and modeling, especially for language modeling problems [50]. Theano’s disadvantages are also obvious, it is slow to run both as a toolkit import and during its compilation, and the framework is currently out of development, so it is not recommended as a research tool.
3.4 MXNet
MXNet is a deep learning framework used and maintained by Amazon officially. It has a flexible and efficient programming mode, supporting both imperative and symbolic compilation methods [51], and can perfectly combine the two methods to provide users with a more comfortable programming environment. MXNet has many advantages. It not only supports distributed training of multiple CPU/GPU, but also can realize true portability of micro-devices from servers and workstations to smart phones. In addition, MXNet supports JavaScript, Python, Matlab, C++ and other languages, which can meet the needs of different users. However, MXNet is not widely used by the community due to the difficulty of getting started and the incomplete tutorials.
3.5 PyTorch
Facebook introduced the Torch framework early on, but it struggled to meet market demand due to its lack of support for the Python interface. Instead, Facebook built Pytorch, a deep learning framework specifically designed for Python programming and GPU acceleration [52, 53]. Pytorch uses a dynamic data flow diagram to build the model, giving users the flexibility to modify the diagram. Pytorch is highly efficient at encapsulating code and runs faster than frameworks such as TensorFlow and Keras, and providing users with a more user-friendly programming environment than other frameworks.
3.6 Keras
Keras is a neural network library derived from Theano. The framework is mainly developed based on Python language and has a complete function chain in the construction, debugging, verification and application of deep learning algorithms. Keras architecture is designed for object-oriented programming, which encapsulates many functions in a modular manner, simplifying the process of building complex models. Meanwhile, Keras is compatible with Tensorflow and Theano’s deep learning software package, which supports most of the major algorithms including convolution and cyclic neural networks (Table 1).
4 Hardware Platform and Dedicated Chip
4.1 CPU
CPU is one of the core parts of the computer, usually composed of control parts, logic parts and registers, its main function is to read, execute computer instructions and process data. As a general-purpose chip, CPU is originally designed to be compatible with all kinds of data processing and computation, and it is not a special processor for neural network training and acceleration. There are a lot of matrix and vector calculations in the training process of deep network, and the computing efficiency is not high by using CPU, and upgrading CPU to improve performance is not cost-effective. Therefore, CPU is generally only suitable for small-scale network training.
4.2 GPU
In 1999, NVIDIA launched GeForce-256 as its first commercial GPU, and began working on developing high-performance GPU technology in the early 2000s. In 2004, gpus evolved to the point where they could carry early neural network computing. In 2006, Kumar Chellapilla [54] successfully used GPU to accelerate CNN, which was the earliest known attempt to use GPU for deep learning.
GPU is a microprocessor specially used for processing image calculation. Different from the generality of CPU, GPU focuses on the calculation of complex matrix and geometric problems, especially good at processing image problems [55]. In the face of complex deep learning model, GPU can greatly increase the training speed. For example, Coates [56] used GPU for training acceleration in the target detection system, which increased its running speed by nearly 90 times. Currently, companies such as Nvidia and Qualcomm have advanced capabilities in developing GPU hardware and acceleration technologies, and support multiple programming languages and frameworks. For example, Pytorch can use the GPU to help model training through CUDA and cuDNN that developed by Nvidia, which can significantly reduce network training time.
4.3 ASIC
ASIC is a professional chip with extremely high flexibility. Its performance can be customized according to actual problems to meet different computing power requirements. Therefore, when dealing with deep learning problems, its performance and power consumption are far higher than CPU, GPU and other general chips. For example, TPU [57], launched by Google in 2015, is a very representative integrated circuit chip. It has been proved that its execution speed and efficiency are dozens of times higher than CPU and GPU. It has been applied and promoted in Google’s search map, browser and translation software. In recent years, Google has continuously released the second and third generation of TPU and TPU Pod [58], which not only greatly improves chip performance, but also extends its application to the broader field of artificial intelligence. In addition, the Cambrian series chips [59] proposed by The Chinese Academy of Sciences also have great advantages in improving the running speed of neural networks. ASIC has broader development prospects and application value, but due to long development cycle, high investment risk and high technical requirements, only a few companies have the development ability at present.
4.4 FPGA
FPGA, also known as field programmable gate array, is a variable circuit derived from custom integrated circuit (ASIC) technology. FPGA directly operates through gate circuit, which not only has high speed and flexibility, but also enables users to meet different needs by changing the wiring between internal gate circuits [60]. FPGA generally have lower performance than ASIC, but their development cycle is shorter, risk is lower, and cost is also relatively lower. When processing specific tasks, the efficiency can be further improved through parallel computing. Although FPGA has many advantages and can better adapt to rapidly developing deep learning algorithms, it is not recommended for individual users or small companies to use due to their high cost and difficulty (Table 2).
5 Conclusion
Around the current popular research fields in artificial intelligence, this paper summarizes the basic principles and application scenarios of current mainstream deep learning algorithms, introduces and compares common deep learning programming frameworks, hardware acceleration platforms and dedicated chips. Obviously, deep learning algorithms are in a stage of rapid development, and also promote the rise of its surrounding industries. However, problems such as single model type and insufficient algorithm performance also limit the development of some industries, so how to innovate and improve new algorithms is still the focus of future research. In addition, the intelligence of deep learning algorithm also brings a lot of convenience to our daily life, but its application is not widely at present. That mean how to promote and utilize deep learning more efficiently is still a long way to go.
References
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press, Cambridge (2016)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Fu, M.C.: AlphaGo and Monte Carlo tree search: the simulation optimization perspective. In: Winter Simulation Conference Proceedings, vol. 26, pp. 659–670 (2016)
Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length and Helmholtz free energy. Adv. Neural Inf. Process. Syst. 6, 3–10 (1993)
Le, Q.V., Ngiam, J., Coates, A., et al.: On optimization methods for deep learning. In: International Conference on Machine Learning. DBLP (2011)
Scholkopf, B., Platt, J., Hofmann, T.: Greedy layer-wise training of deep networks. Adv. Neural Inf. Process. Syst. 19, 153–160 (2007)
Vincent, P., Larochelle, H., Bengio, et al.: Extracting and composing robust features with denoising autoencoders. In: International Conference on Machine Learning. ACM (2008)
Rifai S, Vincent P, Muller X, et al.: Contractive auto-encoders: explicit invariance during feature extraction. In: ICML, vol. 6, pp. 26–46 (2011)
Ma, X., Wang, H., Jie, G.: Spectral-spatial classification of hyperspectral image based on deep auto-encoder. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 9(9), 1–13 (2016)
Vincent, P., Larochelle, H., Lajoie, I., et al.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(12), 3371–3408 (2010)
Jiang, X., Zhang, Y., Zhang, W., et al.: A novel sparse auto-encoder for deep unsupervised learning. In: Sixth International Conference on Advanced Computational Intelligence, vol. 26, pp. 256–261. IEEE (2013)
Hinton, G.E.: Learning and relearning in Boltzmann machines. Parallel Distrib. Process.: Explor. Microstruct. Cogn. 1, 2 (1986)
Smolensky, P.: Restricted Boltzmann machine. Stellenbosch Stellenbosch Univ. 16(04), 142–167 (2014)
Salakhutdinov, R., Hinton, G.E.: Deep Boltzmann Machines. J. Mach. Learn. Res. 5(2), 1967–2006 (2009)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2014)
Shi, X.G., Zhang, S.Q., Zhao, X.M.: Face expression recognition based on deep belief network and multi-layer perceptron. J. Small Micro Comput. Syst. 36(07) (2015). (in Chinese)
Tao, L.: A novel text classification approach based on deep belief network. In: Neural Information Processing Theory & Algorithms-international Conference. DBLP (2010)
Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160(1), 106–154 (1962)
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25(2), 22–27 (2012)
Yi, S., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: IEEE Conference on Computer Vision & Pattern Recognition. IEEE (2014)
Lecun, Y., Boser, B., Denker, J., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (2014)
Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Abdel-Hamid, O., Mohamed, A.-R., et al.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 22(10), 1533–1545 (2014)
Donahue, J., Hendricks, L.A., Rohrbach, M., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 677–691. IEEE (2017)
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering, pp. 815–823 IEEE (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2014)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. IEEE Computer Society (2014)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Bach, F., Blei, D. (eds.) Proceedings of Machine Learning Research, vol. 37, pp. 448–456 (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., et al.: Rethinking the inception architecture for computer vision. In: Proceedings of IEEE, pp. 2818–2826. IEEE (2016)
Szegedy, C., Ioffe, S., Vanhoucke, V., et al.: Inception-v4, inception-ResNet and the impact of residual connections on learning (2016)
Howard, A.G., Zhu, M., Chen, B., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017)
Sandler, M., Howard, A., Zhu, M., et al.: MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2018)
Howard, A., Sandler, M., Chu, G., et al.: Searching for MobileNetV3 (2019)
Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)
Bengio, Y.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Greff, K., Srivastava, R.K., Koutník, J., et al.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2016)
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
Cho, K., Merrienboer, B.V., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. Comput. Sci. 22(10), 21–33 (2014)
Chung, J., Gulcehre, C., Cho, K.H., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. Eprint Arxiv, vol. 32, no. 18, pp. 119–132 (2014)
Ying, X., Le, L., Zhou, Y., et al.: Deep learning for natural language processing. Handb. Stat. 56(20), 221–231 (2018)
Socher, R., Lin, C.Y., Ng, A.Y., et al.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28–July 2 (2011)
Abadi, M.: TensorFlow: learning functions at scale. In: ACM SIGPLAN International Conference on Functional Programming, pp. 1–12. ACM (2016)
Jia, Y., Shelhamer, E., Donahue, J., et al.: Caffe: convolutional architecture for fast feature embedding, pp. 144–156. ACM (2014)
Al-Rfou, R., Alain, G., et al.: Theano: a Python framework for fast computation of mathematical expressions, vol. 122, no. 05, pp. 1022–1034 (2016)
Chen, T., Li, M., Li, Y., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. Statistics (2015)
Ketkar, N.: Introduction to PyTorch (2017)
Sen, S., Sawant, K.: Face mask detection for covid_19 pandemic using pytorch in deep learning. In: IOP Conference Series: Materials Science and Engineering, vol. 1070, no. 1 (2021)
Chellapilla, K., Puri, S., Simard, P.: High performance convolutional neural networks for document processing. In: Tenth International Workshop on Frontiers in Handwriting Recognition (2006)
Shenyan: Radio and Television Information, no. 10, pp. 64–68 (2017)
Coates, A., Baumstarck, P., Le, Q., et al.: Scalable learning for object detection with GPU hardware. In: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009. IEEE (2009)
David, K.: Google TPU boosts machine learning. Microprocess. Rep. 31(5), 18–21 (2017)
Kumar, S., Bitorff, V., Chen, D., et al.: Scale MLPerf-0.6 models on Google TPU-v3 Pods, vol. 56, no. 12, pp. 81–89 (2019)
Editorial Department of the Journal: Cambrian released the first cloud artificial intelligence chip in China. Henan Sci. Technol. 647(14), 0–9 (2018)
Wei, J., Lin, J.: Deep learning algorithm, hardware technology and its application in future military. Electron. Packag. (12) (2019)
Zhang, W.: Deep neural network hardware benchmark testing status and development trend. Inf. Commun. Technol. Policy (012), 74–78 (2019)
Acknowledgements
The research work of this paper is supported by the National Nature Science Foundation of China (51978015 & 51578024).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this paper
Cite this paper
Ji, J., Hu, Z., Zhang, W., Yang, S. (2022). Development of Deep Learning Algorithms, Frameworks and Hardwares. In: Qian, Z., Jabbar, M., Li, X. (eds) Proceeding of 2021 International Conference on Wireless Communications, Networking and Applications. WCNA 2021. Lecture Notes in Electrical Engineering. Springer, Singapore. https://doi.org/10.1007/978-981-19-2456-9_71
Download citation
DOI: https://doi.org/10.1007/978-981-19-2456-9_71
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2455-2
Online ISBN: 978-981-19-2456-9
eBook Packages: EngineeringEngineering (R0)