This chapter firstly introduces the development frameworks that are widely used in deep learning and their characteristics, and illustrates one of the representative frameworks, TensorFlow, in detail. This chapter aims at helping readers to deepen their understanding of through practices after learning the concept at the theoretical level, and to tackle the practical problems. In the latter part of this chapter, the MindSpore framework developed by Huawei is introduced. The framework features some advantages that many of today’s frameworks cannot outperform. After reading this chapter, our readers can decide whether to read this section based on their own needs.

4.1 Introduction to Deep Learning Frameworks

4.1.1 Introduction to PyTorch

PyTorch is a deep learning development framework launched by Facebook. It is a scientific computing package of machine learning that grew out of Torch. Torch is a scientific computing framework supported by a large amount of machine learning algorithms and a tensor operation library similar to NumPy. Although Torch is characterized by its exceptional flexibility, it adopts Lua, an unpopular programming language, prohibiting it from being widely used. Therefore, the Python-based PyTorch is introduced.

PyTorch has the following characteristics.

  1. 1.

    Put Python First

    PyTorch does not simply construct a Python binding to an entire C++ framework. It supports the Python access at a fine-grained level. You can just use PyTorch as swiftly as using NumPy or SciPy, which not only helps users to understand Python much more easily, but also guarantees that the code is basically in consistency with the native Python code.

  2. 2.

    Dynamic Neural Network

    Many mainstream frameworks today do not support dynamic neural networks, such as TensorFlow 1.x. The running of TensorFlow 1.x requires constructing static computational graphs in advance, and then repeatedly running the graphs through the feed and the run () method. But running PyTorch is much less complicated. The PyTorch programs can dynamically construct and adjust computational graphs at runtime.

  3. 3.

    Easy to Debug

    PyTorch can generate dynamic graphs while it is running. Therefore, developers can terminate the interpreter in the debugger session and check the output at a certain node.

Meanwhile, tensors that PyTorch provides to support CPU and GPU can also substantially accelerate the computation.

4.1.2 Introduction to MindSpore

Huawei developed the essential architecture of the MindSpore framework on the basis of user-friendliness, efficiency and flexibility, which consists of four layers: the top layer is MindSpore’s expression of native computational graphs; the second layer is the parallel Pipeline execution layer, mainly designed to optimize the depth image computation and the operator fusion; the third layer includes an on-demand collaborative distribution architecture, a communication library, and a basic framework of scheduling and distributed deployment of tasks; and what at the bottom is the execution efficiency layer. The core architecture of MindSpore enables auto-differentiation, automatic parallelization and automatic tuning, offering solid support for the all-scenario API that is in line with Huawei’s design pursuits of development-friendliness, efficient operation and flexible deployment.

What lies at the core and plays a decisive role in the programming paradigm of an AI framework is the automatic differentiation technology of the framework. A deep learning model is trained through forward and backward computation. Let’s look at the mathematical expression shown in Fig. 4.1. The process of forward computation of this equation suggested by the dark arrow in Fig. 4.1 which finds the output f through the forward computation, and gets the differential values of x and y through backward computation based on the chain rule. When the algorithm engineers design the model, they only cover the forward computation, and the backward computation is implemented by the automatic differentiation technology that the framework features.

Fig. 4.1
figure 1

Mathematical expression

What is more, with the expansion of the NPL models, the memory overhead of training large models such as Bert (340M) and GPT-2 (1542M) has exceeded the single-card capacity, thus the model will need to be distributed into multiple GPUs for execution. At present, the commonly adopted method is hand-engineered model parallelism, which entails designing model partitioning and being aware cluster topology. Developing such a model is very difficult, not mentioning to ensure high performance and tuning.

MindSpore adopts the automatic graph partitioning, which partitions the graphs based on operator’s input/output dimensions, and combines data parallelism with model parallelism. It uses cluster topology-aware scheduling and minimizes the communication overhead through awareness of cluster topology and automatically scheduling subgraph execution. It can keep the logic of the standalone code and put in place the model parallelism, enhancing the development efficiency of the model parallelism by ten times compared to the hand-engineered parallelism.

Currently, the model execution in the context of super computing power chips faces huge challenges, such as memory wall problem, mounting interaction overhead and troublesome data supply. As part of the operation is executed on the host and part is executed on the terminal device, the host-device interaction overhead has even surmounted the execution overhead, reducing the accelerator occupancy.

MindSpore offers the chip-oriented depth graph optimization featuring the wait-reduced synchronization and maximized parallelism of “data-computation-communication”, which makes the data and computation graph sink to the Ascend AI processor.

MindSpore also uses the on-device execution method for decentralization. By implementing the gradient data-driven self-adaptive graph partitioning optimization, MindSpore realizes the decentralization of the independent AllReduce, synthesizes the speed of gradient aggregation and sufficiently pipelines computation and communication.

An on-demand collaborative distributed architecture that synergizing the device edge and cloud is also adopted by the MindSpore. The intermediate representation (IR) of the collective model guarantees a consistent deployment experience, and blocks scene differences through the graph optimization technology that collaborate the software and hardware. The device-cloud collaborative federal meta learning strategy breaks the boundary between device and cloud, and realizes the real-time update of the multi-device collaborative model.

4.1.3 Introduction to TensorFlow

TensorFlow is a deep learning framework developed by Google. It is the second generation of the open-source software library designed for digital computation by Google. The framework can support a wide range of deep learning algorithms and platforms featuring a relatively high system stability.

TensorFlow has the following characteristics.

  1. 1.

    Support Multiple Platforms

    All the platforms of Python environments can support TensorFlow. But TensorFlow has to access the supported GPU through another software to such as the NVIDIA CUDA Toolkit and cuDNN.

  2. 2.

    Support GPU

    TensorFlow supports specific NVIDIA GPUs compatible with the related version of the CUDA toolkit that meets specific performance criteria.

  3. 3.

    Support Distributed Computation

    TensorFlow supports distributed computation. It allows portions of the graph to be computed on different processes, which may be on completely different servers.

  4. 4.

    Support Multiple Languages

    The major programming language of TensorFlow is Python. Developers can also use C++, Java, and Go but these languages have no stability promises, as are many third-party bindings for C#, Haskell, Julia, Rust, Ruby, Scala, R and even PHP. Google recently released TensorFlow-Lite library optimized for mobile devices so as to run TensorFlow applications on the Android system.

  5. 5.

    Flexible and Expandable

    One of the major advantages of using TensorFlow is that it has a modular, extensible, and flexible design. Developers can easily move models across CPU, GPU, or TPU processors by making a few modification of the code. Python developers can use the TensorFlow raw, low-level API (or core API) to develop their own models, and use the high-level APIs for built-in models. TensorFlow has many built-in libraries and distributed libraries, and it can overlay an advanced deep learning framework of higher-level such as Keras to serve as a high-level API.

  6. 6.

    Strong Computing Performance

    Although TensorFlow performs best on Google’s Tensor Processing Units (TPUs), it also manages to attain higher performance on other platforms, not just servers and desktop systems, but also embedded systems and mobile devices.

    TensorFlow’s distributed deployment allows it to run on different computer systems. Training models can be generated in real-time on a system either as small as a smart phone or as big as a cluster of computers. The Windows environment is built on the single-GPU mode, and most of the deep learning frameworks rely on cuDNN. So as long as there is no obvious discrepancy in the hardware computing power or memory allocation, the training speed of these frameworks will not differ from each other too much. But for large-scale deep learning, the huge amount of data will make it hard for the single machine to complete training in time. But TensorFlow supports distributed training.

    TensorFlow is believed to be one of the most user-friendly libraries for deep learning. With the help of TensorFlow, deep learning development will become a much easier task. Its open-source feature makes it possible for everyone to maintain and update TensorFlow so as to improve its efficiency.

    Keras, which receives the third most Stars (namely, being tagged) on GitHub, is encapsulated as a high-level API for Tenser Flow 2.0. Thanks to Keras, TensorFlow 2.0 becomes more flexible and easier to debug.

    In TensorFlow 1.0, after a tensor is created, you cannot return to the result directly, but to create session including the graph concept, and you need to execute session.run for operation. This style is more like the hardware programming language VHDL. Compared with the simpler frameworks such as PyTorch, the above unnecessary steps required by TensorFlow 1.0 are pointless except for creating more hurdles for developers in usage. TensorFlow 1.0 is often criticized for its complicated debugging experience, confusing API, and not easy to get started with. And TensorFlow 1.0 is still difficult to use even for the informed developers, thus many developers turned to PyTorch.

4.2 TensorFlow 2.0 Basics

4.2.1 Introduction to TensorFlow 2.0

The core function of TensorFlow 2.0 is eager execution, a dynamic graph mechanism which allows users to write and debug models just as the normal programming does, facilitating the learning and application of TensorFlow. TensorFlow 2.0 bears more platforms and languages, and improves compatibility across components by standardizing exchange formats aligning APIs. The version 2.0 of TensorFlow cleans up deprecated APIs and reduces the duplicated APIs so as to not confuse the users. TensorFlow 2.0 provides modules that are compatible with TensorFlow 1.x, and tf.contrib module will no longer be used, with the maintained modules moved to other places, and the rest deleted.

4.2.2 Introduction to Tensors

The most fundamental data structures in TensorFlow are tensors, which encapsulate all the data.

According to the definition, a tensor is a multi-dimensional array. Among which, a rank 0 tensor is a scalar, a rank 1 tensor is a vector, and a rank 2 tensor is a matrix. In TensorFlow, tensors can be divided into constants and variables.

4.2.3 TensorFlow 2.0 Eager Execution

TensorFlow 1.0 adopts a static graph mechanism which separates the definition of computations from their execution via graph (also known as computational graph). This is a sort of declarative programming model. Under the static graph mechanism, you need first to build a graph, then run a session, and input the data to get the execution result.

The static graph mechanism has many advantages in distributed training, performance optimization and deployment, but it is not easy to use in debugging, just like calling from a compiled C language program where we cannot perform internal debugging. Therefore, the dynamic graph-based (AutoGraph) eager execution is introduced.

Eager execution is an imperative programming environment consistent with native Python. The execution results will be immediately returned after an operation is performed.

4.2.4 TensorFlow 2.0 AutoGraph

In TensorFlow 2.0, eager execution is enabled by default. For users, eager execution is straightforward and flexible, featuring easier and faster operation. But this may be achieved at the cost of performance and deployment.

To obtain the best performance and ensure that the model can be deployed anywhere, we can apply the @tf.function decorator to build the graph in the program, enhancing the efficiency of the Python code.

A very cool feature of tf.function is AutoGraph, which can convert the TensorFlow operation function into a graph, thus executing the function in Graph mode. In this way, the function is encapsulated into a graph of TensorFlow operation.

4.3 Introduction to TensorFlow 2.0 Module

4.3.1 Introduction to Common Modules

Functions under the TensorFlow 2.0 tf.modules are designed to handle common operations.

For instances, most of the operations in tf.abs (computing the absolute value), tf.add (element-by-element addition), tf.concat (concatenation of tensors), etc. can be fulfilled by NumPy.

The tf.modules also include:

  1. 1.

    tf.errors: exception types for TensorFlow errors.

  2. 2.

    tf.data: perform the operation on dataset. For example, use the input pipeline created by tf.data to read training data. The module also supports easy input of data from memory (such as NumPy).

  3. 3.

    tf.gfile: perform the operation on file. The functions under this module can perform file I/O operations, and copy and rename the file.

  4. 4.

    tf.image: perform the operation on image. The functions under this module can process images like OpenCV, featuring a series of functions such as image brightness, saturation, inversion, cropping, resizing, image format conversion (from RGB format to HSV, YUV, YIQ, Gray formats), rotation and Sobel edge detection. It is equal to a small-scale OpenCV image processing toolkit.

  5. 5.

    tf.keras: call a Python API from the Keras tool. This module is a relatively large, which contains all kinds of operations of the network.

  6. 6.

    tf.nn: the functional support module of the neural network. It is the most commonly used module for building classic convolutional networks. It also contains the rnn_cell sub-module, which is applied in recurrent neural network building. The frequently used functions in this module include: average pooling avg_pool(), batch normalization batch_normalization(), adding bias bias_add(), two-dimensional convolution conv2d(), random dropout neural network cell dropout(), ReLu activation layer relu(), sigmoid cross entropy after activation sigmoid_cross_entropy_with_logits(), softmax activation layer softmax().

4.3.2 Keras Interface

Keras is the program recommended in TensorFlow 2.0 for network building. The keras.layer module has included all the popular neural networks.

Keras is a high-level API designed for building and training deep learning models. It can be used for fast prototyping, advanced research and production. Keras has the following three major advantages.

  1. 1.

    User-Friendly

    Keras has a simple and consistent interface optimized for common use case which provides clear and actionable feedback for user errors.

  2. 2.

    Modular and Composable

    You can build Keras models by connecting configurable building blocks together, with almost no restrictions.

  3. 3.

    Easy to Extend

    You can write custom building blocks to express new research ideas, create new layers, loss functions, and develop advanced models.

Following are the modules commonly used in Keras.

  1. 1.

    tf.keras.layers

    The tf.keras.layers namespace provides extensive interfaces for common network layers, such as fully connected layer, activation function layer, pooling layer, convolutional layer, recurrent neural network layer, etc. For these network layer classes, the forward computation can be accomplished by specifying the relevant parameters of the network layers when creating them, and calling the _call_ method. Keras will automatically call the forward propagation logic of each layer while calling the _call_method, which is implemented by the call function of the class.

  2. 2.

    Network Container

    In the often-used networks, users need to manually call the class instances of each layer to accomplish the forward propagation. When the number of network layers go deeper, the code of this part will become very redundant. The network container Sequential provided by Keras can encapsulate the multiple network layers into a big network model, where users only need to call instance of the network model once to complete the sequential operation of the data from the first layer to the last.

4.4 Get Started with TensorFlow 2.0

4.4.1 Environment Setup

To set up an TensorFlow 2.0 development environment, you need to do as the follows.

  1. 1.

    Setup in Windows Environment

    1. (a)

      Operating system: Windows 10.

    2. (b)

      Python development environment: Anaconda3 (a version adapted to Python 3) equipped with pip software.

    Install TensorFlow: Open Anaconda Prompt, and install TensorFlow directly executing the pip command.

    Fig. 4.2
    figure 2

    Installation command

    As shown in Fig. 4.2, type the following command in Anaconda Prompt.

    pip install tensorflow

  2. 2.

    Linux Environment Setup

    The simplest way to install TensorFlow in a Linux environment is using pip. If the installation goes on slowly, you can use Tsinghua Open Source Mirror to execute the following command in the terminal.

    pip install pip-U pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

    At last, run the pip installation command.

    pip install tensorflow==2.0.0

4.4.2 Development Process

The development process of TensorFlow 2.0 includes five steps.

  1. 1.

    Data preparation, including data exploration and data processing.

  2. 2.

    Building the network, including defining the network structure, defining the loss function, choosing the optimizer, and defining the model evaluation standards.

  3. 3.

    Training and validating the model.

  4. 4.

    Saving the model.

  5. 5.

    Restoring and calling the model.

The above-mentioned process will be further elaborated in the following passage by a real project—MNIST handwritten digit recognition.

The handwritten digit recognition is a common task of image recognition where computers are required to recognize digits from pictures of handwritten digits. Unlike the print, the handwriting of different people has different styles, and the size of handwritten digits varies from one to another, which makes the recognition of handwritten digits by computers much harder. In this project, deep learning and TensorFlow framework are employed to perform a handwritten digit recognition on the MNIST dataset.

  1. 1.

    Data Preparation

    Download the MNIST dataset.

    The MNIST dataset is composed of a training set and a test set.

    The training set contains 60,000 images of handwritten digits and the corresponding labels.

    The test set contains 10,000 images of handwritten digits and the corresponding labels.

    Figure 4.3 shows an example from the dataset.

  2. 2.

    Building the Network

    The activation function used in this project is the Softmax regression model. The Softmax function is also known as the normalized exponential function, which is a derivative of binary function Sigmoid in multiclass classification. Figure 4.4 shows how is the Softmax function computed.

    The process of model building is the key of network construction. Figure 4.5 shows a model’s calculation process, defining how the model is built, and how the output is generated based on the input.

    The quintessential code for TensorFlow to implement the Softmax regression model is presented in Fig. 4.6.

    Creating a model mainly needs to determine the following two things first.

    Loss function: In either machine learning or deep learning, we often need to define a loss function as an indicator to express whether a model is suitable, and then to minimize the loss function. The indicator is called the cost or the loss. The loss function used in this project is cross entropy loss function.

    Optimizer: After the loss function is defined, we need to optimize the loss function by the optimizer, so as to find the optimal parameters and minimize the value of the loss function. The optimizer that is more frequently used in finding the optimal parameters of machine learning is the gradient descent-based optimizer.

  3. 3.

    Training and Validating the Model

    Training all the data in batches or in bulk iterations. In this project, we train the data directly with model.fit, and train the data in bulk iterations for five times, as shown in Fig. 4.7. An epoch represents the number of times of training iterations.

    As shown in Fig. 4.8, we test and validate the model with the test set and compare the predicted outcome and the actual outcome, so as to find out the correct label, and estimate the accuracy of the model on the test set.

Fig. 4.3
figure 3

An example from the dataset

Fig. 4.4
figure 4

Computation of the softmax function

Fig. 4.5
figure 5

Calculation process of the model

Fig. 4.6
figure 6

Code implementing softmax

Fig. 4.7
figure 7

Training process

Fig. 4.8
figure 8

Test validation

4.5 Chapter Summary

This chapter introduces the development frameworks that are commonly used in the AI industry and their characteristics, with the module composition and basic procedure of the TensorFlow framework developing particularly emphasized. What is more, this chapter offers a project to introduce the application of TensorFlow functions and modules in practical cases. Readers can take this chapter as a guide while setting up the framework environment and operating the sample projects. With these steps, we hope our readers can have a deeper understanding of AI.

4.6 Exercises

  1. 1.

    As the implementation of AI is getting wider and wider, what are the most popular development frameworks for AI today? What characteristics do they have?

  2. 2.

    TensorFlow is a representative development framework for AI that has attracted many users. The most important change during its maintenance is the transformation from TensorFlow 1.0 to TensorFlow 2.0. Please describe the differences between the two versions.

  3. 3.

    TensorFlow has a variety of modules designed for users’ demands. Please describe three common Tensor Flow modules.

  4. 4.

    Compared with other frameworks, Keras is quite special as a frontend framework. Please briefly describe the features of Keras’ interface.

  5. 5.

    Please try to configure an AI development framework according to the guidelines provided in this chapter.