As cloud computing technologies and applications mature, their range of applications continues to expand, leading to a huge transformation from client/server models to various forms of cloud services, driving a new era of cloud economy. Container technology is a mainstream technology in the cloud economy and IT ecosystem that effectively divides a single operating system's resources into isolated groups to better balance conflicting resource usage needs among isolated groups. Container technology, including core technology, platform technology, support technology, and a range of technologies, can significantly improve production efficiency and have attracted widespread attention in the industry.

With its agile features that integrate DevOps, container technology has brought new and transformative forces to the cloud computing market, especially the PaaS market, especially the container technology represented by Docker, which has developed rapidly and is now a well-established ecosystem. Kubernetes is a representative product of the new round of change. Based on the model of theory and practice, this chapter introduces the knowledge of container technology and container arrangement from the aspects of platform architecture, basic core function, network, security, and resource management, to make it easier for readers to have a more comprehensive understanding of the Docker and Kubernetes ecosystems.

7.1 Overview of Container Technology

7.1.1 Introduction to Container Technology

Cloud computing solves the problem of elasticity in computing, networking, and storage in computer infrastructure, but it leaves behind two problems: the scalability and migration of applications. In a cloud computing environment, two solutions have been come up. One is through automated scripting, but different environments vary widely, with one script often running correctly on one environment and the other. The second is through the virtual machine image. However, the virtual machine image is too large, replication and download are too time-consuming.

In order to solve the above problems, container technology has been proposed. Drawing on traditional transportation solutions, it has been suggested that applications be packaged in a container-like manner (the dependencies required for an application to run), i.e., to package any application and its dependencies into a lightweight, portable, self-contained container. Isolate different processes running on the host with a kind of virtualization technology that isolates and does not affect each other between containers, containers, and host operating systems, enabling applications to run in the same way anywhere. Developers take notes on their own.

Containers created and tested on this computer can run on virtual machines, physical servers, or public cloud hosts of production systems without any modifications.

  1. 1.

    Container and virtual machine

    When it comes to containers, you have to compare them to virtual machines because both provide encapsulation and isolation for your application.

    Traditional virtualization technologies, such as VMware, KVM, and Xen, aim to create complete virtual machines. In order to run the app, install the entire operating system in addition to the app itself and its dependencies.

    Containers consist of the app itself, and the IT resources on which the app depends, such as the libraries or other applications that the app requires. Containers run in the host operating system's user space and are isolated from other processes of the operating system, which is significantly different from virtual machines. Figure 7.1 shows the difference between a container and a virtual machine.

    Figure 7.1 shows that because all containers share a host operating system, this makes the container much smaller in size than the virtual machine. In addition, boot containers do not need to start the entire operating system, so container deployment and startup are faster, less expensive, and easier to migrate.

  2. 2.

    The evolution of containers

    Container technology dates back to the chroot command in the UNIX operating system in 1979, originally intended to facilitate switching root directories, providing isolation of file system resources for each process, which is also the origin of the idea of operating system virtualization.

    FreeBSD Jails was issued in 2000 based on the chroot command, which was absorbed and improved by BSD. In addition to file system isolation, FreeBSD Jails adds isolation of user and network resources, and each Jail can assign a separate IP for some relatively separate software installation and configuration.

    Linux VServer was released in 2001. Linux VServer continues the idea of FreeBSD Jails, isolating resources such as file system, CPU time, network address, and memory on an operating system, each partition is called a Security Context, and the internal virtualization system is called VPS.

    In 2004, Sun released Solaris Containers. Solaris Containers is released as a feature in Solaris 10 and contains system resource control and binary isolation provided by Zones, where Zones exists as a fully isolated virtual server within the operating system instance.

    In 2005, SWsoft released OpenVZ. OpenVZ is very similar to Solaris Containers in providing virtualization, isolation, resource management, and checkpoints through patched Linux cores. OpenVZ marks the true mainstream of kernel-level virtualization, followed by the addition of relevant technologies to the kernel.

    In 2006, Google released Process Containers. Process Containers recorded and isolated each process's resources (including CPU, memory, hard disk I/O, network, etc.), changed its name to Control Groups, and was added to Linux Kernel 2.6.24 in 2007.

    In 2008, the first more complete container technology, Linux Container (LXC), was available, based on Cgroups and Linux Namespaces implementations added to the kernel. LXC does not need to be patched to run on Linux on any vanilla kernel.

    In 2011, CloudFoundry released Warden, and unlike LXC, Warden can work on any operating system, run as a daemon, and provide an API to manage containers.

    In 2013, Google established an open source container technology stack. Google started this project to enable high-performance, high-resource utilization, and near-zero-cost virtualization technology through containers. Currently, the monitoring tool cAdvisor in Kubernetes originated from the lmctfy project. In 2015, Google donated the core technology of lmctfy to libcontainer.

    Docker was born in 2013. Docker was originally an internal project of DotCloud, the predecessor of Docker, a PaaS company. Like Warden, Docker initially used LXC and later replaced LXC with libcontainer. Unlike other container technologies, Docker built a complete ecosystem around containers, including container imaging standards, container Registry, REST APIs, CLI, container cluster management tool Docker Swarm, and more.

    In 2014, CoreOS created rkt to improve a container engine for Docker's security defect rewrites, including service discovery tools etcd and Web tool Frankel.

    In 2016, Microsoft released Windows-based container technology Hyper-V Container. Hyper-V Container works like container technology under Linux to ensure that processes running in a container are isolated from the outside world, taking into account the security of virtual machines and the lightness of containers.

  3. 3.

    Container standardization

    Today, Docker is almost synonymous with containers, and many people think Docker is a container. In fact, this is the wrong understanding. In addition to Docker, there are Coreos. So it is not just Docker in the container world. This makes it easy to disagree. Any technology requires a standard to regulate it. Otherwise, it can easily led to technology implementation fragmentation, a lot of conflict and redundancy. As a result, the Open Container Initiative (OCI) was established in 2015 by Google, Docker, CoreOS, IBM, Microsoft, Red Hat, and others, and the first open container standard was launched in April 2016. The standard consists primarily of the Easy Runtime Standard and the Image Spec. The introduction of standards helps to bring stability to the growing market. Therefore, enterprises can rest assured that the use of container technology, users in packaging, deployment of applications, can freely choose different containers runtime. Simultaneously, image packaging, establishment, certification, deployment, and naming can also be done according to the unified norms. These two standards mainly contain the following.

    1. (1)

      Container running standard

      • ① Creating:Use the create command to create a container, a process called creating.

      • ② Created:The container has been created, but has not yet run, indicating that there are no errors in imaging and configuration, and that the container can run on the current platform.

      • ③ Running:The container is running, the process inside is up, and the user-set tasks are being performed.

      • ④ Stopped:The container is paused after the container runs completely, or if there is an error running, or after the stop command. In this state, the container also has a lot of information saved in the platform and has not been completely deleted.

    2. (2)

      Container image standard

      1. (1)

        File system: A file system saved by layer, where each layer saves a portion of the change between the top layer, which files the layer should save, how to represent files that are added, modified, and deleted, and so on.

      2. (2)

        Config files: The hierarchical information of the file system (hash values at each level, as well as historical information), as well as some information (such as environment variables, working directories, command parameters, mount lists) that is required for the container runtime, specifying the configuration of the image on a particular platform and system. This is closer to what we see <image_id> using docker inspect.

      3. (3)

        Manifest file: Image’s config file index, the manifest file holds a lot of information about the current platform.

      4. (4)

        Index file: Optional file that points to manifest files from different platforms. This file guarantees that an image can be used across platforms, each with a different manifest file, indexed using index.

  4. 4.

    Container scenarios

    The birth of container technology solves the technical implementation of the PaaS layer. Technologies such as OpenStack and CloudStack are primarily used to solve problems at the IaaS layer. So what are the main scenarios in which container technology is used? There are several mainstream applications at present.

    1. (1)

      Containerized traditional applications

      Containers not only improve the security and portability of existing applications, but also save money. Each enterprise environment has an older set of applications to serve customers or automate business processes. Even large-scale monomer applications benefit from container isolation by enhancing security and portability, reducing costs. Once containerized, these applications can scale additional services or transition to a microservices architecture.

    2. (2)

      Continuous integration and continuous deployment

      Accelerate application pipeline automation and application deployment with Docker. Data suggest that the use of Docker can increase delivery speeds by more than 13 times. Modern development processes are fast, continuous, and automated, with the ultimate goal of developing more reliable software. With Continuous Integration (CI) and CONTINUOUS Deployment( CD), IT teams can integrate new code every time a developer checks in and successfully tests it. As the basis for developing an operational approach, CI/CD creates a real-time feedback loop mechanism that continuously transmits small iterative changes, accelerating changes and improving quality. Ci environments are typically fully automated, triggering tests with git push commands, automatically building new images when successful, and then pushing them to the Docker image library. With subsequent automation and scripting, the new image's container can be deployed to the preview environment for further testing.

    3. (3)

      Microservices

      Use microservices to accelerate application architecture modernization. The application architecture moves from a monomer code base with waterfall model development to a loosely coupled service that is developed and deployed independently. Thousands of such services are connected to form applications. Docker allows developers to choose the tool or technology stack isolation service that best fits each service to eliminate any potential conflicts and avoid "hell-like matrix dependencies." These containers can be easily shared, deployed, updated, and instantly extended independently of the app's other service components. Docker's end-to-end security features enable teams to build and run the lowest privileged microservices model, where the resources required for the service (other applications, confidential information, computing resources, and so on) are created and accessed in real-time.

    4. (4)

      IT infrastructure optimization

      By making the most of your infrastructure, you can save money. Docker and containers help optimize the utilization and cost of your infrastructure. Optimization is not just about cutting costs. It is also about ensuring that the right resources are used effectively at the right time. Containers are a lightweight way to package and isolate application workloads, so Docker allows multiple workloads to run without conflict on the same physical or virtual server. Enterprises can consolidate data centers and consolidate acquired IT resources for mobility to the cloud while reducing maintenance of operating systems and servers.

Fig. 7.1
figure 1

The difference between a container (left) and a virtual machine (right)

7.1.2 Container Imaging

Imaging is the cornerstone of a container, a container is a running instance of an image, and an image is available to launch the container. This section describes container imaging in terms of the internal structure of the image, the construction of the image, image management, and distribution.

  1. 1.

    The internal structure of the image

    If we want to build our images or understand why containers, such as Docker, are lightweight, we need to have an in-depth understanding of the image's internal structure. For ease of understanding, let's start with a minimal image hello-world.

    Hello-world is an official image provided by Docker and is often used to verify that Docker was installed successfully. Let's first download the hello-world image from Docker's official repository via docker pull, as shown in Fig. 7.2.

    Run hello-world through docker run, as shown in Fig. 7.3.

    Dockerfile is a description of the image that defines how to build the Docker image. Dockerfile's syntax is simple and readable. Hello-world's Dockerfile is shown in Fig. 7.4.

    As you can see, there are only three instructions in Dockerfile.

    1. (1)

      FORM scratch: Images are built from scratch.

    2. (2)

      COPY hello /: Copy the file “hello” to the image root.

    3. (3)

      CMD (“/hello”): When the container starts, execute/hello.

      There is only one executable “hello” in the image hello-world, which functions to output information such as “hello from Docker...”. Hello-world is a complete image, but it has no practical use. In general, we want the image to provide a basic operating system environment where users can install and configure software as needed. Such an image is called a base image.

      Base imaging has two meanings: it does not rely on other images, it is built from scratch, and other images can be extended on the basis of it. So what can be called base imaging is usually the Docker image of various Linux distributions, such as Ubuntu, CentOS, and so on.

      The Linux operating system consists of kernel space and userspace. The kernel space is Kernel, and when Linux first starts, the bootfs file system is loaded, after which the bootfs are unloaded. The file system for user space is rootfs, including directories such as /dev, /bin, etc. that we are familiar with. For base imaging, the underlying uses the host's kernel space directly, just add rootfs yourself.

      Docker supports extending existing images to build new ones. For example, we need to build a new image with Dockerfile as shown in Fig. 7.5.

      The new image does not need to start from scratch, but is directly built on the Debian base image, then install emacs and apache2, and finally set up the bash image to run when the container starts. The construction process of the new image is shown in Fig. 7.6.

      As you can see, the new image is generated from a layer-by-layer overlay of the base image. For each software installed, a layer is added to the existing image. The most significant benefit of Docker's hierarchy is that resources can be shared.

      At this point, someone might ask, if multiple containers share a base image, when one container modifies the contents of the base image, will the contents of the other containers be modified? If the answer is no, the modification is limited to a single container. This is known as the COW characteristics of containers. When the container starts, a new writeable layer is added to the top of the image, which is called the container layer, and the underside of the container layer is called the image layer. All changes to the container, whether added, deleted, or modified, occur only in the container layer, only the container layer is writeable, and all image layers below the container layer are read-only. It is visible that the container layer holds the part of the image that changes and does not make any modifications to the image itself.

  2. 2.

    Image construction

    For Docker users, the best situation is that you do not need to create an image yourself. Commonly used databases, middleware, software, etc. have ready-made official Docker images or images created by other people and organizations, and we can use them directly with a little configuration. The benefits of using ready-made images are not only saving the workload of doing images yourself, but more importantly, you can use the experience of predecessors, especially those official images, because Docker engineers know how to run software in containers better. Of course, in some cases we have to build the image ourselves, for example:

    ① Cannot find a ready-made image, such as software developed by oneself.

    ② Specific functions need to be added to the image. For example, the official image does not provide SSH.

    Docker provides two ways to build images: docker commit command and Dockerfile build file.

    The docker commit command is the most intuitive way to build a new image, and its process consists of three steps. The following is an example of installing Vim in the Ubuntu base image and saving it as a new image to illustrate how to build a new image through the docker commit command, as shown in Fig. 7.7.

    ① Run the container. The function of the -it parameter is to enter the container in interactive mode and open the terminal.

    ② Install Vim. First confirm that Vim is not installed, and then execute the installation command.

    ③ Save as a new image, you can use the docker ps command to view the containers running in the current environment in a new window. Silly-Goldberg is the name randomly assigned by Docker for our new container. Execute the docker commit command to save the container as a new image and rename it to ubuntu-with-vim.

    The above steps demonstrate how to build a new image through the docker commit command. However, due to the considerations of error-prone manual creation, low efficiency, weak repeatability, and security, this method is not the preferred method officially recommended by Docker.

    Dockerfile is another way to build an image. It is a text file that records all the steps of image building. Similarly, we take the ubuntu-with-vim image in the previous article to illustrate how to build a new image through this method.

    To build a new image with Dockerfile, you first need to create a Dockerfile, whose content is shown in Fig. 7.8.

    ① The current directory is/root.

    ② Dockerfile is ready.

    ③ Run the Docker build command, it will name the new image ubuntu-with-vim-dockerfile, and the “.” at the end of the command indicates that the build context is the current directory. Docker will find the Dockerfile from the build context by default, and we can also specify the location of the Dockerfile through the -f parameter.

    ④ Starting from this step is the real construction process of the image. First, Docker sends all the files in the build context to the Docker daemon, and the build context provides the files or directories needed for image building.

    The ADD, COPY, and other commands in the Dockerfile can add files in the build context to the image. In this example, the build context is the current directory/root, and all files and subdirectories in this directory will be sent to the Docker daemon. Therefore, you have to be careful when using the build context, do not put extra files in the build context, and be careful not to use / and /usr as the build context; otherwise, the build process will be quite slow or even fail.

    ⑤ Step 1: Execute FROM and use Ubuntu as the base image. The Ubuntu image ID is f753707788c5.

    ⑥ Step 2: Execute RUN and install Vim, the specific steps are ⑦⑧⑨.

    ⑦ Start the temporary container with ID 9f4d4166f7e3 and install Vim in the container via apt-get.

    ⑧ After the installation is successful, save the container as an image with the ID 350a89798937. The bottom layer of this step uses commands similar to docker commit.

    ⑨ Delete the temporary container with ID 9f4d4166f7e3.

    ⑩ The image is successfully built.

    In addition, it needs to be specially pointed out that Docker will cache the image layer of the existing image when building the image. When building a new image, if a certain image layer already exists, it will be used directly without re-creating it. This is called the caching feature of Docker images.

  3. 3.

    Image management and distribution

    We have learned to build our image, and then we will talk about how to use the image on multiple Docker hosts. There are several methods you can use:

    ① Use the same Dockerfile to build images on other hosts.

    ② Upload the image to the registry, such as Docker Hub, and the host can be directly downloaded and used.

    ③ Build a private repository for the local host to use.

    The first method is to rebuild an image through the Dockerfile described in the previous article. The following focuses on how to distribute images through a public/private registry.

    Regardless of the method used to save and distribute the image, you must first name the image. When we execute the docker build command, we have given the image a name, such as docker build –t ubuntu-with-vim, where ubuntu-with-vim is the name of the image.

    The most straightforward way to save and distribute images is to use Docker Hub. Docker Hub is a public registry maintained by Docker. Users can save their images in the free repository of Docker Hub. If you don't want others to access your image, you can also buy a private repository. In addition to Docker Hub, quay.io is another public registry that provides similar services to Docker Hub. The following describes how to use Docker Hub to access the image.

    ① First, you need to register an account on Docker Hub.

    ② Use the command docker login -u xx to log in on the Docker host. xx is the username, you can log in successfully after entering the password.

    ③ Modify the image repository to match the Docker Hub account. In order to distinguish images with the same name from different users, the Docker Hub must include the username in the registry of the image, and the complete format is [username/xxx]:[tag]. We rename the image through the docker tag command.

    Docker’s official image maintained by itself does not have a username, such as httpd.

    ④ Upload the image to Docker Hub via docker push. Docker will upload each layer of the image. If this image is consistent with an official image, all the image layers are on the Docker Hub. Then there is very little data actually uploaded. Similarly, if our image is based on the base image, only the newly added image layer will be uploaded. If you want to upload all the images in the same repository, just omit the tag part, such as Docker push cloudman6/httpd.

    Although Docker Hub is very convenient, it still has some limitations, such as an Internet connection and slow download and upload speeds. Anyone can access the images uploaded to Docker Hub. Although a private repository can be used, it is not free. For security reasons, many organizations do not allow images to be placed on the extranet.

    The solution is to build a local registry. Docker has open sourced the registry, and there is also an official image registry on Docker Hub. Next, we will run our registry in Docker.

    1. (1)

      Start the registry container

      The image we started is registry: 2, as shown in Fig. 7.9.

      -d:Start the container in the background.

      -p:Map port 5000 of the container to port 5000 of the host, where 5000 is the registry service port. Port mapping will be discussed in detail in Sect. 7.1.3.

      -v:Map the container /var/lib/registry directory to the host's /myregistry to store image data. The use of -v will be discussed in detail in Sect. 7.1.4.

      Use the docker tag command to rename the image to match the registry, as shown in Fig. 7.10.

      We added the name and port of the host running the registry to the front of the image.

      The complete format of the repository is [registry-host]:[port]/[username]/xxx.

      Only the mirror on Docker Hub can omit [registry-host]:[port].

    2. (2)

      Upload image via docker push

      Upload the image to the mirror warehouse via docker push, as shown in Fig. 7.11.

      Now the image can be downloaded from the local registry through docker pull, as shown in Fig. 7.12.

Fig. 7.2
figure 2

Download the hello-world image from the official Docker warehouse

Fig. 7.3
figure 3

Run hello-world

Fig. 7.4
figure 4

Contents of Dockerfile

Fig. 7.5
figure 5

Dockerfile for building a new image

Fig. 7.6
figure 6

The construction process of the new image

Fig. 7.7
figure 7

Building a new image through the docker commit command

Fig. 7.8
figure 8

Contents of Dockerfile

Fig. 7.9
figure 9

Start the image registry:2

Fig. 7.10
figure 10

Rename the mirror

Fig. 7.11
figure 11

Upload the image to the image warehouse

Fig. 7.12
figure 12

Download the image from the local registry

7.1.3 Container Network

In this section, Docker network is used as an example to discuss the container network. We first introduce several native networks provided by Docker and how to create a custom network. Then, we introduce how to communicate between containers and how to communicate with the outside world.

  1. 1.

    Docker network model

    Docker provides a variety of native networks such as None, Host, Bridge, Overlay, and Macvlan. The network coverage can be divided into a container network on a single host and a network across multiple hosts. We mainly discuss the former.

    When Docker is installed, three networks will be automatically created on the host. We can view the networks through the docker network ls command, as shown in Fig. 7.13.

    We discuss them separately below.

    1. (1)

      None network

      As the name implies, the None network is a network with nothing. The containers connected to this network do not have any other network cards except lo. When the container is created, you can specify to use the None network through—network-none, as shown in Fig. 7.14.

      This is a closed network. Some applications that require high security and do not require networking can use the None network.

    2. (2)

      Host network

      Containers connected to the Host network share the Docker Host network stack, and the network configuration of the container is the same as that of the host. You can specify the use of the Host network through -network-host, as shown in Fig. 7.15.

      You can see all the host's network cards in the container, and even the hostname is also the host. The biggest advantage of using the Docker Host network directly is performance. If the container has higher requirements for network transmission efficiency, you can choose the Host network. Of course, the inconvenience is to sacrifice some flexibility. For example, to consider port conflicts, the ports already used on Docker Host can no longer be used.

    3. (3)

      Bridge network

      When Docker is installed, a Linux bridge named “docker0” is created. If you do not specify --network, the created container will be hung on docker0 by default, as shown in Fig. 7.16.

      In addition to the three automatically created networks of None, Host, and Bridge, users can also create user-defined networks according to business needs. Docker provides three user-defined network drivers: Bridge, Overlay, and Macvlan. Overlay and Macvlan are used to create a cross-host network. We will not discuss it here.

  2. 2.

    Communication between containers

    There are three ways to communicate between containers via IP address, Docker DNS service or Joined container.

    From the previous example, we can conclude that they must have network cards that belong to the same network for two containers to communicate. After this condition is met, the container can interact through the IP address. The specific method is to specify the corresponding network through --network when creating the container or add the existing container to the specified network through docker network connect.

    Although accessing the container through the IP address satisfies the communication needs, it is still not flexible enough. Because it may not be determined before deploying the application, it will be troublesome to specify the IP address to be accessed after deployment. This problem can be solved through the DNS service that comes with Docker.

    Starting from Docker 1.10, Docker Daemon has implemented an embedded Docker DNS service, allowing containers to communicate directly through the "container name." The method is straightforward, just use -name to name the container at startup. Start two containers bbox1 and bbox2 below:

    docker run -it --network=my_net2 –name=bbox1 busybox docker run –it --network=my_net2 –name=bbox2 busybox

    Then, bbox2 can directly ping to bbox1 and start a specific image, as shown in Fig. 7.17.

    There is a limitation when using Docker DNS Server: it can only be used in user-defined networks. In other words, the default Bridge network cannot use DNS.

    Joined containers are another way to achieve communication between containers. Joined container is exceptional. It can make two or more containers share a network stack, network card, and configuration information. Joined containers can communicate directly through 127.0.0.1.

  3. 3.

    Container communicates with external world

    We have solved the problem of communication between containers. Next, we will discuss how the container communicates with the external world, mainly involving the container's access to the external world and its access to the container.

    In the current experimental environment, Docker Host can access the extranet. Let's see if the container can also access the extranet, as shown in Fig. 7.18.

    It can be seen that the container can access the extranet by default. However, please note that the extranet here refers to the network environment outside the container network, not the Internet.

    Next, we discuss another question, how does the extranet access the container? The answer is port mapping. Docker can map the port that the container provides external services to a certain port of the host, and the extranet accesses the container through this port. The port can be mapped through the -p parameter when the container is started.

    After the container is started, you can view the host's port through the docker ps or docker port command. In addition to mapping dynamic ports, you can also specify the mapping to a specific host port in -p. For example, you can map port 80 to port 8080 of the host, as shown in Fig. 7.19.

Fig. 7.13
figure 13

View network

Fig. 7.14
figure 14

Start None network

Fig. 7.15
figure 15

Starting the Host network

Fig. 7.16
figure 16

Linux bridge information

Fig. 7.17
figure 17

Start a specific image

Fig. 7.18
figure 18

Access to the external network of the container

Fig. 7.19
figure 19

Port mapping

7.1.4 Container Storage

Docker provides two kinds of data storage resources for containers—Storage Driver (management image layer and container layer) and data volume.

We have learned that the Docker image is a hierarchical structure. It consists of a writable container layer on the top and several read-only image layers. The data of the container is placed in these layers. The biggest characteristic of such layering is COW.

The hierarchical structure makes the creation, sharing, and distribution of images and containers very efficient, and these are all due to the Storage Driver. Storage Driver realizes the stacking of multiple layers of data and provides users with a single unified view after merging. Docker supports various Storage Drivers, including AUFS, Device, Mapper, Btrfs, VFS, and ZFS. They can all achieve a hierarchical structure, and at the same time, have their characteristics.

When Docker is installed, the default Storage Driver will be selected according to the current system configuration. The default Storage Driver has better stability because the default Storage Driver has been rigorously tested on the release version. Run the docker info command to view the default Storage Driver.

It is a good choice for some containers to put the data directly in the layer maintained by the Storage Driver, such as those stateless applications. Stateless means that the container has no data that needs to be persisted and can be built directly from the image at any time. However, this method is not suitable for another type of application. They need to persist data. When the container starts, it needs to load the existing data. When the container is destroyed, it hopes to retain the generated data. In other words, this type of container is stateful. This requires another data storage resource of Docker-data volume.

The data volume is essentially a directory or file in the Docker Host file system, which can be directly arranged in the container's file system. It has the following characteristics.

  • Data volumes are directories or files, not unformatted disks or block devices.

  • The container can read/write the data in it.

  • The data in the data volume can be stored permanently, even if the container using it is destroyed.

In terms of specific use, Docker provides two types of Date Volume: Bind Mount and Docker Managed Volume.

  1. 1.

    Bind Mount

    Bind Mount is to arrange the existing directories or files on the host into the container, as shown in Fig. 7.20.

    Arrange it to the httpd container through -v, as shown in Fig. 7.21.

    Bind Mount allows the host to share data with the container, which is very convenient in management. Even if the container is destroyed, Bind Mount is still there. In addition, when Bind Mount, you can also specify the data read/write permission, which is readable and writable by default.

    Bind Mount has many application scenarios. For example, we can mount the source code directory into the container and modify the host's code to see the real-time effect of the application; or put the data of the MySQL container in Bind Mount, so that the host can be convenient back up and migrate data locally.

    The use of Bind Mount is intuitive, efficient, and easy to understand, but it also has shortcomings: Bind Mount needs to specify the specific path of the host file system, limiting the portability of the container. When the container needs to be migrated to another host and that host does not have the data to be mounted or the data is not in the same path, the operation will fail. The more portable way is to use Docker Managed Volume.

  2. 2.

    Docker Managed Volume

    The biggest difference between Docker Managed Volume and Bind Mount is that you do not need to specify the Mount source, just specify the Mount Point. Here, we will take the httpd container as an example, as shown in Fig. 7.22.

    We use -v to tell Docker that a data volume is needed and mounted to /usr/local/apache2/htdocs.

    Whenever a container applies for Mount Docker Managed Volume, Docker will generate a directory under /var/lib/docker/volumes. This directory is the Mount source.

    Summarize the creation process of Docker Managed Volume.

    ① When the container starts, tell Docker that it needs a Data Volume to store data, and help us Mount to the specified directory.

    ② Docker generates a random directory in /var/lib/docker/volumes as the Mount source.

    ③ If the specified directory already exists, copy the data to the Mount source.

    ④ Move Docker Managed Volume to the specified directory.

    In addition to using the Docker inspect command to view Volume, we can also use the docker volume command.

    Then, we discuss sharing data. Sharing data is a key feature of Volume. We will discuss how to share data between containers and hosts and between containers through Volume.

    1. (1)

      Sharing data between the container and the host

      There are two types of data volumes for sharing data between the container and the host. Both of them can share data between the container and the host, but the methods are different. This is very clear for Bind Mount: Mount the shared directory directly to the container. Docker Managed Volume will be more troublesome. Since Volume is located in the directory on the host, it is generated when the container starts, so the shared data needs to be copied to the Volume. Use the docker cp command to copy data between the container and the host. Of course, we can also use the Linux cp command directly.

    2. (2)

      Sharing data between containers

      One method is to put the shared data in Bind Mount, and then mount it to multiple containers. Another method is to use Volume Container. Volume Container is to provide Volume specifically for other containers. The Volume it provides can be Bind Mount or Docker Managed Volume. Next we create a Volume Container, as shown in Fig. 7.23.

      We named the container vc_data (vc is the abbreviation of Volume Container). Note that the docker create command is executed here because Volume Container's role is only to provide data, and it does not need to be running. The container is mounted with two Volumes:

      ① Bind Mount, used to store static files of the Web server.

      ② Docker Managed Volume, used to store some useful tools (of course it is empty now, here is just an example).

      Other containers can use the vc_data Volume Container through volumes-from.

      Finally, we discuss the characteristics of Volume Container.

      ① Compared with Bind Mount, Volume Container does not need to specify each container’s host path. All paths are defined in the Volume Container. The container only needs to be associated with the Volume Container to realize the decoupling of the container and the host.

      ② The Mount Point of the container using Volume Container is consistent, which is conducive to the specification and standardization of the configuration, but it also brings certain limitations. It needs to be considered comprehensively when using it.

Fig. 7.20
figure 20

File information

Fig. 7.21
figure 21

Mount to httpd container

Fig. 7.22
figure 22

Specify Mount Point

Fig. 7.23
figure 23

Create Volume Container

7.1.5 The Underlying Implementation Technology of the Container

In order to better understand the characteristics of containers, this section will introduce the underlying implementation technologies of containers, namely Cgroup and Namespace. Cgroup realizes resource quota, and Namespace realizes resource isolation.

  1. 1.

    Cgroup

    Linux operating system can set the limit of CPU, memory, and I/O resources used by the process through Cgroup.

    What does Cgroup look like? We can find it in /sys/fs/cgroup. To illustrate with an example, start a container, as shown in Fig. 7.24.

    In /sys/fs/cgroup/cpu/docker, Linux will create a Cgroup directory for each container (named after the container's long ID), which contains all CPU-related Cgroup configurations. What shares are saved is the configuration of cpu-shares, with a value of 512.

    Similarly, /sys/fs/cgroup/memory/docker and /sys/fs/cgroup/blkio/docker save the memory and the Cgroup configuration of Block 10.

  2. 2.

    Namespace

    In each container, we can see the file system, network card, and other resources. These resources look like the container itself. Take the network card as an example. Each container will think that it has an independent network card, even if there is only one physical network card on the host. This approach is excellent. It makes the container more like an independent computer.

    The technology that Linux implements this way is Namespace. Namespace manages the globally unique resource in the host and can make each container feel that only it is using it. In other words, Namespace realizes the isolation of resources between containers.

    Linux uses the following Namespaces: Mount, UTS, IPC, PID, Network, and User. Mount Namespace makes the container appear to have the entire file system. UTS Namespace allows the container to have its hostname. IPC Namespace allows containers to have their shared content and semaphores to achieve inter-process communication. PID Namespace allows the container to have its own independent set of PID. Network Namespace allows the container to have its independent network card, IP, and routing resources. User Namespace allows the container to have the authority to manage its users.

Fig. 7.24
figure 24

Cgroup information

7.2 Overview of Kubernetes

Kubernetes is the de facto standard for container orchestration engines. It is another popular technology after big data, cloud computing, and Docker, and it will be trendy for a long time in the future. For the IT industry, this is a valuable technology.

7.2.1 Introduction of Kubernetes

The popularity and standardization of Docker technology have activated the tepid PaaS market, followed by the emergence of various types of Micro-PaaS, and Kubernetes is one of the most representative ones. Kubernetes is Google's open source container cluster management system. It is built on Docker technology and provides a complete set of functions for containerized applications such as resource scheduling, deployment and operation, service discovery, capacity expansion and contraction, and can essentially be regarded as a Micro-Paas platform based on container technology.

Google started using container technology in 2004, released Cgroup in 2006, and internally developed powerful cluster resource management platforms Borg and Omega, which have been widely used in various infrastructures of Google products. Moreover, Kubernetes is inspired by Google's internal Borg system, and it has also absorbed the experience and lessons of container managers, including Omega.

Kubernetes means helmsman in ancient Greek and is also the etymology of Cyber. Kubernetes utilizes Google's practical experience and technical accumulation in container technology while absorbing the Docker community's best practices and has become the “helmsman” of cloud computing services.

  1. 1.

    Advantages of Kubernetes

    1. (1)

      Powerful container orchestration capabilities

      Kubernetes can be said to be developed together with Docker. It is deeply integrated with Docker and naturally adapts to the characteristics of containers. It has powerful container orchestration capabilities, such as container composition, label selection, and service discovery, to meet enterprise-level needs.

    2. (2)

      Lightweight

      Kubernetes follows the theory of microservice architecture. The entire system is divided into components with independent functions. The boundaries between the components are clear, the deployment is simple, and it can be easily run in various systems and environments. At the same time, many functions in Kubernetes are plug-in, which can be easily expanded and replaced.

    3. (3)

      Open and open source

      Kubernetes conforms to open and open source trends, attracting many developers and companies to participate in it and work together to build an ecosystem. At the same time, Kubernetes actively cooperates and develops together with open source communities such as OpenStack and Docker. Both enterprises and individuals can participate and benefit from it.

  2. 2.

    The evolution of Kubernetes

    Kubernetes has quickly gained attention since its launch. In July 2015, after more than 400 contributors’ efforts for a year and as many as 14,000 code submissions, Google officially released Kubernetes 1.0, which means that this open source container orchestration system can be officially launched—used in a production environment. At the same time, Google and the Linux Foundation and other partners jointly established the Cloud Native Computing Foundation (CNCF). They used Kubernetes as the first open source project incorporated into the Cloud Native Computing Foundation's management system to help container technology. Ecological development. The development history of Kubernetes is shown below.

    • June 2014: Google announced that Kubernetes is open source.

    • July 2014: Microsoft, Red Hat, IBM, Docker, CoreOS, Mesosphere, and SaltStack joined Kubernetes.

    • August 2014: Mesosphere announced the integration of Kubernetes into the Mesosphere ecosystem as a framework for the scheduling, deployment, and management of Docker container clusters.

    • August 2014: VMware joined the Kubernetes community. Google’s product manager Craig McLuckie publicly stated that VMware will help Kubernetes implement a functional model that uses virtualization to ensure physical host security.

    • November 2014: HP joined the Kubernetes community.

    • November 2014: Google’s container engine Alpha was launched. Google announced that GCE supports containers and services and uses Kubernetes as the framework.

    • January 2015: Google, Mirantis and other partners introduced Kubernetes into OpenStack, and developers can deploy and run Kubernetes applications on OpenStack.

    • April 2015: Google and CoreOS jointly released Tectonic, which integrates Kubernetes and CoreOS software stacks.

    • May 2015: Intel joined the Kubernetes community and announced that it would cooperate to accelerate the Tectonic software stack development.

    • June 2015: Google’s container engine entered the beta version.

    • July 2015: Google officially joined the OpenStack Foundation, Google’s product manager Craig McLuckie announced that Google will become one of the OpenStack Foundation initiators, and Google will bring its container computing expert technology Enter OpenStack to improve the interoperability of public and private clouds.

    • July 2015: Kubernetes 1.0 was officially released.

    • March 2016: Kubernetes 1.2 was released, and improvements include expansion, simplification of software deployment, and automated cluster management.

    • December 2016: Kubernetes supports OpenAPI, allowing API providers to define their operations and models, and developers can automate their tools.

    • March 2017: Kubernetes 1.6 was released. Specific updates include enabling etcd v3 by default, deleting the direct dependencies of a single container runtime, testing RBAC, and automatically configuring StorageClass objects.

    • December 2017: Kubernetes 1.9 was released. New features include the general availability of apps/v1 Workloads API, Windows support (beta), storage enhancements, etc.

    • March 2018: The first beta version of Kubernetes 1.10 was released. Users can test Kubelet TLS Bootstrapping, API aggregation, and more detailed storage metrics with the production-ready version.

    • June 2018: Kubernetes 1.11 was released, and the cluster load balancing and CoreDNS plug-in reached universal availability. This version has key functions in the network. It opens the two main SIG-API Machinery and SIG-Node functions for beta testing and continues to enhance storage functions.

7.2.2 Kubernetes Management Objects

Kubernetes follows the theory of microservice architecture. The entire system is divided into components with independent functions. The boundaries between the components are clear, the deployment is simple, and it can be easily run in various systems and environments.

  1. 1.

    Kubernetes architecture and components

    Kubernetes belongs to a master-slave distributed architecture, and nodes are divided into Master and Node in terms of roles.

    Kubernetes uses etcd as storage middleware. Etcd is a highly available key-value storage system, inspired by ZooKeeper and Doozer, and uses the Raft consensus algorithm to process log replication to ensure strong consistency. Kubernetes uses etcd as the configuration storage center of the system. Important data in Kubernetes is persisted in etcd, making the various components of the Kubernetes architecture stateless, making it easier to implement distributed cluster deployment.

    The Master in Kubernetes refers to the cluster control node. Each Kubernetes cluster needs a Master node to be responsible for managing and controlling the entire cluster. All control commands of Kubernetes are sent to it, responsible for the specific execution process. All the commands we execute later are run on the Master node. The Master node usually occupies an independent server (three servers are recommended for high-availability deployment). The main reason is that it is too important. It is the “head” of the entire cluster. If it is down or unavailable, then apply it to the cluster content container. Management will be invalidated. The following key components are running on the Master node.

    Kubernetes API Server: As the entrance to the Kubernetes system, it encapsulates the addition, deletion, modification, and query operations of core objects and provides external customers and internal component calls in the form of REST API. The REST objects it maintains will be persisted in etcd.

    • Kubernetes Scheduler: Responsible for the cluster's resource scheduling and allocated machines for the new Pod. This part of the work is separated into a component, which means that it can be easily replaced with other schedulers.

    • Kubernetes Controller Manager: Responsible for executing various controllers. Many controllers have been implemented to ensure the normal operation of Kubernetes.

      In addition to the Master, the other machines in the Kubernetes cluster are called Node nodes, which are also called Minion nodes in earlier versions. Like the Master node, the Node node can be a physical host or a virtual machine. A Node node is a workload node in a Kubernetes cluster. The Master node will assign each Node node some workload (Docker container). When a Node node goes down, the workload will be automatically transferred to the other by the Master node.

      The following key components are running on each Node node.

    • kubelet: Responsible for tasks such as the creation and activation of the container corresponding to the Pod. At the same time, it works closely with the Master node to realize the basic functions of cluster management.

    • kube-proxy: A vital component that realizes the communication and load balancing mechanism of Kubernetes Service.

    • Docker Engine: Docker engine, responsible for the creation and management of local containers.

      Node nodes can be dynamically added to the Kubernetes cluster during operation, provided that the above key components have been correctly installed, configured, and started on this node. By default, kubelet will register itself with the Master node, which is also the Node node management method recommended by Kubernetes. Once the Node node is included in the scope of cluster management, the kubelet process will regularly report its situation to the Master node, such as the operating system, Docker version, the CPU and memory of the machine, and which Pods are currently running. The Master node can therefore know the resource usage of each Node node and realize an efficient and balanced resource scheduling strategy. When a Node node does not report information for more than a specified time, it will be judged by the Master node as “lost connection,” the status of the Node node is marked as not available (NotReady), and then the Master node will trigger the “work load transfer” Automatic process.

  2. 2.

    Basic object concept

    Most of the concepts in Kubernetes, such as Node, Pod, Replication Controller and Service, can be regarded as “resource objects.” Resource objects can be added, deleted, modified, and checked through the kubectl tool (or API programming call) provided by Kubernetes Operate and save it in persistent storage in etcd. From this perspective, Kubernetes is a highly automated resource control system. It achieves automatic control and automatic error correction by tracking and comparing the difference between the “resource expected state” saved in etcd and the “actual resource state” in the current environment.

    1. (1)

      Pod

      Pod is a combination of several related containers. The containers contained in the Pod run on the same host. These containers use the same network namespace, IP address and port, and can be discovered and communicated through the local host. In addition, these containers can also share a storage volume space. The smallest unit of creation, scheduling, and management in Kubernetes is a Pod, not a container. The Pod provides more flexible deployment and management model by providing a higher abstraction level.

    2. (2)

      Replication Controller

      Replication Controller is used to controlling and managing Pod replicas (Replica or instance). Replication Controller ensures that a specified number of Pod replicas are running in the Kubernetes cluster at any time. If there are less than the specified number of Pod replicas, the Replication Controller will start a new Pod replica. Otherwise, it will “kill” the excess replicas to ensure that the number remains unchanged. In addition, the Replication Controller is the core of the implementation of elastic scaling and rolling upgrades.

    3. (3)

      Service

      Service is an abstraction of real application services, which defines the Pod logical collection and the strategy for accessing this Pod logical collection. Service presents the proxy Pod as a single access interface to the outside, and the outside does not need to know how the back-end Pod operates, which brings many benefits to expansion and maintenance and provides a simplified service proxy and discovery mechanism.

    4. (4)

      Label

      Label is a key/value pair used to distinguish Pod, Service, and Replication Controller. In fact, any API object in Kubernetes can be identified by Label. Each API object can have multiple Labels, but each Label's key can only correspond to one value. Label is the basis for the operation of Service and Replication Controller. They all associate Pods through Label. Compared with the solid binding model, this is an excellent loose coupling relationship.

    5. (5)

      Node

      Kubernetes belongs to a master-slave distributed architecture, and Node nodes run and manage containers. As the operating unit of Kubernetes, the Node node is used to assign to the Pod (or container) for binding, and the Pod eventually runs on the Node node. The Node node can be considered as the host of the Pod.

    6. (6)

      Deployment

      Deployment is a higher level API object that manages ReplicaSet and Pod and provides functions such as declarative updates. The official recommendation is to use Deployment to manage ReplicaSet instead of directly using RelicaSet, which means you may never need to manipulate ReplicaSet objects directly.

    7. (7)

      StatefulSet

      StatefulSet is suitable for permanent software, has a unique network identifier (IP), can be stored persistently, and can be deployed, expanded, deleted, and updated appropriately.

    8. (8)

      DaemonSet

      DaemonSet ensures that all (or some) nodes are running on the same Pod. When a node joins the Kubernetes cluster, the Pod will be scheduled to run on the node; when the node is removed from the Kubernetes cluster, the Pod of the DaemonSet will be deleted. When the DaemonSet is deleted, all Pods created by it will be cleaned up.

    9. (9)

      Job

      A one-time task, the Pod will be destroyed after the operation is completed, and the container will not be restarted. Tasks can also be run regularly.

    10. (10)

      Namespace

      Namespace is a fundamental concept in the Kubernetes system. Namespace is used to implement resource isolation for multi-tenancy in many cases. Namespace “distributes” resource objects within the cluster to different Namespaces to form logically grouped different projects, groups, or user groups, so that different groups can be managed separately while sharing the resources of the entire cluster. After the Kubernetes cluster is started, a Namespace named “default”| will be created, which can be viewed through kubectl.

      The object mentioned above components are the core components of the Kubernetes system, and together they constitute the framework and computing model of the Kubernetes system. By flexibly combining them, users can quickly and easily configure, create, and manage container clusters. In addition, many resource objects assist configuration in the Kubernetes system, such as LimitRange and ResourceQuota. In addition, for objects used in the system such as Binding, Event, etc., please refer to the Kubernetes API documentation.

7.2.3 Kubernetes Service

In order to adapt to rapid business needs, microservice architecture has gradually become the mainstream, and the application of microservice architecture needs outstanding service orchestration support. The core element Service in Kubernetes provides a simplified service proxy and discovery mechanism, which naturally adapts to the microservice architecture. Any application can easily run in Kubernetes without changing the architecture.

  1. 1.

    Service proxy and virtual IP address

    In Kubernetes, when dominated by the Replication Controller, the Pod replica changes, such as when migration (to be precise, reconstructing the Pod) or scaling occurs. This is a burden for Pod visitors. Visitors need to discover these Pod copies and sense the changes of Pod copies to update them in time.

    Service in Kubernetes is an abstract concept that defines Pod's logical collection and the strategy for accessing them. The association between Service and Pod is also done based on Label. The goal of Service is to provide a “bridge.” It will provide visitors with a fixed access address, redirecting to the corresponding backend when accessing, which makes non-Kubernetes native applications without writing specific code for Kubernetes. Under the premise, the backend can be easily accessed.

    Kubernetes assigns a fixed IP address to the Service. This is a virtual IP address (also known as ClusterIP), not a real IP address, but virtualized by Kubernetes. The virtual IP address belongs to the virtual network inside Kubernetes, and the external network cannot be found. In the Kubernetes system, the Kubernetes Proxy component is responsible for implementing virtual IP routing and forwarding, so Kubernetes Proxy is running in the Kubernetes Node, thereby implementing a Kubernetes-level virtual forwarding network on top of the container overlay network.

  2. 2.

    Service discovery

    Microservice architecture is a new and popular architecture model. Compared with the traditional monolithic architecture model, microservice architecture advocates dividing applications into a set of small services. However, the application of microservices will also bring new challenges. One of the challenges is to divide the application into multiple distributed components to run, and each component will be clustered and expanded. The mutual discovery and communication between components and components will become complicated, and a set of service orchestration mechanisms is essential.

    Kubernetes provides powerful service orchestration capabilities. Service abstracts each component of a microservice-oriented application. Components and components only need to access the Service to communicate without being aware of component cluster changes. At the same time, Kubernetes provides service discovery capabilities for Service, and components and components can quickly discover each other.

    In Kubernetes, two modes of service discovery are supported: environment variables and DNS.

    1. (1)

      Environmental variables

      When a Pod runs on a Node node, kubelet will add environment variables for each active Service. There are two types of environment variables.

      • Docker Link environment variable: It is equivalent to the environment variable set when the container is connected by the -link parameter of Docker.

      • Kubernetes Service environment variables: the form of environment variables set by Kubernetes for the Service, including {SVCNAME}_SERVICE_HOST and {SVCNAME}_SERVICE_PORT variables. The name of the environment variable is composed of capital letters and underlined.

        For example, there is a server named “redis-master” (its IP address is 10.0.0.11, port number is 6379, protocol is TCP), and its environment variables are shown in Fig. 7.25.

        Here, you can see that the IP address, port number, and protocol information of the “redis-master” Service are recorded in the environment variables. Therefore, applications in Pod can discover this service through environment variables. However, the environment variable method has the following limitations:

        ① Environment variables can only be used in the same namespace.

        ② The Service must be created before the Pod is created. Otherwise, the Service variable will not be set to the Pod.

        ③ DNS service discovery mechanism does not have these restrictions.

        (2) DNS

        DNS service discovery is based on Cluster DNS. The DNS server monitors new services and creates DNS records for each service for domain name resolution. In a cluster, if DNS is enabled, all Pods can automatically pass the name resolution service.

        For example, if you have a service named “my-service” under the “my-ns” namespace, a DNS record named “my-service.my-ns” will be created.

      • Under the “my-ns” namespace, Pod will be able to discover this service by the name “my-service”.

      • In other namespaces, Pod must use the name “my-service.my-ns” to discover this service. The result of this name is the Cluster IP.

        Kubernetes also supports DNS SRV (Service) records for ports. If the “my-service.my-ns” service has a TCP port named “http”, the value of the “http” port can be found by the name “_http._tcp.my-service.my-ns”. Kubernetes DNS server is the only way to discover ExternalName type services.

  3. 3.

    Service release

    The Service's virtual IP address belongs to the internal network virtualized by Kubernetes, and the external network cannot be found, but some services need to be exposed externally, such as the Web front end. At this time, it is necessary to add a layer of network forwarding, that is, the forwarding from the extranet to the intranet. Kubernetes provides NodePort Service, LoadBalancer Service, and Ingress to publish Service.

    1. (1)

      NodePort Service

      NodePort Service is a Service of type NodePort. In addition to assigning an internal virtual IP address to the NodePort Service, Kubernetes also exposes the port NodePort on each Node node. The extranet can access the Service through [NodeIP]:[NodePort].

    2. (2)

      LoadBalancer Service

      LoadBalancer Service is a Service of type LoadBalancer. LoadBalancer Service is built on the NodePort Service cluster. Kubernetes will assign an internal virtual IP address to LoadBalancer Service and expose the NodePort. In addition, Kubernetes requests the underlying cloud platform to create a load balancer with each Node node as the backend, and the load balancer will forward the request to [NodeIP]:[NodePort].

    3. (3)

      Ingress

      Kubernetes provides an HTTP routing and forwarding mechanism called Ingress. The implementation of Ingress requires the support of two components, namely HTTP proxy server and Ingress Controller. The HTTP proxy server will forward external HTTP requests to the Service, and the Ingress Controller needs to monitor the Kubernetes API and update the forwarding rules of the HTTP proxy server in real-time.

Fig. 7.25
figure 25

Environment variables

7.2.4 Kubernetes Network

Kubernetes is independent from Docker's default network model to form its own network model, which is more suitable for traditional network models, and applications can smoothly migrate from non-container environments to Kubernetes.

  1. 1.

    Communication between containers

    In this case, the container communication is relatively simple because the container inside the Pod shares the network space, so the container can directly use the local host to access other containers. In this way, all containers in the Pod are interoperable, and the Pod can be regarded as a complete network unit externally, as shown in Fig. 7.26.

    When Kubernetes starts a container, it starts a Pause container, which implements the communication function between containers. Each Pod runs a special container called Pause, and other containers are business containers. These business containers share the Pause container's network stack and Volume mount volume, so the communication and data exchange between them is more Efficient. In design, we can make full use of this feature to put a group of closely related service processes into the same Pod.

  2. 2.

    Communication between Pod

    The Kubernetes network model is a flat network plane. Pod as a network unit is at the same level as the Kubernetes Node network in this network plane. We consider a minimal Kubernetes network topology, as shown in Fig. 7.27. The following conditions are met in this network topology.

    ① Inter-Pod communication: Pod2 and Pod3 (same host), Pod1 and Pod3 (cross-host) can communicate.

    ② Communication between Node node and Pod: Node1 and Pod2/Pod3 (same host), Pod/1 (cross-host) can communicate.

    So the first question is how to ensure that the IP address of the Pod is globally unique? In fact, the method is straightforward because the Docker bridge assigns the Pod's IP address. Therefore, you can configure the Docker bridges of different Kubernetes Nodes to different IP network segments.

    In addition, Pods/containers on the same Kubernetes Node can communicate natively, but how do Pods/containers between Kubernetes Nodes communicate? This requires enhancements to Docker. Create an overlay network in the container cluster to connect all nodes. Currently, overlay networks can be created through third-party network plug-ins, such as Flannel and OVS.

    1. (1)

      Use Flannel to create a Kubernetes overlay network

      Flannel is an overlay network tool designed and developed by the CoreOS team. It creates an overlay network in the cluster, sets a subnet for the host, and encapsulates the communication messages between containers through a tunnel protocol to achieve cross-host communication between containers. Now we use Flannel to connect two Kubernetes Nodes, as shown in Fig. 7.28.

    2. (2)

      Use OVS to create a Kubernetes overlay network

      OVS is a high-quality, multi-layer virtual switch using the open source Apache 2.0 license agreement developed by Nicira Networks. Its purpose is to allow large-scale network automation to be extended through programming while still supporting standard management interfaces and protocols.

      OVS also provides support for the OpenFlow protocol. Users can use any controller that supports the OpenFlow protocol to manage and control OVS remotely. OVS is a critical SDN technology that can flexibly create virtual networks that meet various needs, including overlay networks.

      Next, we use OVS to connect two Kubernetes Nodes. In order to ensure that the container IP does not conflict, the network segment of the Docker bridge on the Kubernetes Node must be planned.

  3. 3.

    Communication between service and pod

    Service acts as a service agent between Pods and acts as a single access interface externally, forwarding Pods' requests. Service network forwarding is a key part of Kubernetes' realization of service orchestration. Among them, Kubernetes Proxy, as a key component, is responsible for implementing virtual IP routing and forwarding, and a Kubernetes-level virtual forwarding network is implemented on top of the container overlay network. Kubernetes Proxy has two implementation modes, namely Userspace mode and Iptables mode, which can be specified by the startup parameter of Kubernetes Proxy—proxy-mode.

  4. (1)

    Userspace mode

    In Userspace mode, Kubernetes Proxy will enable a random port for each Service to monitor on the host and create an Iptables rule to redirect requests to the Service virtual IP address to this port, and Kubernetes Proxy will forward the request to Endpoint. In this mode, Kubernetes Proxy functions as a reverse proxy, and Kubernetes Proxy completes the forwarding of requests in user space. Kubernetes Proxy needs to monitor Endpoint changes and refresh forwarding rules in real time, as shown in Fig. 7.30.

  5. (2)

    Iptables mode

    In the Iptables mode, Kubernetes Proxy directly redirects requests for access to Endpoint's Service virtual IP address by creating Iptables rules. When the Endpoint changes, Kubernetes Proxy will refresh the relevant Iptables rules. In this mode, Kubernetes Proxy is only responsible for monitoring Service and Endpoint, updating Iptables rules, packet forwarding depends on the Linux kernel, and the default load balancing strategy is random, as shown in Fig. 7.31.

Fig. 7.26
figure 26

Pod network structure

Fig. 7.27
figure 27

The smallest Kubernetes network topology

Fig. 7.28
figure 28

Kubernetes Node connection mode 1

Fig. 7.29
figure 29

Kubernetes Node connection mode 2

Fig. 7.30
figure 30

Monitoring function of Kubernetes Proxy

Fig. 7.31
figure 31

Random load balancing strategy

7.2.5 Kubernetes Storage

  1. 1.

    Storage application scenario

    Services running in Kubernetes can be divided into three categories from simple to complex: stateless services, ordinary stateful services, and stateful cluster services.

    1. (1)

      Stateless service: Kubernetes uses ReplicaSet to guarantee the number of instances of a service. If a Pod instance “hangs” or crashes for some reason, ReplicaSet will immediately use this Pod template to create a Pod to replace it. Because it is a stateless service, the new Pod is the same as the old Pod. In addition, Kubernetes provides a stable access interface through service (multiple Pods can be linked behind a Service) to achieve high service availability.

    2. (2)

      Ordinary stateful services: Compared with stateless services, it has more state preservation requirements. Kubernetes provides a storage system based on Volume and Persistent Volume, which can realize service state preservation.

    3. (3)

      Stateful cluster service: Compared with ordinary stateful services, it has more cluster management requirements. There are two problems to be solved to run stateful cluster services: state preservation and cluster management. Kubernetes has developed StatefulSet (previously called PetSet) for this purpose to facilitate the deployment and management of stateful cluster services on Kubernetes.

      Analyzing the above service types, the use of storage in Kubernetes mainly focuses on the following two aspects:

      • Reading the basic configuration files of the service, password key management, etc.

      • Service storage status, data access, etc.

  2. 2.

    Storage system

    In the design and implementation of Docker, the container's data is temporary. That is, when the container is destroyed, the data in it will be lost. If you need to persist data, you need to use the Docker data volume to mount files or directories on the host to the container.

    In the Kubernetes system, when the Pod is rebuilt, the data will be lost. Kubernetes also provides the persistence of the Pod data through the data volume. The Kubernetes data volume is an extension of the Docker data volume. The Kubernetes data volume is at the Pod level and can be used to implement file-sharing of containers in the Pod.

    Kubernetes data volume adapts to various storage systems, providing rich and powerful functions. Kubernetes provides multiple types of data volumes, which are divided into three categories: local data volumes, network data volumes, and information data volumes according to their functions.

    1. (1)

      Local data volume

      There are two types of data volumes in Kubernetes. They can only act on the local file system. We call them local data volumes. The data in the local data volume will only exist on one machine, so when the Pod is migrated, the data will be lost, which cannot meet the real data persistence requirements. However, local data volumes provide other uses, such as file-sharing of containers in Pod, or sharing the host's file system.

      ① EmptyDir

      EmptyDir is an empty directory, which is a new directory created when the Pod is created. If the Pod is configured with an EmptyDir data volume, the EmptyDir data volume will exist during the life of the Pod. When the Pod is allocated to the Node node, the EmptyDir data volume will be created on the Node node and mounted to the Pod container. As long as the Pod exists, the EmptyDir data volume will exist (container deletion will not cause the EmptyDir data volume to lose data). However, if the Pod's life cycle ends (Pod is deleted), the EmptyDir data volume will be deleted and lost forever.

      The EmptyDir data volume is very suitable for file-sharing of containers in Pod. Pod's design provides a good container combination model, each of which performs its duties and completes the interaction through shared file directories. For example, a full-time log collection container can be combined in each Pod and business container to complete the logs' collection and summary.

      ② HostPath

      The HostPath data volume allows the file system on the container host to be mounted to the Pod. If the Pod needs to use some files on the host, you can use the HostPath data volume.

    2. (2)

      Network data volume

      Kubernetes provides many types of data volumes to integrate third-party storage systems, including some prevalent distributed file systems and storage support provided on the LaaS platform. These storage systems are distributed and share file systems through the network, so we call it network data volume.

      Network data volumes can meet the persistence requirements of data. Pod is configured to use network data volume. Each time a Pod is created, the remote file directory of the storage system will be mounted to the container, and the data in the data volume will be permanently stored. Even if the Pod is deleted, it will only delete the mounted data volume. The data in the data volume is still stored in the storage system, and when a new Pod is created, the same data volume is still mounted.

      ① NFS

      NFS is a file system supported by FreeBSD, which allows computers on the network to share resources via TCP/IP. In NFS applications, the local NFS client application can transparently read/write files located on the remote NFS server, just like accessing local files.

      ② iSCSI

      iSCSI is researched and developed by IBM. It is an SCSI instruction set for hardware devices that can run on the IP address's upper layer. This instruction set can be implemented to run the SCSI protocol on the IP network, enabling it to perform routing selection on, for example, high-speed Gigabit Ethernet. iSCSI technology is a new storage technology that combines the existing SCSI interface with Ethernet technology to enable servers to exchange data with storage devices using IP networks.

      ③ GlusterFS

      GlusterFS is the core of the horizontal expansion storage solution. It is an open source distributed file system with powerful horizontal expansion capabilities. Through expansion, it can support PB-level storage capacity and handle thousands of clients. GlusterFS uses TCP/IP or InfiniBand RDMA network to aggregate physically distributed storage resources and uses a single global namespace to manage data. GlusterFs is based on a stackable userspace design, which can provide excellent performance for various data loads.

      ④ RBD

      Ceph is an open source, distributed network storage, and at the same time, a file system. Ceph's design goals are excellent performance, reliability, and scalability. Ceph is based on reliable, scalable, and distributed object storage, manages metadata through a distributed cluster, and supports POSIX interfaces. RBD (Rados Block Device) is a Linux block device driver that provides a shared network block device to interact with Ceph. RBD strips and replicates on the cluster of Ceph object storage to provide reliability, scalability, and access to block devices.

    3. (3)

      Information data volume

      There are some data volumes in Kubernetes, mainly used to pass configuration information to containers, which we call information data volumes. For example, Secret and Downward API both save Pod information in the form of a file and then mount it to the container in the form of a data volume, and the container obtains the corresponding information by reading the file. In terms of functional design, this is a bit deviating from the original intention of the data volume because it is used to persist data or file-sharing. Future versions may restructure this part, placing the functions provided by the information data volume in a more appropriate place.

      ① Secret

      Kubernetes provides Secret to handle sensitive data, such as passwords, tokens, and secret keys. Compared to directly configuring sensitive data in the Pod definition or mirror, Secret provides a more secure mechanism to prevent data leakage.

      The creation of the Secret is independent of the Pod, and it is mounted to the Pod in the form of a data volume. The Secret's data will be saved in the form of a file, and the container can obtain the required data by reading the file.

      ② Downward API

      The Downward API can tell the container Pod information through environment variables. In addition, it can also pass values through data volumes. Pod information will be mounted in the container through the data volume in the form of a file. The information can be obtained by reading the file in the container. Currently, the Pod name, Pod Namespace, Pod Label, and Pod Annotation are supported.

      ③ Git Repo

      Kubernetes supports downloading the Git warehouse to the Pod. It is currently implemented through the Git Repo data volume. That is, when the Pod configures the Git Repo data volume, it downloads and configures the Git warehouse to the Pod data volume, and then mounts it to the container.

    4. (4)

      Storage resource management

      Understanding each storage system is a complicated matter, especially for ordinary users, who sometimes do not care about various storage implementations, but only hope to store data safely and reliably. Kubernetes provides Persistent Volume and Persistent Volume Claim mechanisms, which are storage consumption models. Persistent Volume is a data volume configured and created by the system administrator. It represents a specific type of storage plug-in implementation, which can be NFS, iSCSI, etc.: For ordinary users, through Persistent Volume Claim, you can request and obtain a suitable Persistent Volume without the need to perceive the back-end storage implementation.

      The relationship between Persistent Volume Claim and Persistent Volume is similar to Pod and Node node. Pod consumes the resources of Node node, and Persistent Volume Claim consumes the resources of Persistent Volume. Persistent Volume and Persistent Volume Claim are related to each other and have complete life cycle management.

    5. (1)

      Preparation

      The system administrator plans and creates a series of Persistent Volumes. After the Persistent Volume is successfully created, it is available.

    6. (2)

      Binding

      The user creates a Persistent Volume Claim to declare the storage request, including storage size and access mode. After the Persistent Volume Claim is successfully created, it is in a waiting state. When Kubernetes finds that a new Persistent Volume Claim is created, it will look for the Persistent Volume according to the conditions. When Persistent Volume matches, Persistent Volume Claim and Persistent Volume will be bound, and Persistent Volume and Persistent Volume Claim are both in a bound state.

      Kubernetes will only select the Persistent Volume in the available state and adopt the minimum satisfaction strategy. When there is no Persistent Volume to meet the demand, the Persistent Volume Claim will be in a waiting state. For example, there are now two Persistent Volumes available, one Persistent Volume with a capacity of 50GB and one Persistent Volume with a capacity of 60GB, then the Persistent Volume Claim for 40GB will be bound to the Persistent Volume for 50GB, and the Persistent Volume Claim for 100Gi is requested. It is in a waiting state until a Persistent Volume larger than 100GB appears (Persistent Volume may be created or recycled).

    7. (3)

      Use

      When creating a Pod using Persistent Volume Claim, Kubernetes will query its bound Persistent Volume, call the real storage implementation, and then mount the Pod's data volume.

    8. (4)

      Release

      When the user deletes the Persistent Volume Claim bound to the Persistent Volume, the Persistent Volume is in the released state. At this time, the Persistent Volume may retain the Persistent Volume Claim data, so the Persistent Volume is not available, and the Persistent Volume needs to be recycled.

    9. (5)

      Recycling

      The released Persistent Volume needs to be recycled before it can be used again. The recycling strategy can be manual processing or automatic cleaning by Kubernetes. If the cleaning fails, the Persistent Volume will be in a failed state.

7.2.6 Kubernetes Service Quality

In order to realize the effective scheduling and allocation of resources while improving resource utilization, Kubernetes uses QoS to manage the quality of service on Pod according to the expectations of different service quality. For a Pod, the quality of service is reflected in two specific indicators: CPU and memory. When the memory resources on the node are tight, Kubernetes will deal with it according to the different QoS categories set in advance.

  1. 1.

    QoS Classification

    QoS is mainly divided into three categories: Guaranteed, Burstable and Best-Effort, with priority from high to low.

    1. (1)

      Guaranteed

      All containers in the Pod must set limits uniformly, and the set parameters are consistent. If there is a container to set requests, then all containers must be set and the set parameters are consistent with the limits. The QoS of this Pod is the Guaranteed level.

      Note: If a container only sets limits but not requests, the value of requests is equal to the value of limits.

      Guaranteed example: Both requests and limits are set and the values are equal, as shown in Fig. 7.32.

    2. (2)

      Burstable

      As long as the requests and limits of a container in the Pod are not the same, the QoS of the Pod is the Burstable level.

      Burstable example: set limits for the different resources of the container foo and bar (foo is memory, and bar is cpu), as shown in Fig. 7.33.

    3. (3)

      Best-Effort

      If requests and limits are not set for all resources, the QoS of the Pod is the Best-Effort level.

      Best-Effort example: neither container foo nor container bar has requests and limits set, as shown in Fig. 7.34.

  2. 2.

    Resource recovery strategy

    When the available resources on a node in a Kubernetes cluster are relatively small, Kubernetes provides a resource recovery strategy to ensure the Pod service's normal operation on the node. When the memory or CPU resources on a node are exhausted, the Pod service scheduled to run on the node may become unstable. Kubernetes uses kubelet to control the resource recovery strategy to ensure that the Pod on the node can run stably when the node resources are relatively small.

    According to the scalability of resources, Kubernetes divides resources into compressible resources and incompressible resources. CPU resources are currently supported compressible resources, while incompressible resources currently support memory resources and disk resources.

    Compressible resources: The CPU is a compressible resource mentioned in the Compressed Resources section. When the Pod usage exceeds the set limits, the CPU usage of the Pod process will be restricted, but it will not be Killed (“killed”).

    Incompressible resources: When the Node node's memory resources are insufficient, a process will be killed by the kernel.

    The sequence and scenarios of the three QoS Pods being Killed are as follows.

     • Best-Effort type Pod: When the system runs out of all memory, this type of Pod will be killed first.

     • Burstable type Pod: When the system runs out of all memory and no Best-Effort container can be killed, this type of Pod will be killed.

     • Guaranteed type Pod: The system has used up all the memory, and there is no Burstable and Best-Effort container that can be killed, this type of Pod will be killed.

    Note: If the Pod process uses more than the preset limits instead of the Node node's resource shortage, the system tends to restart the container on the machine where it was initially located or recreate a Pod.

  3. 3.

    QoS implementation recommendations

    If the resources are sufficient, you can set the QoS Pod type to Guaranteed. Use computing resources for business performance and stability, reducing the time and cost of troubleshooting.

    If you want to improve resource utilization better, business services can be set to Guaranteed, and other services can be set to Burstable or Best-Effort according to their importance.

Fig. 7.32
figure 32

Example of guaranteed configuration file

Fig. 7.33
figure 33

Burstable configuration file

Fig. 7.34
figure 34

Best-Effort configuration file

7.2.7 Kubernetes Resource Management

Resource management is a key capability of Kubernetes. Kubernetes not only allocates sufficient resources to applications, but also prevents applications from using resources without restrictions. As the scale of applications increases by orders of magnitude, these issues become critical.

  1. 1.

    Kubernetes resource model

    Virtualization technology is the foundation of cloud platforms. Its goal is to integrate or divide computing resources. This is a key technology in cloud platforms. Virtualization technology provides flexibility in resource allocation for cloud platform resource management, so that the cloud platform can integrate or divide computing resources through the virtualization layer.

    Compared with virtual machines, the emerging container technology uses a series of system-level mechanisms, such as the use of Linux Namespace for space isolation, the mount point of the file system to determine which files the container can access, and the Cgroup to determine which container can use how many resources. In addition, the containers share the same system kernel, so that when multiple containers use the same kernel, the efficiency of memory usage will be improved.

    Although the two virtualization technologies, containers and virtual machines, are entirely different, their resource requirements and models are similar. Containers like virtual machines require memory, CPU, hard disk space, and network bandwidth. The host system can treat the virtual machine and the container as a whole, allocate and manage the resources it needs for this whole. Of course, the virtual machine provides the security of a dedicated operating system and a firmer logical boundary, while the container is relatively loose on the resource boundary, which brings flexibility and uncertainty.

    Kubernetes is a container cluster management platform. Kubernetes needs to count the overall platform's resource usage, allocate resources to the container reasonably, and ensure that there are enough resources in the container life cycle to ensure its operation. Furthermore, if the resource issuance is exclusive, the resource has been distributed to one container, the same resource will not be distributed to another container. For idle containers, it is very wasteful to occupy resources (such as CPU) that they do not use. Kubernetes needs to consider how to improve resource utilization under the premise of priority and fairness.

  2. 2.

    Resource requests and resource limits

    Computing resources are required for Pod or container operation, mainly including the following two.

    • CPU: The unit is Core.

    • Memory: The unit is Byte.

      When creating a Pod, you can specify the resource request and resource limit of each container. The resource request is the minimum resource requirement required by the container, and the resource limit is the upper limit of the resource that the container cannot exceed. Their size relationship must be:

      0<=request<=limit<=infinity

      In the definition of the container, resource requests are set through resources.requests, and resource limits are set through resources.limits. Currently, the only resource types that can be specified are CPU and memory. Resource request and resource limit are optional configurations, and the default value depends on whether LimitRange is set. If the resource request is not specified and there is no default value, then the resource request is equal to the resource limit.

      The Pod defined below contains two containers (see Fig. 7.35): the resource request for the first container is 0.5 core CPU and 255MB memory, and the resource limit is 1 core CPU and 512MB memory; the resource request for the second container is 0.25 core CPU and 128MB memory, the resource limit is 1 core CPU and 512MB memory.

      The resource request/limit of a Pod is the sum of all container resource requests/limits in the Pod. For example, the Pod's resource request is 0.75 core CPU and 383MB memory, and the resource limit is 2 core CPU and 1024MB memory.

      When the Kubernetes Scheduler schedules a Pod, the Pod's resource request is a key indicator of scheduling. Kubernetes will obtain the maximum resource capacity of the Kubernetes Node (via the cAdvisor interface) and calculate the used resources. For example, the Node node can accommodate 2 core CPUs and 2GB memory, and 4 Pods have been running on the Node node, requesting a total of 1.5 core CPU and 1GB memory, and the remaining 0.5 core CPU and 1GB memory. When Kubernetes Scheduler schedules a Pod, it checks whether there are enough resources on the Node node to satisfy the Pod's resource request. If it is not satisfied, the Node node will be excluded.

      Resource requests can ensure that the Pod has enough resources to run, and resource restrictions prevent a Pod from using resources unrestrictedly, causing other Pods to crash. Especially in the public cloud scenario, malicious software often preempts the attack platform.

      Docker containers use Linux Cgroups to implement resource limits, and the docker run command provides parameters to limit CPU and memory.

    1. (1)

      --memory

      The docker run command sets the memory quota available to a container through the --memory parameter. Cgroup will limit the memory usage of the container. Once the quota is exceeded, the container will be terminated. The value of --memory of the Docker container in Kubernetes is the value of resources.limits.memory, for example, resources.limits.memory=512MB, then the value of --memory is 512×1024×1024×1024.

    2. (2)

      --cpu-shares

      The docker run command sets the available CPU quota for a container through the --cpu-shares parameter. It is important to note that this is a relative weight and has nothing to do with the actual processing speed. Each new container will have 1024 CPU quota by default. When we talk about it alone, this value does not mean anything. However, if you start two containers and both will use 100% of the CPU, the CPU time will be evenly distributed between the two containers because they both have the same CPU quota. If we set the container's CPU quota to 512, compared to another 1024CPU quota container, it will use 1/3 of the CPU time, but this does not mean that it can only use 1/3 of the CPU time. If another container (1024CPU quota is easy) is idle, the other container will be allowed to use 100% of the CPU. For CPUs, it is difficult to clearly state how many CPUs are allocated to which container, depending on the actual operating conditions.

      The value of --cpu-shares of the Docker container in Kubernetes is through resources.requests.cpu or resources.

      The value of requests.cpu is multiplied by 1024. If resources.requests.cpu is specified, --cpu-shares is equal to resources.

      requests.cpu multiplied by 1024; if resources.requests.cpu is not specified, but resources.limits.cpu is specified, --cpu-shares is equal to resources.limits.cpu multiplied by 1024; if resources.limits.cpu and resources. If limits.cpu is not specified, --cpu-shares takes the minimum value.

      LimitRange includes two types of configurations, Container and Pod. The configurations, including constraints and default values, are shown in Tables 7.1 and 7.2.

Fig. 7.35
figure 35

Setting resource request and resource limit

Table 7.1 LimitRange container configuration
Table 7.2 LimitRange Pod configuration

Kubernetes is a multi-tenant architecture. When multiple tenants or teams share a Kubernetes system, the system administrator needs to prevent the tenants from occupying resources and define resource allocation strategies. Kubernetes provides the API object ResourceQuota to implement resource quotas. ResourceQuota can not only act on CPU and memory, but also limit the number of Pods created. The computing resource quotas and resources supported by ResourceQuota are shown in Tables 7.3 and 7.4.

Table 7.3 Computing resource quota
Table 7.4 Kubernetes API对象资源

7.3 Exercise

  1. 1.

    Fill in the blanks

    1. 1.

      The emergence of container technology actually mainly solves the technical realization of the _______layer.

    2. 2.

      Docker provides two ways to build images:___________and___________.

    3. 3.

      Kubernetes uses etcd as storage middleware, etcd is a highly available key-value storage system, inspired by ZooKeeper and Doozer, processing log replication through________to ensure strong consistency.

    4. 4.

      Kubernetes provides powerful __________ capabilities. Each component of a microservice application is abstracted by Service. Components only need to access the Service to communicate with each other without being aware of component cluster changes.

    5. 5.

      Kubernetesd's QoS is mainly divided into three categories: _________, __________, and __________.

  2. 2.

    Answer the following questions

    1. 1.

      What is a container? What is the difference between container virtualization and traditional virtualization?

    2. 2.

      How many components does Kubernetes contain? What is the function of each component? How do the components interact?

    3. 3.

      What is the relationship between Kubernetes and Docker?

  3. 3.

    Practice

    Write a Dockerfile to achieve the following functions: open the container to view the “/” directory, and rewrite to view the “/mnt” directory, the image can be selected dependently.