To meet user needs or availability requirements, cloud computing environments often require the flexibility to manage and use connected resources across multiple data centers and require robust cloud operating system architectures or software. Public or private organizations have developed many management frameworks within themselves to manage public resources. OpenStack is one of many commercial or open source cloud management software. OpenStack is the framework for building a cloud operating system, an open source project designed to provide software for the construction and management of public and private clouds, comprising a large number of peripheral components, integrating and managing a wide range of hardware devices, and hosting a variety of upper-level applications and services, resulting in a complete and robust cloud computing system.

By dissecting the components of OpenStack, explaining their relationship to the underlying resources (computing, storage, networking, etc.), focusing on the core components and Keystone, Nova, Glance, Cinder, Neutron, etc., this chapter explores how OpenStack works and helps readers understand how OpenStack components work through examples, procedures, pictures, logs, etc. It enables readers to acquire the skills to implement and manage OpenStack and configure and adapt to their needs.

6.1 Overview of OpenStack

OpenStack is the most popular open source cloud operating system framework. Since its first release in 2010, OpenStack has grown and matured through the joint efforts of thousands of developers and tens of thousands of users. At present, OpenStack is powerful and rich, has been in private cloud, public cloud, NFV, and other fields have been increasingly widely used.

6.1.1 OpenStack Architecture

Whether it is Linux on servers and personal computers, Windows, or Android and iOS on your phone, it is a common example of an operating system. Correspondingly, a complete cloud operating system is a distributed cloud computing system consisting of a large number of software/hardware. As with a normal operating system, a cloud operating system needs to be managed. OpenStack is a key component in implementing a cloud operating system, primarily deploying infrastructure as a service (IaaS), or building a complete framework for a cloud operating system. Figure 6.1 shows the cloud operating system in the cloud computing framework, and you can see OpenStack’s position in the complete cloud computing framework.

Fig. 6.1
figure 1

Cloud operating system in the cloud computing framework

The cloud operating system framework is not equal to the cloud operating system. Building a complete cloud operating system requires the organic integration of many software components that work together to provide the functionality and services that system administrators and tenants need. On the other hand, OpenStack does not have all the capabilities required for a complete cloud operating system on its own. For example, OpenStack cannot independently implement resource access and abstraction, but needs to work with the underlying virtualization software, software definition storage, software definition network, and other software. OpenStack cannot independently provide comprehensive application life cycle management capabilities but needs to integrate various management software platforms at the upper level. OpenStack itself does not have complete system management and maintenance capabilities, and when put into use, it needs to integrate various management software and maintenance tools. OpenStack's human–machine interface is not rich, powerful, etc.

It is not hard to see how building a complete cloud operating system based on OpenStack requires integrating OpenStack with other software components to provide capabilities that OpenStack itself can't provide. As a result, OpenStack's precise positioning is a cloud operating system framework. Based on this framework, different components can be integrated to implement cloud operating systems that meet the needs of different scenarios, and on this basis, a complete cloud computing system can be built.

6.1.2 OpenStack Core Components

OpenStack contains many components, including Nova, Swift, Glance, Keystone, Neutron, Cinder, Horizon, MQ, Heat, Ceilometer, and seven core components.

  1. 1.

    Nova

    Nova is OpenStack's controller that supports all the activity processing required for the life cycle of instances within the OpenStack cloud. Nova manages computing resources and scaling requirements in the OpenStack cloud as a management platform. However, Nova cannot provide its virtualization capabilities, and it uses Libvirt's API to support interactions between hypervisors.

  2. 2.

    Swift

    Swift provides object storage services that allow files to be stored or retrieved, but not by mounting directories on the file server. Swift provides OpenStack with distributed, ultimately consistent virtual object storage. With distributed storage nodes, Swift can store billions of objects. Swift has built-in redundancy, fault management, archiving, streaming capabilities, and Swift is highly scalable.

  3. 3.

    Glance

    Glance provides a directory and storage repository for virtual disk images that can provide storage and retrieval of virtual machine images. These disk images are widely used in Nova components. Glance can perform mirror management and tenant private mirror management in multiple data centers. Although a mirror service is technically optional, a cloud of any size can demand that service.

  4. 4.

    Keystone

    Keystone provides authentication and authorization for all services on OpenStack. Authentication and authorization are complex in any system, especially projects as large as OpenStack, and each component requires unified authentication and authorization.

  5. 5.

    Neutron

    Neutron is the core component of providing network services in OpenStack, based on the idea of a software-defined network, software-based network resource management, and fully utilizes various network-related technologies in Linux operating system to support third-party plug-ins.

  6. 6.

    Cinder

    Cinder is an essential component of the virtual infrastructure and the basis for storing disk files and the data used by virtual machines. Cinder provides block storage services for instances. The allocation and consumption of storage are determined by block storage drives or multi-back-end configured drives.

  7. 7.

    Horizon

    Horizon provides a Web-based interface that enables cloud administrators and users to manage a variety of OpenStack resources and services.

6.1.3 Logical Relationship Between OpenStack Components

OpenStack's services are called through a unified REST-style API to achieve loose coupling of the system. The advantage of a loosely coupled architecture is that developers of individual components can focus only on their domains, and modifications to their domains will not affect other developers. On the other hand, however, this loosely coupled architecture also brings some difficulties to maintaining the whole system, and operations personnel need to master more system-related knowledge to debug the components that are in trouble. So both developers and maintenance personnel must understand the interchange relationships between components. Figure 6.2 shows the logical relationship between the various components of the virtual machine.

Fig. 6.2
figure 2

The logical relationship between OpenStack components

6.2 OpenStack Operating Interface Management

OpenStack needs to provide a simple, user-friendly interface for end-users and developers to browse and manipulate computing resources that belong to a subset, which is OpenStack's operator panel component Horizon.

6.2.1 Introduction to OpenStack Operation Interface

Horizon is the portal to the entire OpenStack application architecture. It provides a Web-based graphical interface service portal. Users can access and control their compute, storage, and network resources, such as launching virtual machine instances, assigning IP addresses, setting access control, and more, using the graphical interface provided by Horizon through their browser. Horizon provides a different interface for users in both roles.

  • Cloud administrator: Horizon provides a holistic view of cloud administrators, who can get an overview of the resource size and health of the entire cloud, create end-users and projects, assign projects to end-users, and manage resource quotas that projects can use.

  • End-users: Horizon provides end-users with an autonomous service portal that allows end-users to use compute, storage, and network resources in projects assigned by cloud administrators that do not exceed quota limits.

6.2.2 The Architecture and Functions of the OpenStack Operation Interface

Horizon uses the Django framework, a popular Python-based open source Web application framework, and Horizon follows the pattern of the Django framework to generate several apps that combine to provide a complete implementation of the OpenStack interface. Figure 6.3 shows the interface in which Horizon created the instance.

Fig. 6.3
figure 3

Horizon interface

Horizon consists of three Dashboards (Django called Apps): user Dashboard, System Dashboard, and Set Dashboard. These three Dashboards make up Horizon's core application. Figures 6.4–6.6 show the user Dashboard's functional architecture, the system Dashboard, and the Set Dashboard.

Fig. 6.4
figure 4

Functional architecture of User Dashboard

Fig. 6.5
figure 5

The functional architecture of the System Dashboard

Fig. 6.6
figure 6

Setting the functional architecture of Dashboard

User Dashboard is an autonomous service portal for end-users who can freely operate and use compute, storage, and network resources in projects assigned by cloud administrators that do not exceed quota limits. System Dashboard and Settings Dashboard is an interface for cloud administrators who can get an overview of the resource size and health of the entire cloud, create end-users and directories, and assign projects and resource quotas that can be used by projects to end-users.

In addition to providing these main Web interface features, Horizon controls other details of the page through various options for the profile, such as setting a Logo image on the OpenStack home page through the profile local.setting.py, or specifying the title of the page.

6.3 OpenStack Certification Management

Security is an unavoidable problem for every software, and no software can do without considering security considerations, and of course no software can solve all security problems. Even a small-cost software takes into account the security and privacy of end-users, especially OpenStack, which provides cloud infrastructure services.

6.3.1 Introduction to OpenStack Authentication Service

Keystone is a component of the OpenStack framework that manages authentication, service access rules, and service token functionality. User access to resources requires verification of the user's identity and permissions, and service execution also requires permission detection, which needs to be handled through Keystone. Keystone is similar to a service bus or registry of the entire OpenStack framework. The OpenStack service registers its Endpoint (the URL of service access) through Keystone, and any calls between services need to be authenticated by Keystone, obtain the Endpoint of the target service, and then call.

Keystone's main features are as follows:

  • manage users and their permissions

  • maintain the OpenStack Service Endpoint

  • certification and authorization

Keystone's architecture is shown in Fig. 6.7.

Fig. 6.7
figure 7

Keystone architecture

Mastering Keystone requires understanding some basic concepts, such as User, Credentials, Vis Token, and more.

  1. 1.

    User

    User refers to any entity that uses OpenStack, which can be a real user or another system or service. When the User requests access to OpenStack, Keystone verifies it.

  2. 2.

    Credentials

    Credentials is the information that the User uses to prove his identity. This can be:

    • username and password

    • Token (a Keystone assigned identity token)

    • username and API Key (key)

    • other advanced ways

  3. 3.

    Authentication

    Authentication is Keystone's process of verifying User's identity. When User accesses OpenStack, it submits a username and password form to Keystone, which, upon verification, issues User with a Token as follow-up access.

  4. 4.

    Token

    Token is a string of numbers and letters that, after User is successfully certified, are assigned to User by Keystone. Token is used as a credit for accessing the service, which verifies Token's validity through Keystone. Token also has the concept of scope, indicating what the Token acts on, such as an engineering scope or a domain scope, and Token can only be used to certify User's operations on resources within a specified range. Token is valid by default at 24h.

  5. 5.

    Project

    Project is used to group and isolate OpenStack's resources (compute, storage, and network resources). Project can be a department or project team in a private enterprise cloud, similar to the VPN concept of a public cloud. Ownership of resources belongs to Project, not User. Each User (including an administrator) must hang in a Project to access the Project's resources, and a User can belong to more than one Project.

  6. 6.

    Service

    OpenStack's Services include Computing (Nova), Block Storage (Cinder), Image Services (Glance), Neutron, and more. Each service provides several Endpoints through which User accesses resources and performs operations.

  7. 7.

    Endpoint

    Endpoint is an accessible address on the network, usually a URL. Service exposes its OWN APIs through Endpoint. Keystone is responsible for managing and maintaining the Endpoint for each service.

  8. 8.

    Role

    Security consists of two parts: Authentication and Authorization. Keystone uses Role to realize authorization. Role is global, so its name must be unique within the jurisdiction of a keystone. Horizon manages Role in the Identity Management → Role: You can assign one or more Roles to User.

  9. 9.

    Group

    A Group is a collection of Domain partial Users designed to facilitate Role allocation. Assign a Role to a Group, and the result will be assigned to all Users within the Group.

  10. 10.

    Domain

    Domain represents a collection of Project, Group, and User, often representing a customer in a public or private cloud, similar to the concept of a virtual machine data center. Domain can be thought of as a namespace, like a domain name, globally unique. Within a Domain, the names of Project, Group, and User cannot be repeated, but within two different Domains, their names can be repeated. Therefore, when determining these elements, you need to use both their names and their Domain IDs or names.

6.3.2 Principles of OpenStack Authentication Service

As a stand-alone security authentication module in OpenStack, Keystone is responsible for OpenStack user authentication, token management, and service directories that provide access to resources, and access control based on user roles. User access to the system's username and password verification, Token issuance, service (Endpoint) registration, and determining whether the user has access to a particular resource are all dependent on the involvement of the Keystone service.

Based on the core concepts described earlier, Keystone provides services in four areas: Identity (Certification), Token, Catalog, and Polly (Security Policy, or Access Control).

  1. (1)

    Identity

    The user's Identity is verified, the user's credentials are typically presented as a username and password, and the Identity service provides the extraction of metadata related to that user.

  2. (2)

    Token

    After identity confirms the user's Identity, it is given a token to verify that Identity and can be used to request subsequent resources, and the Token service verifies and manages the token used to verify the Identity. Keystone issues two types of tokens to users through the Identity service. One is a token that is not related to Tenant, which allows you to look up the Tenant list from Keystone, where the user can select the Tenant to access, and then you can get another type of token bound to this Tenant that can access the resources in this Tenant only through a token bound to a particular Tenant. Tokens are only valid for a limited period of time and can be removed if you need to remove a specific user's access.

  3. (3)

    Catalog

    Catalog provides a service catalog query, or Endpoint list for each service. The service directory contains Endpoint information for all services, and resource access between services begins with the Endpoint information for that resource, usually a list of URLs, before resource access can be made based on that information. From the current version, Keystone provides a service directory that is returned to the user simultaneously as the token.

  4. (4)

    Policy

    A rule-based authentication engine that uses configuration files to define how actions match user roles. Strictly speaking, this part is no longer part of the Keystone project, since access control is involved in different projects, so this part is developed and maintained as part of online and offline interaction (Online, Service, Offline, OSO).

    Keystone builds a bridge between the user and the service, the user obtains the token and the service list from Keystone, sends the user's token when the user visits the service, and the related service seeks the legitimacy of the token from Keystone. Figure 6.8 shows the service user interaction process based on the Keystone mechanism.

Fig. 6.8
figure 8

Service user interaction process based on the Keystone mechanism

6.4 OpenStack Image Management

Mirroring is self-evident for cloud and virtualization. This section provides an overview of OpenStack mirror management by describing the Glance components. Glance provides a directory and storage repository for virtual disk images and can provide storage and retrieval of virtual machine images. These disk images are often widely used in OpenStack's Nova-compute components.

6.4.1 Introduction to OpenStack Image Service

As a mirroring service for OpenStack's virtual machines, Glance provides a range of REST APIs to manage and query the image of virtual machines, supporting a variety of back-end storage media. For example, use the local file system as storage media and Swift as storage media. Figure 6.9 describes Glance's relationship with Nova and Swift.

Fig. 6.9
figure 9

The relationship between Glance, Nova, and Swift

As you can see, through Glance, the three components of OpenStack are connected as a whole, Glance provides mirror lookups for Nova, and Swift provides actual storage services for Glance, which Swift can see as a concrete implementation of the Glance storage interface.

6.4.2 Principles of OpenStack Image Service

The OpenStack mirroring service consists of two main parts: API Server and Registry Server. Glance is designed to fit as many back-end storages and registration database scenarios as possible. API Server (running the "Glance-API" program) acts as a communication hub. For example, various client programs, the registration of mirror metadata, and storage systems that contain virtual machine mirror data all communicate through it. API Server forwards the client's request to the mirror metadata registry and its back-end storage. The Glance service uses these mechanisms to save virtual machine images.

Glance-API is primarily used to receive various API call requests and provide appropriate actions. Glance-registry is used to interact with mySQL databases, store or obtain mirrored metadata. Note that Swift does not save metadata in its storage service, where metadata refers to some information about mirroring stored in the MySQL database and belongs to Glance.

The operation of actually creating an instance is done by the Nova-compute component, which is inextricably linked to Glance. The process for creating an instance is shown in Fig. 6.10.

Fig. 6.10
figure 10

The process of creating an instance

6.5 OpenStack Computing Management

Nova is the core component of OpenStack, responsible for maintaining and managing computing resources for cloud environments. OpenStack is the cloud operating system for IaaS, and virtual machine life cycle management is achieved through Nova.

6.5.1 Introduction to OpenStack Computing Service

At the heart of the OpenStack architecture is the computing organization controller in the OpenStack cloud, providing large-scale, scalable, on-demand self-service computing resources. All activities in the instance life cycle in the OpenStack cloud are handled by Nova, making Nova a scalable platform for managing computing resources.

In the first few versions of OpenStack, computing, storage, and networking were implemented by Nova, which gradually split the storage and networks. Currently, Nova specializes in computing services, relying on Keystone's authentication services, Neutron's web services, and Glance's mirror services. The Nova architecture is shown in Fig. 6.11.

Fig. 6.11
figure 11

Nova architecture

6.5.2 Principles of OpenStack Computing Services

The Nova architecture is complex and contains many components that can be divided into the following categories. These components run as subservices (background Daemon processes), and their operating architecture is shown in Fig. 6.12.

  1. 1.

    Nova-API

    The Nova-API is the portal for the entire Nova component and is responsible for receiving and responding to customers' API calls. All requests to Nova are handled first by the Nova-API. Nova-API exposes several HTTP REST APIs to the outside world. In Keystone, we can query the Endpoint of the Nova-API. The client can send the request to the address specified by Endpoint and request action from the Nova-API. Of course, as an end-user, we do not send REST API requests directly. OpenStack CLI, Dashboard, and other components that need to be exchanged with Nova are used by these APIs.

  2. 2.

    Nova-scheduler

    Nova-scheduler is a virtual machine scheduling service that decides which compute node to run the virtual machine on. When an instance is created, the user raises resource requirements such as CPU, memory, and how much disk each requires. OpenStack defines these requirements in a type template (Flavor) where the user only needs to specify which Flavor to use.

    Nova-scheduler implements scheduling by:in /etc/nova/nova.conf, Nova filter_

    The scheduler parameter is configured with Nova-scheduler. Filter Scheduler is the default scheduler for Nova-scheduler, and the scheduling process is divided into two steps.

    • Select the eligible compute node (run Nova-compute) through the Filter.

    • Use Weighting to select to create instances on the optimal (most weighted) compute node.

      Nova allows the use of a third-party Scheduler and configures scheduler_driver. This once again reflects OpenStack’s openness. The scheduler can filter multiple Fillers, and then the filtered nodes select the most suitable node by calculating the weights.

  3. 3.

    Nova-compute

    Nova-compute is the core service for managing virtual machines running on compute nodes. The life cycle management of instances on the node is achieved by calling the Hypervisor API. OpenStack's operations on the instance are ultimately left to Nova-compute. Nova-compute works with Hypervisor to implement OpenStack's management of the instance life cycle.

    Hypervisor is a virtualization manager running on a compute node, the lowest level program in virtual machine management. Different virtualization technologies offer their Hypervisor. Common Hypervisor is KVM, Xen, VMware, etc. Nova-compute defines a unified interface for these Hypervisors, which Hypervisor only needs to implement to plug and play in the form of a Driver and use in OpenStack systems.

  4. 4.

    Nova-conductor

    Nova-compute often needs to update databases, such as updating and getting the status of virtual machines. For security and scalability reasons, Nova-compute does not access the database directly, but delegates this task to Nova-conductor. There are two benefits to this: greater system security and better system scalability.

  5. 5.

    Messager Queue

    We have learned that Nova contains several subservices that need to be coordinated and communicated with each other. To decouple each subservice, Nova uses Messenger Queue as an information transit point for the child service. So on the Nova operating architecture, we can see no direct connection between the subservices, which is contacted through Messenger Queue.

    Finally, let us look at how Nova's subservices work together from the virtual machine creation process to understand Nova's specific workflow. Figure 6.13 shows the Nova service process.

    1. (1)

      The customer (who can be an OpenStack end-user or some other program) sends a request to the API (Nova-API): “Help me create a virtual machine.”

    2. (2)

      After the API did some necessary processing of the request, it sent a message to Messaging (RabbitMQ): “Let Scheduler create a virtual machine.”

    3. (3)

      Scheduler gets the message sent to it by the API from Messaging and then executes a scheduling algorithm to select the compute node A from several compute nodes.

    4. (4)

      Scheduler sent Messaging a message: “Create this virtual machine on compute node A.”

    5. (5)

      Nova-compute node A gets the message sent to it by Scheduler from Messaging and starts the virtual machine on a hypervisor of the node.

    6. (6)

      During the creation of a virtual machine, Compute sends a message to Nova-conductor via Messaging if it needs to query or update database information, and Conductor is responsible for database access.

Fig. 6.12
figure 12

Nova operating architecture

Fig. 6.13
figure 13

Nova workflow

These are the core steps to create virtual machines, which show us how to collaborate between subservices in Nova and reflect OpenStack's distributed design philosophy across the system, which is very helpful for us to understand OpenStack in depth.

6.6 OpenStack Storage Management

OpenStack offers various types of storage services that users can choose freely based on their business needs. This section focuses on the block storage service Finder in OpenStack and briefly describes the object storage service Swift.

6.6.1 Introduction to OpenStack Storage Service

OpenStack's storage services are important because multiple service components use them. Storage is divided into Ephemeral Storage and Persister Storage, as shown in Fig. 6.14.

Fig. 6.14
figure 14

Storage classification in OpenStack

If a virtual machine uses temporary storage, all data information in the virtual machine instance is lost once it is shut down, restarted, or deleted. In the OpenStack project, after deploying the Nova-compute service component, users can use the nova boot command to create virtual machine instances, which use temporary storage without any security guarantees.

Persistent storage includes block storage, file system storage, and object storage. Regardless of whether the virtual machine instance is terminated, their data is continuously available, and the security is relatively high. The three types of storage are in the order of block → file → objects. Files are usually implemented based on blocks, and the underlying or back-end storage of object storage is usually implemented based on the local file system.

Block storage "communicates" with the host, just like a hard disk directly attached to the host, generally used for the host's direct storage space and database applications (such as MySQL) storage, divided into the following two.

  • DAS: One server has one storage, and multiple machines cannot be shared directly. It requires the use of operating system functions, such as shared folders.

  • SAN: A high-cost storage method involving optical fiber and various high-end equipment, with high reliability and performance. The disadvantage is that the equipment is expensive and the operation and maintenance cost is high.

File system storage is different from lower level block storage. It has risen to the application layer, generally referring to NAS, which is a set of network storage devices, accessed through TCP/IP, and the typical protocol is NFS. Due to the network and the use of upper-layer protocols, the file system storage has a large overhead and the delay is higher than that of block storage. It is generally used for sharing data with multiple cloud servers, such as centralized server log management and office file sharing.

Object storage deals with self-developed applications (such as network disks). It has the characteristics of high-speed block storage and sharing of file system storage. It is more intelligent and has its CPU, memory, network, and disk, which is higher level than block storage and file system storage. Cloud service providers generally provide REST APIs for uploading, downloading, and reading user files to facilitate application integration of such services.

6.6.2 Principles of OpenStack Storage Service

  1. 1.

    Block storage

    Block storage, also known as volume storage, provides users with block-based storage device access, and user access and interaction to block storage devices are achieved by mapping block storage devices to running virtual machine instances that can be read/written, formatted, and so on. The block storage schema is shown in Fig. 6.15.

    Block storage is persistent and the data on the block store is unaffected when mapping between the block storage device and the virtual machine instance or when the entire block store is remapped to another virtual machine instance. Block storage is provided by Cinder components in the OpenStack project and currently supports several back-end storage types, depending on the storage already included.

    The Finder component in the OpenStack project provides a block storage device for virtual machine instances and a comprehensive set of methods for managing storage devices, such as volume snapshots, volume types, and so on. Block storage types are determined by drivers or back-end device configurations such as NAS, NFS, SAN, iSCSI, and Ceph. The Cinder-API and Cinder-scheduler services of the Cinder component typically run on control nodes, and Cinder-volume services can run on control nodes, compute nodes, or stand-alone storage nodes.

    Cinder-API provides the REST API externally, parses operational requirements, and calls processing methods such as create/Delete/List/Show. Cinder-scheduler is responsible for collecting capacity and capability information for back-end escalation, scheduling volumes to the specified Finder-volume according to the set algorithm, and then filtering out the appropriate back end through filtering and weighting. Cinder-volume multi-node deployment, using different configuration files, access to different back-end devices, by each storage vendor inserted Driver code to interact with the device, to complete the device capacity and capacity information collection, volume operations, and so on.

    Figure 6.16 shows the process of creating a volume for Cinder, in which the Cinder components involved consist of the following service processes.

    • Cinder-API: Receive API requests and forward them to Finder-volume.

    • Cinder-volume: Interacts directly with block storage, handling tasks such as those assigned by Cine-scheduler, and interacting with those tasks through message queues. It also maintains the state of the block store by driving interactions with various types of storage.

    • Cinder-scheduler: Choose the best storage node to create a volume. (Nova-scheduler has similar functionality.)

    • Cinder-backup: Provides backups of any type of volume.

      Many storage drivers support virtual machine instances that have direct access to the underlying storage without having to go through layer-by-layer transformations that result in performance consumption and improve overall I/O performance. Cinder components also support the use of common file systems as block devices. In the NFS and GlusterFS file systems, you can create a stand-alone file to map to the virtual machine instance as a block device. Similar to creating virtual machine instances in QEMU, which are files that are saved in the /var/lib/nova/instances directory.

  2. 2.

    Object storage

    The Swift component in the OpenStack project provides object data storage and retrieval through the REST API, and it must be used at least in conjunction with the Keystone component to make sure that Keystone is ready before deploying the Swift component. Swift components support multi-tenant use, low cost of investment, high scalability, and the ability to store large amounts of unstructured data.

    The Swift component includes the following sections.

    • Proxy Server: A proxy server responsible for communication between the rest of the Swift architecture, receives API and HTTP requests stored by objects, modifies metadata, and creates containers. For each client's request, it queries the location of the account, container, or object in the ring and forwards the request accordingly. You can also use the public API to send requests to the proxy server. It also provides a list of files or containers on the graphical Web interface and uses MemCached to provide caching capabilities to improve performance.

    • Account Server: Account server that manages accounts within object storage.

    • Container Server: Container servers, whose first task is to process lists of objects and manage the mapping between object stores and folders. The container server does not know where objects are stored; it knows only which objects are stored in the specified container. This object information is stored as an SQLIte database file and is backed up on a cluster like an object. Container servers also do some tracking statistics, such as the total number of objects, container usage.

    • Object Server: An object server that manages real object data is a simple binary large object storage server that can store, retrieve, and delete objects on a local device. Each object is stored using a path consisting of hash values of the object name and action timestamps. The last write is sure to succeed, and the latest version of the object is processed. Deletion is also considered a version of the file (a file with an extension of “.ts” and ts represents a tombstone). This ensures that deleted files are copied correctly and do not result in earlier versions of “magical reproduction” due to a failure scenario.

    • All Periodic Processes: Performs day-to-day transactions in which replication services guarantee continuity and effectiveness of data, including auditing services, update services, and deletion services.

    • WSGI Middleware: Handle authentication-related issues and connect to Keystone components.

    • Swift Client: Allows various users with permissions to submit commands and takes action on the client.

    • Swift-Init: The script for initializing the Ring file requires the daemon name as an argument and provides an action command.

    • Swift-Recon: Command-Line Interface, CLI tool for retrieving various performance metrics and status information for a cluster.

    • Swift-Ring-Builder: A tool for creating and rebalancing ring.

      Swift object storage relies on software logic design to distribute the data evenly, typically by default saving three copies of the data. The location where the three pieces of data are stored has a significant impact on the cluster's overall performance, with the option of saving on different hard drives on the same server or different servers within the same rack. Figure 6.17 shows the Swift data model. Swift has three layers of logical structure: Account/Container/Object (i.e., account/container/object). There is no limit to the number of nodes per tier and can be extended at will.

      In Swift object storage clusters, when the host nodes that store data go down, the load of the entire cluster is very high (a copy problem, there will be several times the data needs to be transferred, rebalancing), in practice to use as many technologies as possible network card aggregation and solid-state disk to improve overall performance.

  3. 3.

    File system storage

    File system storage is a remote file system that can be mounted. It is shared and can be used by multiple users by mounting on a virtual machine instance. File system storage can be mounted and accessed by multiple users at the same time. A file storage system can perform a series of operations, such as creating a file and file system protocol of a specified capacity size, creating files that can be distributed across one or more servers, specifying access rules and security protocols, supporting snapshots, restoring a file system through snapshots, viewing usage, and so on. In an OpenStack project, the program code name for file system storage is Manila, which supports multiple back-end storage drivers and is shared through multiple storage protocols.

Fig. 6.15
figure 15

Block storage architecture

Fig. 6.16
figure 16

Cinder-volume creation process

Fig. 6.17
figure 17

Swift data model

6.7 OpenStack Network Management

Like storage, the network is one of the most critical resources managed by OpenStack. Nova realizes the OpenStack virtual machine world's abstraction, and Swift and Cinder provide the virtual machine with a “safe haven,” but without a network, any virtual machine will be just an “island” in the world, unaware of the value of their survival. Initially, network services in OpenStack were provided by a separate module in Nova, Nova-network, but in order to provide a richer topology, support more network types, and better scalability, a dedicated component, Netron, was created to replace the original Nova-network.

6.7.1 Basics of Linux Network Virtualization

Neutron's central task is to abstract and manage the two-tier physical network. In a traditional physical network, there may be a set of physical servers running a variety of applications, such as Web services, database services, and so on. In order to communicate with each other, each physical server has one or more physical network cards (NICs) that are connected to physical switching devices, such as switches, as shown in Fig. 6.18.

Fig. 6.18
figure 18

Traditional two layer switch

With the introduction of virtualization technology, multiple operating systems and applications shown in Fig. 6.18 can share the same physical server as virtual machines generated and managed by Hypervisor or VMM. The network structure shown in Fig. 6.18 has evolved into the virtual machine structure shown in Fig. 6.19.

Fig. 6.19
figure 19

Virtual machine structure

A virtual machine's network capabilities are provided by a virtual network card (vNIC), and Hypervisor can create one or more vNICs for each virtual machine. From the virtual machine's point of view, these vNICs are equivalent to NICs. In order to achieve the same network structure as traditional physical networks, virtual switches, like NICs, are virtualized as virtual switches, each vNIC is connected to the port of the virtual switch, and finally, these virtual switches access the external physical network through the NIC of the physical server.

Thus, it is mainly to complete the virtualization of two network devices for a virtual two-tier network structure: NIC hardware and switching devices. Virtualization of network devices in Linux environments takes several forms, and Neutron is based on these technologies to build user-private virtual networks.

  1. 1.

    TAP, TUN, and VETH

    TAP and TUN are virtual network devices implemented by the Linux kernel, with TAP working on the second floor and TUN working on the third floor. The Linux kernel sends data through the TAP and TUN devices to the user space program that binds the device. Conversely, user space programs can send data through TAP/TUN devices just as they do with hardware network devices.

    Based on tap drivers, vNIC functions can be implemented, and each vNIC of a virtual machine is connected to a TAP device in Hypervisor. When a TAP device is created, a corresponding character device file is generated in the Linux device file directory, and the user program can open the file to read/write as if it were a normal file.

    When writing on this TAP device file, for the Linux network subsystem, it is equivalent to the TAP device received data, and requested the kernel to receive it, Linux kernel received this data will be based on the network configuration for subsequent processing, processing process similar to ordinary physical network card received data from the outside world. When the user program performs a read operation, it is equivalent to querying the kernel whether there is data on the TAP device that needs to be sent, and in other cases, taking it out of the user program to complete the function of sending data to the TAP device. In this process, the TAP device can be used as a native network card, and the application operating the TAP device is equivalent to another computer, which communicates with the native through a read/write system call. Subnet is a three-tier concept in a network that specifies an IPv4 or IPv6 address and describes its associated configuration information attached to a two-tier network and indicates the range of IP addresses that can be used by virtual machines that belong to the network.

    VETH devices always appear in pairs, and data sent to one end of the request is always sent from the other in the form of a request to receive. Once created and configured correctly, the data is entered to one end, and VETH changes the direction of the data and feeds it into the kernel network subsystem to inject the data, which can be read at the other end.

  2. 2.

    Linux Bridge

    The Linux Bridge (see Fig. 6.20) is a second-tier virtual network device that functions like a physical switch.

    Bridges can bind other Linux network devices as slave devices and virtualize those slaves as ports. When a slave device is bound to a bridge, it is equivalent to a switch port in a real network plugged into a network cable connected to a terminal. In Fig. 6.20, the bridge device br0 binds the actual device eth0 to the virtual device Tap0/Tap1, at which point, for hypervisor's network protocol stack upper layer, only br0 can be seen and does not care about the details of the bridge. When these packets are received from the device, they are submitted to br0 to determine where the packets are going, and br0 forwards them based on the mapping relationship between the MAC address and the port.

    Because the bridge works on the first and second floors, the slave devices eth0, Tap0, and Tap1 bound to br0 need not set IP addresses. For the upper routers, they are all on the same subnet, so set the IP address for br0 (the bridge device works on the second layer, but it is only one abstraction of the Linux network device and can be understood by setting the IP address), such as 10.0.1.0/24. At this point, eth0, Tap0, and Tap1 are all in the 10.0.1.0/24 segment through br0.

    Because it has its IP address, br0 can be added to the routing table and sent data, whereas a device does the actual sending process. If eth0 had its IP address, such as 192.168.1.1, its IP address would have expired after binding to br0, and the user program would not have received the data sent to that IP address. Linux receives only packets with a destination address of br0 IP.

  3. 3.

    OVS

    OVS is a product-quality virtual switch developed in the C language, taking into account portability between different virtualization platforms. At the same time, it follows Apache 2.0 licenses and is therefore very commercially friendly.

    As mentioned earlier, the virtualization of switching devices is a critical part of virtual networks, and virtual switches are responsible for connecting physical and virtual networks. Although Linux bridges are already well placed to perform this role, we also need OVS to do the extra functionality.

    In the traditional data center, the network administrator can control the physical machine's network access by configuring the ports of the switch and completing a series of work such as network isolation, traffic monitoring, packet analysis, and QoS configuration traffic optimization. However, in a cloud environment, network administrators cannot distinguish which virtual machine, operating system, and user the “flowing” packets on the bridged physical network card belong to with the support of a physical switch alone. The introduction of OVS makes it easy to manage virtual networks in cloud environments and monitor network status and traffic.

    For example, we can assign each virtual machine that accesses OVS (which also creates one or more virtual switches on the physical server for each virtual machine to access) to a different VLAN to isolate the network, just as we can configure a physical switch. We can also configure QoS for virtual machines on the OVS port. OVS also supports many standard management interfaces and protocols, including NetFlow and sFlow, through which we can perform traffic monitoring and other work

    In addition, OVS provides support for OpenFlow, which Open Flow Controller can manage, as shown in Fig. 6.21. In summary, OVS implements distributed virtual switches on a variety of virtualization platforms in a cloud environment, such as Xen and KVM. Distributed virtual switches are virtual network management methods that manage virtual switches (software-based virtual switches or smart network card virtual switches) on multiple hosts, including hosts' physical and virtual port management.

    To understand distributed virtual switches, let's take a look at box switches. Box switches have a master board, interface board. The main control board is responsible for managing the plane work, and the interface board is responsible for the data plane work. Interface boards can have multiple, distributed across multiple slots, and can even be stacked to expand the interface boards further. In OVS network, the control plane is the controller's responsibility, the interface board work is handed over to the virtual switches, these virtual switches are distributed in the network of multiple servers and pure software implementation, so there is the concept of a distributed virtual switch. Distributed virtual switches act as virtual devices between hosts, connecting hosts and virtual machines for sharing as if the entire network were using a single large virtual switch.

Fig. 6.20
figure 20

Linux network bridge

Fig. 6.21
figure 21

OVS

6.7.2 Introduction and Architecture of OpenStack Network Services

Unlike Nova and Swift, Neutron has only one major service process, Netron-server. Neutron-server runs on a network control node and provides the REST API as an entry point to access Neutron, and the user HTTP requests received by Neutron-server are ultimately made by various agents spread across the computing and network nodes.

Neutron offers several API resources that correspond to a variety of Neutron network abstractions, of which L2's abstract Network/Subnet/

Port can be considered a core resource, while other abstraction levels, including routers and numerous high-level services, are extended resources.

To make scaling easier, Neutron organizes code in a Plugin way, each of which supports a set of API resources and completes specific operations that are ultimately performed by Plugin calling the appropriate agent via RPC.

These Plugins are made with some distinctions. Some of the Plugins that provide support for the underlying two-tier virtual network are called Core Plugins, and they must implement at least three major abstractions of L2, and administrators need to choose one of these already-implemented Core Plugins. Plugin other than Core Plugin is referred to as a service plugin, such as a firewall plugin that provides firewall services.

As for L3 abstract Roler, many Core Plugins were not implemented, and version H was previously modeled with mim, incorporating standard router features to provide L3 services to tenants. In version H, Neutron implements L3 Router Service Plugin to provide router services.

Agents are typically part of a feature that uses physical network devices or some virtualization technology to do something practical—for example, L3 Agent, which implements routing-specific operations.

The Neutron architecture is shown in Fig. 6.22.

Fig. 6.22
figure 22

Neutron architecture

Because there is a lot of duplicate code between the various Core Plugin implementations, such as access operations to the database, Neutron in version H implements an ML2 Core Plugin. The ML2 Core Plugin is implemented with a more flexible structure that supports the various Core Plugins available in a Driver form. Therefore, it can be said that the emergence of ML2 Core Plugin is intended to replace all current Core Plugin.

For ML2 Core Plugin and the various Service Plugins, while it is possible to divest Neutron as a stand-alone project, the way they are implemented will not change much compared to what is covered in this chapter.

6.7.3 OpenStack Network Service Principle and Process

The Neutron network architecture consists of Neutron-API (running in Neutron-server) along with Neutron agents, including Linux Bridge Agent or Open vSwitch Agent, DHCP Agent, MetaData Agent, and L3 Agent.

To make Neutron easier for the reader to understand, use Linux Bridge Agent (in short for Neutron-LB Agent in the figure) as an example of the collaboration between the Neutron-API and the agents, as shown in Fig. 6.23.

Fig. 6.23
figure 23

Schematic diagram of the collaboration relationship between Neutron-API and each agent

Some of the main components and functions are described below.

  1. (1)

    Neutron-API

    Neutron-API is primarily used to receive instructions from cloud systems for network operations and then through Neutron-LB Agent to operate the Linux bridge plug-ins in network nodes and compute nodes to create specific interfaces, bridges, and VLANs. When creating a virtual machine, if you need to prepare the virtual machine's network operating environment, Neutron-API receives the network requirements for the Nova-API and further informs the Neutron-LB Agent in the compute node and network node of the network action request that will need to be created. Then the Agent operates the underlying Linux bridge plug-in to complete the specific network configuration.

  2. (2)

    DHCP Agent

    Simply put, the DHCP Agent component is used to complete the task of assigning IP addresses to virtual machines, setting up gateways, and also referring requests for metadata from virtual machines to MetaData Agent.

  3. (3)

    MetaData Agent

    The MetaData Agent component's primary purpose is to further forward a virtual machine's request for metadata (such as virtual machine name, ID, key, IP address) to the metadata service in the Nova-API server when the virtual machine is created or started, providing the required information.

  4. (4)

    L3 Agent

    The primary role of the L3 Agent component is to provide IP routing and NAT services to the cloud system user network. In the tenant's internal network, IP routing for different segments is done through the L3 Agent service, while external access to the tenant's internal network is done through the NAT service provided by L3 Agent.

6.7.4 Analysis of Typical Scenarios of OpenStack Network Services

  1. 1.

    Load Balance as a Service

    Load Balance as a Service, LBaaS is a network premium service. As the name implies, users can dynamically create a load balancing device on their network. Load balancing can be a relatively basic component of a distributed system that receives requests sent by the front end and then forwards requests to a processing unit in the back-end resource pool according to some balanced strategy for processing turn enables high availability and horizontal scalability.

    OpenStack Neutron supports LBaaS in the form of advanced service extensions, which are currently implemented by default with HA Proxy software.

  2. 2.

    FireWall as a Service

    Readers familiar with firewalls know that firewalls are typically placed on gateways to isolate access between subnets. Therefore, FireWall as a Service, FWaaS is also implemented on network nodes, specifically in the router namespace.

    Currently, implementing a firewall in OpenStack is based on the iptables that come with the Linux operating system, so don't expect too much of its performance and functionality.

    One concept that can be confusing is security groups. The object of the security group is a virtual network card, implemented by L2 Agent. For example, neutron_openvswitch_agent and neutron_linuxbridge_agent restrict access to virtual network cards by configuring iptables rules on compute nodes. Firewalls can isolate malicious traffic coming from outside before a security group, but communication between different virtual network cards within the same subnet cannot be filtered (unless it is to cross the subnet). You can deploy firewalls and security groups at the same time for dual protection.

  3. 3.

    Distributed Virtual Router

    OpenStack users may find that, as Neutron was initially designed, all network services are performed on the network node, which means a lot of traffic and processing, putting much pressure on the network node. At the heart of this processing is the router service. Any access that requires cross-subnet access requires a router to route. Naturally, I wonder, can I also run router services on compute nodes? This design idea is undoubtedly more reasonable, but the implementation of many details of the technical considerations.

    To reduce the load on network nodes while increasing scalability, OpenStack has officially introduced distributed routing (DVR) features (users can choose whether to use them or not) since the Juno release, allowing computing nodes to handle their large amounts of east-west traffic and non-source address translation (Source Address Network Translation, SNAT) north-south traffic.

    In this way, the network node only needs to handle a portion of the SNAT traffic, greatly reducing the load and the system's dependence on the network node. Naturally, FWaaS can also be placed together on a compute node. DHCP services and VPN services still need to be centralized on network nodes.

6.8 OpenStack Orchestration Management

The growing popularity of cloud computing has led to the proliferation of various cloud computing platforms. Who will eventually be accepted by the industry and users depends on who can effectively support the orchestration of complex user applications. Heat's full support for orchestration will strongly support OpenStack's “leadership” in cloud computing, particularly in IaaS. This section describes OpenStack orchestration management, what is orchestration, Heat's location in orchestration, Heat templates, and how Heat templates and Heat are implemented and supported, from the perspectives of infrastructure, software configuration and deployment, automatic resource scaling, and load balancing, and finally, the integration of Heat and configuration management tools, and the integration of Heat and IBM UCDP/UCD.

6.8.1 Introduction to OpenStack Orchestration Service

Heat is a template-based service for orchestrating composite cloud applications. It currently supports Amazon's CloudFormation template format, as well as Heat's own HOT template format. The use of templates simplifies the definition and deployment of complex infrastructure, services, and applications. Templates support rich resource types. The relationship between Heat and other modules is shown in Fig. 6.24.

Fig. 6.24
figure 24

The relationship between Heat and other modules

Heat currently supports templates in two formats, one is a JSON-based CFN template and the other is a YAML-based HOT template. CFN templates are primarily designed to maintain compatibility with AWS. HOT templates are Heat's own, and the resource types are more prosperous and more reflective of Heat's characteristics.

A typical HOT template consists of the following elements.

  • Template version: Required, specify the corresponding template version, Heat will be verified according to the version.

  • List of parameters: Optional, refers to the list of input parameters.

  • List of resources: Required, refers to the various resources contained in the resulting stack. You can define dependencies between resources, such as building Port, and then using Port to build virtual machines.

  • Output list: Optional, refers to the information exposed by the resulting stack that can be used by the user or provided to other Stacks as input.

6.8.2 OpenStack Orchestration Service Architecture

Heat contains the following important components.

  1. (1)

    Heat-API components implement the REST API that OpenStack naturally supports. The component processes API requests by transmitting them via AMQP to Heat-Engine.

  2. (2)

    The Heat-API-CFN component provides an API that is compatible with AWS CloudFormation and forwards API requests to Heat-Engine via AMQP.

  3. (3)

    Heat-Engine components provide Heat's most important collaboration capabilities.

The Heat architecture is shown in Fig. 6.25.

Fig. 6.25
figure 25

Heat architecture

The user submits a request containing templates and parameters in Horizon or CLI, which converts the request into an API call in REST format, and then calls the Heat-API or Heat-API-CFN. Heat-API and Heat-API-CFN verify the template's correctness and then pass the request asynchronously to Heat-Engine via AMQP.

When Heat-Engine receives a request, the request is resolved to various resources, each corresponding to an OpenStack another service client, and then to another service by sending a REST request. With such parsing and collaboration, the processing of requests is finalized.

The Heat-Engine structure (see Fig. 6.26) serves three layers: Layer 1 handles Heat-level requests to create Stack based on templates and parameters, where Stack consists of a variety of resources.

Fig. 6.26
figure 26

Heat-Engine structure

6.8.3 Principles of OpenStack Orchestration Service

OpenStack has from the very beginning provided CLI and Horizon to manage resources for users. However, typing a line of commands and clicking in the browser can be time-consuming and laborious. Even if you save the command line as a script, write additional scripts between input/output and interdependency for maintenance and are not easy to extend. Users writing programs directly through the REST API introduce additional complexity, which is also not easy to maintain and extend. This is not conducive to users using OpenStack for large-volume management, and even less conducive to the use of OpenStack to orchestrate resources to support IT applications.

Heat came into being in this case. Heat uses industry-popular templates to design or define orchestrations. Users open the text editor and write a template based on a key-value pair to get the orchestration they want easily. To make it easier for users to use, Heat provides several template examples. Most of the time, the user only needs to choose the desired arrangement, copy and paste the way to complete the template writing. Heat supports orchestration in four ways.

  1. 1.

    Heat's orchestration of the infrastructure

    OpenStack provides its infrastructure resources, including computing, networking, and storage. By orchestrating these resources, users can get the essential virtual machines. It is worth mentioning that users can provide some simple scripts to make some simple configurations for virtual machines during the orchestration of virtual machines.

  2. 2.

    Heat's orchestration of software configuration and deployment

    Users can configure virtual machines with complex configurations such as installing software and configuration software, such as Throughware Configuration and Software Deployment provided by Heat.

  3. 3.

    Heat's orchestration of automatic scaling of resources

    If users have some advanced functional requirements, such as a set of virtual machine groups that can be automatically scaled based on load, or a set of load-balanced virtual machines, Heat provides support such as AutoScaling and LoadBalance. Heat's support for complex applications such as AutoScaling and LoadBalance is well established and has a wide variety of templates for reference.

  4. 4.

    Heat's arrangement of load balancing

    If the user's application is complex, or if the user's application already has some deployment based on popular configuration management tools, such as a cookie book based on Chef, these cookies can be reused by integrating Chef, saving a lot of development time or migration time.

6.8.4 OpenStack Orchestration Service and Configuration Management Tool Integration

With the popularity of DevOps, many configuration management tools have emerged, such as Chef, Puppet, and Ansible. In addition to providing a platform framework, various tools also provide scripts that can be flexibly configured and referenced for a large number of middleware and software deployments. Take Chef as an example. It provides a large number of CookBooks for open source software. Major manufacturers have also written CookBook for their middleware. For example, IBM has provided CookBook for DB2. With these CookBooks, users can deploy complex middleware or software through simple configuration and application.

Heat supports these configuration management tools in the use of collaborative processes based on OS::Heat::SoftwareConfig and OS::Heat::SoftwareDeployment. First of all, for OS::Heat::SoftwareConfig, the group needs to be defined as the corresponding type, such as Ansible, Puppet, Docker-compose, and Salt. Then OS::Heat::SoftwareDeployment references OS::Heat::SoftwareConfig. In this way, when the software is deployed, the corresponding script hook (Hook) Heat-config-ansible will be called to execute the corresponding software configuration.

The integration of Heat and IBM UCDP/UCD is shown in Fig. 6.27. With the gradual rise of cloud computing, various cloud computing-based orchestration tools have begun to appear. From the current point of view, these tools mainly have the characteristics of cross-platform, visualization, and powerful configuration management functions. Among them, IBM's UrbanCode Deploy with Patterns (UCDP) and UrbanCode Deploy (UCD) are powerful platforms with the features as mentioned above.

Fig. 6.27
figure 27

Integration of Heat and IBM UCDP/UCD

UCDP is full-stack environment management and deployment solution that supports users in designing, deploying, and updating full-stack environments for multiple clouds. The platform can integrate UCD, based on Heat, to realize automatic management of OpenStack infrastructure and optimize continuous delivery throughput. It has a visual operation interface, and you can create and edit cross-cloud platform templates by dragging icons.

UCD arranges application, middleware configuration, and database changes and automatically deploys them to development, test, and production environments. It allows users to deploy as needed or as planned through self-service. In UCD, it is possible to split complex application configuration according to configuration (Configuration-Only) or traditional code and configuration (Code-and-Configuration) and define step by step as shown in Fig. 6.27.

With the help of UCDP's powerful pattern design capabilities, we can make a complex template by dragging. Two types of resources are used: cloud computing resources, such as networks, security groups, and mirroring; the others are components defined in UCD, such as jke.db, MySQL Server, jke.war, and WebSphere Liberty Profile.

6.9 OpenStack Fault Management

OpenStack is a complex software suite, and there are quite a few problems to be solved for both beginners and experienced system administrators. Although there is no single troubleshooting method, understanding the important information in OpenStack logs and mastering the tools that can help track down errors will help solve the problems that may be encountered. However, it can be expected that without external support, it is impossible to solve all problems. Therefore, it is imperative to collect demand information to help the OpenStack community identify errors and propose corrections. It will help bugs or problems to be dealt with quickly and effectively.

6.9.1 OpenStack Troubleshooting

When OpenStack fails, the following methods can be used to diagnose and deal with the fault.

  1. 1.

    Check OpenStack service

    OpenStack uses some basic commands to communicate with computing and other services, by viewing the running status of these services, and combining some general system commands to detect whether the environment is running as expected. To check whether the computing service is normal, execute the following command:

    sudo nova-mange service list

  2. 2.

    Understand logs

    Logs are critical to all computer systems. The more complex the system, the more it relies on logs to find problems, reducing troubleshooting time. Understanding the OpenStack system log is very important to ensure the health of the OpenStack environment. OpenStack generates a lot of log information to help troubleshoot OpenStack installation problems. The log locations of these services will be described in detail below.

    1. (1)

      OpenStack computing service log

      OpenStack computing service logs are located in /var/log/nova/, and the default permission holder is the Nova user. In order to read the information, log in as the root user. It should be noted that not all log files are included on every server. For example, in Nova-compute, the log is only generated on the compute node.

    2. (2)

      OpenStack object storage log

      OpenStack object storage logs are written to SysLog by default. In a Ubutun operating system, you can pass /var/log/syslog view. In other operating systems, the target may be located in /var/log/messages. The logs generated by the OpenStack block storage service are placed under /var/log/cinder by default.

    3. (3)

      OpenStack network service log

      The OpenStack network service Neutron, formerly known as Quantum, saves logs in /var/log/quantum/×.log, and each service has a separate log file.

  3. 3.

    Change log level

    The default log level of each OpenStack service is WARNING. Logs of this level are sufficient for understanding the running system's status or basic error location, but sometimes it is necessary to increase the log level to help diagnose problems or lower the log level to reduce log noise.

    Since each service's log setting methods are similar, here, we take the OpenStack computing service as an example to set the log level in the OpenStack computing service.

    Log in to the machine running the OpenStack computing service and execute the following command:

    sudo vim /etc/nova/logging.conf

    Modify the log level of a listed service to DEBUG, INFO, or WARNING, as shown below:

    [logger root] Level= WARNING handlers = null [logger_nova] Level = INFO handlers = stderr qualname = nova

6.9.2 OpenStack Troubleshooting Tools

OpenStack provides tools to detect different components of the service to detect whether they are operating normally. These basic level troubleshooting can ensure that the system is operating as expected. Commonly used troubleshooting tools are as follows:

  1. 1.

    Common network troubleshooting tools

    • Use ip -a to check the status of the network interface: On a computing node or a node running Nova-Network, use the ip -a command to view the information of the network card, including the IP address, VLAN, and whether the network card is working.

    • Find out the fault in the network path: Use the ping command to quickly find the network path's fault. In a virtual machine instance, first check whether the external host can be pinged successfully, which indicates that there is no network problem. If it does not work, try to ping the IP address of the computing node where the virtual machine instance is located. If it can be pinged, the problem may be between the computing nodes; if the ping fails, the problem is between the virtual machine instance and the computing node. Such as tcpdump and iptables are powerful network troubleshooting tools that can help quickly locate faults.

    • Use Nova-Network to troubleshoot DHCP: A common network problem is that the virtual machine instance is successfully booted, but not connected. This is because it cannot obtain an IP address from dnsmasq. dnsmasq is also a DHCP service started by the Nova-Network service. The easiest way to check for this kind of problem is to look at the console output of the virtual machine instance. If DHCP fails, you can use the nova console-log <instance name or uuid> command to retrieve the console log.

    • DNS troubleshooting: If you can use SSH to log in to a virtual machine instance, but it takes a long time to see the prompt, then there may be a problem with DNS. To quickly check whether DNS is working properly, one way is to use the host command to resolve the host name in the virtual machine instance.

  2. 2.

    Commonly used computing and storage troubleshooting tools

    • Computing node failure and maintenance: Sometimes, a computing node will unexpectedly go down, or it needs to be restarted for maintenance purposes. Before restarting, you need to ensure that all virtual machine instances on this computing node are migrated, which can be achieved using the nova live-migration command. Other faults may also occur in the computing node, and users need to use the commands provided by the Nova service to locate the fault according to the specific situation gradually. Commonly used commands are lsb_release -a, uname -a, etc.

    • Failure and maintenance of storage nodes: Because object storage is highly redundant, it is much simpler to deal with object storage node problems than computing node problems. If you encounter a storage node failure, you can try to restart it directly. Besides, shutting down the problematic storage node or replacing the Swift disk is also a way to solve the storage node problem. Commonly used storage fault handling commands are df -h, free -m, kpartx, etc.

  3. 3.

    View services and logs

    • Reading logs: OpenStack service logs contain different levels, and log messages will only appear in the logs when they are more serious than a specific log level. The DEBUG level can record logs of various levels, and specific tracking of the system status can be achieved by adjusting the log level or adding a custom log statement.

    • Tracking instance requests: When a virtual machine instance is abnormal and needs to be tracked from the logs of various Nova-× services, these logs are distributed in cloud controllers and computing nodes. The general method is to track the UUID of this virtual machine instance in these service logs.

    • Centralized log management: The cloud system contains many servers, and sometimes it is necessary to check the logs of multiple servers to summarize an event. A better way is to send all service logs to a centralized place. Ubuntu uses rsyslog as the default log service. It can send logs remotely. By modifying the configuration file, log aggregation can be easily achieved.

    • Monitoring: Monitoring ensures that all services are operational and monitors the use of resources over time to determine potential bottlenecks and upgrade requirements. Nagios is an open source monitoring service that can check the status of servers and network services by executing arbitrary commands.

6.9.3 OpenStack Troubleshooting Cases

This section will introduce troubleshooting from three cases of computing services, identity authentication, and networks.

  1. 1.

    Computing service failure

    OpenStack computing services are very complex, and timely fault diagnosis can ensure these services' smooth operation. Fortunately, OpenStack provides some tools to help solve this problem, while Ubuntu also provides some tools to help solve the positioning problem.

    Although OpenStack computing services' troubleshooting is a complex problem, solving the problem in an orderly manner can help users get a more satisfactory answer. When encountering corresponding problems, you can try the following solutions.

    Case 1: Unable to ping or SSH to the instance

    • When starting the instance, specify a security group. If not specified, the default security group is used by default. This mandatory security group ensures that the security policy is enabled in the cloud environment by default. For this reason, it must be clearly stated that the instance needs to be able to ping and SSH to the instance. For such a basic activity, these rules usually need to be added to the default security group.

    • Network problems may also prevent users from accessing instances in the cloud. First, check whether the computing instance can forward packets from the public interface to the bridge interface. The command is as follows:

      sysctl -A I grep ip_ forward

    • net.ipv4.ip_forward should be set to 1. Otherwise, check/etc/sysctl.conf for comments on the following options:

      net.ipv4.ip-forward=1。

    • Run the following command to perform the update:

      sudo sysctl -p

    • Network issues may also involve routing issues. Check that the client communicates properly with the OpenStack compute node and that any routing records to these instances are correct.

    • In addition, IPv6 conflicts may be encountered. If you don't need IPv6, you can add -use ipv6-false to/etc/nova/nova.conf file and restart the Nova-compute and Nova-Network services.

    • If openStack Neutron is used, check the status of the Neutron service on the host and see if the correct IP namespace is used.

    • Restart the host.

      Case 2:Error codes such as 40×, 50 × appear

      The main OpenStack services are essentially Web services, which means that service responses are clearly defined.

    • 40×: Refers to a user-generated response event that has been started. For example, 401 is an authentication failure that requires a check of the certificate used to access the service.

    • 50×:The error code means that a link service is not reachable, or an error causes the service to interrupt the response failure. Usually, this type of problem is that the service is not starting properly, so check the health of the service.

      If all the attempts don't solve the problem, you can turn to the community, where many enthusiastic friends can help.

  2. 2.

    Authentication failure

    OpenStack Authentication Services is a complex service that is responsible for authentication and authorization of the entire system. Common problems include endpoint misconfiguration, incorrect parameters, and general user authentication issues, such as resetting passwords or providing more detailed information to users. Because Keystone troubleshooting requires administrator privileges, the environment is first configured to facilitate Keystone-related commands' execution. When you encounter a problem, you can refer to and follow these steps.

    Case: Authentication issues

    Users have been experiencing a variety of authentication issues, including forgotten passwords or account expirations, as well as unpredictable authentication failures. Locating these issues allows the service to resume access or continue using the OpenStack environment.

    The first thing to look at is the relevant logs, including /var/log/nova, /var/log/glance (if mirror-related) and /var/log/keystone logs.

    Account-related issues may include account loss. Therefore, first use the following command to view the user: keystone user-list. If the user's account exists in the user list, further review the user's details. For example, after you get a user's ID, you can use the following command:

    keystone user-get 68ba544e500c40668435aa6201e557e4

    The result information returned is shown in Fig. 6.28.

    This helps you understand if the user has a valid account. If you need to reset a user's password, you can reset the user's password (e.g., by setting it to openstack) using the following command:

    keystone user-password-update \ --pass openstack \ 68ba544e500c40668435aa6201e557e4

    If your account is deactivated, you can simply re-enable it using the following command:

    keystone user-update --enabled true 68ba544e500c40668435aa6201e557e4

    Sometimes there is no problem with the account, the problem occurs on the client-side. Therefore, before looking for authentication issues, make sure that the user account is in the right environment.

  3. 3.

    Network failure

    With the introduction of Neutron, OpenStack network services became a complex service because they allowed users to define and create their networks in a cloud environment. OpenStack network administrators' common problems include misconfiguration during Neutron installation, routing failures, and virtual switch plug-in issues. Common problems for users include misunderstandings about Neutron functionality and restrictions set by administrators.

    The next Neutron installation requires administrator privileges, so first, make sure that you log on to the control, compute, and network nodes as root, and configure your environment to run various commands. When you encounter a problem, you can follow these steps to correct the error.

    Case: Cloud-init reports that the connection was denied when accessing metadata

    In the instance's console log (viewed INSTANCE_ID the command nova console-log), you see two lines of errors as shown in Fig. 6.29.

    There may be several reasons for the error, but the result is the same, i.e., the user cannot log on to the instance because it cannot be injected with a key.

    First check that the physical network cards on the network nodes and compute nodes are configured for use by the operating system. You should also ensure that the following commands are run during installation and configuration:

    ovs-vsctl add-port br-eth1 eth1

    Where br-eth1 is a bridge created on the network card, eth1 is a physical network card.

    Then check to see if the instance can be routed from the instance's gateway to the 169.254.169.254 metadata server, and if not, create a routing rule to the network. When creating a subnet and developing a gateway, the gateway's address should be routed to address 169.254.169.254. Otherwise, the error shown in Fig. 6.29 occurs. Use the following options to create a routing rule for instances to 169.254.169.254 at the same time when creating a subnet:

    quantum subnet-create demoNet1 \ 10.1.0.0/24 \ --name snetl \ --no-gateway \ --host_ routes type=dict list=true \ destination=0.0.0.0/0,nexthop=10.1.0.1 \ --allocation-pool start=10.1.0.2,end=10.1.0.254

    Neutron injects 169.254.169.254 routes into the instance using the no-gateway option and appears in the instance's routing table. However, in order to provide a default routing rule, the target address of 0.0.0.0/0 and the next address of the routing table are specified here so that the instance can access other locations.

Fig. 6.28
figure 28

Result information

Fig. 6.29
figure 29

Console log

6.9.4 OpenStack Troubleshooting-Related Items

  1. 1.

    Vitrage

    Vitrage is a component of OpenStack that provides root analysis (Root Causes, RCA) services. The organization is used to organize, analyze, and extend OpenStack alerts and events, derive the root cause of the problem, generate derived alarms for the system, or set the derived state. Its main functions include physical-virtual entity mapping, deriving alarms and states, root cause analysis of alarms or events, and Horizon display.

  2. 2.

    Aodh

    Aodh is a component separated by Ceilometer, and its main function is to provide resource alerting, support log, Webhook, and other alerts. It consists of four modules: alarm condition trigger calculation module, alarm notification module, listening module, Aodh start module.

  3. 3.

    Monasca

    Monasca is a high-performance, scalable, highly available monitoring-as-a-service solution. It is an open source monitoring scheme based on open source technology. It uses the REST API to store and query historical data. Unlike other monitoring tools that use special protocols and transport methods, such as Nagios' NSAA, Monasca uses only HTTP. When multi-tenant authentication, metrics are submitted and certified using keystone components, and metrics are defined using key-value pairs. Real-time thresholds and alarms can be given to system indicators, and composite alarms can be set, which are simple to use and consist of sub-alarm expressions and logical operators. The monitoring agent supports the results of checks on built-in systems and services, as well as Nagios' Checks and Statsd.

  4. 4.

    Mistral

    Mistral is a relatively new project in the OpenStack ecosystem, a workflow component contributed by Mirantis to the OpenStack community, providing Workflow As a Service, similar to AWS's SWS (Simple Workflow Service), and Oozie services in the Hadoop ecosystem.

  5. 5.

    Freezer

    Freezer is an open source backup software that helps users automate data backup and data restoration. Freezer has now officially introduced OpenStack for data backup, an official project in the OpenStack community that aims to provide Solutions for OpenStack's data backup environment. Freezer introduced support from the OpenStack Liberty version, which required minor modifications.

6.10 Exercise

  1. (1)

    Multiple choices

    1. 1.

      The OpenStack component does not include ( ).

      1. A.

        Nova

      2. B.

        Swift

      3. C.

        Keystone

      4. D.

        EC2

    2. 2.

      In the OpenStack platform, the following () components are responsible for supporting all activities of instances and managing the life cycle of all instances.

      1. A.

        Glance

      2. B.

        Neutron

      3. C.

        Swift

      4. D.

        Nova

    3. 3.

      In the OpenStack platform, ( ) is used to define a collection of resources that can be accessed.

      1. A.

        User

      2. B.

        Project

      3. C.

        Role

      4. D.

        Domain

    4. 4.

      In the OpenStack platform, network traffic packets can be filtered on the route to enhance network security.

      1. A.

        Securet Group

      2. B.

        ML2

      3. C.

        FWaaS

      4. D.

        LBaas

    5. 5.

      The basic features not provided by the Civic component in OpenStack are ( ).

      1. A.

        Provides basic block storage management capabilities

      2. B.

        Virtualize SAN management with iSCSI, FC, or NFS

      3. C.

        Provides long-lasting storage media and can be passed between virtual machines

      4. D.

        Provides high-performance file systems

    6. 6.

      ( ) is not Swift's design principle.

      1. A.

        Persistence of data

      2. B.

        Responsible algorithms to improve storage efficiency

      3. C.

        Symmetrical System Architecture

      4. D.

        No single point of failure

  2. (2)

    Fill in the blanks.

  3. 1.

    OpenStack is the framework for building__________, managing all kinds of hardware devices through integration and hosting all kinds of upper-level applications and services, resulting in a complete system.

  4. 2.

    OpenStack is a free and open source platform that is primarily used to deploy ___________.

  5. 3.

    OpenStack’s key components are _________, __________, _________, __________, __________

  6. 4.

    OpenStack's mirroring service supports a variety of virtual machine mirror formats, including _________, __________, _________, __________.

  7. 5.

    As a separate security authentication module in OpenStack, __________ is responsible for the authentication of OpenStack users, token management, the service directory that provides access to resources, and access control based on the user role.

  8. 6.

    Initially, network services in OpenStack were provided by a separate module _______in Nova.

  9. (3)

    Answer the questions

  10. 1.

    What is OpenStack?

  11. 2.

    Summarize the main components of OpenStack and its features.

  12. 3.

    Summarize how OpenStack works together across service modules.

  13. 4.

    What services in OpenStack typically run on the control node?

  14. 5.

    What is a Neutron agent? How do I display all Neutron agents?