During the operation of the network system, the operation and maintenance personnel need to manage the hardware and software resources in the network system according to the service requirements, and at the same time, monitor and regularly maintain the switches, routers, wireless AC/AP, firewalls, servers and other equipment in the network system, so as to quickly and effectively collect fault information, analyze the cause of the fault and recover the equipment in time.

This chapter will first introduce the resource management in network operation and maintenance, including hardware resource management and software resource management. Among them, hardware resource management includes the management of electronic labels, CPU, memory, single board and other resources of network equipment, while software resource management includes the management of license, system software, configuration files and other resources. Then, the routine maintenance and fault handling of network system will be introduced, in which the purpose of routine maintenance is to find and eliminate the hidden trouble of network equipment, while the purpose of fault handling is to quickly analyze and locate the fault and repair it after the fault occurs, so as to resume service.

By the end of this chapter, you will

(1) Understand the management of network system

(2) Understand the maintenance of network system

(3) Master how to manage hardware resources

(4) Master how to manage software resources.

(5) Familiar with routine maintenance of the equipment room

(6) Acquire the ability to handle common faults

6.1 Network System Resource Management

Before the management and maintenance of the network system, the operation and maintenance personnel should first collect the planning and data information of the whole network system, including network topology, data planning, user name and password of remote login, etc., so as to query, compare and maintain at any time in the later period.

The resource management of network system includes the management and maintenance of hardware and software resources of equipment in the whole network system. The management of hardware resources mainly refers to the management of equipment system resources (CPU and memory), cables, boards, fans, etc., while the management of software resources includes equipment license management, system software and patch management, backup and recovery of configuration files, user information management, etc.

6.1.1 Hardware Resource Management and Maintenance

Hardware resource management refers to the operation and management of hardware resources of equipment through command line, such as resetting boards, backing up electronic labels, turning on or off power supply, etc. In the process of equipment operation, the necessary management of hardware resources can reduce the actual plugging, unplugging or loading/unloading of equipment hardware resources, which is convenient and fast, and can improve the reliability of hardware resources. Common hardware resource management will be described in detail below.

  1. 1.

    Electronic label backup

    Electronic labels, also known as radio frequency tags, which are commonly called equipment serial numbers, play a very important role in dealing with network failures and replacing hardware in batches.

    When the network breaks down, the related hardware information can be obtained conveniently and accurately through the electronic labels, which improves the efficiency of maintenance work. At the same time, through the statistical analysis of the electronic label information of the faulty hardware, the hardware defect can be analyzed more accurately and efficiently. In addition, when replacing hardware in batches, the distribution of hardware in the whole network can be accurately learned through the electronic label information established in the file system of customer equipment, which is convenient to evaluate the impact of replacement and formulate corresponding strategies, thus improving the efficiency of replacing hardware in batches.

    Huawei’s network equipment supports backing up electronic labels to file servers or device storage media. When backing up the electronic labels to the file server, it is necessary to ensure that the device and the file server communicates with each other with accessible routes. The currently supported file servers are FTP server and TFTP server.

    To execute the [backup elabel] command to back up the electronic label, there are three methods.

    1. (a)

      Execute the [backup elabel filename [slot-id]] command to back up the electronic label to the device storage medium.

    2. (b)

      Execute the command [backup elabel ftp ftp-server-address filename username password [slot-id]] to back up the electronic label to the FTP server.

    3. (c)

      Execute the command [backup elabel tftp tftp-server-address filename [slot-id]] to back up the electronic label to the TFTP server.

    Then, taking router AR3260 as an example, the specific process of backing up electronic labels is shown as follows.

    [Example 6.1]

    Backup of electronic label

    Method (1): Back up to a storage device, which is the simplest method. Assuming that the file name of the electronic label is “ar3260_elabel”, execute the command [backup elabel ar3260_elabel] directly, as detailed below.

    <Huawei>backup elabel ar3260_elabel It is executing, please wait... Backup elabel successfully!

    Method (2): Back up to the FTP server. The network topology is shown in Fig. 6.1. The FTP server’s IP address is 192.168.0.11, the user name is “user1”, and the password is “pass1”. Ensure that the user has the permission to upload files. The command and execution results are as follows.

    <Huawei>backup elabel ftp 192.168.0.11 ar3260_elabel user1 pass1 It is executing, please wait... Backup elabel successfully!

    After the above operation, the backup file “ar3260_elabel” can be found in the root directory of the FTP server, indicating that the electronic label was successfully backed up.

    Method (3): Back up to the TFTP server. The network topology is shown in Fig. 6.2. The command and execution results are as follows.

    <Huawei>backup elabel tftp 192.168.0.11 ar3260_elabel It is executing, please wait... Info: Transfer file in binary mode. Uploading the file to the remote TFTP server. Please wait... TFTP: Uploading the file successfully. 915 bytes send in 1 second.

    After the above operation, the file transfer process can be queried in the TFTP server, and the backup file “ar3260_elabel” can be found in the root directory of the TFTP server, indicating that the electronic label was successfully backed up (Fig. 6.3).

  2. 2.

    Configuration of the alarm of CPU usage threshold

    CPU is the core part of the equipment. When there is a lot of routing information in the system, it will take up a lot of CPU resources, which will greatly affect the system performance, resulting in data processing delay or high packet loss rate. In the process of data processing, if we get an alarm to the CPU with high usage rate in time, we can more effectively monitor the CPU usage and optimize the system performance, so as to ensure that the system is always in a benign operation state.

    The alarm threshold of CPU usage includes usage threshold and restore threshold, whose configuration calls for the following three steps.

    1. (a)

      Execute the command [display cpu-usage configuration] to view the configuration information of CPU usage rate of the equipment.

    2. (b)

      Execute the [system-view] command to display the system view.

    3. (c)

      Execute the command [set cpu-usage threshold threshold-value [restore restore-threshold-value] [slot slot-id]] to configure the CPU usage threshold alarm and restore threshold.

    By default, the CPU usage threshold is 80%, and the restore threshold is 75%.

  3. 3.

    Configuration of the alarm of memory usage threshold

    Memory usage rate is one of the important indicators to measure equipment performance. In the process of network equipment running, too high memory usage rate will lead to abnormal service. During the process of data processing, if we get an alarm to the memory with high usage rate in time, we can more effectively monitor the memory usage and optimize the system performance, so as to ensure that the system is always in a benign operation state. So how to configure the alarm of memory usage threshold? The steps are as follows.

    1. (a)

      Execute the command [display memory-usage threshold] to view the configuration information of the device memory usage rate.

    2. (b)

      Execute the [system-view] command to display the system view.

    3. (c)

      Execute the [set memory-usage threshold threshold-value] command to configure the alarm of memory usage threshold.

  4. 4.

    Board management

    There are many slots in the frame equipment, which can be used to carry many redundant network interface cards, including system bus, power supply, security module, etc. The so-called board management refers to the operation management of the board in a single slot. In the process of equipment operation, board management allows maintenance or troubleshooting of equipment with as little disruption to service as possible. Huawei network equipment supports the operation management of a single board, including board reset, power-on and power-off of the board, and the active/standby switchover of the master control board.

    1. (a)

      Board reset

      In the actual operation and maintenance process, in order to provide better services, a single board may need to be upgraded. During the upgrade, the board may fail. At this time, we can repair the failure by resetting the board. Execute the command [reset slot slot-id] to reset the board in the slot corresponding to “slot-id”. In the following, taking router AR3260 as an example, the operation steps of resetting a board are introduced.

      [Example 6.2]

      Reset a board

      1. (i)

        Execute the [display device] command to view the status information of the board. The execution results are as follows.

        [Huawei]display device AR3260's Device status: Slot Sub Type Online Power Register Alarm Primary - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 2 - 2E1/T1-F Present PowerOn Registered Normal NA 3 - 2E1/T1-F Present PowerOn Registered Normal NA 4 - 1GEC Present PowerOn Registered Normal NA 6 - 8FE1GE Present PowerOn Registered Normal NA 15 - SRU80 Present PowerOn Registered Normal Master 16 - FAN Present PowerOn Registered Normal NA

      2. (ii)

        Reset the board, for example, resetting the board “2E1/T1-F” in the slot “2”. The command and execution results are as follows.

        <Huawei>reset slot 2 Are you sure you want to reset board in slot 2 ? [y/n]:y Feb 7 2020 14:56:40-08:00 Huawei %%01DEV/4/ENTRESET(l)[0]:Board[2] is reset, The reason is: Reset by user command. INFO: Resetting board[2] succeeded.

      In addition, for equipment supporting dual master control boards, the standby master control board can also be reset without affecting the normal operation of the equipment. At this time, just execute the [slave restart] command in the system view.

    2. (b)

      Power-on and power-off of the board

      The actual network has certain service redundancy, including network-level redundancy, equipment-level redundancy and board-level redundancy. That is to say, in the actual network, some boards on the equipment may be idle. Therefore, the designated idle board can be powered off without interfering the service, which is beneficial to the stable operation of the system and saves energy. When the board is needed for later service expansion, it can be powered on in real time without impeding service expansion. In the following, taking router AR3260 as an example, the operation steps of power-on and power-off of the board are introduced.

      1. (i)

        Power-off of the board

        Execute the [power off] command to power off the board. Example 6.3 shows the process of powering off an idle board.

        [Example 6.3]

        Power-off of the board

        Taking router AR3260 as an example, the operation steps of powering off the board are as follows.

        • Execute the [display device] command to view the status information of the board. There are many slots in the network equipment, especially the frame equipment, which can be used to carry many redundant network interface cards, including system bus, power supply, security module, etc.

          <Huawei>disp device AR3260's Device status: Slot Sub Type Online Power Register Alarm Primary - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 2 - 2E1/T1-F Present PowerOn Registered Normal NA 3 - 2E1/T1-F Present PowerOn Registered Normal NA 4 - 1GEC Present PowerOn Registered Normal NA 6 - 8FE1GE Present PowerOn Registered Normal NA 15 - SRU80 Present PowerOn Registered Normal Master 16 - FAN Present PowerOn Registered Normal NA

        • Assume that the board in Slot 3 is idle, not hosting any service. Enter the user view and execute the command [power off slot 3] to power off the board.

          <Huawei>power off slot 3 Feb 7 2020 15:56:02-08:00 Huawei %%01DEV/4/ENTPOWER OFF(l)[0]:Board[3] is power off, The reason is: Power off by user command.

      2. (ii)

        Power-on of the board

        As power-off of the board, the power-on is also supported. When the powered off board is needed for later service expansion, just execute the command [power on] to power on the board.

        [Example 6.4]

        Power-on of the board

        Also taking router AR3260 as an example, the operation steps of powering on the board are as follows.

        • Execute the [disp device] command to view the status information of the board.

          <Huawei>disp device AR3260's Device status: Slot Sub Type Online Power Register Alarm Primary - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 2 - 2E1/T1-F Present PowerOn Registered Normal NA 3 - 2E1/T1-F Present PowerOff Registered Normal NA 4 - 1GEC Present PowerOn Registered Normal NA 6 - 8FE1GE Present PowerOn Registered Normal NA 15 - SRU80 Present PowerOn Registered Normal Master 16 - FAN Present PowerOn Registered Normal NA

        • Enter the user view and execute the command [power on slot 3] to power on the board in Slot 3.

          <Huawei>power on slot 3 Info: Power on slot [3] successfully. Feb 7 2020 16:02:42-08:00 Huawei %%01DEV/4/ENTPOWERON(l)[8]:Board[3] is power on.

    3. (c)

      Active/Standby switchover of the master control board

      For some equipment supporting hot backup by dual master control boards, during software upgrade or system maintenance, operation and maintenance personnel can manually switch between the active master control board and the standby master control board. This operation process is called active/standby switchover. After active/standby switchover, the current master control board will be restarted and become the standby master control board after startup, and the current standby master control board will become the active master control board.

      It is important to note that during the active/standby switchover of the equipment, it is forbidden to plug, unplug or reset all the active and standby master control boards, service interface boards, power modules or fan modules, otherwise, the whole equipment may restart or fail.

      The active/standby switchover of the master control board is only applicable to some equipment supporting hot backup by dual master control boards. The specific operation steps are as follows.

      1. (i)

        Execute the command [display switchover state] to check whether the active or standby master control board meets the conditions for switchover. It must be emphasized again that only when the master control boards are in the real-time backup stage, the user can perform the active/standby switchover operation.

      2. (ii)

        Execute the [system-view] command to display the system view.

      3. (iii)

        Execute the command [switchover enable] to enable the active/standby switchover operation. By default, the active/standby switchover function is enabled.

      4. (iv)

        Execute the [slave switchover] command to perform the switchover.

  5. 5.

    Interface management

    The interfaces in the equipment include management interface, physical interface and logical interface. A management interface mainly provides configuration management support for the user. The user can log in to the device and perform configuration and management operations through this interface. A management interface does not undertake service transmission, such as Console port, MiniUSB port and Meth. The physical interface is a real interface supported by the device, which is responsible for service transmission, such as Ethernet, Gigabit Ethernet, Serial interface, etc. A logical interface refers to the interface that can realize the data exchange function but does not exist physically, which needs to be established through configuration, such as Loopback, Eth-Trunk, VLANIF, Tunnel, etc.

    Interface management includes basic parameter configuration, physical interface configuration, logical interface configuration, etc. The basic parameter configuration refers to setting interface description information, bandwidth, traffic statistics, etc. The physical interface configuration refers to the configuration of real Layer 2 and Layer 3 interfaces, including VLAN configuration and IP address configuration. The logical interface configuration refers to the configuration of Null0, Loopback, Tunnel and other interfaces, mainly for configuring IP addresses. The following mainly introduces the basic parameter configuration. For the configuration of VLAN and IP address, please refer to Chaps. 4 and 5.

    1. (a)

      Configuration of interface description information

      In order to facilitate the management and maintenance of equipment, the interface description information can be configured during actual operation and maintenance, describing the equipment to which the interface belongs, the interface type, the opposite network element equipment and other information.

      [Example 6.5]

      Configuration of interface description information

      Assuming that the current interface is connected to the “GE0/0/1 interface” of Device B, the description information can be configured as “To_DeviceB_GE0/0/1″ through the following configuration.

      [Huawei-GigabitEthernet0/0/0]description To_DeviceB_GE0/0/1

    2. (b)

      Configuration of interface bandwidth and network management bandwidth

      The Ethernet interface supports bandwidth setting and network management bandwidth setting.

      The [speed] command is used to configure the speed of the Ethernet interface in non-auto-negotiation mode. By default, when an Ethernet interface works in a non-auto-negotiation mode, the speed is the maximum supported by the interface, so the auto-negotiation function must be disabled before using the [speed] command to modify the bandwidth.

      The [bandwidth] command is used to set the interface bandwidth acquired by the network management device on MIB. By default, the interface bandwidth acquired by the network management device on MIB involves the interface type, and configuring the interface bandwidth acquired by the network management device does not change the actual bandwidth of the interface. For example, the actual bandwidth of the GE interface is 1000 Mbit/s, and the command [bandwidth 10] can be executed under the GE interface view to configure the interface bandwidth acquired by the network management device to 10 Mbit/s.

      [Example 6.6]

      Configuration of interface bandwidth

      The following is a configuration example of modifying the bandwidth of the GE interface to 10Mbit/s.

      <Huawei>system-view [Huawei]interface GigabitEthernet0/0/0 [Huawei-GigabitEthernet0/0/0]undo negotiation auto [Huawei-GigabitEthernet0/0/0]speed 10

    3. (c)

      Setting of the interval of interface traffic statistics

      By setting the interval of interface traffic statistics, the user can make statistics and analysis on interested packets. At the same time, network congestion and service interruption can be avoided by checking the interface traffic statistics in advance and taking timely measures of traffic control. By default, the interval of interface traffic statistics is 300 s. When the user detect network congestion, they can set the interval to less than 300 s (set to 30s when congestion intensifies) and observe the traffic distribution in a short time. For data packets that cause congestion, traffic control measures can be taken. When the network bandwidth is abundant and the service runs normally, the traffic statistics interval can be set to be greater than 300 s. Once abnormal flow parameters are found, it is necessary to modify the interval in time, so as to observe the trend of the traffic parameters in real time.

      Huawei network equipment supports executing the command [set flow-stat interval interval-time] in system view to set the interval of interface traffic statistics. This setting in the system view is effective for all interfaces with the default interval; while the setting in the interface view only takes effect on this interface, not affecting other interfaces, with priority higher than the interval configured in system view.

      [Example 6.7]

      Setting of the interval of interface traffic statistics

      Next, set the traffic statistics interval of interface GE0/0/0 to 100 s and other intervals to 200 s. The setting example is as follows.

      <Huawei>system-view [Huawei]set flow-stat interval 200 [Huawei]interface GigabitEthernet0/0/0 [Huawei-GigabitEthernet0/0/0]set flow-stat interval 100

      1. (i)

        Configure to open or close the interface

        When the working parameter configuration of the interface is modified and the new configuration fails to take effect immediately, execute the [shutdown] and [undo shutdown] commands in turn, or execute the [restart] command to shut down and restart the interface to make the new configuration take effect.

        By default, all interfaces are open. When an interface is idle (i.e., no cable or optical fiber is connected), it is best to use the [shutdown] command to shut down the interface to prevent it from being abnormal due to interference. It is important to note that some logical interfaces (such as Null0 and Loopback interfaces) will remain open once they are created, and cannot be shut down or opened by command.

      2. (ii)

        Clear interface statistics.

        If it is necessary to conduct statistics on the traffic of the interface in a period of time, the original statistics information must be cleared before the statistics start. The [reset counters interface] command can be used to clear the statistical information of the specified interface, whose format is [reset counters interface { interface-type [ interface-number] }], where “interface-type” indicates the interface type and “interface-number” indicates the interface No. If no interface type is specified, the statistics information of all types of interfaces will be cleared. If you specify an interface type without specifying an interface No., the statistics of all interfaces of that type will be cleared.

        The [reset counters interface] command clears the statistics information of input and output packets of the interface, which cannot be recovered after clearing, and the packet statistics of each interface is the basis of traffic charging. Thus, clearing the statistics information of the interface will have an impact on the result of traffic charging. Therefore, in the normal application environment, do not clear the interface statistics at will.

  6. 6.

    Optical module alarm management

    Huawei’s network equipment supports both Huawei-certified optical modules and non-Huawei-certified optical modules, but it should be noted that when non-Huawei-certified optical modules are used on the device, the functions of these optical modules may not work normally, and the system will generate a large number of alarms, trying to remind the user to replace it with Huawei-certified ones for management and maintenance. In addition, the optical modules produced by Huawei in the early days may not record the manufacturer information, thus also generating the alarms to non-Huawei-certified modules.

    On the device, check the general, manufacturing and alarm information of the optical module by executing the command [display transceiver]. For Huawei-certified optical modules, you can choose the most suitable optical module alarm method by configuring the optical module alarm function; for non-Huawei-certified modules, in order to make full use of resources, they can continue to service in the device, but it is recommended to turn off such alarms by command.

    For Huawei-certified optical modules, the operation steps of alarm management are as follows.

    1. (a)

      Execute the command [display transceiver] to check the general, manufacturing and alarm information of the optical module on the device interface.

    2. (b)

      Configure the optical module alarm switch. In the system view, the power alarm switch of the optical module can be turned on by executing the command [set transceiver-monitoring enable]. Execute the command [set transceiver-monitoring disable] to turn off the power alarm switch. By default, the power alarm switch of the optical module is turned on.

    3. (c)

      Configure the threshold for the transmitting power alarm to the optical module. Enter the optical interface view to be configured, and execute the commands [set transceiver transmit-power upper-threshold upper-value] and [set transceiver transmit-power lower-threshold lower-value], respectively, to set the upper and lower thresholds of the optical module’s transmitting power. When the transmitting power exceeds the range defined by the upper and lower thresholds, an alarm will be generated.

    4. (d)

      Configure the threshold for the receiving power alarm to the optical module. Enter the optical interface view to be configured, and execute the commands [set transceiver receive-power upper-threshold upper-value] and [set transceiver receive-power lower-threshold lower-value], respectively, to set the upper and lower thresholds of the optical module’s receiving power. When the receiving power exceeds the range defined by the upper and lower thresholds, an alarm will be generated.

    By default, the alarm function of non-Huawei-certified optical modules is enabled. In order to make these modules work normally on the device without generating a large number of alarms, it is recommended to disable the alarm function. To disable the function, execute the command [transceiver phony-alarm-disable] in the system view.

  7. 7.

    Energy saving management

    With the continuous expansion of network scale, the energy consumption of equipment accounts for an increasing proportion of operating costs, and “green” and “energy saving” have become the main concerns of network construction and operation. The equipment in the network system supports the adoption of many energy-saving technologies to reduce energy consumption, so as to achieve the purpose of green energy saving.

    The energy-saving management technologies supported by Huawei network equipment include automatic fan speed regulation, Automatic Laser Shutdown (ALS), Energy Efficient Ethernet (EEE), etc. The specific characteristics of these three energy-saving management technologies are described as follows.

    1. (a)

      Automatic fan speed regulation

      The fan adopts automatic speed regulation strategy to monitor the temperature of key components of the device. When the temperature of a sensitive device inside the equipment is higher than the set value, the fan speed is increased; and when the temperature is lower than the set value, the fan speed is reduced. Finally, the equipment is controlled in a stable temperature state, so as to save energy and reduce noise.

    2. (b)

      ALS

      The ALS controls the light emission of the optical module laser by detecting the loss of signal (LoS) at the optical port. It provides the user with security protection, and at the same time, reduces energy consumption by the user. If the device does not enable or support the ALS function, when the interface optical fiber is not in place or the optical fiber link fails, although the data communication is interrupted, the optical interface is not turned off and the light emitting function of the optical module laser is turned on. When the data communication is interrupted, the continuous light emission by the optical module laser will not only cause the waste of energy, but also cause certain danger, because the laser accidentally entering the human eye will also cause certain harm. On the contrary, if the equipment enables the ALS function, when the interface optical fiber is not in place or the optical fiber link fails, after the system detects the LoS signal of the optical port, it can be judged that the service has been interrupted at this time, and the system will automatically turn off the optical module laser; when the optical fiber or optical fiber link plugged into the interface is restored, the system detects that the LoS signal of the optical port is cleared, and automatically turns on the optical module laser, thereby resuming the service.

    3. (c)

      EEE

      EEE is an energy-saving method that dynamically adjusts the power of electrical interfaces according to network traffic. If the device is not configured with the power self-adjustment function for the electrical interfaces, the system will supply power to each interface with constant power, and even if an interface is idle, it consumes the same energy. On the contrary, if the power self-adjustment function of the electrical interfaces is configured, when an interface is idle, the system will automatically reduce the power supply to the interface, thus saving the overall energy consumption of the system; when the interface starts to transmit data normally, it will resume normal power supply without affecting normal service.

    The configuration processes related to these three energy-saving management technologies are introduced below.

    1. (a)

      Configuration of automatic fan speed regulation

      The fan speed affects the device temperature. Reasonable adjustment of the fan speed helps keep the device in a stable temperature and state. By default, the system enables the automatic fan speed regulation, that is, the system automatically adjusts the fan speed according to the device state. Under normal circumstances, when the fan runs in the automatic state, the noise is low, the energy is saved and the normal function of the system is not affected. It is suggested to confirm the current device temperature and fan state before configuring the fan speed, and then reasonably adjust the fan speed according to the current device state, that is, if the current temperature is too high, the fan speed can be increased, otherwise, the fan speed can be reduced. Specific operation steps are as follows.

      1. (i)

        Execute the command [display temperature all] to check the device temperature information.

      2. (ii)

        Execute the [display fan] command to check the current state of the fan.

      3. (iii)

        Execute the command [set fan-speed fan slot-id percent percent] in the system view to adjust the fan speed. For example, by executing the command [set fan-speed fan 0 percent 100], the fan speed on Slot 0 is adjusted to 100%, that is, the maximum speed. If there are multiple fans on the board, all fans will be adjusted to the maximum speed.

    2. (b)

      ALS configuration

      The ALS function is only applicable under the optical port, not supported by the electric port. Next come the steps of ALS configuration.

      [Example 6.8]

      ALS configuration

      In the topology shown in Fig. 6.4, the interfaces GE1/0/0 of R1 and GE1/0/0 of R2 are interconnected by optical fiber. The user hopes that when the link fails, the optical module laser of the optical port can automatically turn off the light emission, and can resume it after the link recovers, so as to achieve the purpose of energy saving. To meet this requirement, it is necessary to configure the interfaces interconnected by two routers to enable the ALS function, so as to automatically turn off the light emission when the link fails, and at the same time, configure the laser to automatic restart mode, so that the laser will automatically resume lighting when the link is restored.

      1. (i)

        Enable the ALS function of interface GE1/0/0 of R1, and configure the restart mode of the laser as automatic restart. The specific commands are as follows.

        <Huawei> system-view [Huawei] sysname R1 [R1] interface GigabitEthernet1/0/0 [R1-GigabitEthernet1/0/0] als [R1-GigabitEthernet1/0/0] als restart mode automatic [R1-GigabitEthernet1/0/0] return

      2. (ii)

        Enable the ALS function of interface GE1/0/0 of R2, and configure the restart mode of the laser as automatic restart. The specific commands are as follows.

        <Huawei> system-view [Huawei] sysname R2 [R2 interface GigabitEthernet1/0/0 [R2-GigabitEthernet1/0/0] als [R2-GigabitEthernet1/0/0] als restart mode automatic

      3. (iii)

        [R2-GigabitEthernet1/0/0] returnVerify the configuration results, and check the configuration of interface ALS on R1 and R2. The specific commands and execution results are as follows.

        <R1> display als interface GigabitEthernet1/0/0 Interface Mode Pulse Interval Pulse Width GigabitEthernet1/0/0 AUTO 100 2 <R2> display als interface gigabitethernet 1/0/0 Interface Mode Pulse Interval Pulse Width GigabitEthernet1/0/0 AUTO 100 2

    3. (c)

      EEE configuration

      By default, the network equipment supplies power to each interface with constant power, and even if an interface is idle, it consumes the same energy. After configuring the EEE function of the electrical interface, the power of the electrical interface is dynamically adjusted according to the network traffic. When the interface is idle, the system automatically adjusts the power supply to the interface, which enters the low power consumption mode, that is, the sleep mode, reducing the overall energy consumption of the system and achieving the purpose of energy saving; when the interface starts to transmit data normally, the normal power supply is restored. The EEE mechanism can only be configured on electrical ports above 100 Mbit/s, not supported by optical ports, Combo ports with photoelectric multiplexing and electrical ports with negotiation rate of 10 Mbit/s. By default, the EEE function of the electrical port is not enabled. The operation steps for enabling the EEE are as follows.

      1. (i)

        Execute the [system-view] command to display the system view.

      2. (ii)

        Execute the command [interface interface-type interface-number] to display the interface view.

      3. (iii)

        Execute the command [energy-efficient-ethernet enable] to enable the EEE of the electrical port.

Fig. 6.1
figure 1

Network topology of backup to the FTP server

Fig. 6.2
figure 2

Network topology of backup to the TFTP server

Fig. 6.3
figure 3

File transfer process in the TFTP server

Fig. 6.4
figure 4

Topology of ALS configuration

6.1.2 Software Resource Management and Maintenance

In the process of operation and maintenance, operation and maintenance personnel should not only manage hardware resources, but also configure and manage software resources, including license management, configuration of equipment startup files, system software upgrade, software patch management, file management, interface management and so on.

  1. 1.

    License management

    A license is a contract form in which suppliers and customers authorize/get authorization of the use scope and term of products sold/purchased. Through the license, customers can get the corresponding services promised by suppliers. After purchasing the device, the user can use the basic functions of the equipment. When the user need to use value-added features or expand equipment resources due to service expansion, he/she must purchase licenses for corresponding functions or resources of the equipment to meet service needs. Such function or resource control based on license allows the user to flexibly choose the appropriate license as needed, and to use the value-added features customized without purchasing additional devices, thus effectively reducing the cost of the user.

    The license can be divided into COMM and DEMO types by the use. Under normal circumstances, the licenses purchased according to the contract are all COMM type, most of which is generally permanent. But a part of COMM licenses are subject to fixed term. Temporary licenses used for special purposes such as testing and trial use are DEMO licenses, which are generally subject to strict time limit.

    The physical form of license is represented by license authorization certificate and license file. The licenses feature convenience, security and disaster tolerance. Convenience means that the installation of license is an uninterrupted process, which does not need to restart the device, nor affect other running services. Security is reflected in the binding of license file and equipment serial number (ESN), that is, the license file is unique and exclusively corresponds to the device. If the content of the license file is modified manually, the file will be invalid immediately, thus effectively preventing the license from being stolen. In addition, in case of unexpected emergencies, such as earthquake, rescue, etc., the license activated by the traditional license mechanism can also be converted into the disaster-tolerant state. In the disaster-tolerant state, the resource-based license no longer controls the size of corresponding dynamic resources, but opens up the maximum resources that the product can support, ensures the product to work fully, and satisfies the service to the maximum extent. Hence the disaster-tolerant mechanism.

    In license management, the following concepts should be noted.

    1. (a)

      License file

      The license file is an authorization file that controls the capacity, function and time of the software version. It is generated by special encryption tools according to the contract information, and is generally distributed in the form of electronic document.

      In practical application, one device can only load one license file. When the number of features or resources contained in the currently loaded license file is insufficient, it is necessary to increase the number of corresponding functions or resources, that is, license expansion. The Electronic Software Delivery Platform (ESDP) of Huawei automatically merges all license items on the same device to generate the final license file, and then the device reloads the merged license file to complete the license expansion.

    2. (b)

      License authorization certificate

      The license authorization certificate, also called license certificate, records the product name, authorization ID, customer name and validity period of the license. The license authorization certificate is sent to the customer by mail, or provided to the customer with the product in paper (A4 size) or CD. Only the COMM license includes a license authorization certificate.

    3. (c)

      ESN

      ESN is a character string used to uniquely identify a device, which is the key to ensure that the license is granted to the designated device, also known as “device fingerprint”.

    4. (d)

      License serial number

      The license serial number (LSN) uniquely identifies the license file.

    5. (e)

      Revoke code

      A character string obtained after executing the revoke command on the network element is called revoke code. This character string is the certificate for self-service ESN change and adjustment after logging in to the license website. After the revoke command is executed on the network element, the license file on the network element is invalidated immediately.

    6. (f)

      Expired period

      The license limited by a fixed term enters the expired trial period after the running deadline, and the trial days at this time are called the expired period, which is generally 60 days. During the expired period, the features in the license file continue to run normally, and after the expiration period, the features in the license file cannot be used normally.

    The management of license generally includes applying for, installing, viewing, uninstalling, upgrading or downgrading, restoring the license, etc. Next, examples are given to illustrate the common operation steps of license management.

    1. (a)

      Apply for the license.

      The application for the license includes the application for COMM license and for temporary license. Among them, the temporary license is applicable to temporary tests, such as POC test in the stage of market expansion, brand exhibition, and service scenarios of R&D test before product launch, etc. If the user needs to apply, he/she needs to contact Huawei technical support to obtain a temporary license. The following mainly introduces the application of COMM License.

      There are two ways to apply for COMM license: authorization activation and password activation. When using the authorization activation method, you can enter query conditions (such as contract number, order number and authorization ID) to query authorization, and then select authorization according to the query results before activation; when using the password activation method, you must obtain the activation password from the license certificate and activate the license through the activation password. Currently, only the enterprise network user is eligible to the password activation method.

      [Example 6.9]

      Application for COMM license

      There are two ways to apply for COMM license: authorization activation and password activation. The specific operation steps of application through password activation are as follows.

      1. (i)

        Obtain the authorization ID or activation password from the license certificate.The example of a license certificate is shown in Fig. 6.5.

      2. (ii)

        Log in to the device and execute the command [display esn] in any view to obtain the ESN of the device.

      3. (iii)

        Log in to ESDP website.

      4. (iv)

        Activate the license.

        • Select the option “License activation” -> “Password activation” in the left tree navigation bar, enter the authorization ID or activation password in the text box “Activation password”, check the “I have learned the above information” check box after confirmation, and click the “Next” button, as shown in Fig. 6.6.

        • Bind the ESN of the device. Enter the ESN directly, or select the added device (network element) to obtain the ESN, and click “Next”, as shown in Fig. 6.7.

        • Enter the activation confirmation interface and confirm the activation information, as shown in Fig. 6.8. If it is correct, click the “Confirm activation” button to enter the next step; otherwise, click the “Back” button to modify it.

        • After successful activation, enter the license download screen, as shown in Fig. 6.9, and click the “Download” button to download the license file locally.

    2. (b)

      Install the license.

      After the license application is successful, it needs to be installed on the device before it can be used. Taking router AR3260 as an example, the operation steps of license installation are as follows.

      [Example 6.10]

      License installation.

      After the license application is completed, it can be downloaded locally, assuming that the license file is named “LICAR3200all_201404110L1Q50.dat” (note that the license file name cannot contain spaces). The specific operation steps are as follows.

      1. (i)

        Upload the license file to the device by FTP or TFTP.

      2. (ii)

        Execute the command [license active file-name] to activate the license file and obtain the corresponding authorization. The command and execution results are as follows.

        <Huawei> license active LICAR3200all_201404110L1Q50.dat Info: The License is being activated. Please wait for a moment. GTL Verify License passed with minor errors on MASTER board: This item LAR0CM00 License File value more than maximum value. This item LAR0CT00 License File value more than maximum value. Warning: If this operation is performed, the trial license may replace the current license, and resources and functions in the current license may reduce. Continue? (y/n)[n]:y Info: Succeeded in activating the License file on the master board.

    3. (c)

      View the license.

      After installing the activated license file, execute the [display license] command to view the detailed information of the activated license file in the current system, including the name, storage path, status and revoke code of the license file.

      If you only need to check the license status of the master control board, you can execute the command [display license state]. The output information and description of this command are shown in Table 6.1.

      In addition, you can check the usage of the resource items defined in the license file by executing the command [display license resource usage]. The output information and description of this command are shown in Table 6.2.

    4. (d)

      Uninstall the license.

      For the redundant license files installed on the device, you can uninstall and delete them to save the storage space. The specific operation steps are as follows.

      1. (i)

        Execute the [license revoke] command in the user view, so that the license that needs to be uninstalled at present get into a trial state.

      2. (ii)

        Upload and activate a new license file. For specific operation steps, please refer to the operations of installing license.

      3. (iii)

        Execute the [delete filename] command in the user view to delete the license file to be uninstalled, where “filename” is the name of the license file to be deleted.

    5. (e)

      Merge the license.

      In the process of operation and maintenance, if some devices need to be temporarily suspended (for example, those need maintenance, etc.), the licenses of such devices can be merged with the license of other devices, so that the existing license resources can be fully utilized and the service capability will not be affected. The operation steps are as follows.

      1. (i)

        Obtain revoke codes of the disabled device and target device.

        • Execute the [license revoke] command in the user view to change the current license into trial state and obtain the license’s revoke code.

        • Or execute the command [display license revoke-ticket] to obtain the license’s revoke code after changing the current license into trial state.

      2. (ii)

        Provide the revoke code to Huawei technical support personnel, who will perform the license merging operation.

  2. 2.

    System management

    System management refers to the management of device software, configuration files and system patches.

    Among them, the software of the device includes BootROM software and system software. After the device is powered on, first run the BootROM software to initialize the hardware and display the hardware parameters, and then run the system software. On the one hand, the system software provides the function of driving and adapting the hardware, on the other hand, it realizes the service characteristics. BootROM software and system software are necessary for starting and running the device, which provide support, management, service and other functions for the whole device.

    A configuration file is a collection of command lines. The user saves the current configuration in the configuration file, so that these configurations can continue to take effect after the device restarts. In addition, through the configuration file, the user can conveniently consult the configuration information, and can also upload the configuration file to other devices to realize batch configuration of the device.

    Patch is a kind of software compatible with the system software, which is used to deal with the problems that need to be solved urgently. During the device operation, it is sometimes necessary to modify the software of device system for adaptability and debugging, such as correcting the defects in the system and optimizing a specific function to meet the service requirements. Patches are usually released in the form of patch files. A patch file may contain one or more patches, and different patches deliver different functions. When the patch file is loaded into the memory patch area by the user from the memory, the patches in the patch file will be assigned a unique unit serial number in the memory patch area for identifying, managing and operating each patch.

    In the process of operation and maintenance, for security reasons, the operation and maintenance personnel need to back up the configuration files of network device. If some new features need to be deployed on the device, the operation and maintenance personnel also need to upgrade the version of system software or install new system patches. The following will introduce the methods of software upgrade, patch management and configuration file backup and recovery.

    1. (a)

      Software upgrade.

      In the process of equipment operation, it may be necessary to add new features and optimize the original features based on user requirements. At this time, it is necessary to upgrade the current software version to meet user requirements. This optimizes the device performance, increase the new features, and solve the problem overdue update.

      In order to ensure the smooth upgrade, the following preparations should be made before upgrading the software.

      1. (i)

        The user prepares relevant hardware as required, such as clearing the memory space of the device for storing supporting files for the new version.

      2. (ii)

        Confirm whether to apply for a new GTL license file. If so, do obtain it from the formal channels of Huawei.

      3. (iii)

        Get the required upgrade software. Do get the new version of system software (*.cc) to be upgraded and the supporting files for the new version from the formal channels of Huawei.

      4. (iv)

        In the user view, execute the [display version] command to check the current software version. If the version is consistent or better than the version to be upgraded, there is no need to upgrade.

      5. (v)

        Check the running status of the device through a series of commands.

        • In the user view, execute the command [display memory-usage] to check the memory usage rate of the master control board of the device, so as to ensure the normal operation of the master control board.

        • In the user view, execute the command [display health] and record the displayed information. If there is any problem that cannot be located during the upgrade, send the information to Huawei technical support engineers for fault location.

      6. (vi)

        Build an upgrade environment, where Web or CLI can be used. If CLI is used, FTP, TFTP, XModem and other different ways can be used to transfer files.

      7. (vii)

        Back up the important data in the storage medium of the equipment to be upgraded.

      8. (viii)

        Check the remaining space in the storage medium of the device to be upgraded to ensure that there is enough space to store the software and supporting files to be uploaded and upgraded.

      The following will take AR2220 as an example, and its topological structure is shown in Fig. 6.10, respectively introducing the operation process of upgrading system software in different ways such as Web, FTP and TFTP.

      [Example 6.11]

      System software upgrade (Web mode)

      Web mode refers to that the user logs in to the device through HTTP or HTTPS, where with the device as the server, the graphical operation interface is provided through the built-in Web server to facilitate the user’s intuitive and convenient management and maintenance of the device. Operation steps of software upgrade in Web mode are as follows.

      1. (i)

        Log in to the device by Web (refer to the corresponding parts of Sects. 5.2 and 5.3 for details).

      2. (ii)

        Select “System management” -> “Upgrade maintenance”, and then “System software” tab to enter the interface of upgrade maintenance for system software, as shown in Fig. 6.11.

      3. (iii)

        Click the “Browse” button to select the system software to be uploaded, which is the new version of system software (*.cc) obtained from formal channels in the preparation stage.

      4. (iv)

        Click the “Load” button to upload the system software to the device, and designate the uploaded system software as the system software to be used when the device starts up next time.

      After restarting the device, the specified system software takes effect and the upgrade process is completed.

      [Example 6.12]

      System software upgrade (FTP mode)

      Log in to the device by Telnet or STelnet, and the system software is transmitted between the terminal and the device via FTP, where with the device as the FTP client or server.

      1. (i)

        Serving as the FTP client, the device is subject to the operation steps as follows.

        When the device serves as the FTP client, it is necessary to open the FTP server on the maintenance terminal (PC), and place the new version of system software (*.cc) obtained through formal channels in the root directory of the FTP server. The IP address of the maintenance terminal, that is, the FTP server, is 192.168.0.11, as shown in Fig. 6.12.

        Log in to the device by Telnet or STelnet (refer to the corresponding parts of Sects. 5.2 and 5.3 for details), and perform the following operations.

        • In the user view, execute the [ftp host [port-number]] command and log in to the FTP server on the PC, where “host” is the IP address of the maintenance terminal, and “port-number” is the port of FTP server. If the port is “21” by default, it can be left blank. After entering the correct user name and password, you can successfully log in to the FTP server.

          <Huawei>ftp 192.168.0.11 Trying 192.168.0.11 ... Press CTRL+K to abort Connected to 192.168.0.11. 220 欢迎访问 Slyar FTPserver! User(192.168.0.11:(none)):user1 331 Please specify the password. Enter password: 230 Login successful.

        • Execute the [binary] command to set the file transfer mode to binary mode.

          [Huawei-ftp]binary

        • Execute the command [get remote-filename [local-filename]] to download system files from FTP Server, where “remote-filename” is the file name of the new-version system software that needs to be downloaded on the FTP server, and “local-filename” is the file name that is downloaded locally, and it is unnecessary to specify if it does not need to modify.

          [Huawei-ftp]get ar2200new.cc 200 Port command successful. 150 Opening BINARY mode data connection for file transfer. 1%_ 2%_ 3%_ 4%_ 5%_ 6%_ 7%_ 8%_ 9%_10%_11%_12%_13%_14%_15%_16%_ 17%_18%_19%_20%_ 21%_22%_23%_24%_25%_26%_27%_28%_29%_30%_31%_32%_33%_34%_35%_ 36%_37%_38%_39%_40%_ 41%_42%_43%_44%_45%_46%_47%_48%_49%_50%_51%_52%_53%_54%_55%_ 56%_57%_58%_59%_60%_ 61%_62%_63%_64%_65%_66%_67%_68%_69%_70%_71%_72%_73%_74%_75%_ 76%_77%_78%_79%_80%_ 81%_82%_83%_84%_85%_86%_87%_88%_89%_90%_91%_92%_93%_94%_95%_ 96%_97%_98%_99%_100% Transfer complete FTP: 181886978 byte(s) received in 297.370 second(s) 611.66Kbyte(s)/sec.

        • After downloading the system file successfully, execute the [bye] or [quit] command to terminate the connection with the server.

      [Huawei-ftp]bye

      • Execute the [dir] command in the user view to check that the new-version system file exists in the current storage directory of the router.

      • In the user view, execute the command [startup system-software filename] to set the system software to be loaded at the next startup, where “filename” is the file name of the new-version system software on the device.

        <Huawei>startup system-software ar2200new.cc

      • Execute the [reboot] command in the user view to restart the device. After that, the upgrade is completed.

        <Huawei>reboot

      1. (ii)

        Serving as the FTP server, the device is subject to the operation steps as follows.

        • Log in to the device by Telnet or STelnet (refer to the corresponding parts of Sects. 5.2 and 5.3 for details).

        • In the user view, execute the [ftp server enable] command to start the FTP server.

          [Huawei]ftp server enable

        • Execute the [aaa] command to display the AAA view.

          [Huawei]aaa

        • Create an FTP user, whose username is “Huawei” and password is “Huawei@123”.

        • Execute the command [local-user user-name password cipher password] to configure the local user name and password. Here, the italicized “user-name” and “password” are the user name and password set by the user.

        • Execute the command [local-user user-name privilege level level-number] to set the user level, where “user-name” is the user name created by the user, which is “huawei” here, and “level-number” is the number of user level, which can be set to 3 here.

        • Execute the command [local-user user-name service-type ftp] and configure the service type of local user to FTP, where “user-name” is the user name created by the user, which is “huawei” here.

          [Huawei-aaa]local-user huawei password cipher Huawei@123 [Huawei-aaa]local-user huawei privilege level 3 [Huawei-aaa]local-user huawei service-type ftp

        • Execute the command [local-user user-name ftp-directory directory] and configure the authorized directory for the FTP user, where “user-name” is the user name created by the user, which is the same as that in step d; “directory” is the root directory of the FTP server on the device, if which is set to “flash:”, it means that the FTP root directory is the root directory of the flash card.

          [Huawei-aaa]local-user huawei ftp-directory flash:

        • Execute the command [display ftp-server] to view the configuration information of the FTP server.

          [Huawei]display ftp-server FTP server is running Max user number 5 User count 1 Timeout value(in minute) 30 Listening port 21 Acl number 0 FTP server's source address 0.0.0.0

        • Open the FTP client on the PC and log in to the FTP server on the device, as shown in Fig. 6.13. Execute the [binary] command to set the file transfer mode to binary mode. Execute the [put remote-filename] command to upload the acquired new-version system software to the device, where “remote-filename” is the storage path and file name of the new-version system software on the PC.

        • In the user view, execute the command [startup system-software filename] to set the system software to be loaded at the next startup, where “filename” is the file name of the new-version system software on the device.

          <Huawei>startup system-software ar2200new.cc

        • Execute the [reboot] command in the user view to restart the device. After that, the upgrade is completed.

          <Huawei>reboot

      [Example 6.13]

      System software upgrade (TFTP mode)

      When TFTP is used to transfer files, the device only serve as a TFTP client. The specific operation steps are as follows.

      1. (i)

        Open TFTP server software on the PC and set the TFTP server root directory as the directory where the new-version system software is located, as shown in Fig. 6.14.

      2. (ii)

        Log in to the device by Telnet or STelnet (refer to the corresponding parts of Sects. 5.2 and 5.3 for details).

      3. (iii)

        In the user view, execute the command [tftp tftp-server get source-filename [destination-filename]] to download the system file from the PC. where “tftp-server” is the IP address of the TFTP server, “source-filename” is the directory and file name of the new-version system software to be downloaded, and “destination-filename” is the name of the file downloaded to the device, which may not be specified if it needs no change.

        <Huawei>tftp 192.168.0.11 get ar2200new.cc

      4. (iv)

        In the user view, execute the command [startup system-software filename] to set the system software to be loaded at the next startup, where “filename” is the file name of the new-version system software on the device.

        <Huawei>startup system-software ar2200new.cc

      5. (v)

        Execute the [reboot] command in the user view to restart the device. After that, the upgrade is completed.

        <Huawei>reboot

    2. (b)

      Patch management

      Patch management includes installation and uninstallation of system patches. Patches are installed to upgrade the system without interrupting the service. If the patch file does not need to take effect immediately, it can be executed after the next startup as specified. Uninstalling patches can deactivate patches that do not meet the system requirements, or delete patch files that are not needed by the system, thus releasing the memory space in the patch area of the master control board.

      1. (i)

        Install the patch.

        Since more than one patch files cannot run in the system simultaneously, it is necessary to execute the [display patch-information] command to check all the current patch information, including the running patch files, before installing a patch. If the running patch file is displayed in the information, the patch deletion operation should be performed to complete the uninstallation and deletion of this patch file.

        Before loading a patch, the user need to obtain the patch file through Huawei support website and upload it to the device. For specific steps, please refer to the software upgrade section. To install and load the new patch file immediately, execute the [patch load patch-name all run] command in the user view, where “patch-name” is the file name of the new patch, and the system will install and activate the patch immediately. If you want to load a new patch file at the next startup, execute [startup patch patch-name] in the user view, where “patch-name” is the file name of the new patch, and the system will load the new patch file at the next startup.

      2. (ii)

        Uninstall the patch.

        If the patch fails to meet the system requirements, or the storage space for the patches is insufficient, the user can uninstall the patch. In the user view, execute the [patch delete all] command to delete all patches in the system.

    3. (c)

      Backup and recovery of configuration files

      In the process of device operation, abnormal operation may occur for various reasons, which may affect the service. In order to ensure quick repair of faults and service recovery, backup of configuration files is required during routine maintenance.

      For the device running normally, there are many ways to back up the configuration files, among which the following are common.

      1. (i)

        Copy them directly from the screen. In this way, the user logs in to the device through CLI, executes the command [display current-configuration] to copy all displayed information to the text file, and then saves the text file to back up the configuration file to the hard disk of the maintenance terminal.

      2. (ii)

        Back up the configuration file to the flash card or other memories. The following steps are given to back up the configuration file to the flash card.

        <HUAWEI> save config.cfg <HUAWEI> copy config.cfg backup.cfg

      3. (iii)

        Back up via the FTP or TFTP. Through the FTP or TFTP, the user can transfer the configuration file to the hard disk of the maintenance terminal, the specific operation mode of which is similar to that in software upgrade.

In case of device failure, you can transfer the previously backed-up configuration files to the device. Execute the [startup saved-configuration] command to specify the configuration file for restart (specified as the file name of the backup configuration file transferred to the device), and then execute the [reboot] command to restart the device, thus restoring the configuration and repairing the failure.

Fig. 6.5
figure 5

An example of License certificate

Fig. 6.6
figure 6

Password activation—enter the authorization ID or activation password

Fig. 6.7
figure 7

Password activation—enter the ESN of the device

Fig. 6.8
figure 8

Password activation—confirm activation

Fig. 6.9
figure 9

Password activation—download the license

Table 6.1 Output information and description of the command [display license state]
Table 6.2 Output information and description of the command [display license resource usage]
Fig. 6.10
figure 10

Software upgrade topology

Fig. 6.11
figure 11

Interface of upgrade maintenance for system software

Fig. 6.12
figure 12

Open the FTP Server on PC

Fig. 6.13
figure 13

Upload the system software with the PC as the FTP client

Fig. 6.14
figure 14

Set the TFTP server root directory

6.2 Routine Maintenance and Troubleshooting

Network maintenance refers to the unified and coordinated actions in management and technology for the normal operation of the network system and the improvement of its stability and security. Network maintenance mainly includes routine maintenance and troubleshooting, in which routine maintenance refers to routine inspection and maintenance of the network to eliminate hidden dangers of equipment operation when the network is in normal operation, while troubleshooting refers to the process of emergency handling of the network when the network fails.

6.2.1 Maintenance Overview

In network maintenance, network operation and maintenance personnel must be familiar with the use of various network equipment such as routers and switches, which constitute the core components of the network, and understand their performance, data configuration, command and function realization. Also, they need to follow up the configuration of main network equipment and the changes of corresponding parameters, and take corresponding technical measures to repair faults and resume services in time when faults occur.

Operation and maintenance personnel must follow the following precautions when carrying out network maintenance.

  1. 1.

    When a fault occurs, evaluate whether it is an emergency fault. If yes, use the pre-established emergency troubleshooting method to recover the fault module as soon as possible, and then resume the service.

  2. 2.

    Strictly abide by the operation regulations and industry safety regulations to ensure personal safety and equipment safety.

  3. 3.

    In the process of replacing and maintaining equipment parts, take ESD measures and wear the ESD wrist strap.

  4. 4.

    In the process of troubleshooting, if any problems are encountered, record all original information in detail.

  5. 5.

    All major operations, such as restarting the equipment and erasing the database, shall be ed., and the feasibility of the operation shall be carefully confirmed before the operation. Only after the corresponding backup, emergency and safety measures are taken, can them be performed by qualified operators.

6.2.2 Routine Maintenance

On the one hand, the stable operation of the network system depends on the complete network planning, on the other hand, it is necessary to discover and eliminate the hidden dangers of equipment operation through routine maintenance. Routine maintenance of network system mainly includes equipment environment check, basic equipment information check, equipment running status check, interface content check and service check, etc.

Normal operation environment is the premise to ensure the normal running of equipment. During routine maintenance, the temperature, humidity, air conditioning status and power supply status of the equipment room should be checked regularly. The Checklist for equipment operation environment is shown in Table 6.3.

Table 6.3 Checklist for equipment operation environment

The basic information check of equipment mainly involves the correctness of the operating version, patch information and system time of the equipment. The checklist for basic information of equipment is shown in Table 6.4.

Table 6.4 Checklist for basic information of equipment

In the process of equipment operation, it is also necessary to check its operation status, such as the board operation status, equipment reset status and equipment temperature. The checklist for operation status is shown in Table 6.5.

Table 6.5 Checklist for equipment operation status

During routine maintenance, it is necessary to check the interface contents of network equipment and some basic services. Common interface content to be checked include interface configuration items, interface status and other information, while service checks involves the normality of services including IP, multicast, routing, etc. The items of interface content check and IP service check are shown in Table 6.6, and the recommended maintenance cycle is 1 week.

Table 6.6 Items of interface content check and IP service check

6.2.3 Troubleshooting

It is a challenge for network maintenance and management personnel to maintain the network correctly to prevent failure, and to ensure that the problems can be located and eliminated quickly and accurately after failure. This requires not only a deep understanding of network protocols and technologies, but also the establishment of a systematic troubleshooting idea and its rational application in practice, so as to isolate, decompose or reduce the scope of troubleshooting a complex problem and repair network faults in time.

  1. 1.

    Basic steps of troubleshooting

    The basic steps of network troubleshooting are observing circumstance, collecting information, judging and analyzing, and identifying the causes. The basic idea is to systematically reduce or isolate all possible causes of faults into several small subsets, so that the complexity of the problems decreases rapidly. Generally speaking, the troubleshooting process can be divided into three stages, namely, the fault information collection stage, the fault location and diagnosis stage and the fault repair stage. The following paragraphs will introduce the handling and operations required in each stage.

    1. (a)

      Fault information collection stage

      In case of service failure, the failure-related information should be collected first, which includes the following contents.

      1. (i)

        Time of failure, network topology structure of the failure point (upstream and downstream equipment connected by the failure equipment, network location), operation causing the failure, measures and results taken after the failure, failure symptom and affected service scope (the ports and services getting abnormal due to the failure), etc.

      2. (ii)

        Name, version, current configuration and interface information of the failed equipment.

      3. (iii)

        Log information generated in case of failure.

      Failure information is generally acquired in two ways, one is through the [display] command, and the other is through viewing the equipment log and alarm information. Among them, the [display] command is an important tool for network maintenance and troubleshooting, which tells the current status of equipment, detects neighboring equipment, monitors the network as a whole, and locates network faults. The device provides several [display] commands for checking the status information of hardware, interfaces and software. The commonly used [display] commands are shown in Table 6.7. Analyzing these status information is helpful to locate network faults.

      In addition, the diagnosis information of the device can be obtained with one click through the command [display diagnosis-information [file-name]], including the startup configuration, current configuration, interface, time and system version of the device. If the “file-name” parameter is not specified, the diagnosis information will be displayed on the terminal; if the “file-name” parameter is specified, the diagnosis information will be directly stored in the specified TXT file. It is recommended to output diagnosis information to the specified TXT file. By default, the saving path of TXT file is flash:/, and executing [dir] command in user view can confirm whether the file is generated correctly.

      [Example 6.14]

      Get diagnosis information with one click and output it as a TXT file

      The following is an operation example of obtaining diagnosis information with one click and outputting it as a TXT file. The specific commands and execution results are as follows.

      <Huawei>display diagnostic-information t0212.txt This operation will take several minutes, please wait.... .................. Info: The diagnostic information was saved to the device successfully. <Huawei>dir Directory of flash:/ Idx Attr Size(Byte) Date Time(LMT) FileName 0 drw-   - Feb 12 2020 05:31:54 dhcp 1 -rw-   121,802 May 26 2014 09:20:58 portalpage.zip 2 -rw- 2,263 Feb 12 2020 05:31:49 statemach.efs 3 -rw-   828,482 May 26 2014 09:20:58 sslvpn.zip 4 -rw-   135,168 Feb 12 2020 07:39:14 t0212.txt 5 -rw-   724 Feb 12 2020 05:31:47 vrpcfg.zip 1,090,732 KB total (784,328 KB free)

      The fault information can also be obtained by viewing the logs and alarm information of the device. When the device fails, the system automatically generates some system logs and alarm information. Collecting and analyzing these information will help the user to know what happened during the device operation and locate the failure point. The operation steps for obtaining the log and alarm information in the log file are as follows.

      1. (i)

        In the user view, execute the [save logfile] command to manually save the information in the log file buffer to the log file.

      2. (ii)

        Transfer all files in the directories “flash:/syslogfile/” (“flash:/logfile/” for v200r005c00 and later versions) and “flash:/resetinfo/” to the terminal by FTP/TFTP.

    2. (b)

      Fault location and diagnosis stage

      The purpose of fault location is to identify the cause of fault, which is the core action in troubleshooting. It depends on the fault information collected before. The more completely and accurately the information is collected, the more accurate and rapid the location can be.

      There are many reasons for network failure. In the case of newly completed configuration, the causes of network failure may include the following:

      1. (i)

        Incorrect or incomplete configuration;

      2. (ii)

        Excessively strict rules for configuration access;

      3. (iii)

        Equipment/protocol compatibility issues.

      For the faults in the actual operation network, the common reasons may be as follows:

      1. (i)

        Equipment changes, such as configuration modification, version upgrade, and board addition and deletion;

      2. (ii)

        Link failure in the network, and the configuration modification of peripheral equipment;

      3. (iii)

        Abnormal traffic, such as burst high traffic;

      4. (iv)

        Hardware failure.

      When a network fault actually occurs, the network management and maintenance personnel can reasonably analyze and locate the possible fault causes according to the fault information collected in the fault information collection stage, combining with appropriate network diagnosis tools, so as to lay a solid foundation for the next fault treatment.

    3. (c)

      Fault repair stage

      The purpose of troubleshooting is to eliminate the fault symptom and restore the normal operation of the network without causing other faults. Generally, the following three steps should be followed for troubleshooting.

      1. (i)

        List the possible causes through the collected fault phenomena. This step usually requires fault handlers with high technical level and experience.

      2. (ii)

        Develop a troubleshooting plan. When formulating a troubleshooting plan, the operation and maintenance personnel should comprehensively consider a variety of factors according to the network conditions and fault severity, including prioritize the troubleshooting steps, determining troubleshooting methods and tools, estimating the troubleshooting time and determining the actions after fault causes is identified, etc.

      3. (iii)

        Troubleshoot according to the plan formulated in Step B. During the process of troubleshooting, before proceeding to the next scheme, it is necessary to restore the network to the state before implementing the previous scheme. If the changes made to the network by the previous scheme are saved, it may interfere with the location of fault causes and may lead to new faults.

  2. 2.

    Common fault cases

    1. (a)

      Power module failure

      Generally, there are two kinds of power module failures. One is that the device cannot be powered on, and the system indicator and power indicator fail to light at this time. The other is that the power indicator is always red.

      If both “SYS” indicator and power indicator of the device are not on, the reason may be one of the following three.

      1. (i)

        The power switch of the device is not turned on.

      2. (ii)

        The power cable of the device is not firmly inserted.

      3. (iii)

        The device’s power supply module is faulty, which may be a pluggable power supply module, an external power adapter or a built-in power supply module.

      Corresponding handling includes the following four steps.

      1. (i)

        Confirm whether the power switch of the device is turned on.

      2. (ii)

        Confirm whether the power cable of the device is plugged in.

      3. (iii)

        Confirm whether the power module of the device is faulty. For the pluggable power module, it can be confirmed by replacing other pluggable power modules that works normally. If the device can power up normally, it can be confirmed that the pluggable power module is faulty. Please collect the fault information and contact technical support to replace the power module. If the device adopts an external power adapter, it can be verified by replacing another external power adapter that works normally. If the device can power up normally, it can be confirmed that the external adapter is faulty. Please collect the fault information and contact technical support to replace the power adapter.

      4. (iv)

        After completing the above three steps, if the device still cannot be powered on normally, it can be confirmed that the device itself is faulty. Please collect the fault information and contact technical support to replace the device.

      For the fault situation that the power indicator is always red, the reason may be one of the following three.

      1. (i)

        The power module of the device is not firmly plugged in.

      2. (ii)

        The pluggable power module of the device is faulty.

      3. (iii)

        The external power supply module of the device is faulty.

      In view of the above three fault causes, the following three steps can be taken to repair the fault correspondingly.

      1. (i)

        Insert the power module firmly.

      2. (ii)

        Replace the pluggable power module of the equipment.

      3. (iii)

        Replace the external power module of the equipment.

    2. (b)

      Fan module failure

      Common fault symptoms of fan module are full-speed operation of fan, loud noise, and “STATUS” indicator in red flashing state. There are four possible fault reasons.

      1. (i)

        The fan module is not fully inserted into the fan slot.

      2. (ii)

        The fan blade is stuck by foreign matter, resulting in locked rotation.

      3. (iii)

        The fan software is not upgraded to the latest version.

      4. (iv)

        The Fan module is faulty.

      Generally, the following steps can be taken to deal with fan module failures.

      1. (i)

        Re-plug and unplug the fan module to ensure that the fan module is reliably inserted into the equipment backplane, and tighten the loose screws on the fan module panel.

      2. (ii)

        Pull out the fan module, remove the foreign matter blocking the fan blades, and insert the fan module back into the frame.

      3. (iii)

        Confirm whether the device software version corresponding to the fan is old. If so, upgrade the software version of the fan.

      4. (iv)

        Replace the fan module with another one of the same model that works normally. If the fault disappears, it demonstrates that the fan module itself has a fault, and a new fan module should be replaced.

      [Example 6.15]

      Fan software version upgrade

      When the fan module fails, one of the possible reasons is that the fan software version is too old. Taking AR3260 as an example, the operation steps of upgrading the fan software version are introduced below.

      1. (i)

        When the fan operates at full Speed, execute the command [display fan] to check the status of the fan module. If the “Speed” status indicates “NA”, the fan is abnormal.

        <Huawei> display fan FanId FanNum Present Register Speed Mode 16 [1-3] YES YES NA MANUAL

      2. (ii)

        Re-plug the fan module, and execute the [display version] command to check the software version of the device. If the version is older than V200R003C01SPC300, the fan software version needs to be upgraded.

        <Huawei> display version Huawei Versatile Routing Platform Software VRP (R) software, Version 5.120 (AR3200 V200R003C01SPC300) Copyright (C) 2011-2013 HUAWEI TECH CO., LTD Huawei AR3260 Router uptime is 1 week, 5 days, 2 hours, 40 minutes BKP 0 version information: 1. PCB Version : AR01BAK3A VER.B 2. If Supporting PoE : No 3. Board Type : AR3260 4. MPU Slot Quantity : 2 5. LPU Slot Quantity : 10

      3. (iii)

        Collect fault information and contact technical support to obtain the corresponding software version.

      4. (iv)

        Refer to the software upgrade steps in Sect. 6.1.2, and add the software version to the storage medium of the device by FTP or TFTP.

      5. (v)

        In the diagnosis view, execute the [upgrade fan-software startup] command to upgrade the fan software version.

        <Huawei> system-view Enter system view, return user view with Ctrl+Z [Huawei] diagnose Now you enter a diagnostic command view for developer's testing,some commands may affect operation by wrong use,please carefully use it with HUAWEI engineer's direction [Huawei-diagnose] upgrade fan-software startup Info: Now Loading the upgrade file to fan-board, please wait a moment Info: Upgrade the fan-board successfully.The new version is 108, while the old version is 103

      6. (vi)

        If the fan module is plugged or unplugged or the upgrade fails in the process of upgrading the fan software version, the following message will appear. At this time, you can re-plug the fan module, and return to Step E to upgrade the fan software version again.

        [Huawei-diagnose] upgrade fan-software startup Info: Now Loading the upgrade file to fan-board, please wait a moment Load app get response fail! Index = 0xaa Load Tx fail! Error: Load the upgrade file to fan-board fail

    3. (c)

      Board failure

      During the device operation, the board may fail to power on, register or reset normally.

      There are two possible reasons for the failure that the board fail to power on.

      1. (i)

        The board is not firmly inserted.

      2. (ii)

        The software version is not compatible.

      Generally, the following steps can be taken to deal with board failures.

      1. (i)

        Check whether the board is firmly inserted.

      2. (ii)

        Execute the [display version] command to check the software version.

      3. (iii)

        Submit the version information displayed in Step B to the technical support to check whether the board supports the software version.

      During the software upgrade of the system, the original board that can be registered normally may fail to register, where if you execute the [display device] command, you will find that the “Register” status of the board is “Unregistered”, meaning that the registration failed. The reasons for such failure generally include the following two.

      1. (i)

        The board is not plugged firmly.

      2. (ii)

        When upgrading the device software, the system software is upgraded before upgrading the board software. If the device occurs power failure during the board software upgrade after the system software upgrade, the board software update error will result.

      The corresponding troubleshooting steps are as follows.

      1. (i)

        Re-plug the board, and check whether there is a reverse pin in the backplane connector in the device case. If there is a reverse pin, repair the reverse pin before inserting the single board to ensure that the single board can be reliably plugged into the backplane.

      2. (ii)

        Collect fault information and contact technical support to restore the corresponding software version.

      In addition, the abnormal reset of the board may occur during the device operation. Generally, there are four reasons for the abnormal reset of the board.

      1. (i)

        The system power supply is not connected reliably.

      2. (ii)

        The board is not firmly plugged in the device backplane.

      3. (iii)

        The power grid voltage is unstable.

      4. (iv)

        There is a thunderstorm.

      To repair the fault of abnormal board reset, the corresponding steps are as follows.

      1. (i)

        Turn off the power switch of the device, and plug in the power cable and power module to re-power it on.

      2. (ii)

        Re-plug the board for reliable connection between the board and the device backplane.

      3. (iii)

        Observe whether the incandescent lamp is flickering to judge whether the voltage is stable. If the voltage is unstable, it is recommended to use a voltage stabilizer or an uninterruptible power supply to supply power.

      4. (iv)

        Connect the grounding point on the device with the indoor equipotent connection terminal for effectively reducing the risk of abnormal board reset due to thunderstorm.

    4. (d)

      Port failure

      Generally, a port failure shows that the port cannot be “UP”, and the indicator of the corresponding port on the device is abnormal. Common ports include Ethernet port, optical port, E1 interface, etc. Taking Ethernet port and optical port as examples, the troubleshooting steps for the occasion that the port cannot be “UP” are introduced as follows.

      When the Ethernet port cannot be “UP”, the port indicator is off, and the physical layer or protocol layer cannot be “UP” also. The possible reasons include the following four.

      1. (i)

        There is a problem with the network cable.

      2. (ii)

        There is a problem with the configuration of the network port.

      3. (iii)

        There is a problem with the auto-negotiation compatibility.

      4. (iv)

        The board is faulty.

      The corresponding troubleshooting steps are as follows.

      1. (i)

        Replace a workable network cable.

      2. (ii)

        Check whether the configuration parameters (port speed, duplex or not, auto-negotiation, etc.) of the device at both ends of the network cable are consistent; if not modify them for consistence.

      3. (iii)

        If the parameters at both ends are consistent and both are in auto-negotiation mode, but the fault still exists, try to set the ports at both ends to mandatory mode, because interconnection between some non-Huawei devices and Huawei devices may lead to failure of auto-negotiation.

      4. (iv)

        Interconnect with another port on the same board with a workable network cable to perform the loopback test. If the loopback test works normally, it indicates that there may be a problem with the opposite device; otherwise, replace the port for the next test.

      5. (v)

        If the port still fails to “UP” after the port replacement and the loopback test, it is judged that the board is faulty.

      6. (vi)

        Replace a port on another board for test. If the fault is repaired, replace the original faulty board; otherwise, collect fault information and contact technical support.

      After the optical fiber is connected, if the optical port fails to “UP”, the “LINK” indicator corresponding to the optical port is generally off. Possible fault causes include the following four.

      1. (i)

        There is a problem with the optical fiber.

      2. (ii)

        There is a problem with the optical module.

      3. (iii)

        Inappropriate optical attenuation is selected.

      4. (iv)

        For the interface where the electrical port and the optical port are multiplexed, the fault cause may be that the interface is configured as an optical port.

      For such failure, the general handling steps are as follows.

      1. (i)

        Adopt a workable optical fiber or optical module to verify whether there is any problem with the optical fiber or optical module.

      2. (ii)

        Confirm whether the optical module used by the port is Huawei-certified.

      3. (iii)

        Confirm whether the speed of the optical module is consistent with that of the optical port.

      4. (iv)

        Confirm whether the working wavelength of the optical module is consistent with that of the optical module at the opposite end.

      5. (v)

        Confirm whether the use distance of the optical module is equivalent to the nominal distance.

      6. (vi)

        For the interface where the electrical port and the optical port are multiplexed, execute the [display this] command in the corresponding interface view to check if the current interface is set as an optical port.

      7. (vii)

        Execute the command [display transceiver verbose] to check the information of the optical module, check whether there is an alarm, and take actions according to the alarm. For example, when it is indicated that the received signal is too high, the optical attenuation of the receiving circuit can be appropriately increased.

      8. (viii)

        After the above troubleshooting, if the fault still exists, collect fault information and contact technical support.

    5. (e)

      Storage failure

      Common storage failures include memory usage alarm, failure to use SD card, failure to use USB memory, etc.

      Memory usage rate refers to the proportion of the memory space occupied by the program to the total memory. It is one of the important indicators to measure equipment performance. By default, an alarm will be generated when the memory usage rate exceeds 95%. In this case, if the memory usage rate continues to increase, the system will eventually reset automatically, resulting in service interruption. In the process of equipment operation, some applications may occupy memory for a long time without releasing it, which leads to the cumulative increase of memory occupancy, eventually leading to the exhaustion of system memory. This failure is called memory leakage.

      In case of memory leakage, the total memory usage rate of the device, the size of Zone 2, the specified block, the memory usage of each PID and the specified PID shall be collected and delivered to technical support.

      The more common storage failure is the inability to read and write SD card or USB memory, which may be caused by damage or poor contact of SD card and USB memory. In case of such failure, it can usually be repaired by replacing or re-plugging the SD card or USB memory. If the failure still exists, you can collect fault information and contact technical support.

Table 6.7 Common-used [display] commands

6.3 Summary

This chapter mainly introduces the related knowledge of network system resource management and maintenance, in which the resource management includes the management of hardware resources and software resources, while the maintenance involves routine maintenance and troubleshooting.

Through the study of this chapter, readers will have a understanding the main parts of network resource management and network system maintenance, master the common resource management methods and specific steps, and acquire familiarity with the routine maintenance of network system and certain capability to handle failures.

6.4 Exercise

  1. 1.

    In the frame equipment, the board on Slot 1 can be reset by executing the () command in the user view.

    1. A.

      reset system

    2. B.

      reset slot 1

    3. C.

      reset slot 2

    4. D.

      reboot

  2. 2.

    [Multiple choices] The energy-saving management technologies supported by Huawei network equipment include ().

    1. A.

      EEE

    2. B.

      Automatic fan speed regulation

    3. C.

      Automatic laser turn-off

    4. D.

      Frequency conversion

  3. 3.

    [Multiple choices] License can be divided into () by purpose.

    1. A.

      COMM

    2. B.

      DEMO

    3. C.

      Temporary License

    4. D.

      Permanent License

  4. 4.

    [Multiple choices] Electronic labels can be backed up by ().

    1. A.

      Backup to storage medium

    2. B.

      Backup to FTP server

    3. C.

      Copy and paste

    4. D.

      Backup to the TFTP server

  5. 5.

    [Multiple choices] Troubleshooting can be divided into ().

    1. A.

      Fault information collection stage

    2. B.

      Fault location and diagnosis stage

    3. C.

      Service recovery stage

    4. D.

      Fault repair stage