Keywords

1 Introduction

Internet of Things (IoT) is well known to involve technology of computing, control, and wireless networking, under wide-range of application scenarios. Complex systems engineering of IoT complicates efficient realization of IoT systems. While control of intelligent systems such as robots is widely implemented by artificial intelligence (AI) computing, holistic integration of computing and networking emerges a critical technology for future IoT. This article presents a unique aspect to look into this new technological paradigm, we first review the applications of machine learning (ML) in the wireless networking of IoT systems (in Sect. 2), then turn to a particular scenario of applying AI and ML into smart factory of networking (in Sect. 3). By investigating the facilitation of ML in wireless networks and the requirements of wireless networking into multi-agent AI systems, the possible picture of holistic integration of AI computing and wireless networking in IoT, particularly involving mobile agents such as autonomous vehicles, mobile robots, smart factory, has been initially investigated in this paper.

2 Applications of ML in IoT Wireless Networking

Currently, a vary of remarkable developments of the network, notably wireless networks, are breaking the boundaries between virtual and reality and evolve narrow cyber towards the Internet of Things (IoT). The terminals of the internet are no longer limited to PCs, mobile phones. TVs, lamps, mirrors, water heaters, air conditioners, taxis, elevators, workstations and so on are all surprisingly connected to the Internet. The interconnection of all things dramatically facilitates people’s work and life. And wireless sensor networks (WSN), the main part of IoT, extend human physiological perception limits. These vast changes also pose profound challenges to theoretical research.

Unlike a series of “excellent” features of traditional wireless communication systems, IoT has been a significant impact on wireless communication technologies. For example, the number of network nodes has increased dramatically, the network topology has become more dynamic, links and interference have become denser, and network transmission has demand fluctuates sharply, etc. Recently, Machine Learning, particularly Reinforcement Learning, has been used as an emerging tool to effectively address the above problems and challenges. Different from classic convex optimization or optimization methods, ML shows exciting performance in solving complicated mathematical structures and instability decisions.

According to our investigation of published articles, the current research directions of applying ML in IoT can roughly divide into three categories: MAC protocol design, cache and offload in MEC scenarios, and network security. Besides, there are some explorations for unique scenarios and particular problems that not included in this list. (The framework is shown in Fig. 1).

Fig. 1.
figure 1

The appli of Machine Learning for IoT wireless network

2.1 MAC Protocol: Wireless Access, Routing and Others

Modern networks, particularly IoT, become more decentralized and ad-hoc and more dynamic in topology and routing. In IoT, entities such as sensors and mobile users need to make independent decisions, e.g., multi-routing selections, channel selections, to achieve their own goals, e.g., throughput maximization. However, this is challenging due to the dynamic and the uncertainty of network status. The reinforcement learning represented by Q-learning and Deep Q-learning Network (DQN) can intuitively adapt to the design requirements of such mac protocols. Therefore, extensive research has been carried out in this direction, such as dynamic wireless access schemes, dynamic routing, multi-point cooperative communication, etc.

Focus on multiple Access for Heterogeneous Wireless Networks, [54] proposed and investigated a MAC protocol based on deep reinforcement learning for heterogeneous wireless networking, referred to as Deep-reinforcement Learning Multiple Access (DLMA). A salient feature of DLMA is that it can learn to achieve an overall objective (e.g., \(\alpha \)-fairness objective) by a series of state-action-reward observations while operating in the heterogeneous environment. In particular, it can achieve near-optimal performance with respect to the objective without knowing the detailed operating mechanisms of the coexisting MAC. For dynamic routing implementation issues, [49] devises the off-policy \(Q_{ssp}\) algorithm and the on-policy \(SARSA_{ssp}\) algorithm to solve the routing problem in Wireless Sensor Networks. Specifically, the authores tackle the stochastic shortest path problem using reinforcement learning schemes by modeling the path searching procedure as an appropriate discounted Markov decision process. [41] considers the multihop routing algorithm plays an important role in the exploration and monitoring of deep-sea environments. For this A proposed a routing algorithm based on the Q-learning for 3D under water WSN. Combined with defined distance and energy paths, the researchers derived the iterative formula of the Q-table. The proposed QL-EDR algorithm can extend the network lifetime and improve the efficiency of the data collection, compared to the conventional protocol. In addition, the authors defined a regulatory factor to adjust the network performance. According to the realistic demands, choose appropriate values of factor to improve the network throughput, to reduce the average end-to-end delay or to prolong the network lifetime.

[62] works on improving the packet transmission efficiency using Cognitive networks. A Q-learning-based transmission scheduling mechanism using deep learning for the cognitive radio-based IoT is proposed to solve the problem of how to achieve the appropriate strategy to transmit packets of different buffers through multiple channels to maximize the system throughput. [23, 43] have also tried machine learning applications in the field of IoT congestion control, and has gained some valuable experience. [57] proposed a cooperative spectrum sensing algorithm for cognitive radio networks. By implementing DQN based on upper confidence bounds with Hoeffding-style to improve the exploration efficiency, the proposed algorithm can achieve better reward performance with faster convergence speed than the conventional algorithms based on Q-learning with \(\varepsilon \)-greedy.

We note that most of the research in this direction models the network state of IoT as MDP. In addition, the DQN method receives more attention than other algorithms. The future network will involve multiple network slides, and these network slides have multiple conflicting goals, which brings many challenges to the traditional resource management mechanism and network standard formulation that are worthy of in-depth study.

2.2 MEC: Caching and Offloading

MEC is one of the key scenarios of IoT, and the intra-network cache can effectively reduce duplicate content transmission. Research on wireless caching shows that by caching content in wireless devices, you can significantly reduce access latency, energy consumption, and overall traffic. This direction has also attracted many studies. As each node’s storage, computing, and energy consumption capabilities are limited, how to coordinate collaboration between nodes, such as decision cache content, has become a focus of attention.

Most recent studies focus on “attention” tagging of content and tasks and caching and calculation offload allocation based on importance. But there are still many attempts to provide new ideas for research. [25] cares about that rare wireless network resources are difficult to meet the influx of a huge number of terminal devices. Specifically, the authors use two potential recurrent neural network approaches, the echo state network (ESN) and the long short-term memory (LSTM) network, to make predictions about user mobility and content popularity. Finally, use DQN algorithm to make cached decisions for prediction results. [61] formulated the cache replacement problem as a MDP problem and proposed a DRL-based caching policy. In the model S, A and reward are definded as values of information about cached/arrived data items, the caching action selected by the edge node and the sum utility of all data items which are requested, respectively. [44] tries to simultaneously tackle the issues of content caching strategy, computation offloading policy, and radio resource allocation, in fog computing. Authors use the actor-critic reinforcement learning framework to solve the joint decision-making problem with the objective of minimizing the average end-to-end delay, due to wireless signals and service requests have stochastic properties. The deep neural network (DNN) is employed as the function approximator to estimate the value functions in the critical part due to the extremely large state and action space in the problem. The actor part uses another DNN to represent a parameterized stochastic policy and improves the policy with the help of the critic.

At the same time, some scholars have also noticed the connectivity of the MEC devices in this scenario. [29] focus on the connectivity solutions especially for those covering the wide remote areas in the scale of kilometer squares. Although many low-power wide-area network technologies are supposed to support long-range low-power wireless communication, underneath star topology limits the scalability of the networks due to the need for a central hub. To provide connectivity to a wider area, the authors propose to build the mesh topology upon these LPWAN technologies and propose a distributed as well as energy-efficient reinforcement learning based routing algorithm for the wide-area wireless mesh IoT networks. [16]’s goal is to acquire an online algorithm that optimally adapts task offloading decisions and wireless resource allocations to the time-varying wireless channel conditions. This requires quickly solving hard combinatorial optimization problems within the channel coherence time, which is hardly achievable with conventional numerical optimization methods. The authors propose a Deep Reinforcement learning-based Online Offloading (DROO) framework that implements a deep neural network as a scalable solution that learns the binary offloading decisions from the experience. It eliminates the need for solving combinatorial optimization problems, and thus greatly reduces the computational complexity especially in large-size networks.

MEC scenarios involve very complicated system analysis, which is due a unified study on caching, offloading, networking, and transmission control. Strong couplings among mobile users with heterogeneities in application demand, QoS provisioning, mobility pattern, radio access interface, and wireless resources also cause for above. A model-free reinforcement learning approach becomes a promising candidate to manage huge state space and optimization variables.

2.3 IoT Security and Reliability

In IoT, physical devices, sensors, appliances, and other different objects can communicate with each other without the need for human intervention in IoT. And IoT has many critical and non-critical applications. The security of IoT became a crucial problem. Future networks become more decentralized and ad-hoc in nature which is vulnerable to various attacks such as Denial-of-Service (DoS) and cyber-physical attacks. Recently, the DQL has been used as an effective solution to avoid and prevent the attacks [31].

In [52], the Markov game framework is employed to model and analyze the anti-jamming defense problem. Based on Q-learning, the authors development a collaborative multi-agent anti-jamming algorithm. As machine learning and artificial intelligence can be used for the protection of devices by analyzing traffic or devices behavior, the [48] development a model of increasing security of wireless environment for IoT appliance through creating a fingerprint by Machine Learning algorithm. The experimental result shows that the model is able to detect anomaly flooding traffic in Wi-Fi networks based on characteristic patterns that separate normal traffic from malicious activity.

A wide variety of low-cost radio technologies, that being used to enable wireless communication in IoT, brings a security problem due to the fact that it is very easy for a malicious user to perform passive wireless signal scanning on these networks and use this information to launch identity-based attacks. In [34], the authors propose a learning-based strategy to detect spoofing attacks in wireless sensor networks. Based on detailed analytical models for the mobile radio channel, the proposed algorithm combines two classifiers to process and analyze the instant samples of received signal strength to detect attacks. In [11], a watermarking algorithm is proposed for dynamic authentication of IoT signals to detect cyber-attacks. The proposed watermarking algorithm, based on a deep learning long short- term memory structure, enables the IoT devices to extract a set of stochastic features from their generated signal and dynamically watermark these features into the signal. This method enables the IoT gateway, which collects signals from the IoT, to effectively authenticate the reliability of the signals.

The sharp increase in interference caused by the dense network is also an aspect that needs to be explored in depth. [14] considers the optimization of the cache-enabled opportunistic interference alignment network as a so complex problem. The results in the literature were demonstrated that the performance of cache-enabled opportunistic IA networks can be significantly improved by using the proposed deep reinforcement learning approach.

We found that the research on reinforcement learning applied to network security is mainly focused on anomaly detection and identity authentication, and research on interference in network transmission needs to be further promoted.

2.4 Low-Power Operation and Sensor Networks

In addition to the above directions in which the studies are concentrated, scholars have also conducted extensive explorations on the application of machine learning in wireless networks. Such as power supply problems in low power networks, network structure update problems, etc.

Sensing devices operating in the upcoming IoT are likely to rely on the radio frequency (RF) transmissions of a hybrid access point (HAP) for energy [51]. The HAP is also responsible for setting the sampling or monitoring time of these devices according to their harvested energy. A challenging issue is that setting the HAP’s charging time and also the sampling time of each device with imperfect channel gains information. [51] also propose a scheme, through the improvement of Actor-Critic algorithm, to minimize the sampling time of the device. Wireless sensor network has the characteristics of scattered network requirements and uneven information. [46] study the WSN-based field sensing and reconstruction problem. The authors establish a two-layer learning framework based on reinforcement learning, and present the detailed design for an adaptive sampling policy which can actively determine the most informative sensing location and thus significantly reduce the communication cost.

3 Machine Learning in Smart Factory

A smart factory is an IoT system of particular interest. Factories, especially manufacturing factories are embracing the notion of integrating cyber resources such as computation, networking and physical processes together to drive the development of smart factory, which is Cyber-Physical Systems (CPS) [38, 42]. Several technologies are believed to bring evolutionary changes to the traditional factories in industry, they are: Internet of Things (IoT), Wireless Sensor Networks (WSN), Cloud Services and Artificial Intelligence [38, 42, 47]. The main goal is to accommodate product variants and production number variance to fulfill the demands from major customers down to individual customers. That requires the real-time collection of relevant information, a fast reintegration of resources within the factory and an optimized re-setup or reconfiguration solution of physical entities within the factory [40]. In addition to that, the call for sustainability requirement smart factory to improve the utility of raw materials and energy [40], even a higher efficiency of supply chain, product packing and logistics among factories.

The integration of cyber resources and physical entities happens on top of automated manufacturing equipped factories. By regarding each physical entity after integration a system, the whole CPS is actually a system of systems. The integration of cyber resources and physical entities is considered in two ways: vertical integration that emphasis on the real-time information collection and control, and horizontal integration that emphasis on cooperation among physical entities [40, 59].

Fig. 2.
figure 2

A horizontal integration towards smart factory.

3.1 Computing and Networking Systems in Smart Factories

By regarding each physical entity after integrating a system, the whole CPS is actually a system of systems. As shown in Fig. 2, illustrative systems including product design, materials supply (raw materials), energy use to drive the manufacturing, entities in manufacturing process, product test (inspection), maintenance of the factory, logistic (local logistic within the factory dealing with semi-products, packing and shipping of the finished products), local fog nodes(including storage, computing and etc.), cloud service (including cloud storage, cloud computing and etc.) and backbone network (with internet access) are considered to be the import systems in smart factories [38]. Paper [4, 38] expounded data powered “smart design” in future industry. The sensing devices integrated to the traditional factory brings tremendous amount of data from every stage of manufacturing. After proper data processing and visualization, the designers could refine the manufacturing processes, part design and etc towards more flexible and energy saving manufacturing. Paper [4, 17, 38] discussed the supply chain in smart factory. Since the fact that the production requirement is dynamic in the smart factory age, the short-term supply system paves a solid foundation of smart manufacturing. Paper [17] proposed a multi-objective, multi-stage flexible flow-shop scheduling model for fast response supply chain and manufacture agent collaboration. Resource and energy efficiency is another important indicators toward next generation industry in that it’s directly related to the profit and environment. Smart energy supply gives energy consumption data as a feedback to the designer and management to improve the plant organization and production design [19, 42, 59]. A lot of paper in literature focus on Multi-agent System (MAS) in manufacturing system. Paper [17, 39] proposed algorithms for scheduling in MAS considering efficiency and dynamic. Paper [10, 27, 36] adopt Machine Learning (ML), Reinforcement Learning (RL) and Deep Learning (DL) to give solutions to MAS task allow cation and scheduling considering the load balancing and efficiency. Paper [24] proposed a deep learning based inspection system with high accuracy, which can find the possible defective products. Paper [35] gives a good vision of smart factory maintenance considering the task offloading, path planning, and access point selection in mobile scenario. Besides, paper [45] introduced ML based mechanical tool wearing prediction, which is a good addition to the smart maintenance. Local logistics is also important part of smart factory. Paper [38, 40] discussed about raw material distribution and (semi-)product collection and delivery within smart factory. In the framework proposed in Fig. 2, the edge devices, local fog nodes and cloud all have the capability of computing. Paper [24] and [27] give a possible solution that utilize the edge and fog computation. The good side is edge/fog computing delivers lower latency than cloud computing in practical application. Of course, data intensive and complicated deep learning algorithms may still good to be executed on the cloud, but edge computing and fog computing are more in line with the needs of smart factories for real-time environment and requirement changes.

Fig. 3.
figure 3

A vertical integration towards smart factory.

3.2 Vertical Integration

As shown in Fig. 3, seven important elements in traditional factory could all get integrated with WSN, actuators, computing and AI to become a system. Each one of four technology is a layer put on top of traditional element, so it’s called vertical integration. After equipped with WSN, the traditional element in addition to the backbone network and fog or cloud becomes a IoT system. After equipped with actuator (either software or physical part that could control the entity), the traditional element becomes a cloud controlled IoT system. After the integration of computing capability and AI algorithms (software), the traditional element becomes a complete local AI agent. For example, the energy meter in the energy supply system will keep monitoring the energy consumption data and then upload to cloud storage. The related designer could utilize those data for further refining the design of the product in purpose of green manufacturing. Another example is, after collecting maintenance data from the cloud service, the AI on the cloud send an instruction that one of the robots in manufacturing need to get maintenance. Therefore, the other robots will get more tasks to make up that change for the overall goal. In this example, the actuator is the software running on manufacturing robots. Like just mentioned, the IoT integration brings data that hard to access in traditional factory. This makes machine learning, deep learning and reinforcement learning based AI integration possible to facilitate the smart manufacturing.

3.3 Horizontal Integration

As shown in Fig. 2, after the vertical integration, seven systems are connected to the backbone network of the smart factory. Also, they play a part of the smart factory network. The connections indicate the physical interaction that could happen in the smart factory, of course, along with data exchange. For example, the AI agent within the manufacturing system find the materials are running out. It could send material requirement to the material supply system. The material supply system then prepares the materials and send requirement to the local logistic system. The logistic system then initiates a material distribution. Another example is all the machine related systems (manufacturing, product test, logistics and etc.) may need daily maintenance. They send their daily running statistics and AI agent in maintenance system will analysis those data and gives a cost-optimized and energy efficient maintenance schedule. This kind of integration emphasis the interaction among systems in smart factory, making a better quality, higher efficiency automation possible. Horizontal integration is vital especially when there are human involved in the overall work flow or the vertical integration is incomplete considering the cost. The reason is that, when there are human involved or incompleteness of vertical integration, some data are not available to the cloud service. For example, in a factory, the local logistics has to be done by human. However, the output of human part, that is, the delivery of the raw material is dynamic because of the variance of working efficiency. At the same time, no measurement will be acquired directly from human in terms of, for example, the working efficiency considering the privacy. Thus, those measurement could be acquired from the next, fully integrated process of the manufacturing by interaction: the manufacturing robot received the raw materials, as a variable in the whole manufacturing. Therefore, the horizontal integration requires sensing and data exchange among systems in the smart factory, which will need extra sensors or related parts. Other reasons such as latency of a centralized cloud based control, failure of the data collection system and etc. brings the necessarily of the horizontal integration. Paper [3] introduces a Parallel Reinforcement Learning (PRL) based IoT system to reduce the learning time of Reinforcement Learning (RL) considering the communication overhead. The simulation based on the multi-agent system in smart factory gives a good vision of horizontal integration.

4 Future Networking and Computing Architecture

The holistic networking and computing architecture can be facilitated from two aspects: (1) machine learning for communications and networks (2) networking for AI agents to form a networked multi-agent system, which will be detailed in the following two sub-sections.

4.1 State-of-the-Art Applications of Machine Learning to Future Wireless Network Architecture

Future wireless network architecture accommodating machine learning (ML) emerges as an important technology for next decades, while ITU-T forming a focus group (FG) to study from 2018 to 2020. When incorporating ML functionalities into network architecture, there are two mechanisms to execute ML algorithms: online ML and offline ML. The online ML computing means the ML functionality is embedded into networking algorithms or protocols, and thus must be implemented into the corresponding network entities. On the other hand, if the ML functionality is executed then used to assist network functionalities, it is known as offline ML computing that can be executed in a co-locating computing facility connected to the corresponding network entities. The offline ML can be also computed in another far-away computing facility and then transfer the model of learning to the target network entity. As shown in Fig. 4, the ML computing can be executed and co-located with the user equipment (UE) or agents, radio access network (RAN), or core network (CN), in addition to the cloud. The emerging edge computing or edge artificial intelligence (AI) [33] can be considered co-locating with RAN.

Fig. 4.
figure 4

Alternatives, agent computing with UE, edge computing co-locating with RAN, and cloud computing through CN to implement machine learning or AI in the wireless network architecture

Generally speaking, ML can be applied in a few possible networking and communication scenarios:

  • Channel State Information (CSI): CSI is critical to air-interface technology for networking algorithms and physical layer communication, which has been considered to be inferred or estimated with the aid of deep learning [18, 53], or calibration the channel models for preferred CSI [1].

  • User Behavior: User behavior such as human/vehicular mobility patterns can be useful to network management and mobility management functionalities [7], and autonomous system operation [8], through big data analysis by ML or reinforcement learning [28].

  • Traffic Prediction: Deep packet inspection, network intelligence, and user mobility patterns, can be used to predict wireless network traffic for more efficient network/radio resource allocation [20, 55].

  • Cybersecurity: ML might be one of the most attractive tools to enhance network security, detect attacks and intrusions to networks [13, 56].

  • Anticipatory Networking Mechanism: Except using reinforcement learning or multi-armed bandit mechanisms for radio resource or network resource allocation [32, 58], existing applications of ML to wireless networking is generally offline learning to assist or enhance existing solutions. However, another advantage of ML is to develop predictive networking mechanism via online learning such that ML can enable networking functionalities that are nor possible before. One of such rare examples is the anticipatory mobility management using Naive Bayesian and recursive belief update [6, 26] as online learning to enable proactive communication and virtual cell for ultra-low latency wireless networking, where anticipatory is widely adopted in AI.

4.2 Networked Multi-agent Systems

In addition to apply ML to wireless networks, another question of AI computing and wireless networking arises: what is the desirable wireless networking for agents using ML? More precisely, how to design a wireless network for agents of machine intelligence, say a multi-robot system (MRS) or a multi-agent system (MAS).

Legg and Hutter gave an informal definition of machine intelligence in [22]: Intelligence measures an agent’s ability to achieve goals in a wide range of environments. Distributed artificial intelligence (DAI) has been brought into attention in AI research well over 3 decades [2], which has two common sub-disciplines: distributed problem solving (DPS) and multi-agent system (MAS). DPS typically decomposes task into several not completely independent sub-problems that can be executed on different processors and then synthesizes a solution. On the other hand, MAS considers an agent is an intelligent entity, which can be a robot or an AV, with goals and actions in an operating environment. In state-of-the-art CPS/IoT that are highly parallel in computing, a MAS typically represents a complex system of multiple agents and the mechanism for coordination of agents’ behaviors. Please recall that Demazeau inspiringly defined MAS consisting of four major aspects: agents, environments, interactions, and organization [9]. When RL deals with agent’s action and environment, communication for decisions was brought into MAS of agents using RL by modeling as partially observed MDP [37, 50]. Though communication or exchange of actions by agents has been studied in MAS and DAI for a long time, the features of wireless networks have been hardly considered in literature. In [37], finite number of communication channels with so-called fast communication was considered for information sharing among a team of cooperative agents. [12, 30, 60] indirectly considered the communication in MAS. However, realistic wireless communications and networking has not well taken into consideration, nor impacts on ML mechanisms.

An interesting study looks into collective behavior of autonomous vehicles moving across a region of Manhattan streets, by treating the behavior of each autonomous vehicle as an agent using reinforcement learning. It is shown that wireless networking reduces average delay [21]. In such wireless networking, different from human-to-human personal communication, the reward map and policy of another autonomous vehicle in the interaction range would be useful information to exchange. The age of such information is critical and thus ultra low-latency wireless networking is highly preferred, in which the real-time ALOHA has been considered as multiple access. For collaborative robots that each has own machine learning algorithms (such as moving actions and planning) to execute toward a common goal, wireless networking would be extremely beneficial to collective efficiency [5]. A lot of issues remain open in such networked MAS, such as network topology [15] and innovative machine learning for networked MAS.

5 Conclusions

Holistic integration and interaction of AI computing and wireless networking for IoT systems still has a long way to develop. This paper initially brings up the literature survey and in-depth discussions toward this ultimate goal. Many open issues still require remarkable technological innovations in the future.