# Big Data in 5G

**DOI:**https://doi.org/10.1007/978-3-319-32903-1_58-1

- 2 Citations
- 1 Mentions
- 379 Downloads

## Synonyms

## Definition

The fifth-generation wireless systems, abbreviated as 5G (Andrews et al. 2014), are proposed as the next wireless and mobile communications standards beyond the current 4G standards. 5G networks not only aim at providing higher data rate, lower latency, larger capacity, and better customer experience than 4G but also commit to fulfilling the Internet of things (IoT) with reliable and secure services at low costs (Atzori et al. 2010). To this end, 5G networks call for and rely on seamless operations of distinctive wireless technologies and solutions, including cognitive radio (CR) (Akyildiz et al. 2006), massive multiple-input multiple-output (maMIMO) (Larsson et al. 2014), millimeter wave (mmWave) communications (Rappaport et al. 2013), heterogeneous network (HetNet) architecture, cloud-based radio access, edge computing and caching (Hu et al. 2015), device and interference management (Asadi et al. 2014), etc. These revolutionary technologies deal with very wide radio spectrum, extremely high-frequency bands, dynamic spectrum access, large-scale antenna arrays, a huge number of devices, massive connectivity, context-aware computing, and so on. All these issues signify signal and data processing in regimes of exceedingly large volume, size, and dimension. As 5G services and technologies both support and contribute to an unprecedented amount of data traffic, the field of wireless communications has entered a Big Data era with imminent challenges and opportunities.

In the era of data deluge, Big Data refer to the kinds of data that are extremely large in terms of the size of available dataset and/or extremely complicated in terms of the processing complexity to separate, classify, and analyze the data. Thus, management of Big Data goes beyond the capability of conventional signal processing techniques and data analysis tools. Further, Big Data in 5G present unique challenges due to the characteristics of wireless and mobile data traffic and service expectations. Multi-domain resources, e.g., time, frequency, space, energy, and code, are intertwined in complex ways, and real-time content delivery are often expected by lightweight mobile devices. As a result, traditional processing and analysis techniques become inadequate to deal with such a huge amount of complex data within a tolerable elapsed time at affordable costs. Fortunately, recent advances and developments in machine learning and sparse signal processing illuminate pathways to embrace the challenges and opportunities from Big Data in 5G.

## Historical Background

With the pervasive connectivity offered by 5G technologies, it is becoming a reality that we live in an immersive smart world with smart cities, smart networks, smart cells, smart grids, smart homes, smart vehicles, smart industries, smart devices, smartphones, smart sensors, and so on. The rapidly growing 5G and IoT applications generate an unprecedentedly vast amount of diverse data, which create and expand the regime of Big Data in 5G. On the one hand, the huge volume and high complexity of Big Data cannot be adequately and efficiently supported by legacy wireless technologies in spectrum access, resource allocation, and network management, all of which need to be revamped to address the Big Data challenges. On the other hand, Big Data, as new fuel for learning and predicting channel and network traffic behavior, offer great opportunities for improved system performance and much enhanced quality of experience in 5G.

In the regime of Big Data in 5G, although the dimension of signals and the quantity of data grow unprecedentedly large, there exist unique opportunities because of some useful properties, structures, and features exhibited in the underlying signals and data at both the signal and network levels. Harnessing such properties, structures and features not only allow to acquire data efficiently and process signals smartly but also enable prediction, control, management, and optimization of 5G networks in a proactive manner. Along this line, recent advances in powerful machine learning and intelligent signal processing techniques play key roles in not only supporting Big Data applications but also developing revolutionary 5G technologies in the Big Data era.

At the network level, powerful machine learning techniques have been employed to learn data traffic patterns in 5G networks for efficient network management and control. Machine learning, as a promising technology in artificial intelligence, gives machines the ability to learn without following strictly static program instructions by making predictions or decisions (Alpaydin 2004). Generally speaking, machine learning enables a machine to learn the execution of some class of tasks *T* from experience *E* with respect to certain performance measure *P*, where the performance achieved at tasks in *T* is measured by *P* and can be improved with experience *E*. Machine learning approaches are typically classified into three broad categories, depending on the nature of the dataset for learning or the feedback available to a learning system. They are supervised, unsupervised, and reinforcement learning, where supervised and unsupervised learning indicate whether or not there are labeled samples in the input dataset and reinforcement provides feedback in terms of rewards and punishment to navigate the learning process. Recently, both the increase of available data volume and the improvement on hardware computational ability have contributed to the fast development of a new concept of deep learning (LeCun et al. 2015), which makes use of multiple hidden layers in an artificial neural network. This technique aims to mimic the way that the human brain processes light and sound into vision and hearing (LeCun et al. 2015).

At the signal level, intelligent signal processing techniques have been developed to exploit useful structural features of the complex channels and spectra in 5G systems for effective data acquisition and transmissions. Facing the large size and high dimension of sensing problems arising in wireless communications, compressive sensing (CS) (Candes and Wakin 2008), a.k.a. compressed sensing or compressive sampling, provides a new paradigm of simultaneous data acquisition and compression to effectively reduce the sampling costs of high-dimensional signals by utilizing the fact that typical signals of interest are often sparse in a certain domain via projection onto certain known basis. It is the sparsity of signals that enables CS techniques to reconstruct signals from far fewer samples than those required by the Nyquist sampling theory. As Big Data applications continue to grow in size and number, it is critical to deal only with measurements of data that are informative for specific inference tasks of interest, in order to limit the required sensing cost, as well as the related costs of storing or communicating Big Data. Alternative to CS for deterministic signals, a novel statistical signal processing framework, called compressive statistics sensing (CSS), is developed to leverage the statistical structure of random processes to enable compression (Romero et al. 2016). Capitalizing on parsimonious representations, CSS technology allows compression and reconstruction tasks to be addressed in broader applications, such as wideband spectrum sensing, frequency detection, direction-of-arrival estimation, power spectrum estimation, incoherent imaging, etc.

## Foundations and Applications

When entering the era of wireless Big Data, it is of crucial importance to demonstrate how advances in artificial intelligence and signal processing can be used to overcome challenges and seek opportunities in future 5G wireless networks. Specific exemplary scenarios below describe some situations and applications in which different kinds of machine learning and sparse signal processing techniques play key roles in utilizing Big Data for wireless networks.

### Supervised Learning

Supervised learning refers to the machine learning task of inferring an input-output mapping function from labeled training data, where the dataset contains observations of both the input objects and the resulting output values. Specific techniques include regression analysis, *k*-nearest neighbors (KNN) algorithm, support vector machine (SVM), and Bayesian learning (Alpaydin 2004). The regression analysis relies on a set of statistical processes for estimating the relationships among variables, which aims at predicting the value of one or more estimation targets. The estimation target is a function of the independent variables. The KNN and SVM algorithms are mainly used for classification. KNN classifies an object into a category based on a majority vote of the object’s neighbors. That is, the object is assigned into a class that is most common among its *k-*nearest neighbors. In contrast, the SVM algorithm resorts to nonlinear mapping, which transforms the original training data into a higher dimension where the data become separable. Then, SVM searches for the optimal linear hyperplane that can separate one class from another in the higher dimension. Bayesian learning is to compute and maximize the a posteriori probability distribution of a target variable conditioned on both the input signals and all of the training instances. Some generative models that can be learned with the aid of Bayesian techniques include the Gaussian mixture model, expectation maximization, and hidden Markov model.

In wireless communications, supervised learning can be used for training-based estimation and prediction of radio parameters. For example, in channel estimation for nonlinear MIMO systems where nonlinearities are present in either the transmitter or the receiver sides, a multivariate regression framework is modeled based on the SVM technique to utilize the MIMO channel multidimensionality (Sanchez-Fernandez et al. 2004). The use of the multidimensional regression enables to exploit the dependencies in the channel and then make each estimate less vulnerable to the added noise. In solving such a nonlinear MIMO channel estimation problem, an iterative reweighted least-squares algorithm is developed for the regression of multiple variables. Thanks to the benefits of SVM in solving nonlinear problems, this SVM-based channel estimation method can take advantage of the MIMO spatial diversity and is able to discover the dependencies between the transmitted and received signals.

For the link adaptation problem in MIMO-OFDM WiFi systems, supervised machine learning has also been successfully applied. In Daniels et al. (2010), the feature space is first defined to characterize the link quality by ordering the post-processing signal-to-noise ratio (SNR) and selecting those with high values as the most relevant link-level performance indicators. Then, the second step is to identify a mapping function between the defined feature set and the link configuration to fulfill the design objective. The mapping task is treated as a classification problem to determine the relationship between the channel state and the configuration for adaptive modulation and coding, where KNN can be adopted as a classifier.

Another fruitful application is context-aware energy enhancement for mobile devices in IoT applications (Donohoo et al. 2014). KNN techniques have been applied to learn and utilize the spatiotemporal and device context to predict the device wireless data and location interface configurations that can optimize energy consumption in mobile devices (Donohoo et al. 2014). The experiment results in Donohoo et al. (2014) show that the successful rate of energy demand prediction exceeds 90% with the use of KNN algorithms.

Furthermore, Bayesian learning techniques can be used for learning and predicting spectral characteristics in cognitive radio systems and channel estimation in massive MIMO. For example, to solve the pilot contamination problem in massive MIMO systems, Wen et al. (2015) suggest to estimate both the channel parameters of the desired links in a target cell and those of the interfering links in other adjacent cells, in which channel estimation is carried out with the aid of Bayesian learning techniques. For cognitive radio networks, a cooperative wideband spectrum sensing approach is proposed to detect primary users via the expectation maximization algorithm (Assra et al. 2016). In Choi and Hossain (2013), a two-state hidden Markov process is designed to learn the spectrum occupancy of the primary user, whose presence and absence are mapped to a two-state observation space for Bayesian learning.

### Unsupervised Learning

Unsupervised learning is to fulfill the learning task of inferring a function to describe the underlying structure or distribution from unlabeled data, which means the desired output values (e.g., classification or categorization outcomes) are not included in the observations (Alpaydin 2004). Since the data given to the learner are unlabeled, there is no evaluation of the accuracy of the learned structure as the output of the learning algorithm. As an example of unsupervised learning, *k*-means clustering is used for cluster analysis in data mining, which aims to partition observations into *k* clusters where each observation belongs to the cluster with the nearest mean. The *k*-means clustering algorithm runs in an iterative manner. In each iteration, an object is assigned to the cluster whose centroid is the nearest one to the object and then each cluster and its centroid are updated by minimizing the in-cluster differences. The iterations will terminate until convergence is reached.

For wireless heterogeneous networks (HetNet) with densely deployed small cells, clustering is a common yet challenging task given the diverse types of operating networks/protocols and varied cell sizes. For example, small cells have to be carefully clustered to mitigate intercell interference, and terminal users also have to be clustered dynamically to avoid frequent cell handovers and achieve optimal offloading. In fact, the handover performance and intercell interference coordination critically determine the network efficiency and user experience. For cell and user clustering, several unsupervised learning methods can be used, such as *k*-mean clustering and principal component analysis (PCA). PCA can be viewed as a relaxed solution to *k*-means clustering, where the PCA subspace spanned by the principal directions is identical to the cluster centroid subspace. PCA transforms a set of potentially correlated variables into a set of uncorrelated variables a.k.a. the principal components.

In contrast to PCA, independent component analysis (ICA) is a statistical technique for revealing hidden factors that underlie sets of data variables. Based on these underling factors, ICA decomposes the data variables into a set of independently additive components. This is done by assuming that the subcomponents are non-Gaussian and mutually independent from each other. ICA is useful for blind source separation. In Qiu et al. (2011), ICA is used in a CR-based smart grid system to recover the simultaneous wireless transmissions of smart utility meters, where the power utility station applies ICA to blindly separate the signals received from all the smart meters before the signals can be decoded, by utilizing the statistical properties of the signals.

### Reinforcement Learning

Reinforcement learning is inspired by behaviorist psychology to deal with learning tasks when the feedback of the instantaneous reward is available. It is concerned with how learning agents ought to take actions in an environment based on feedback, in order to maximize the cumulative reward over time (Otterlo and Wiering 2012). The environment in reinforcement learning is typically formulated as a Markov decision process (MDP). The main differences between the classical inference techniques and reinforcement learning algorithms include: (1) the reinforcement learning techniques do not request any prior knowledge about the MDP, and (2) they target large-scale MDPs where classical methods become infeasible. Reinforcement learning focuses on the online performance, which enables to reach a balance between exploration of uncharted territory and exploitation of obtained knowledge. Such a tradeoff can be achieved via the multiarmed bandit (MAB) and finite MDPs (Kaelbling et al. 1996). MDP describes a discrete-time stochastic control process and provides a framework for modeling the decision-making situations in which the outcomes are partly random and partly under the control of a decision-maker. At each step, the MDP stays in a certain state, and the decision-maker may choose any legitimate action that is available in the current state. At the following step, the MDP randomly moves into a new state and gives the decision maker a corresponding reward. The probability that the MDP moves into its new state is influenced by both the selected action and the system’s inherent transitions. In this sense, the new state depends on the current state and the action selected by the decision-maker. Meanwhile, the state transition probability is conditionally independent of all previous states and actions, which makes the MDP satisfy the Markov property. MDP usually assumes that the state is known when action is to be taken. However, if this assumption is not guaranteed, then the MDP problem becomes partially observable, a.k.a. POMDP. The POMDP models an agent decision process where the system dynamics are determined by an MDP, although the agent is unable to directly observe the underlying state and only has partial knowledge. Thus, the agent must maintain a probability distribution of the possible states based on a set of observations and observation probabilities and the underlying MDP. In the learning task for the MAB problem, an agent seeks to simultaneously perform exploration and exploitation: the former is to acquire new knowledge, and the latter is to optimize the decisions based on existing knowledge.

For decision-making in 5G wireless systems, the terminal users in a network can be regarded as agents, and the network constitutes the environment of the MDP/POMDP models. The applications of reinforcement learning include transmit power control in energy-constrained systems, channel selection in device-to-device (D2D) communications, dynamic spectrum access in CR networks, and so on. For example, future IoT systems may feature in wireless energy harvesting sensors that operate using energy harvested from environmental sources such as the sun, wind, vibrations, etc. For these sensors, energy management is crucial to ensure continuous and reliable operation with minimized power outage. In Aprem et al. (2013), the problem of transmit power control with packet retransmissions is formulated as making a sequential decision to achieve optimal outage probability performance. Through acknowledgement (ACK) and negative-acknowledgement (NACK) feedback messages, the channel information is implicitly provided to the sensors, which can be exploited in deciding the transmit power level for subsequent transmission attempts. Since the ACK and NACK messages only reveal partial channel information to the sensors, this problem is cast as a POMDP formulation. Accordingly, computationally efficient solutions have been developed based on the maximum likelihood criterion and the voting heuristic policy (Aprem et al. 2013).

In the context of D2D communications, distributed and autonomous channel selection is considered as an underlay to a cellular network. D2D users directly communicate with each other by exploiting the cellular spectrum, while their individual decisions are not governed by any centralized controller. Self-interested D2D users competing for access to optimize their own performance form a distributed system, where the transmission performance actually relies on channel availability and quality. However, such information is hard to acquire for autonomous D2D users. Further, the adverse impact of D2D communications to cellular systems should be minimized. Under these limitations, a network-assisted distributed channel selection approach is proposed by modeling a multiplayer (MP) MAB game with broadcast side information indicating channels selected by other D2D users (Maghsudi and Stanczak 2015). In this MP-MAB game, each D2D user is viewed as a player, and the channels are regarded as multiple arms, where selecting a channel corresponds to pulling an arm. Then, a distributed solution is developed by combining no-regret learning and calibrated forecasting for multiplayer stochastic learning problems. Similarly, in CR scenarios, the problem of dynamic spectrum access of decentralized secondary CR users can also be modeled as a MP-MAB game (Liu and Zhao 2010), where each CR user independently searches for spectrum opportunities without exchanging sensing information with others.

### Deep Learning

Deep learning refers to the learning techniques that allow computational models to take on multiple processing layers in order to learn representations of data with multiple levels of abstraction (LeCun et al. 2015). Deep learning has led a new trend in artificial intelligence by providing a powerful set of techniques for learning via deep neural networks (DNN). In DNN, connections and interactions of neurons from layer to layer contribute to feature extraction and transformation, where each layer uses the output from the previous layer as the current input. Such a topology allows a biologically inspired programming paradigm where an agent is able to learn from observational data via representing and utilizing the abstract features among data (Schmidhuber 2015). By continuously learning multilevel features of the data, higher-level features are learned and formulated from lower-level features to form a hierarchical representation. In this way, deep learning translates the features of data into compact intermediate representations over layered structures, which can be used to remove redundancy and detect saliency. In the era of 5G, wireless traffic and user data are experiencing a tremendous growth due to pervasive devices, ubiquitous connections, and diverse services and applications. Different from the centralized and static architecture of conventional cellular networks, future wireless networks and systems are moving toward decentralized and flexible network architectures. In this context, deep learning provides useful tools to acquire underlying trends, predict upcoming events, and harvest correlations and statistical probabilities, given a huge amount of available data for training. Such efforts enable 5G wireless networks to make proactive decisions such as proactive resource allocation by taking advantage of context awareness and edge computing and caching (Wang et al. 2014), which thereby improve network performance and efficiency. In return, these 5G techniques offer pervasive network connectivity for data collection and transmission, which make it possible to enhance Big Data analytic tools and deep learning techniques within 5G networks for accurate content popularity estimation and prediction. After traffic content learning, popular contents are cached in the intermediate servers so that demands from users for the same content can be accommodated easily without duplicate transmissions from remote servers (Bastug et al. 2014). As a result, redundant traffic can be significantly eliminated to save precious wireless network resources.

In addition, given the available large volume of user and network data, Big Data analytics are inherently synergistic with the trend of software-defined networking (SDN) in 5G (Haleplidis et al. 2015), which refers to network programmability by initializing, controlling, managing, and updating network behavior dynamically via open interfaces. While SDN has emerged as a promising networking solution, opening up interfaces gives rise to the risk of security threats to both user and network data. To solve such problems, neural network can be adopted for anomaly detection in SDN environments. In Jadidi et al. (2013), to protect a network from attacks, a flow-based anomaly detection algorithm is developed based on a multilayer perceptron and gravitational search scheme, in which a neural network uses a large volume of traffic data to classify benign and malicious flows with high accuracy.

### Wideband Spectrum Sensing

At the signal level, Big Data challenges arise primarily from the large size and high dimension of transmitted and received signals that 5G systems have to cope with. For instance, cognitive radio (CR) has been acknowledged as an important technology for dynamic spectrum access in future 5G, which aims to overcome the dilemma between radio frequency (RF) scarcity and spectrum underutilization. The first and key task in CR is to perform accurate and fast spectrum sensing in order to determine the spectrum occupancy of existing (primary) users and identify potential transmission opportunities for (secondary) CR users. To harvest the benefits of open spectrum access and dynamic spectrum sharing, spectrum sensing is usually conducted over a very wide range of frequency bands. In the wideband regime, a major challenge stems from the high costs in RF signal acquisition, which hinders the implementation of fast and accurate spectrum detection. On the other hand, the spectrum underutilization over most assigned frequency bands induces widespread sparsity in the frequency domain. Capitalizing on such sparseness of the underutilized spectrum in open-access networks, CS techniques have been tailored and applied for wideband spectrum sensing using a much smaller number of samples than that required by the Nyquist sampling rate (Tian and Giannakis 2007; Polo et al. 2009). This is because the sampling rate in CS is dictated by the sparsity order of the signals, rather than the large dimension of the original signals. Since the average spectrum utilization of most frequency bands is below 20% at a given time and location (Akyildiz et al. 2006), the CS approach allows for sampling at a small fraction of the Nyquist rate without loss of sensing accuracy.

Considerable research efforts have been devoted to realizing the potential of CS-based sensing and sparse signal processing in wireless CR systems under practical operating scenarios. For instance, in practical wireless environments, the actual sparsity order of wideband spectrum is usually unknown a priori or even dynamically changing over time. To circumvent this problem, a concept of sparsity order estimation is proposed in Wang et al. (2012a), which can be applied to design a two-step compressive spectrum sensing solution for wideband CR (Wang et al. 2010). Wireless fading constitutes a major performance-degrading factor to sensing techniques in practice. To cope with the fading effects, collaborative spectrum sensing among distributed CR users provides an effective way to improve the sensing performance (Zeng et al. 2010). For wideband collaborative CR systems, to jointly collect both performance gain and complexity gain, a cooperative spectrum sensing solution is designed based on matrix rank minimization techniques (Wang et al. 2011). Therein the spectrum measurements of all collaborative CR users are modeled to possess a decent low-rank property. Accordingly, a nuclear norm minimization problem is formulated to jointly identify the nonzero support and hence the overall wideband spectrum occupancy, in which the low-rank property enables efficient utilization and desired tradeoff between detection diversity and sampling costs (Wang et al. 2012b).

Under the conventional centralized network architecture for spectrum sensing, although the sensing performance can be globally optimal, the energy costs in reporting local information to the fusion center and conveying global decisions back to the CR users can be undesirably high. Alternatively, decentralized cooperative spectrum sensing becomes attractive for its low communication overhead and robustness to node and link failure. Decentralized schemes have been developed based on consensus techniques (Bazerque and Giannakis 2010; Tian 2008). These algorithms adopt a consensus averaging technique, where the average value of all the local spectrum decisions is computed in a decentralized manner and taken as the global decision.

The framework of consensus-based collaborative sensing has been extended to address the technical hindrance in practical multi-hop CR networks. In a CR network adopting dynamic access, there is no guarantee on stringent synchronization to ensure that all CR users stay silent during the sensing stage. Further, due to the multi-hop nature, a CR user during the sensing stage may be subject to distinct spectral emissions from sources within its local area, such as other coexisting CR users in the transmission mode, mistimed cooperative CR users in the sensing mode, or local interference. These individually received spectral components, termed as spectral innovations, are sparse but complicate the cooperative task of detecting the common spectrum support of primary users. To deal with this complex but practical problem, decentralized multi-hop cooperative spectrum sensing solutions have been developed in Zeng et al. (2010) and Zeng et al. (2011), in which each CR user estimates the common spectrum of primary users and its own local spectral innovation in an alternating manner and exchanges proper information with neighboring CRs to reach global fusion and consensus on the estimated primary users’ spectrum occupancy.

Besides the wide bandwidth and wireless fading, noise uncertainty raises another critical issue that degrades the performance of spectrum sensing. Cyclic feature detection as a kind of statistical signal processing technique works robust under noise uncertainty and low SNR (Gardner 1991) but requires high-rate sampling which is very costly especially in the wideband regime. To solve this problem, a compressive cyclic feature detection technique has been developed for wideband spectrum sensing by exploiting the unique sparsity property of the two-dimensional cyclic spectra of cyclostationary signals (Tian 2011). Along this line, a new compressed covariance sensing framework is proposed for extracting useful second-order statistics of wideband random signals from digital samples taken at sub-Nyquist rates (Romero et al. 2016; Tian et al. 2012).

### Sparse Channel Estimation

To meet the growing demands for high-rate and large-capacity wireless services, massive multiple-input multiple-output (MIMO) technology and millimeter wave (mmWave) communications have emerged as a natural pair to provide a promising option for 5G networks. In MIMO communications, the channel state information (CSI) is required via channel estimation for transmitter design and receiver demodulation. However, the large number of antennas in massive MIMO systems results in a large-size channel matrix, which is costly to estimate, because traditional channel estimation techniques would consume heavy resources in terms of training symbols and CSI feedback. To reduce such heavy training overhead, CS has been advocated for CSI estimation in mmWave MIMO systems (Bajwa et al. 2010; Schniter and Sayeed 2014; Alkhateeb et al. 2014; Gao et al. 2016; Wang et al. 2016a, b). By exploiting the sparse structure of mmWave MIMO channels due to the limited multipath scattering, the CS-based approach can estimate the CSI from a relatively small set of compressively collected training symbols. In Bajwa et al. (2010) and Schniter and Sayeed (2014), the task of estimating sparse multipath channels is formulated as a sparse vector recovery problem by vectorizing the sparse channel matrix and representing it within a three-dimensional angle-delay-Doppler space. Subsequently, sparse signal recovery techniques can be applied to acquire the vectorized sparse channel from compressive measurements within a short sensing time (Schniter and Sayeed 2014; Gao et al. 2016). In Alkhateeb et al. (2014), an adaptive CS-based algorithm is proposed to estimate the sparse channel with hybrid analog/digital hardware architecture. In Wang et al. (2016a), a diagonal-search greedy pursuit algorithm is developed to estimate the second-order statistics of the sparse MIMO channel. To reduce the sensing time and computational complexity caused by the vectorization operation, a fast channel estimation solution is developed in Wang et al. (2016b), which directly decouples the original channel estimation problem into three reduced-size subproblems, i.e., angle-of-arrival (AoA) estimation, angle-of-departure (AoD) estimation, and fading coefficient estimation, which can be solved in a sequential manner.

The CS-based channel estimation techniques for mmWave MIMO systems implicitly rely on the assumption that the values of the AoD/AoA of sparse paths fall on some known grid. However in practice, the AoD/AoA can be continuously valued off the grid, in which case the CS-based methods may suffer from the power leakage effect due to basis mismatch of the on-grid assumption (Chi et al. 2011). To circumvent the on-grid assumption, gridless channel estimation solutions can utilize the two-level Toeplitz structure of mmWave massive MIMO channel by applying two-dimensional (2D) atomic norm minimization (ANM) to estimate the continuous-valued channel state information with super resolution. The 2D ANM is a two-level structure-based optimization approach where the Vandermonde structure of the transmit/receive array manifold is captured by a two-level Toeplitz matrix, which can be constructed from the received array data via semidefinite programming (SDP). Such a gridless channel estimation approach attains super-resolution AoD/AoA estimation accuracy within a very short sensing time, possibly from measurements collected at a single-time snapshot.

Noticeably, the computational complexity of the SDP-based ANM formulation is decided by the numbers of transmit and receive antennas used for MIMO channel estimation. As a result, the computational complexity goes up quickly as the antenna size increases and may become intractable for terminal devices. To reduce the high complexity, an efficient truncated 2D gridless channel estimation solution is proposed for mmWave massive MIMO systems (Wang et al. 2017). It reformulates the original full-size ANM to a truncated ANM (T-ANM) version by activating a small part of transmit/receive antennas. The T-ANM approach is motivated from an important fact that the mmWave MIMO channel provides not only the two-level Toeplitz structure but also a salient low-rank property due to the limited scattering propagation at mmWave frequency. Further complexity reduction is accomplished by introducing an alternative form of ANM, termed decoupled ANM (D-ANM) (Tian et al. 2017). It is another optimal ANM formulation for gridless 2D harmonic retrieval problem, which adopts a new matrix form set of atoms to naturally decouple the joint observations in both AoA and AoD dimensions without loss of optimality. The D-ANM strategy reformulates the original large-scale 2D problem into two one-level Toeplitz matrices, which can be solved by simple 1D estimation with automatic pairing. Therefore, D-ANM dramatically reduces the constraint size in SDP-based ANM formulation, thus greatly reducing the overall computational complexity.

## Summary and Outlook

This article reviews emerging technologies and advances achieved in both machine learning and sparse signal processing and focuses on data analytic tools that can be used in future 5G wireless networks. To demonstrate their broad applications in various wireless systems, selected examples are illustrated, including CR, HetNet, mmWave, massive MIMO, software-defined networking, etc.

There are still a range of open questions in Big Data analytics for 5G wireless networks. For instance, it is quite challenging to apply machine learning to gain full awareness of the RF environments over space, time, frequency, and location, due to the dynamic nature and the lack of labeled data. As opposed to computer vision and speech recognition where the output of machine cognition can be readily compared and verified against human visual and auditory perception, no such option is available for radio signals.

## Cross-References

## References

- Akyildiz I, Lee W, Vuran M, Mohanty S (2006) NeXt generation/dynamic spectrum access/cognitive radio wireless networks: a survey. Comput Netw 50(13):2127–2159CrossRefGoogle Scholar
- Alkhateeb A, El Ayach O, Leus G, Heath RW (2014) Channel estimation and hybrid precoding for millimeter wave cellular systems. IEEE J Sel Topics Signal Procss 8(5):831–846CrossRefGoogle Scholar
- Alpaydin E (2004) Introduction to machine learning. MIT Press, CambridgeGoogle Scholar
- Andrews G et al (2014) What will 5G be? IEEE J Sel Areas Commun 32(6):1065–1082CrossRefGoogle Scholar
- Aprem A, Murthy CR, Mehta NB (2013) Transmit power control policies for energy harvesting sensors with retransmissions. IEEE J Sel Topics Signal Process 7(5):895–906CrossRefGoogle Scholar
- Asadi A, Wang Q, Mancuso V (2014) A survey on device-to-device communication in cellular networks. IEEE Commun Surv Tutorials 16(4):1801–1819CrossRefGoogle Scholar
- Assra A, Yang J, Champagne B (2016) An EM approach for cooperative spectrum sensing in multiantenna CR networks. IEEE Trans Veh Technol 65(3):1229–1243CrossRefGoogle Scholar
- Atzori L, Iera A, Morabito G (2010) The internet of things: a survey. Comput Netw 54(15):2787–2805CrossRefGoogle Scholar
- Bajwa WU, Haupt J, Sayeed AM, Nowak R (2010) Compressed channel sensing: a new approach to estimating sparse multipath channels. Proc IEEE 98(6):1058–1076CrossRefGoogle Scholar
- Bastug E, Bennis M, Debbah M (2014) Living on the edge: the role of proactive caching in 5G wireless networks. IEEE Commun Mag 52(8):82–89CrossRefGoogle Scholar
- Bazerque JA, Giannakis GB (2010) Distributed spectrum sensing for cognitive radio networks by exploiting sparsity. IEEE Trans Signal Process 58(3):1847–1862MathSciNetCrossRefGoogle Scholar
- Candes EJ, Wakin MB (2008) An introduction to compressive sampling. IEEE Signal Process Mag 25(2):21–30CrossRefGoogle Scholar
- Chi Y, Scharf LL, Pezeshki A, Calderbank R (2011) Sensitivity to basis mismatch in compressed sensing. IEEE Trans Signal Process 59(5):2182–2195MathSciNetCrossRefGoogle Scholar
- Choi KW, Hossain E (2013) Estimation of primary user parameters in cognitive radio systems via hidden Markov model. IEEE Trans Signal Process 61(3):782–795MathSciNetCrossRefGoogle Scholar
- Daniels RC, Caramanis CM, Heath RW (2010) Adaptation in convolutionally coded MIMO-OFDM wireless systems through supervised learning and SNR ordering. IEEE Trans Veh Technol 59(1):114–126CrossRefGoogle Scholar
- Donohoo BK et al (2014) Context-aware energy enhancements for smart mobile devices. IEEE Trans Mob Comput 13(8):1720–1732CrossRefGoogle Scholar
- Fanzi Z, Zhi T, Chen L (2010) Distributed compressive wideband spectrum sensing in cooperative multi-hop cognitive networks. In: IEEE ICC conference, Cape Town, 23–27 May 2010Google Scholar
- Gao Z, Hu C, Dai L, Wang Z (2016) Channel estimation for millimeter-wave massive MIMO with hybrid precoding over frequency-selective fading channels. IEEE Commun Lett 20(6):1259–1262CrossRefGoogle Scholar
- Gardner W (1991) Exploitation of spectral redundancy in cyclostationary signals. IEEE Signal Process Mag 8(2):14–36CrossRefGoogle Scholar
- Haleplidis E et al (2015) Software-defined networking (SDN): layers and architecture terminology. IRTFGoogle Scholar
- Hu Y et al (2015) Mobile edge computing: a key technology towards 5G, ETSI white paperGoogle Scholar
- Jadidi Z, Muthukkumarasamy V, Sithirasenan E, Sheikhan M (2013) Flow-based anomaly detection using neural network optimized with gsa algorithm. In: IEEE 33rd international conference on distributed computing systems workshops, Philadelphia, 8–11Google Scholar
- Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285Google Scholar
- Larsson EG, Edfors O, Tufvesson F, Marzetta TL (2014) Massive MIMO for next generation wireless systems. IEEE Commun Mag 52(2):186–195CrossRefGoogle Scholar
- LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444CrossRefGoogle Scholar
- Liu K, Zhao Q (2010) Distributed learning in cognitive radio networks: multi-armed bandit with distributed multiple players. In: IEEE ICASSP conference, Dallas, 14–19 Mar 2010Google Scholar
- Maghsudi S, Stanczak S (2015) Channel selection for network-assisted D2D communication via no-regret bandit learning with calibrated forecasting. IEEE Trans Wirel Commun 14(3):1309–1322CrossRefGoogle Scholar
- Otterlo M, Wiering M (2012) Reinforcement learning and Markov decision processes. In: Reinforcement learning. Springer, Berlin/Heidelberg, pp 3–42CrossRefGoogle Scholar
- Polo Y, Wang Y, Pandharipande A, Leus G (2009) Compressive wide-band spectrum sensing. In: IEEE ICASSP conference, Taipei, 19–24 Apr 2009Google Scholar
- Qiu RC et al (2011) Cognitive radio network for the smart grid: experimental system architecture, control algorithms, security, and microgrid testbed. IEEE Trans Smart Grid 2(4):724–740CrossRefGoogle Scholar
- Rappaport TS et al (2013) Millimeter wave mobile communications for 5G cellular: it will work. IEEE Access 1(1):335–349MathSciNetCrossRefGoogle Scholar
- Romero D, Ariananda D, Tian Z, Leus G (2016) Compressive covariance sensing: structure-based compressive sensing beyond sparsity. IEEE Signal Process Mag 33(1):78–93CrossRefGoogle Scholar
- Sanchez-Fernandez M, de-Prado-Cumplido M, Arenas-Garcia J, Perez-Cruz F (2004) SVM multiregression for nonlinear channel estimation in multiple-input multiple-output systems. IEEE Trans Signal Process 52(8):2298–2307MathSciNetCrossRefGoogle Scholar
- Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117CrossRefGoogle Scholar
- Schniter P, Sayeed AM (2014) Channel estimation and precoder design for millimeter-wave communications: the sparse way. In: Asilomar conference on signals, systems, and computers, Pacific Grove, 2–5 Nov 2014Google Scholar
- Tian Z (2008) Compressed wideband sensing in cooperative cognitive radio networks. In: IEEE GLOBECOM conference, New Orleans, 30 Nov–4 Dec 2008Google Scholar
- Tian Z (2011) Cyclic feature based wideband spectrum sensing using compressive sampling. In: IEEE ICC conference, Kyoto, 5–9 June 2011Google Scholar
- Tian Z, Giannakis GB (2007) Compressed sensing for wideband cognitive radios. In: IEEE ICASSP conference, Honolulu, 15–20 Apr 2007Google Scholar
- Tian Z, Tafesse Y, Sadler BM (2012) Cyclic feature detection from sub-Nyquist samples for wideband spectrum sensing. IEEE J Sel Topics Signal Process 6(1):58–69CrossRefGoogle Scholar
- Tian Z, Zhang Z, Wang Y (2017) Low-complexity optimization for two dimensional direction-of-arrival estimation via decoupled atomic norm minimization. In: IEEE ICASSP conference, New Orleans, 5–9 Mar 2017Google Scholar
- Wang Y, Tian Z, Feng C (2010) A two-step compressed spectrum sensing scheme for wideband cognitive radios. In: IEEE GLOBECOM conference, Miami, 6–10 Dec 2010Google Scholar
- Wang Y, Tian Z, Feng C (2011) Cooperative spectrum sensing based on matrix rank minimization. In: IEEE ICASSP conference, Prague, 22–27 May 2011Google Scholar
- Wang Y, Tian Z, Feng C (2012a) Sparsity order estimation and its application in compressed spectrum sensing for cognitive radios. IEEE Trans Wirel Commun 11(6):2116–2125CrossRefGoogle Scholar
- Wang Y, Tian Z, Feng C (2012b) Collecting detection diversity and complexity gain in cooperative spectrum sensing. IEEE Trans Wirel Commun 11(8):2876–2883Google Scholar
- Wang X et al (2014) Cache in the air: exploiting content caching and delivery techniques for 5G systems. IEEE Commun Mag 52(2):131–139CrossRefGoogle Scholar
- Wang Y, Tian Z, Feng S, Zhang P (2016a) Efficient channel statistics estimation for millimeter-wave MIMO systems. In: IEEE ICASSP conference, Shanghai, 20–25 Mar 2016Google Scholar
- Wang Y, Tian Z, Feng S, Zhang P (2016b) A fast channel estimation approach for millimeter-wave massive MIMO systems. In: IEEE GlobalSIP conference, Washington, 7–9 Dec 2016Google Scholar
- Wang Y, Xu P, Tian Z (2017) Efficient channel estimation for massive MIMO systems via truncated two-dimensional atomic norm minimization. IEEE ICC Conf, Paris, 21–25 May 2017Google Scholar
- Wen C et al (2015) Channel estimation for massive MIMO using Gaussian-mixture Bayesian learning. IEEE Trans Wirel Commun 14(3):1356–1368CrossRefGoogle Scholar
- Zeng YH, Liang YC, Hoang AT, Zhang R (2010) A review on spectrum sensing for cognitive radio: challenges and solutions. EURASIP J Adv Signal Process 2010:1–15CrossRefGoogle Scholar
- Zeng F, Li C, Tian Z (2011) Distributed compressive spectrum sensing in cooperative multi-hop wideband cognitive networks. IEEE J Sel Topics Signal Process 5(1):37–48CrossRefGoogle Scholar