1 Review

Space information network (SIN) is known as an integrated network system of various space information platforms (e.g., satellites, stratospheric airships, manned or unmanned aerial vehicles) to enable real-time sensing, collection, transmission, and processing of various space information, as well as to realize both global and local information services [14]. In the past few years, SIN has received increasing research attention and has been envisioned to play a crucial role in future social and economic activities. Existing research efforts have focused on various aspects of SIN, including network system architecture, real-time information sensing, on-orbit information processing, high-speed wireless transmission, storage, security and privacy protection, and new applications, to name just a few [512].

Spectrum usage and management becomes a more and more serious issue in SIN mainly for the following reasons. The first one is the paradox between spectrum shortage and spectrum under-utilization [13]. For one thing, there is quite limited spectrum bands that could be utilized to support new SIN platforms or new SIN services, since most of favorable spectrum bands have already been licensed to various existing services [14]. For another, one of the promising functions of SIN was to broaden the observation area and realize continuous information acquisition for Earth observation, which will further produce massive Earth observation data and require large amount of spectrum to transmit the data back to the satellite ground station. Cognitive radio-based dynamic spectrum sharing has been widely recognized as a promising technique to resolve this paradox and improve the spectrum utilization by allowing a secondary system opportunistically use the under-utilized spectrum of a primary system [1522]. For example, dynamic spectrum sharing can happen between a secondary satellite system and a primary satellite system, or between a secondary satellite system and a primary terrestrial communication system [2327].

The second one is the complex electromagnetic spectrum environment with tremendous spectrum devices and ubiquitous spectrum interference. SIN will be an integrated and heterogeneous system with various networks and entities, including high-altitude satellites, low-altitude stratospheric airships and unmanned aerial vehicles, and ground mobile communication base stations and terminals. The electromagnetic power emission of tremendous spectrum devices will bring ubiquitous spectrum interference, including intra-band interference, inter-band interference, intra-system interference, and inter-system interference. In particular, there are several satellite communication systems and terrestrial communication systems that have been allocated overlapping spectrum bands and it is inevitable that different systems in future SIN will share common spectrum bands in time, space, and/or beam domains [2831]. Consequently, real-time spectrum monitoring is a prerequisite to enable spectrum devices’ harmonious coexistence.

The third one is the spectrum disorder and spectrum attack. On the one hand, software-defined devices and software-defined networks will be one trend in the design of future SIN. The reconfigurability of software-defined devices and networks along with the openness of wireless spectrum will pose various spectrum threats to the development and operation of SIN [3234]. Among others, the phenomena of spectrum disorder behaviors are more and more serious, which violates the spectrum policies and regulations and illegally uses the spectrum mainly for commercial purposes. On the other hand, electromagnetic spectrum has been recently recognized as the sixth dimension of war and electromagnetic spectrum warfare has received increasing attention. Kinds of malicious spectrum attacks (such as jamming, cheating, and eavesdropping) are emerging [3538]. Thus, intelligent spectrum control and management is vital to maintain the spectrum security.

In a nutshell, dynamic spectrum sharing, real-time spectrum monitoring, and intelligent spectrum control are dominating requirements for spectrum usage and management in SIN. Motivated by the above observations, the main contributions of this tutorial article are summarized as follows:

  • Identify critical spectrum issues in developing SIN and highlight that spectrum data analytics is the key solution to address these issues via spectrum sensing, spectrum data statistical inference and knowledge discovery, spectrum data-driven decision optimization, and spectrum experiment validation and evaluation, etc.

  • Introduce the concept of big spectrum data in SIN and analyze its characteristics, including volume, variety, velocity, veracity, viability, and value.

  • Present several emerging use cases and highlight some research frontiers to guide the design of practical algorithms.

2 Spectrum issue in space information networks

As shown in Fig. 1, SIN is well known as an integrated and heterogeneous network system consisting of high-altitude satellite communication subsystems, low-altitude stratospheric airships and unmanned aerial vehicles (UAVs) communication subsystems, and ground mobile/cellular communication subsystems. All these subsystems have their own merits and limitations. Several key points are summarized as follows:

Fig. 1
figure 1

Illustration of space information networks

  • Satellite communication subsystems are good at global wide area coverage via line-of-sight wireless transmission; however, they do not work well for indoor communications and urban environments with dense high buildings. Moreover, the transmission rate or throughput is relatively low due to the quite long distance between a satellite and a satellite ground station or ground user.

  • On the contrary, ground mobile/cellular communication subsystems can provide higher system throughput and support more communication connections, especially suitable for urban environments and hot areas. However, the communication coverage is limited for rural and ocean areas due to the lack of infrastructure such as base stations or access points.

  • Low-altitude stratospheric airships and unmanned aerial vehicle communication subsystems have recently received increasing attention for their flexibility, mobility, and quick deployment [3941]. These subsystems have boundless viable applications as well due to their small size and capability to fly without an on-board pilot such as in agriculture, photography, surveillance, and numerous public services.

Currently, most of the above subsystems are designed in isolation and deployed in a chimney-type manner. From the spectrum usage perspective, most of the existing subsystems have been allocated exclusive spectrum bands. However, there is quite limited spectrum bands that could be utilized to support new platforms or new services, since most of favorable spectrum bands have already been licensed to various existing services. In the contrary, SIN is well recognized as a promising paradigm to bridge and unify various subsystems for building an integrated and heterogeneous network system, where spectrum can be efficiently utilized via various dynamic spectrum sharing techniques. Dynamic spectrum sharing among subsystems will be one trend to improve spectrum utilization and support more emerging services.

As shown in Fig. 2, dynamic spectrum sharing can be employed in various scenarios, such as

  • Spectrum sharing between a primary satellite system and a secondary satellite system as shown in Fig. 2a. As spectrum resource is quite precious and most of the favorable spectrum for satellite systems has been licensed but not necessarily utilized sufficiently. On the other hand, new satellite systems are emerging to provide new services but lack of spectrum resource. Licensed shared access between various satellite systems, especially those belong to the same operators, is a promising paradigm to accomplish a win-win spectrum usage.

    Fig. 2
    figure 2

    Typical spectrum sharing scenarios in space information networks. (a) Spectrum sharing between a primary satellite system and a secondary satellite system. (b) Spectrum sharing between a primary terrestrial communication system and a secondary satellite system. (c) Spectrum sharing between a primary satellite system and a secondary terrestrial communication system

  • Spectrum sharing between a primary terrestrial communication system and a secondary satellite system as shown in Fig. 2b. It is well known that microwave spectrum bands are widely used in both terrestrial communication systems and satellite communication systems. In many cases, their working bands are adjacent or even overlapping with each other. On the other hand, the geographical separation in 3D space between satellite and terrestrial communication systems provides more freedom of spatial reuse of the same spectrum, especially when multi-antenna beamforming techniques are adopted.

  • Spectrum sharing between a primary satellite system and a secondary terrestrial communication system as shown in Fig. 2c. The motivation of this kind of spectrum sharing is similar to that in Fig. 2b. One unique feature is that the terrestrial communication system can opportunistically utilize the spectrum for uplink transmission in a primary satellite system since the secondary terrestrial transmission, generally with a transmit power in the order of several milliwatt, can hardly bring harmful interference to the distant satellite receiver.

Moreover, the power emission of tremendous spectrum devices in SIN will bring complex electromagnetic spectrum environment and ubiquitous spectrum interference. Real-time spectrum monitoring is a prerequisite to enable spectrum devices’ harmonious coexistence. The reconfigurability of software-defined devices and networks along with the openness of wireless spectrum will pose various spectrum threats (such as spectrum disorder and spectrum attack) to the development and operation of SIN. Intelligent spectrum control and management is vital to maintain the spectrum security [3538]. To enable dynamic spectrum sharing, real-time spectrum monitoring, and intelligent spectrum control in SIN, spectrum data analytics-based spectrum awareness is the key.

3 Big spectrum data in space information networks

In this section, we propose to empower SIN with big spectrum data analytics for dynamic spectrum sharing, real-time spectrum monitoring, and intelligent spectrum control.

3.1 Big spectrum data: key characteristics

Spectrum data, in a narrow sense, refers to the data that indicates the state of a given frequency band measured by a spectrum analyzer or sensor. For example, a spectrum data can be characterized as idle or busy if a binary state is of interest, or alternatively, as signal strength level if soft or continuous state if of interest.

On the other hand, in a broad sense, spectrum data include all the data that are directly or indirectly related to spectrum environment awareness, such as

  • Spectrum state data, such as idle or busy, signal strength levels.

  • User or device data, e.g., user profile, device identification, device configuration, user spectrum demand, user spectrum actions, and user feedback on spectrum usage.

  • Network state data, such as network topology, network traffic, and network congestion.

  • Environment side information, e.g., signal propagation environment-related data, terrain data, meteorologic, and hydrographic data.

Notably, spectrum data in SIN not only has the common features as listed above, but it also has several unique characteristics since SIN is an integrated network system of various space information platforms (e.g., satellites, stratospheric airships, manned or unmanned aerial vehicles).

The concept of big spectrum data, firstly coined in a technical report of our previous work [42] several years ago, is a specific pattern of Big Data in wireless or spectrum domain. Briefly, big spectrum data refers to massive and complex spectrum data that cannot be analyzed using traditional platforms, systems, and/or tools. As illustrated in Fig. 3, we can characterize big spectrum data via six keywords: volume, variety, velocity, veracity, viability, and value. The first four keywords are borrowed from what the IBM refers to as (general) “Big Data” [43]. In this paper, we also extend the concept with two other features, value, and viability.

Fig. 3
figure 3

Key characteristics of big spectrum data

3.1.1 Volume

Volume is absolutely the most well-known characteristic associated with big spectrum data. The sheer volume of spectrum data being measured, stored, and processed is growing at an unprecedented rate, which is driven by the demand to obtain a full understanding of wideband spectrum dynamics in wide areas [44], for SIN, as wide as the whole earth in a three-dimensional ground-air-space integrated area. As shown in Fig. 4, just taking the spectrum state data as an example, if we use 1 B to denote the spectrum data (e.g., the signal energy level) in a geospatial grid of 100 m × 100 m, a resolution frequency band of 100 kHz, and a time slot of 100 ms, the accumulated data size for a time duration of 1 week in a spectrum band of 5 GHz and a ground area of 100 km × 100 km, can be as large as [45]:

$$\begin{array}{*{20}l} &7~\text{days/week}~\!\times\!~24~\text{h/day}~\!\times\!~3600~\text{s/h}\\ & \times\! \frac{1~\mathrm{s}}{100~\text{ms}} \!\times\! \frac{5~\text{GHz}}{100~ \text{kHz}} \!\times\! \frac{100~\text{km} \!\times\! 100~\text{km}}{100~ \mathrm{m} \!\times\! 100~\mathrm{m}} \!\times\! 1~\mathrm{B}\\ &= 3.024 \!\times\! 10^{17} \mathrm{B}\\ &= 3.024 \!\times\! 10^{5} \text{Terabyte ~(TB)} \\ &= 3.024 \!\times\! 10^{2} \text{Petabyte ~(PB)}. \end{array} $$
(1)
Fig. 4
figure 4

Spectrum data in multi-dimensional space

The volume of spectrum data exponentially grows with the resolution and the span in time, frequency, and space dimensions. Moreover, if we further consider the indirectly related data such as the user data, the network data, the terrain data, and the meteorologic and hydrographic data, the volume of spectrum data would become much bigger.

3.1.2 Value

It is definitely the potential values of big spectrum data that motivate us to accept the additional cost resulted from the storage and processing of it. Although the exploration of values of big spectrum data is still in infancy, it is well believed that big spectrum data will bring a more comprehensive picture of spectrum usage and much deeper understanding of the patterns of spectrum state evolution [17]. Among many other, several representative applications of big spectrum data include (i) comprehensive and accurate modeling of spectrum usage patterns in the time, space, and frequency dimensions [46]; (ii) deep and long-term prediction or inference of spectrum evolution to enable efficient spectrum usage by looking into the future [4750]; (iii) informative spectrum recommendation, just as Amazon provides successful product (e.g., books) recommendation for its consumers, big spectrum data can also be exploited to recommend preferable spectrum bands to users or operators in dynamic spectrum market according to their historical behaviors and preferences of spectrum usage; and (iv) effective and fine spectrum management by utilizing both real-time and statistical information on spectrum usage to enable dynamic spectrum assignment and smart interference management.

More specifically, for SIN, big spectrum data can also find its merits in addressing the critical issues mentioned in Section 1, such as, dynamic spectrum sharing, real-time spectrum monitoring, and intelligent spectrum control.

3.1.3 Variety

Spectrum data generally comes from various sources and has various types. Since SIN is an integrated network system of various platforms including satellites, stratospheric airships, and manned or unmanned aerial vehicles, the sources of spectrum data can be relatively more diverse. Spectrum measurement and spectrum sensing are two well-known sources. Spectrum measurement is generally collected by governmental agencies, mobile operators, or academic research bodies. The governmental spectrum measurement usually has an aim to conduct a survey of spectrum usage for validating the properness of current spectrum policy or to monitor the actual usage of spectrum and find the outliers. This type of spectrum measurement generally has expert and expensive spectrum devices, and the resulting data are often of high quality and wide coverage. On the other hand, mobile operators regularly collect spectrum data that reflects the communication interactions among the users and the infrastructure like base stations to improve its provided services by, for example, optimizing the base station deployment. During the past decade, we have also witnessed an increasing interest in performing spectrum measurement by world-wide academic institutions for scientific research, which has been mainly motivated by the renewed spectrum policy initialized by FCC to allow unlicensed spectrum access for tackling with the critical spectrum shortage issue via improving spectrum unitization.

Spectrum sensing is another dominating source of spectrum data that are obtained from various spectrum sensors, either specialized spectrum measurement equipments like spectrum analyzer or crowd mobile devices such as smartphones and tablets. These data are then fused to identify the spectrum holes via hypotheses testing.

Geolocation spectrum database has recently been recognized as a promising approach to provide location-based spectrum services for users. The core function of Geolocation spectrum database is to indicate the spectrum white space at users’ location by jointly utilizing sophisticated radio signal propagation modeling [51], high-resolution terrain data, GPS localization data, and the parameters (e.g., transmission power and antenna height) of working primary transmitters such as a TV tower or a wireless microphone [5254].

3.1.4 Veracity

Veracity here refers to the quality of spectrum data or, more specifically, the uncertainty on spectrum data quality. There are many factors that contribute to the veracity, such as

  • Ubiquitous random noise, wireless fading, and shadowing;

  • Measurement bias of low-end hardware of crowd spectrum sensors;

  • Corrupted data from malicious contributors;

  • Positioning errors due to the GPS’s bouncing;

  • Ever-changing climate or weather conditions and imperfect terrain data.

3.1.5 Viability

In SIN, there are generally too many attributes and factors that potentially affect the status of spectrum usage. Viability guides the selection of the ones that have a great impact on the value of big spectrum data. For example, to build an air-to-ground radio propagation model via analyzing big spectrum data, it is necessary to make a good balance between under-fitting and over-fitting by carefully selecting the dominating factors via principal component analysis, other than utilizing too many variables to build an over-sophisticated model that results in much lower generalization performance.

The viability of big spectrum data guides us to tackle with the fundamental tradeoff between comprehensiveness and complexity of spectrum modeling. With so many candidate variables, we should validate each particular variable’s relevance and discovery the hidden connections among these variables, beginning with hypotheses as follows:

  • Is three-dimensional location good enough for air-to-ground spectrum modeling? Is it necessary to further utilize three-dimensional terrain data?

  • To what extent the meteorologic and hydrographic condition affect radio signal propagation?

3.1.6 Velocity

Velocity concerns about the latency between the time instant that spectrum data are collected and those that decisions are made based on them. In SIN, with the quick advances of spectrum measurement techniques and devices, spectrum data may continually be collected at a speed that is difficult for current algorithms or computing platforms to process them timely. There are several tasks, such as statistical spectrum modeling, which are delay-tolerant and can possibly be performed using batch algorithms in an off-line manner. However, other tasks like spectrum prediction are generally time-sensitive, in which case online or incremental processing techniques are needed. Otherwise, outdated results will be useless.

As shown in Fig. 4, if we model spectrum data from an image or video perspective, we can treat spectrum state evolves with time as a 3D video where each spectrum data corresponding to a pixel in an image. The time resolution determines the velocity. Consequently, we can accommodate the velocity of big spectrum data from a new perspective named spectrum data in motion: The speed at which spectrum data is streaming. The recording of spectrum video is useful to check any spectrum misuse behaviors and to enhance spectrum policies [55].

4 Emerging use cases

To address the critical spectrum issues in SIN, there is a promising vision named spectrum without bounds and networks without borders. To enable this vision, so far, there are several emerging use cases that exploit big spectrum data analytics to improve spectrum environment awareness by developing geolocation spectrum database (see, e.g., [52]), which provides location-based spectrum services for terrestrial users with localization capability [56].

Another promising use case is to integrate the idea of mobile crowd sensing and geolocation spectrum database [5759], where crowd devices such as tablets and in-vehicle sensors are employed to collect large-scale spectrum measurements for calibrating the propagation models and improving the performance of spectrum prediction.

Similarly, 2D or 3D Radio Environment Maps [60] are good use cases in SIN by visualizing big spectrum data to facilitate decision-making of spectrum regulators and/or telecommunication operators, for example, interference management and resource scheduling.

In a general sense, there is a clear trend to develop various use cases of big spectrum data analytics in SIN from a multi-disciplinary perspective. One typical case is to perform a bridging research and development between communication and control [6163] where generally communication takes charge of information or data transmission while control introduces an effective closed-loop adaptive feedback mechanism between the input and the output. Another case is to improve the performance of communication by integrating various machine learning techniques, for addressing the issues of device identification, information fusion, and signal modulation classification, to name just a few [6468].

5 Frontiers in big spectrum data analytics for space information networks

To fully exploit big spectrum data analytics for space information networks, there are many research opportunities and critical challenges.

  • One clear trend to improve the computational power for addressing the volume of big spectrum data is to develop parallel and distributed computing algorithms and platforms. Based on [69], the emerging issues include software and hardware parallelism, data stream management systems, cloud and edge computing systems, and parallel and distributed databases, to name just a few.

  • In terms of velocity of big spectrum data, one frontier is to design effective algorithms for real-time acquisition, transmission, and storage of massive time-aware spectrum data, while the other is to develop hybrid online and batch algorithms for spectrum data processing, representation, and inference. An online streaming system might provide real-time alerting, while historical (statistical) analysis are made on a batch-oriented system [70].

  • For variety in big spectrum data analytics, hybrid computer/human processing is receiving more interests. For one thing, crowdsourced spectrum database [57, 71] has been recognized as a new paradigm, thanks to the convergence of communication, sensing, and computational power on personal mobile devices, which harnesses human activity of a large population to accomplish large-scale spectrum measurements. For another, since computers and people are respectively good at very different types of tasks, it is interesting to exploit the integrated efforts of both artificial and human intelligence.

  • In terms of viability, one key challenge is to identify the most important variables by jointly considering the constraints on resources, limitations, and trade-offs [72, 73].

  • Veracity in big spectrum data analytics is closely related with uncertainty, integrity, and security [74], each of which corresponds to many open issues. For example, since big spectrum data may involve many sources of contamination [75], robust design of spectrum models are of great interest.

  • In terms of value of big spectrum data, more promising use cases, especially killer applications in SIN, are vital to attract more research attention and promote the practical development.

6 Conclusions

This article presented a tutorial on how to exploit big spectrum data as one new kind of resource for space information network (SIN) to improve radio spectrum utilization. We firstly identify critical spectrum issues in developing SIN and highlight that spectrum data analytics is the key solution to address these issues. Then, we introduce the concept of big spectrum data in SIN and analyze its characteristics, including volume, value variety, veracity, viability, and velocity. Next, we discuss several emerging use cases and then highlight research frontiers to present some guideline for the design of practical algorithms.

We firmly believe this important area will be a fruitful research direction, and we have just touched the tip of the whole iceberg. We hope this article will stimulate much more interests of researchers in multi-disciplinary like statistics, machine learning, data mining, and spectrum management.