Highlights

  • Sustainable Goals and their universality can only be attained through readily available data from affordable sources such as satellite images and similar commonly available sources.

  • Earth Observation is an innovative and accurate approach to address the indicators associated with the Sustainable Development Goals.

  • There is an increased need for new methods and techniques to process an ever-growing amount of Earth Observation data.

  • Machine Learning techniques are crucial in handling Earth Observation data given the enormous quantity of sources and formats.

Background

The concept of Sustainable Development (SD) has been developed in 1960 when it became evident that environmental problems can be caused by economic and industrial development. In 1972, a first report was published and presented at UN concerning SD. This report, named as the Meadows Report [1], was strongly criticised at that time since it advocated non-growth to the developing countries [2]. Later in 1987, the Brundtland Report (BR) [3] defined the SD concept as development that meets the essential needs of the present without compromising the ability of future generations to meet their own essential needs. In 2000, the Millennium Development Goals (MDGs), established 8 objectives to tackle poverty and hunger, achieve gender equality and improve the health sector [4]. Until 2015 the MDGs [5] drove the progress of SD, including improvements in health and education services, reduced hunger and equity gaps, and higher levels of coverage in interventions with major investments [6, 7]. However, it remained incomplete and in 2012, new objectives were established, designated as Sustainable Development Goals (SDGs) [5], defining 17 unique objectives, representing an urgent call to shift the world onto a more sustainable path [8, 9].

Earth Observation (EO) plays a major role in supporting progress towards many of the SDGs [10, 11]. According to the United Nations [12] it is advantageous using EO data such as the images from satellites to produce and support official statistics to complement traditional sources of socio-economic and environmental data. Satellite imagery may be perhaps the only cost-effective technology able to provide data at a global scale [13, 14]. Such globally available data are determinant to understand the progress and contribution of underdeveloped countries concerning SD since they lack the resources to collect relevant information. The considerable amount of data, provided by EO sources, need to be effectively analysed and processed with appropriate methods and tools to provide robust indicators concerning SD.

The growth of Machine Learning (ML) field, which is constantly creating new opportunities for monitoring and evaluating humanitarian efforts, plays an essential part in the analysis of satellite images applied to SDGs. In fact, the majority of methods used for processing EO data are based on ML [11, 15] given in one hand their ability to process enormous amounts of data and also because they possess unique characteristics pertaining to classification, modelling and forecasting.

The main purpose of this article is to explore and comprehend the relation between SD, EO and ML, to understand the relevance and role EO and ML play in attaining the SDGs. Figure 1 depicts the layout of this review as well as major aspects pertaining to the treatment of EO data related to the identification of SDGs.

Fig. 1
figure 1

Relation between SDG, EO and ML

This review highlights major methodologies and ML methods that have been successfully applied to EO data in pursue of SD. The structure of this paper is divided as follows: Sect. “Materials and methods“ describes how the research was conducted, Sect. “Overview on sustainable development“ presents the meaning of SD, its history, concepts and goals, followed by a brief explanation of the EO system and how it presently contributes to SDGs, in Sect. “Overview on earth observation for sustainable goals development”. Afterwards, in Sect. “Earth observation using machine learning techniques”, a review on the importance of ML for EO is presented and as well as their contribution for SDGs, highlighted by case studies of different ML categories applied to EO data. In addition, further considerations are addressed and discussed concerning SD, EO, ML, their relation and new paths and approaches to overcome limitations.

Materials and methods

A systematic search and analysis of published articles in peer-reviewed journals have been conducted using ScienceDirect and Google Scholar. The search has been performed using the following search topics: sustainable development or sustainable goals, earth observation and machine learning. To ensure the identification of relevant case studies for each ML category of data and image analysis, words such as: classification techniques, clustering techniques, regression techniques, dimension reduction techniques, empirical and semi-empirical modelling, supervised techniques, unsupervised techniques and object-based techniques in combination with earth observation data or sustainable development goals were used. The search was refined to sustain relevance and state of the art results, considering the latest research and case studies, retaining historical reports and agreements.

Overview on sustainable development

The environmental problems derived from the economic development became evident during the 1960s and a number of solutions were proposed [1, 3, 8]. The Limits to Growth, also known as the Meadows Report [1], was published by the Club of Rome, in 1972. It presented a computer model developed by MIT called World3, which allowed Meadows et al. [1] to explore the relationship between five subsystems of the world economy: population, food and industrial production, pollution and consumption of non-renewable natural resources [16]. The key finding has been that unlimited growth in the economy and population would lead to a collapse of the global system by the mid to late twenty-first century [1, 17,18,19]. Moreover, the sooner the world starts striving to change the growth trends, the better the chance of achieving sustainable ecological and economic stability [1, 18, 19]. Thus, the report advocated that the non-growth in developing countries is a response to environmental decline and the lack of its resources [1, 20].

This premise became very popular among non-orthodox economists since it was translated as an attack to the capitalist economic system. On the other hand, it has also been criticised by the economists who affirm that for capitalism, it is crucial a development without boundaries. Due to that, in 1974, the Club of Rome issued another report in which it defended an organic growth (world division into different regions, each with a definite function within the world system) [2]. Since the publication of The Limits to Growth [1], a considerable number of concepts have been introduced and developed integrating ecological and economics concerns, not being consensual, until the publication of Brundtland Report (BR) in 1987 (further detailed in Sect. “Brundtland report [3]”). Table 1 presents some of the most important milestones of the path to SD until nowadays.

Table 1 Important milestones on the path to SD

Brundtland report [3]

The Brundtland Report’s (BR) concept of SD follows a generic definition of development that meets the essential needs of the present without compromising the ability of future generations to meet their own essential needs [3]; however, it included crucial features such as environmental preservation and meeting the basic human needs at a global scale. For those reasons, it was widely accepted as a reference for SD definition [20]. Even so, the ambiguity in the BR’s concept of SD along with differing worldviews, ideologies, backgrounds, beliefs and interests has contributed to the proliferation of several explanatory definitions [23]. In an attempt to clarify and simplify the BR’s concept, it became important to describe and explain the following key concepts:

  • Needs: necessary or basic needs (especially referring to developing countries’ needs);

  • Technological Limitation: insufficient technological development;

  • Social Organisation Problems: originate an unequal allocation of income.

Later, the BR also clarified the meaning of technological growth, arguing that such progress cannot exceed the limited availability of resources [3, 20].

Millennium development goals (MDGs)

In 2000, the Millennium Development Goals (MDGs) have started a global effort to tackle the indignity of the poverty problem. The MDGs [24] established eight objectives for: tackling poverty and hunger; primary education for all children; achieve gender equality; improve maternal and child health; prevent and combat deadly diseases; ensure environmental sustainability; and, global development.

Until 2015, the MDGs allowed progress in several important areas, such as: reducing poverty and child mortality; providing access to water and sanitation; improving maternal health and combatting several diseases such as HIV/AIDS, malaria and tuberculosis.

The most notable accomplishments were: the reduction of child mortality and the number of children out of school by more than half; more than 1 billion people left extreme poverty; and, HIV/AIDS infections have been reduced by almost 40%. The legacy and achievements of the MDGs provided valuable lessons and experience, and pave the way for new goals [8].

Sustainable development goals

The Sustainable Development Goals (SDGs) have replaced the MDGs in 2012 during the UN Conference on SD held in Rio de Janeiro. As a result of climate changes and other serious environmental problems, there was a need to enhance the environmental performance [25]. Hence, the main objective was to create new goals that would address the urgent environmental, political and economic challenges affecting the world [26]. Representing an urgent appeal to change the world’s course into a more sustainable direction, the SDGs [27] represent a strong commitment to proceed the MDGs and tackle some of the world’s most significant challenges [28].

The success of each of the 17 goals affects all other positively: No Poverty; Zero Hunger; Good Health and Well-Being; Quality Education; Gender equality; Clean Water and Sanitation; Affordable and Clean Energy; Decent Work and Economic Growth; Industry, Innovation and Infrastructure; Reduced Inequalities; Sustainable Cities and Communities; Responsible Consumption and Production; Climate Action; Life Below Water; Life on Land; Peace, Justice and Strong Institutions; Partnerships for the Goals [8]. The 2030 Agenda [5], which coincided with another historical agreement achieved at COP21 Paris Climate Conference [29], sets specific objectives and attainable targets for the reduction of carbon emissions, management of climate change and risks of natural disasters.

Overall, the SDGs are special because they address issues that affect the entire world and reaffirm the determination to eradicate poverty, improve the health system and reduce inequalities. Better yet, they involve all nations in building a more sustainable, safer, more prosperous planet for humanity [8, 28]. To monitor and achieve the SDGs, EO became a vital part since it provides numerous benefits [10, 30, 31], namely: Data at different scales (local, regional, national or even global) and periods of time; Consistency; Wide variety of parameters; and, Cost-effective data acquisition.

Overview on earth observation for sustainable goals development

Earth Observation (EO) covers different approaches, including the use of drones, aircrafts and satellites. The era of satellite based EO began in 1959 with the launch of Explorer 7, and remains until today [32]. In fact, there are more than 2000 active EO satellites operated by Space Agencies, governmental institutions and commercial operators [11, 33], resulting in an increased availability of information concerning the Earth condition and proprieties [34].

EO data are an example of a big data source that can be acquired at low cost, over long periods of time and used to comprehend the entire Earth system while addressing scientific challenges [35] such as climate change and global warming [36], ecological change and reduction impacts of habitat and biodiversity deterioration [37] and used to produce statistics and indicators that enable the quantification of SD [11, 12]. The United Nations report [12] has demonstrated the viability of using EO data to produce official statistics, including SDGs statistics such as agricultural [38], urban and land planning [39] or food security indicators [40].

EO satellite imagery can be classified into two groups, based on the sensor used to capture images: the passive sensors receive emitted or reflected radiation by the Earth’s surface, and the active sensors emit radiation and receive the echoes reflected or refracted by the Earth’s surface [11]. Overall, EO sensors provide data at four different resolutions: spectral, spatial, radiometric and temporal. The spectral resolution is the ability to define/distinguish wavelengths ranges of radiation; hence, different spectral bands provide a spectral signature for specific land cover types [11] such as soil [41], water [42] or buildings [43]. The spatial resolution refers to the area that each pixel represents on the surface, the radiometric resolution indicates the degree of light intensities the sensor is able to distinguish [44] and the temporal resolution is related to the revisit time, namely the frequency with which sensors cross a specific area on Earth. Besides the differences related to the type of EO sensors, the data provided by satellites can also be distinguished by the different orbits. The geostationary orbit means that satellites track the same area and the Low Earth orbit means that satellites track the surface as they orbit [11].

EO images can be used to identify characteristics of interest based on how images are presented and their inherent properties, such as in agriculture [45], forests [46], water [47] and urban areas [48]. Identifying such characteristics has been often seen as a classification problem which requires techniques to classify or group pixels, according to their spectral characteristics, as belonging to a class [48]. The study of Group on Earth Observations [10] has identified SDGs that are measurable, at some level, using EO data. Figure 2 presents SDGs that can already be measured and analysed based on EO data as SDG 2—No hunger, SDG 6—Clean Water and Sanitation, SDG 13—Climate Action, and SDG 14—Life Below Water.

Fig. 2
figure 2

adapted from: Group on Earth Observations [10]

SDGs measurable by EO data

Taking advantage of emerging developments within EO domain represents an accurate and reliable way to address the SDG indicators and targets and thus bridge the gap between developed and developing countries discrepancy on the quantity and quality of data [20]. The data from EO sources have been advocated by several international organisations and researchers, such as Holloway et al. [38] and Murthy et al. [14], as a mean of minimising costs compared to the conventional acquisition and monitoring of different environmental parameters over relevant scales, areas and time periods [11].

From Fig. 2, it can be depicted that EO can provide quite a large number of indicators for the SDG framework such as data on the condition of the atmosphere [49], oceans [50], crops [51], forests [52], climate [53], natural disasters [54], natural resources [55], urbanisation [56], biodiversity [57] and human conditions [58]. The two most important indicators are population distribution (I-1), and cities/infrastructure mapping (I-2) since they contribute to all the SDGs. On the other hand, the SDGs which benefit from all the EO indicators are the zero hunger (SDG 2), clean water and sanitation (SDG 6), climate action (SDG 13), life below water (SDG 14) and partnership for the goals (SDG 17). This view is supported by the Global Working Group on Big Data [59] and United Nations [12] that states that satellite imagery has significant potential to provide more timely information, minimising the number of surveys and offering more disaggregated data for informed decision making. As a consequence of the quantity of data generated by EO sources, the necessity to find methods to process and analyse this amount of data arises. The purpose is to transform the EO data into valuable information.

Earth observation using machine learning techniques

In the last decade, there have been some major contributions to a wide range of Earth Science applications, from analysing gases, soil, vegetation, climate and, more recently, to ocean [60, 61]. Recent advances on Machine Learning (ML) field are creating unprecedent opportunities to evaluate and monitor policy decisions as well as humanitarian initiatives [62, 63]. Despite the advantages of using ML techniques, it may require greater computational resources as well as an expert to interpret results. ML techniques can be classified into four groups: supervised, unsupervised, semi-supervised and reinforcement learning schemes. The major difference between supervised and unsupervised lies in the fact that the first one requires output values (classification) in the training dataset [64] where problems can be either as classification or regression techniques. In contrast, unsupervised learning techniques require only the input values in the training dataset since their purpose is to find hidden patterns in data and can be handled by clustering or dimension reduction techniques [65]. Semi-supervised learning combines aspects of supervised and unsupervised learning and requires a combination of data with and without classification [66]. Reinforcement learning aims to build systems that can learn from the interaction with the environment, using rewards and punishments rules [67, 68].The following sub-sections give an overview of the different techniques and methods pertaining to the use of ML in the scope of SD supported in EO data highlighting major findings and applications. This summary outlines the boundaries of research concerning the application of ML algorithms as well as their importance, relevance and potential to support further research towards the development of robust methodologies concerning universal applications. This overview takes into consideration the most recent research results as well as their relevance.

SDGs tackled with machine learning

ML is a subdomain of Artificial Intelligence, which according to Samuel [69] aims to provide to machines the ability to learn from data without being explicitly programmed. The study and development of algorithms plays a major role in ML, as it aims to build a model between inputs and outputs, based on the data and algorithms provided, to learn how to make decisions upon unseen information [70, 71]. The popularity of ML is vast and increasingly applied to different subdomains, including Statistical Learning methods, Data Mining, Image Recognition, Natural Language Processing and Deep Learning [72].

A substantial number of ML algorithms have been used and described in the literature, performing a wide range of tasks in a variety of domains like Agriculture [73], Renewable Energies [74], Disasters [54], Climate [75], Construction [76], Human Living Conditions [58] and Health System [77]. Figure 3 presents the most relevant techniques applied to remote sensed data, grouped according to the four categories of supervised and unsupervised methods: classification, clustering, regression and dimension reduction.

Fig. 3
figure 3

adapted from: Holloway and Mengersen [78]

Categories of ML problems and examples of methods

Classification

A classification method belongs to supervised learning category, and it is applicable in cases where the overall aim is to accurately assign a datapoint to a class [78,79,80]. There is a broad range of classification methods as presented in Table 2, in the scope of SD, that clearly shows the impact and potential use of these techniques in conjunction with EO data.

Table 2 Examples of application of classification methods towards SDGs using EO data

Clustering

The clustering method belongs to unsupervised learning category, and it is appropriate when the purpose is to associate/divide datapoints into clusters [78, 89]. Table 3 synthesises the findings within the scope of clustering methods used in combination with EO data to aid in the development of SDGs.

Table 3 Examples of application of clustering methods towards SDGs using EO data

Regression

A regression method belongs to the same category as the classification method, supervised learning, and it is applicable when the aim is to predict/estimate a continuous output variable of a given datapoint [78, 99]. There are several approaches, as presented in Table 4, in the scope of SD, that clearly show the impact and potential use of these techniques in conjunction with EO data.

Table 4 Examples of application of regression methods towards SDGs using EO data

Dimension reduction

Dimension reduction, similar to clustering method, belongs to the unsupervised learning category and typically follow two main approaches: Feature Selection (FS), applicable when there is the necessity to select fewer characteristics [111, 112]; and Feature Extraction, when the information needs to be synthesised through transformation. The aim is to create a small set of features covering much of the details in the initial dataset [79, 113, 114]. Then, these features/characteristics can be fed into other algorithms or otherwise used as an end result [78]. Table 5 synthesises the finding within the scope of dimension reduction methods used in combination with EO data to aid in the development of SDGs.

Table 5 Examples of application of dimension reduction methods towards SDGs using EO data

Methodologies and techniques for EO imagery analysis

Pre-processing, post-processing and the seldom incorporation of qualitative information play a major role in the success of any data analysis approach and is found to vary significantly among researchers. As above mentioned, the majority of methods for processing EO data are based on ML algorithms, whether they are supervised or unsupervised [11]. However, besides the general problem category, the techniques can also be classified according to the approach used taking into consideration images analysis and their feature extraction: Sub-PB, PB, Super-PB and OB [129, 130]. In Sub-PB, each pixel can have multiple classes [131, 132]; in PB, it is only possible to have one class per pixel [133, 134]; in Super-PB, the pixels are grouped based on homogeneity [135, 136]; while in OB, the aim is to delineate readily usable objects from imagery or partitioning an image into objects [137, 138]. Figure 4 illustrates the Sub-PB, PB, Super-PB and OB techniques.

Fig. 4
figure 4

i) Sub-Pixel-Based; ii) Pixel-Based; iii) Super-Pixel-Based and iv) Object-Based Technique

In addition to those techniques, there are visual interpretation techniques conducted through direct operator analysis of characteristics from raw satellite images. Such techniques are used to extract visual characteristics including colour, form, size, pattern, texture and shadow from images [11]. The human abilities, however, should be explored/emulated to further enhance and automate ML algorithm-based image interpretation. Overall, several approaches are being used by different researchers that combine ML algorithms and pre-processing of data giving rise to different methodologies.

Empirical and semi-empirical modelling

Empirical and Semi-Empirical models are created based on data acquired from observations or experiences, which means that there are none or few assumptions on data analysis. There are many examples of the application of empirical and semi-empirical modelling, such the ones in Table 6:

Table 6 Examples of application of empirical and semi-empirical models towards SDGs using EO data

Supervised classification techniques

The Supervised Classification requires a set of classified samples (sub-pixels, pixels or super-pixels) to train the models to understand each class’ patterns. After training models should be able to categorise new samples or place those samples into classes [143]. Some applications of these approaches are presented on the following Table (7).

Table 7 Examples of application of supervised classification techniques towards SDGs using EO data

Unsupervised classification techniques

Unsupervised Classification techniques do not require any training data or prior knowledge, and their main goal is to group image pixels or sub-pixels into unlabelled classes [11]. Table 8 lists some recent examples regarding the application of unsupervised classification techniques.

Table 8 Examples of application of unsupervised classification techniques towards SDGs using EO data

Image segmentation object-based classification

The image segmentation OB classification is used to identify objects based on their proprieties or features. These techniques were developed to emulate the human visual interpretation. Some applications of OB techniques are presented in Table 9.

Table 9 Examples of application of image segmentation object-based classification towards SDGs using EO data

The success cases presented in Tables 2, 3, 4, 5 and in Sect. “Methodologies and techniques for EO imagery analysis”, demonstrate that the contribution of ML is crucial towards the analysis of data provided by EO sources. The synergy between EO and ML can be viewed as an important tool to support a wide variety of SDGs and fields at a global scale and enhance their level of implementation, effectiveness and efficiency. Some of the most common SDGs presented in this paper, which benefits from the synergy EO-ML are: SDG 11, 15 and 9; and the most common fields are Agriculture, Land Cover and Pollution.

Conclusions

Sustainability is an unavoidable aspect for the development of societies and countries; it leads to the development of SDGs and, hence, is crucial to the future of the planet. SDGs are unique as they cover issues that affect all communities and reaffirm the international commitment to eradicate poverty, hunger and inequalities to build a more sustainable, prosperous and safer planet for all humanity.

This paper highlights the importance of monitoring the SD by means of EO and ML and enhances their fundamental role in pursuing those goals. Monitorisation aspects related to SD, such as poverty, nutrition, health conditions and inequalities have leveraged EO data collection methods. EO is possibly the most cost-effective technology as it is able to provide data at a global level and therefore enabling a global perspective of the SDGs. EO data plays a critical role in promoting equity among developed and developing countries since it grants worldwide data access despite their development level. EO data analysis, which often involves identifying features of interest within large amounts of information (Classification, Clustering, Regression or Dimension Reduction problems), gets even more powerful through the application of ML methods using different methodologies such as Empirical and Semi-Empirical modelling, Sub-PB, PB, Super-PB or even OB techniques.

This extensive review looked at different ML categories to handle EO data to tackle different SDGs. It can be concluded that all ML categories can contribute to a wide variety of SDGs and fields—The Classification category covers the SDGs 2, 6, 8, 11, 13, 14 and 15, and fields such as Agriculture, Land Use and Forests; the Clustering category covers the SDGs 2, 7, 9, 11, 13, 14, 15 and 17, and fields such as Construction, Natural Disasters and Renewable Energy; the Regression category covers the SDGs 2, 3, 6, 7, 9, 11, 13, 14 and 15, and the fields Water Quality, Pollution and Freshwater; and the Dimension Reduction category covers the SDGs 3, 6, 7, 9, 11, 13 and 15, and the fields Land Cover, Electricity and Software.

Thus, the overall findings confirm the significance of EO and ML in pursuing the goals of SD providing an overview of methods and techniques that sustain the achievement of SDGs. Lastly, the applicability and efficiency of specific ML methods used to analyse EO data, such as Random Forest (RF), Support Vector Machine (SVM) and Neural Network (NN), should be further explored to sustain a more consensual and reliable development/improvement of tools to support SDGs.