1 Introduction

Soil organic carbon (SOC) plays a pivotal role in the Earth's life support system. As one of soil’s most crucial constituents, SOC serves not only as a key in the global carbon (C) cycle (Vitousek et al. 1997; Korner 2000) but also as a cornerstone for maintaining soil quality (Mathers and Xu 2003; Srivastava et al. 2015; Guo et al. 2018). The dynamic balance of SOC directly affects soil fertility and structure (Jiang and Zhang 2016; Dong et al. 2020), thereby influencing plant growth (Mayer et al. 2020) and C sequestration (Scialabba and Muller-Lindenlauf 2010; Minasny et al. 2017). Moreover, SOC also plays an irreplaceable role in mitigating global warming (Paustian et al. 2016; Soussana et al. 2019) and enhancing biodiversity (Burle et al. 1997; Scialabba and Muller-Lindenlauf 2010).

Accurately estimating SOC and its spatial and temporal variability have become a critical issue in soil science and global C cycle research (Angelopoulou et al. 2019; Li et al. 2023). Traditional SOC estimation methods rely on extensive field sampling and laboratory analyses (Huang et al. 2023), which are time-consuming, costly, labor-intensive and difficult to be applied to large-scale studies, although they are highly accurate (Stevens et al. 2008; Guo et al. 2019). The remote sensing (RS) technology offers the advantage of providing continuous surface information, including vegetation cover, land use type, and surface temperature; while machine learning (ML) could process large datasets and decipher complex relationships between SOC and environment factor (Angelopoulou et al. 2019; Li et al. 2023). By analyzing such as surface albedo (Denis et al. 2014), vegetation index (i.e., Normalized Difference Vegetation Index, NDVI; Enhanced Vegetation Index, EVI) (Dvorakova et al. 2020) and land use type (McSherry and Ritchie 2013; Bao et al. 2021), along with advanced data processing algorithms (Li et al. 2023), these techniques could offer rapid and comparable accurate SOC estimation at large spatial scales (Angelopoulou et al. 2019; Odebiri et al. 2021), significantly advancing SOC research. The most crucial advantage of applying these methods is true for large-scales studies and regions where direct observation is challenging.

The integration of RS and ML for SOC estimation has acquired significant research attention (Li et al. 2018, 2023; Angelopoulou et al. 2019; Pricope et al. 2019; Xiao et al. 2019; Duan et al. 2020; Yuzugullu et al. 2020; Zhu et al. 2020; Wang et al. 2022; Mashala et al. 2023; Zhou et al. 2023). These studies involve various RS data sources, ML models, and application scenarios, leading to substantial enhancements in the efficiency and accuracy of SOC estimation. However, the emergence of such studies has led to information overload, underscoring the urgent need for systematic organization and analysis of existing literature. Among numerous research efforts, bibliometric analysis stands out as a powerful tool to quantitatively analyzing a large amount of literature. The analysis also track the application and effectiveness of RS and ML in SOC estimation and captures the evolution of their integration (Liu et al. 2020; Li et al. 2021, 2022a). Among numerous research efforts, bibliometric analysis stands out as a powerful tool to quantitatively analyzing a large amount of literature.

This research employs a bibliometric analysis methodology to systematically review and synthesize the information concerning the integration of RS and ML in the field of SOC estimation. The objective was to ascertain a comprehensive research framework that delineates scientific issues, research advancements, and key technologies in the field. Our analysis would encompass the evolution of research themes, the co-occurrence network of keywords, as long with prominent research institutions and scholars in the field, with the aim of elucidating the knowledge structure and academic collaboration network in the field of SOC estimation. Furthermore, we also would assess the applicability and accuracy of RS and ML techniques for SOC estimation across diverse environmental conditions and evaluate their potential for application in different regions of the world. We attempted to furnish a valuable resource for researchers and policy makers in the field of SOC estimation, promote academic dialogue and technological innovations, and provide a scientific foundation for investigations into the global C cycle and ecological environmental protection.

2 Materials and methodology

2.1 Data source and search strategies

A systematic approach was adopted to comprehensively explore the field of SOC estimation using RS and ML. Data for analysis were retrieved for analysis from the Web of Science Core Collection spanning from 1980 to 2023, encompassing the latest relevant research in English-language publications. Additionally, emphasis was placed on original research articles and review papers. Keywords such as "soil organic carbon", "remote sensing", and "machine learning" were utilized to filter and gather literature on SOC, RS, and ML applications in the field of SOC. Subsequently, detailed information about the selected literature was exported in plain text formats for subsequent data analysis and reference management (Fig. 1).

Fig. 1
figure 1

Methodological framework of the study

2.2 Data processing and visualization methods

A combination of tools was utilized for data analysis and visualization for the literature review (Fig. 1). Initially, R studio was employed for bibliometric analysis (Verma et al. 2023), which encompassed literature counts, keywords analysis, and collaboration network exploration. This approach provided valuable quantitative insights into research patterns and scholarly connections (Li et al. 2022a, 2022b). Furthermore, the robust tool of VOSviewer was utilized to construct keyword co-occurrence networks, allowing us to uncover and visualize relationships between keywords (van Eck and Waltman 2010) to facilitate the understanding of prevalent research themes and their interconnections. Keyword co-occurrence analysis provides valuable insights into the historical development and emerging trends in the field (Bin et al. 2021; Li et al. 2021).

2.3 Soil data acquisition and pre-processing

The data acquisition process involved identifying and extracting relevant publications from the Web of Science database using specified keywords. We included studies from diverse geographical locations and environmental conditions to ensure a comprehensive data collection. Publications were filtered based on relevance to SOC estimation using RS and ML, and duplicates were removed. The data was standardized to maintain consistency across studies, involving conversion to common units and normalization. Missing data were addressed using imputation techniques to ensure a complete dataset for analysis.

2.4 Consideration of different study areas and terrain

Although our analysis did not involve primary data collection, we considered variations in environmental and geomorphological factors by including studies from regions with diverse topographic features. This approach allowed us to account for how these factors affect SOC variability. We also categorized environmental factors such as climatic zones, land use types, and vegetation cover to analyze their impact on SOC estimates. The selection of RS data sources in these studies was tailored to specific requirements, with satellite data for large-scale patterns and high-resolution airborne and UAV data for detailed local studies.

3 Results

3.1 General statistical information

The annual trend of literature publication shows that the research results in the field of SOC have shown a yearly increase. Since 1991, the number of research evaluation for the SOC using RS has increased steadily, and 203 research was published in 2023. As for SOC research combining RS with ML, this number is relatively small, although the number of studies has increased to 99 studies in 2023. The results of these studies suggest that the combination of RS and ML brings promising new directions for SOC research, providing new research paths for more options analysis and estimating SOC on the large scale (Fig. 2).

Fig. 2
figure 2

Temporal trend of SOC research using remote sensing (RS) and machine learning (ML) techniques (status November 2023)

3.2 Summary of RS and ML techniques for SOC estimation

Tables 1 and 2 summarize the diverse array of RS and ML techniques applied in the field of SOC estimation, as well as these methods show potential for the future but this needs to be proven. This comprehensive overview underscores the evolving nature of SOC research and the increasing importance of interdisciplinary approaches combining RS and ML.

Table 1 Remote sensing (RS) techniques used for SOC estimation
Table 2 Machine learning (ML) methods used for SOC estimation using RS and technique

3.3 Estimation of SOC using RS techniques

3.3.1 Basic bibliometric information

The SOC estimation research with RS spans over 120 countries, with North America, China, and Europe being the predominant. Our analysis highlights the top 10 institutions by publication volume, with the United States Department of Agriculture (USDA) output growing from 1 to 66 articles between 1992 and 2023, and the University of Chinese Academy of Sciences (UCAS) marking a rise from 1 to 117 articles from 2007 to 2023. Similarly, the Centre National De La Recherche Scientifique CNRS and IRD have followed a consistent increase, amounting 59 and 60 articles, respectively in 2023. In the Web of Science categories, Environmental Sciences leads with 718 articles, followed by Geosciences Multidisciplinary (395 articles) and Soil Science (363 articles), emphasizing the role of RS and ML in SOC studies. RS and Imaging Science Photographic Technology also stand out with 325 and 264 records, respectively, alongside Ecology (165 articles) and Water Resources (116 articles). Other significant categories include Environmental Studies, Agronomy, and Meteorology Atmospheric Sciences, with 86, 83, and 89 articles, respectively, underlining SOC research's interdisciplinary nature. An examination of the top 10 journals reveals Remote Sensing at the forefront with 165 articles, trailed by Geoderma (124 articles), and Science of the Total Environment (63 articles), among others. Notably, the distribution of SOC research publications was highly concentrated, with 66.7% of journals having only a single SOC-related article, reflecting a focused yet diverse journal landscape for this field (Fig. 3).

Fig. 3
figure 3

Inter-country cooperation networks of research articles (a); trends in the dynamics of the top 10 research organizations (b); distribution of the top 10 disciplinary research categories (c); and distribution of the top research journals (d)

3.3.2 Author keyword co-occurrence of an integrated SOC estimation study using different RS techniques

The analysis of keyword co-occurrences within satellite RS research for SOC estimation unveils a rich tapestry of interconnected themes, underscored by the weights attributed to each keyword. Central to this network are keywords such as "carbon sequestration", "climate change", and "digital soil mapping", which not only exhibit high occurrence frequencies but also significant link strengths, reflecting their foundational role in the discourse. Specifically, "carbon sequestration" stands out with a remarkable link strength of 33, signifying its pivotal importance in the broader context of climate change mitigation efforts. The weight of the "climate change" keyword, coupled with an impressive link strength of 65, underscores the critical nexus between satellite RS studies and the urgent global dialogue on environmental sustainability. This is further accentuated by the substantial link strengths and occurrences of related terms like "carbon cycle" and "carbon stocks", highlighting the nuanced exploration of C dynamics facilitated by satellite data. Emerging technologies and platforms, such as "Sentinel-2" and "Sentinel-2a", are marked by their recent uptick in research engagement, with link strengths of 36 and 16 respectively, reflecting the cutting-edge advancements in satellite imagery. The integration of "deep learning", with a notable link strength of 15, into satellite-based SOC estimation exemplifies the field's progression towards employing complex analytical frameworks to interpret the rich datasets provided by satellite observations. Moreover, the significant link strengths associated with "GIS" (20) and "Google Earth Engine" (33) indicate the leveraging of geospatial technologies in synthesizing and analyzing satellite data, underscoring their indispensability in contemporary SOC studies. The emphasis on "digital soil mapping" with the highest link strength of 108 in the dataset points to a methodological paradigm shift, where traditional soil mapping converges with advanced satellite data analytics, embodying the field's evolution. The inclusion of ecosystem-specific keywords such as "agriculture", "biodiversity", and "peatland", with substantial link strengths, reveals the application breadth of satellite RS in SOC research, extending from agricultural landscapes to conservation areas. This multidimensional approach, enriched by the diverse ecological contexts represented in the keyword network, underscores the satellite RS's invaluable contribution to understanding and managing SOC across various ecosystems (Fig. 4a).

Fig. 4
figure 4

Author keywords co-occurrence of three different RS techniques. Satellite-based (a), Airborne (b), Unmanned Aerial Vehicle (c)

The analysis of keyword co-occurrence within the domain of SOC estimation underscores the substantial role of aircraft-based RS. Out of the six identified clusters within the network, keywords associated with aircraft such as "airborne hyperspectral imagery", and "airborne" have shown significant link strength and frequency of occurrence. This highlights the specialized yet prominent position of aircraft-based sensing in the SOC estimation research landscape. The occurrence and centrality of these keywords indicate a robust application of airborne platforms for SOC mapping, demonstrating the technology’s detailed for accurate SOC analysis. For instance, "airborne hyperspectral imaging" with a link strength of 6 and "airborne" with a link strength of 9, coupled with recent average publication years (circa 2019), suggest a keen research interest in utilizing these methods for in-depth soil analysis. Moreover, the interlinkage of these keywords with technological advancements like "hyperspectral data" and analytical methods such as "feature selection" reflects the integration of complex data processing and ML techniques in recent studies. The progression towards high-resolution imaging and the application of sophisticated algorithms for SOC estimation is indicative of the growing trend to enhance in SOC estimation from aircraft RS data (Fig. 4b).

The keyword co-occurrence analysis for the utilization of Unmanned Aerial Vehicle (UAV) in SOC estimation reveals the intricate network of research themes and the pivotal role that UAV technology plays in this scientific domain. Among the keywords, UAV exhibits substantial link strength with a total of 9 and a high frequency of 11 occurrences, signaling UAVs as a critical tool for high-resolution data collection in precision agriculture, especially noted by its recent average publication year of 2021. The co-occurrence of UAV with terms such as "hyperspectral imagery" and "machine learning" with respective occurrences and link strengths underscores the trend of integrating UAV-collected data with advanced analytical techniques for enhanced SOC estimation. The presence of "machine learning algorithms" alongside UAV-related keywords points towards an increasing reliance on sophisticated data analysis methods to interpret the complex datasets gathered by UAVs. Moreover, "drone" as a keyword, with a total link strength of 5 and an average publication year of 2021, reinforces the growing adoption of drones for practical applications in SOC studies. The term "precision agriculture" aligns well with this trend, emphasizing UAVs' contribution to more accurate and management-oriented research in agricultural contexts. Keywords such as “Sentinel-2” and “vegetation indices” are associated with more recent studies, as indicated by their average publication years of 2022 and 2021, respectively. This connection suggests a complementary use of UAVs with satellite imagery and established RS indices to provide a multi-scalar perspective on soil characteristics (Fig. 4c).

3.4 Integrated application of RS and ML in SOC estimation

3.4.1 Comprehensive bibliometric analysis

Research in SOC estimation utilizing RS and ML techniques is geographically diverse but skewed to a limited number of countries. The USA and China have the major publications, with 200 and 784 articles in 2023, respectively. Germany, Australia, Iran, Brazil, France, Canada, the Czech Republic, and India are also contributing with a positive trend in research volume over the years (Fig. 5a). The analysis reveals a global increase in SOC research, with notable variations among institutions. While Eberhard Karls University of Tubingen has shown limited publication activity, organizations like the Helmholtz Association and INRAE have significantly increased their output, from 2 articles in 2016 to 22 in 2023, and 9 in 2019 to 29 in 2023, respectively. Similarly, Chinese institutions, notably the Institute of Geographic Sciences and Natural Resources Research, CAS, have marked a rise in publications, increasing from 15 to 37 articles between 2020 and 2023. Consistent research contributions are also evident from the United States Department of Agriculture (USDA) and Ohio State University. These trends reflect a diverse yet intensifying global commitment to the field, underscoring the expanding scope and significance of this area of study (Fig. 5b).

Fig. 5
figure 5

The countries distribution of articles published (a), trends in the dynamics of the top 10 research organizations (b); distribution of the top 10 disciplinary research categories (c); distribution of the top research journals (d)

The research contexts are well-represented across Web of Science categories, showcasing its interdisciplinary reach. Environmental Sciences (201 articles) is at the forefront, followed by Soil Science (135) and RS (132), reflecting a blend of fundamental and advanced observational research. Geosciences, Imaging Technology, Water Resources, Environmental Studies, and Agronomy also contribute notably, signifying the field's broad impact on resource management and sustainability. The diversity of these categories highlights the field's evolving focus on addressing key environmental and agricultural challenges (Fig. 5c). Top journals in SOC estimation using RS and ML include Remote Sensing with 76 articles, Geoderma with 45, and Science of the Total environment with 30. Catena and Spectroscopy and Spectral Analysis follow with substantial contributions. Other key journals like Sustainability and Geoderma Regional each published 11 articles, reflecting the field's dynamic research scope (Fig. 5d).

3.4.2 Map of research themes and analysis of theme evolution with changes in integrated research on SOC estimation incorporating RS and ML techniques

The thematic map analysis of author keywords in the domain of SOC and RS research categorizes topics into four quadrants reflecting their developmental stage and centrality (Fig. 6a). "Niche Themes" such as partial least squares regression, geographically weighted regression, and hyperspectral imaging are well-developed but less central, indicating specialization within the field. "Motor Themes" like climate change, C, and NDVI are both highly developed and central, suggesting that they are key drivers of the research area. "Emerging or Declining Themes" include soil organic matter, permafrost, and organic C, which are currently less prominent with low priority, potentially indicating new areas of research or declining interest. Lastly, "Basic Themes" display high centrality but are less developed, encompassing fundamental topics such as land use, C, land degradation, and data from Sentinel-2, Landsat, and MODIS. These themes represent the foundational basis that integrate the field together and may signal areas ripe for further investigation (Fig. 6a).

Fig. 6
figure 6

Relevance and development degree for various research (a); thematic evolution of research topics in SOC and RS from 1991 to 2023 (b)

The thematic evolution analysis spanning from 1991 to 2023 indicates a dynamic shift in research focus areas within the SOC and RS field (Fig. 6b). The thematic evolution analysis from 1991 to 2014 highlights the foundational phase in the SOC and RS field. During this period, the emphasis was on establishing the basics of RS, as seen with prevalent keywords like "remote sensing" and "digital soil mapping". This phase laid the groundwork for future advancements in the field. Furthermore, the period between 2015 and 2018 marked a significant shift in research focus. The attention turned towards "carbon sequestration" and the integration of "big data". This transition indicates a deeper analytical approach in SOC studies, reflecting the integration of complex data sets and analytical methods. Moreover, in the years 2019 and 2020, emerging themes such as "SOC stocks", "carbon sink", and "erosion" began to gain prominence. This suggests a growing concern for the impact of environmental changes on C dynamics, signifying an expanding awareness of SOC in the context of environmental changes. Additionally, the analysis for 2021 to 2022 shows a focus on "remote sensing" and "organic carbon", alongside advanced modeling techniques like "random forest models". This period highlights the technological advancement in data processing and predictive modeling within the SOC and RS field. Lastly, projected themes for 2023 reveal a continued emphasis on advanced RS technologies like "Sentinel-2A" and modeling methods. This indicates a trend towards more precise and nuanced studies in SOC sequestration, underscoring the field's evolution driven by technological innovation and a refined understanding of SOC dynamics.

4 Discussion

4.1 Relationship between SOC assessment and RS

4.1.1 Development trends and contributions of sensors

The increase of publications over the years shows an increasing demand and exploration of the new technologies. Climate change put pressure on policy makers and farmers to find solutions for mitigation options and to coup with the impacts of the changing climate on the production system. Driven by the demand, fast and affordable solutions are required. While field measurements provide potentially the best results, RS provides faster, cheaper and large-scale solutions to estimate SOC (Angelopoulou et al. 2019; Li et al. 2023). Driven by the demand, funding and research focusses more and more on RS which is reflected in the increasing number of publications.

The development of sensors improves the options to estimate SOC using more wavelength, which potentially increases the accuracy. However, RS methods require calibration, which is based on large spatial distributed and consistent data sets. This is available for relatively older IR methods, but not yet for the more recently developed spectral analysis sensors (Chabrillat et al. 2019). This is reflected in the number of publications on IR and spectral analysis.

Furthermore, the changes in the satellites available for RS in space is changing (Zhao et al. 2019). New sensors are available but the calibration and generic use requires time. This is indicated by the increased number of publications with airborne RS missions represented in the publications. Airborne campaigns are costly and require good infrastructure to link the campaign with a robust network of ground-based measurements. This kind of approaches are often usen in research and to develop methods to up-scale them later to satellite measurements. This process indicates that future approaches will use most like multi- and hyperspectral measurements while actual methods are still based on Infrared (Angelopoulou et al. 2019; Odebiri et al. 2021).

4.1.2 Advantages and disadvantages of different RS techniques

The three methods, satellite, Airborne, and UAVs are associated with different scales. The author keywords associated with the different methods indicate the main purpose of working on these scales. As mentioned in the above section, satellite-based RS is using older methods, which is indicated by a stronger representation of landsat than sentinel-2. This large scale is also more relevant for climate change mitigation options and large-scale assessment, which is also represented in the author keywords. There is a bit of an overlap with some author keywords, but for the airborne based publications. Sentinel-2, machine learning and hyperspectral are stronger represented, which indicates the development status of these approaches as indicated in the above section. Further small-scale key variables like soil moisture are stronger represented. For the UAV author keyword analysis precision farming and rangeland indicate the main use. The representation of sentinel-2 indicates also the studies used for development and later up-scaling of this approach.

Table 3 demonstrated the advantages and disadvantages of three different RS techniques. Specially, satellite-based RS provides extensive spatial coverage, long-term data consistency, and non-invasive data collection, making it invaluable for large-scale SOC estimation across diverse landscapes and ecosystems. This method reduces costs and time associated with ground-based data collection (Xiao et al. 2019; McCarty et al. 2020; Fernández-Guisuraga et al. 2021; Palmer et al. 2022). The technology allows monitoring of SOC changes over time and reduces costs and time associated with labor-intensive ground-based data collection (Zhao et al. 2019; Venter et al. 2021; Sothe et al. 2022; Abdulraheem et al. 2023). However, satellite RS faces challenges such as limitations in spatial and temporal resolution, the need for rigorous calibration and validation using ground-based measurements, and variations in data availability depending on satellite missions and sensor specifications (Angelopoulou et al. 2019; Hauser et al. 2021; Mandal et al. 2022).

Table 3 An overview of the advantages and limitations of utilizing RS auxiliary platforms data for SOC estimates

Airborne RS offers high spatial and spectral resolution, essential for detailed SOC assessments (Angelopoulou et al. 2019). Manned aircraft can carry various sensors, including hyperspectral cameras, LiDAR, and thermal imaging devices, facilitating fine-scale analysis of SOC content across various land covers and soil types (Lu et al. 2020). These sensors enhance SOC assessments by analyzing soil properties and vegetation health (Khan et al. 2018; Mondejar et al. 2021). The rapid data collection capability of airborne RS is beneficial for large-scale and time-sensitive research, with some aircraft providing real-time data transmission (Uno et al. 2005). However, the high-quality data comes with significant costs for aircraft operation, sensor equipment, and data processing (Stevens et al. 2012) which must be balanced against budget constraints.

UAVs, or drones, offer several advantages for SOC mapping in precision agriculture, including high-resolution imagery for detailed mapping, flexibility for tailored flight plans and sensor setups, and efficient coverage of large areas, which supports eco-friendly practices (Sott et al. 2021; Heil et al. 2022). They are equipped with diverse sensors such as multispectral cameras, LiDAR, and thermal sensors, enabling various data types for SOC assessment (Zhang et al. 2021; Zhou et al. 2023). UAVs can perform repeated flights to monitor SOC changes over time and present a cost-effective alternative to traditional soil sampling by reducing labor costs (Heil et al. 2022). However, UAVs also have disadvantages, including limited flight range and coverage area compared to satellite and manned aircraft, regulatory constraints that can impact data collection, and the need for expertise in operation and data analysis can be observed in Fig. 7 and Table 3.

Fig. 7
figure 7

Different platforms used for SOC estimates at multiple scales and spatial resolutions

4.1.3 The temporal analysis of relevant topics

The temporal analysis of relevant topics associated with the RS publications (Fig. 6) show the increasing demand for SOC estimates options. The commitments of several states to net zero combined with the attractive option to achieve negative emissions with the increase of SOC made this a favorable option. However, SOC estimates is expensive and labor intensive, which means cheaper options are required. This explains the recent push for using RS to estimate SOC and SOC changes. However, the increase in studies does not necessarily reflect to improvement of the method. With or without ML, the correlation of the measured wavelengths with SOC values is empirical and requires data under all possible environmental conditions. Therefore, the majority of the publications does not improve the methodology, but includes the consideration of different conditions (Angelopoulou et al. 2019). For an improvement of the methodology more standardized ground measurements are required which are scares.

4.2 Relationship between SOC and ML and RS

4.2.1 Bibliometric trends in RS and ML techniques for SOC estimation

The bibliometric analysis demonstrates a significant growth trajectory in soil SOC research, propelled by advancements in RS and ML. The United States and China have emerged as prolific contributors due to practice of a science-based policy (Minasny et al. 2017; Zhao et al. 2018). The exponential increase in publications from these countries, alongside notable contributions from Germany, Australia, Iran, Brazil, France, Canada, the Czech Republic, and India, underscores a global surge in interest and activity in the field of SOC estimation. This trend not only signifies a growing recognition of the importance of SOC in understanding environmental dynamics but also a collective effort in addressing global challenges such as climate change and agricultural sustainability (Cao et al. 2017; Minasny et al. 2017). As such, the geographical expansion of research output and the increasing number of publications suggest a maturing field that is gaining momentum across various scientific communities.

The emergence of themes like "SOC stocks", "carbon sink", and "erosion" around 2019 and 2020 marked an increased awareness of SOC's role in the environment, suggesting potential new avenues for research in environmental impact mitigation. The focus on advanced modeling techniques such as "random forest models" in 2021 and 2022 demonstrated a leap in technological application within the field, moving towards precise and predictive analyses. The anticipation of continued emphasis on advanced RS technologies, like "Sentinel-2A", for the year 2023, indicates that the field is likely to pursue even more refined studies in SOC sequestration. The projected trend is not only a testament to the technological strides made but also highlights a strategic pivot towards leveraging these advancements for detailed and accurate SOC assessments, which may have significant implications for both environmental management and policy making. As well as soil mapping studies using RS platforms predominantly focus on 'static' spatial predictions of soil properties (Fig. 4). However, for large-scale SOC monitoring, dynamic information on SOC over extensive areas is crucial. This includes spatial distribution maps of SOC over time, which are essential for understanding temporal changes and trends (Li et al. 2024). Therefore, incorporating space–time estimates of SOC using RS data and ML techniques is highly important. This approach provides comprehensive insights that static models cannot offer, thereby enhancing the accuracy and applicability of SOC assessments.

4.2.2 Summary of spectral data analysis and ML methods

Satellite-based RS, with its extensive spatial and temporal coverage, provides a robust dataset for SOC estimation when combined with machine learning algorithms such as random forests and support vector machines in Fig. 7. These algorithms handle large datasets efficiently and can model complex relationships between SOC and spectral data, despite limitations in resolution and the need for ground-truth calibration (Xiao et al. 2019; Zhao et al. 2019; Abdulraheem et al. 2023). Airborne platforms, equipped with hyperspectral sensors, offer high-resolution data critical for fine-scale SOC mapping. Machine learning techniques, including neural networks and deep learning, enhance the analysis by capturing intricate patterns in the spectral data, providing detailed insights into SOC distribution (Lu et al. 2020; Mondejar et al. 2021). However, the cost and complexity of data processing remain significant challenges. UAVs provide extremely high-resolution data and flexibility in data acquisition, making them ideal for precision agriculture. Machine learning models, particularly convolutional neural networks, are adept at processing UAV data to reveal fine-scale SOC variability and temporal changes. The main limitations are related to flight range, regulatory constraints, and the need for expert operation and analysis (Zhang et al. 2021; Heil et al. 2022; Zhou et al. 2023) as shown in Fig. 7.

4.3 Enhancing SOC estimation: Trends, challenges, and policy directions

Propelled by advancements in RS and ML, there is a clearly shift toward more sophisticated and precise methodologies. Soil is a complex natural entity, and the non-uniform distribution of SOC content exhibits independence and uncertainty in spatial variability, resulting in spatial differences in crop yields across agricultural fields. On a larger monitoring scale, spectral information is more influenced by the complexity and diversity of natural environmental and topographic factors. Future research must not only continue to develop and refine these techniques but also extend the focus to account for the underrepresented regions such as Africa and Latin America. Firstly, this requires partnerships to facilitate data sharing and technology exchange. As well as, interdisciplinary collaboration by integrating the expertise from soil science, ecology, RS, and data science, researchers can create robust models that combine RS data with ground measurements. This collaborative approach could result in more precise predictions of SOC and a deeper understanding of the environmental factors affecting soil. Secondly, according to the local ecological characteristics and resources, the local sampling scheme is developed. The expansion is crucial for achieving a comprehensive global understanding of SOC dynamics and for tailoring SOC estimation techniques to diverse ecological contexts. However, limitations within the current research landscape include potential biases from the overrepresentation of certain regions or institutions. As RS and ML technologies continue to evolve, they offer new opportunities for large-scale SOC management, providing a crucial connection between scientific research and practical policy-driven solutions for environmental management. In essence, the integration of new technological approaches with traditional soil science is paving the way for a future where detailed, accurate, fast, low-cost, and inclusive data not only drive scientific discovery but also inform policies for sustainable environmental stewardship.

5 Conclusion

In conclusion, this study utilizes bibliometrics to conduct a comprehensive and systematic analysis of RS and ML in SOC estimation and finds that integration of RS and ML is the key to precise estimation of SOC at large-scale. Notable geographical imbalances underscore the necessity for expanded research inclusivity, especially from under-represented areas, to enhance our global understanding of SOC estimation. Addressing these challenges through interdisciplinary collaboration and informed policy direction could drive advancements in sustainable land management, making significant contributions to global efforts in climate change mitigation and ecosystem management. The synergistic application of RS and ML technologies emerges as a beacon for future research, potentially transforming SOC estimation into a cornerstone of soil science and policy.