Introduction

One of the primary challenges of our time is ensuring an adequate food supply for the growing global population, despite diminishing arable land availability, scarcities in agricultural resources, and the looming threats posed by increasingly variable and extreme climatic conditions (Tao et al. 2006; Ombadi et al. 2023). Over the past six decades, monoculture intensification has gained widespread adoption worldwide, contributing significantly to meeting global food demands by focusing on four major crops: wheat, rice, soybean, and maize (Ryan 2021). However, this intensive monoculture approach has raised significant environmental concerns, including flora and fauna biodiversity losses (Foley et al. 2011), water and air pollution, as well as soil degradation and nutrient depletion (Tilman et al. 2001; Robertson and Swinton 2005).

Intercropping, the simultaneous cultivation of multiple crops in the same field, offers a promising opportunity to address these challenges. In comparison to monoculture, intercropping has demonstrated several advantages, including enhanced biodiversity, improved pollination, more effective pest and disease control, better nutrient cycling, increased soil fertility and soil health, and improved regulation of soil water (Ahmed et al. 2020a, 2021; Blessing et al. 2022; Chen et al. 2017, 2022; Li et al. 2007, 2021b; Wu et al. 2021). Notably, intercropping has been observed to yield higher levels of soil macro-aggregates (> 2 mm), organic matter, and total nitrogen compared to those of monoculture (Li et al. 2021a). Furthermore, positive effects on crop yield have been consistently observed in intercropping systems compared to monoculture (Zhao et al. 2022). Various types of intercropping techniques have been developed, including mixed intercropping, row intercropping, and strip intercropping (Fig. 1). Mixed intercropping involves cultivating multiple crops together in a single field without adhering to any specific row arrangement (Fig. 1, left). Row intercropping entails the regular cultivation of two or more crops arranged in rows (Fig. 1, centre). Strip intercropping involves growing two or more crops simultaneously in separate and distinct strips (Fig. 1, right). However, it is crucial to acknowledge that traditional intercropping practices have primarily been embraced by smallholder farmers or confined to specific regions with an emphasis on preserving soil health and environmental sustainability, rather than primarily focusing on augmenting crop yield to meet the demands of human consumption.

Fig. 1
figure 1

Schematic of intercropping types used in current agriculture

Fortunately, a study by Li et al. (2020b) reported markedly high yield gains of 2.1 tons per hectare under intercropping compared to monoculture. This finding is particularly impressive considering that intercropping achieved an approximately 30-fold increase in crop yield compared to that observed under traditional monoculture (Ray et al. 2012). According to Li et al. (2020b), this intensive intercropping practice, predominantly observed in China with strip intercropping (Fig. 1 right), involves maize as the main crop planted with soybean or other crop species. Moreover, this finding presents a significant and promising strategy for addressing the global deficiency of legume crops, which have experienced a significant decline in cultivation regions over recent decades due to their inherent low and unstable yields (Blessing et al. 2022). This decline is particularly challenging in China, where meeting the protein needs of hundreds of millions of people has historically relied heavily on soybean imports, posing a significant risk to national food security in the event of disruptions in supplier countries' exports (Ren et al. 2021). Therefore, the adoption of intensive intercropping may help sustain current cereal production while providing additional soybean yield with reduced environmental impacts, benefiting not only China but also other countries facing challenges related to an increasing population and limited arable land. To address this issue, in this study, we further restrict this strategy to soybean and maize, both being food crops, and redefine intensive soybean/maize intercropping as strip intercropping, where maize and soybean species are cultivated together using large inputs of fertilizers and pesticides, with a mechanized approach replacing manual labour.

Indeed, previous studies have extensively demonstrated the yield advantage of soybean/maize intercropping (Wang et al. 2022; Li et al. 2022; Xu et al. 2020; Yu et al. 2016; Zhu et al. 2023; Tao et al. 2006). Furthermore, mechanisms, such as temporal niche differentiation, species complementarity, competitive dynamics, and additional biologically fixed nitrogen, have been proposed to explain these crop yield advantages in soybean/maize intercropping systems (Yu et al. 2015; Li et al. 2020a; Rodriguez et al. 2020). Factors such as fertilizer inputs, spatial arrangement (row ratios and distance), cropping sequences, strip intercropping patterns, planting time, and density have been tested to determine their influence on intercropped yield gain (Ahmed et al. 2020a; Blessing et al. 2022; Zeng et al. 2022; Zhang et al. 2015). However, the commonly used evaluation parameter, the land equivalent ratio (LER), is a dimensionless indicator that quantifies the ratio of the area under sole cropping to the area under intercropping required to achieve the same yield. While the LER provides relative information on the yield benefits, it falls short of offering absolute information, making it limited in its ability to accurately guide specific field intercropping practices. Thus, we assessed the absolute yield gain under intensive soybean/maize intercropping using the net effect (NE) of intercropped species production, which is defined as the overyielding under intercropping compared to monoculture of the crops per unit area (Li et al. 2020a). Additionally, we explored the same mechanisms driving LER crop yield gain within the NE framework.

To accurately guide local farmers' selection and optimize the utilization of the aforementioned intensive soybean/maize intercropping practice in China, where it is widely applied and has significant benefits, particularly in terms of soybean yield, we conducted a comprehensive evaluation of the NE and LER under intensive soybean/maize intercropping in China. This evaluation involved a rigorous meta-analysis of peer-reviewed literature, incorporating meta-analysis protocols. Furthermore, we estimated the potential NE and LER for each cell of Chinese cropland, thereby mapping the spatial distribution of potential yield gains from intensive soybean/maize intercropping in China. To accomplish this, we utilized a random forest (RF) model, a statistical and machine learning method known for its ability to accommodate highly correlated variables, complex nonlinear interactions, and predictors of varying types and ranges (Ransom et al. 2019; Krause et al. 2020).

In summary, our primary objectives were to (i) gain a quantitative overview of the absolute yield gain under intensive soybean/maize intercropping in China, (ii) analyse the mechanisms underlying such crop yield gains, and (iii) simulate the potential crop yield advantage of this system in a per-cell map of Chinese cropland while incorporating current natural resource and soil fertility information.

Materials and methods

Data collection

A comprehensive literature search was conducted to investigate the effects of intensive soybean and maize intercropping on crop yields in China. Databases such as Web of Science, Google Scholar, Springer Link, and China Knowledge Resource Integrated Database were utilized to search for relevant publications from 1980 to November 2022. The searched terms included ‘Intercrop*’, ‘Maize or Corn’, ‘Soybean’, ‘Legume’, ‘Maize/soybean or soybean/maize intercropping’, and ‘Grain or yield’ in the article title, abstract, or keywords. Doctoral theses, master’s theses, and conference proceedings were excluded from the analysis. To be included, each study had to meet specific criteria: (1) the paper had to be published in English or Chinese; (2) only studies involving field experiments were considered, excluding those with experiments conducted in pots or greenhouses; (3) the study had to report at least one pair of soybean and maize crops under both intercropping and monoculture, with other crops (e.g., peanut, oilseed) allowed as long as they were mentioned in the same article; (4) soybean and maize had to be arranged in strip intercropping patterns with large inputs of fertilizer and pesticides, potentially with the use of machinery to represent typical intensive intercropping practices in China; (5) the experiment had to be replicated at least three times and conducted in mainland China; and (6) the management practices, authors, sampling years, geographic location (GPS coordinates), and experimental treatments had to be clearly described and easily accessible.

After completing the search and selection process, a total of 135 publications were included, comprising 88 in Chinese and 47 in English. From these publications, we collected 460 crop yield means under intercropping and monoculture systems, along with their standard deviation (SD) and replication numbers. The SD was calculated using SD = SE×\(\surd n\) if only the standard error (SE) was reported. If only means were mentioned, the SD was estimated using the linear-regression method of Bracken (1992) in the “metagear” package of R software (a missingness of 35% existed for SD values in our dataset). Other relevant information, such as accumulated sunshine and temperature (> 10 ℃, Accum.Temp) hours per year, the annual mean precipitation (Precip), and air temperature (Temp), were compiled from the original papers. The annual mean aridity and humidity index (H.index) of the experimental sites was obtained from a third-party database using the site’s geographic coordinates. Soil properties, including total nitrogen (TN), total phosphorus (TP), total potassium (TK), available nitrogen (a.N), available phosphorus (a.P), available potassium (a.K), soil organic matter (SOM), bulk density (BD), pH, texture (Sand, Clay, Silt), and soil type, were extracted from original articles when available. Intercrop management practices, such as the delay in soybean sowing after maize (S.delay.days), the row ratio of maize and soybean (Row.ratio), row distances between maize and maize (MM.row.distatnce), maize and soybean (MS.row.distance), and soybean and soybean (SS.row.distance), and the seeding density of soybean (S.Inter.density) or maize (M.Inter.density) under intercropping or monoculture were also recorded. For clarity, all information and data from the published papers were compiled into a dataset called "Dataset1".

In addition, basic data on China dryland soil (Dataset2) were collected from various sources to predict the intercropping benefit in specific fields. This dataset included natural factors of both local environments and soil properties. The China cropland map of 2018 at a resolution of 250 m2 served as the basic spatial GIS database, integrating other data such as sunshine hours, the aridity index, the humidity index, precipitation, air temperature, and plant available-accumulated temperature (> 10℃) as an average annual mean value from 2006 to 2015. Additional soil texture (Sand and Clay), total and available nutrient (TN, TP, TK, a.N, a.P, a.N), SOM, pH, and soil bulk density data from the second national soil census of China were also incorporated into cropland map at the same resolution. All original databases used in Dataset2 were provided by the “Institute of Soil Science, Chinese Academy of Sciences” and calculated using computer software such as ArcGIS 10.7 and Python 3.7.0.

Data analysis

The net effect (NE), which represents the absolute yield gains of soybean and maize under intercropping compared with monoculture of the crops, was calculated using the following equation (Li et al. 2020b):

$$\mathrm{NE}=\left({\mathrm{Y}}_{\mathrm{S}}+{\mathrm{Y}}_{\mathrm{M}}\right)-({\mathrm{EY}}_{\mathrm{S}}+{\mathrm{EY}}_{\mathrm{M}})$$
(1)

where YS and YM denote the average yield of soybean and maize under intercropping, respectively, while EYS and EYM represent the expected yields of these two crops. The expected yields can be calculated as follows:

$${\mathrm{EY}}_{\mathrm{S}}={\mathrm{M}}_{\mathrm{S}}\times {\mathrm{LS}}_{\mathrm{S}}$$
(2)
$${\mathrm{EY}}_{\mathrm{M}}={\mathrm{M}}_{\mathrm{M}}\times {\mathrm{LS}}_{\mathrm{M}}$$
(3)

where MS and MM refer to the yields per unit area of soybean and maize under monoculture, respectively. LSS and LSM denote the land shares of soybean and maize in the intercropping system. The land share is determined based on the densities of soybean and maize in both intercropping and monoculture system or based on the spatial arrangement of rows or plants.

Furthermore, the land use efficiency known as the land equivalent ratio (LER), is defined as the sum of partial LER (relative yields) of soybean (pLERS) and maize (pLERM).

$$\mathrm{LER}={\mathrm{pLER}}_{\mathrm{S}}+{\mathrm{pLER}}_{\mathrm{M}}=\frac{{Y}_{S}}{{M}_{S}}+\frac{{Y}_{M}}{{M}_{M}}$$
(4)

where YS and YM represent the yields per unit of total area in the intercrop, while MS and MM represent the yields of soybean and maize under monoculture (as mentioned above).

Meta-analysis

A meta-analysis was conducted to assess the overall effect of intercropping on the NE and LER at both the national scale and in major grain production regions. The effect size for NE was determined by calculating the raw mean difference (MD) between the actual yield and the expected yield (Viechtbauer 2010; Taylor 1997):

$$MD=Ma-Me=\left(Ys+Ym\right)-(EYs+EYm)$$
(5)
$${V}_{D}=\frac{{SD}_{a}^{2}}{{n}_{a}}+\frac{{SD}_{e}^{2}}{{n}_{e}}$$
(6)
$${SD}_{a}={SD}_{Ys}+{SD}_{Ym}$$
(7)
$${SD}_{e}={LS}_{s}\bullet {SD}_{EYS}+{LS}_{m}\bullet {SD}_{EYm}$$
(8)

where Ma and Me represent the total actual yield and expected yield under intercropping, respectively. Ys, Ym, EYs, and EYm denote the average yield of soybean and maize under intercropping, and their average expected yields (as calculated in Eq. 2). VD refer to the variances of MD. SDa and SDe refer to the standard deviation of Ma and Me, respectively.\({SD}_{Ys}\) and \({SD}_{Ym}\) represent the standard deviation of soybean and maize yield under the intercropping system, while \({SD}_{EYs}\) and \({SD}_{EYm}\) represent the standard deviation of these yields under monoculture (Taylor 1997). \({LS}_{s}\) and \({LS}_{m}\) denote the land shares of soybean and maize in the intercropping system.

To estimate the effect size of the LER, the natural logarithm response ratio (ln(LER)) was calculated as (Viechtbauer 2010; Taylor 1997):

$$\mathit{ln}\left(LER\right)=\mathit{ln}\left(\frac{{Y}_{s}}{{M}_{s}}+\frac{{Y}_{m}}{{M}_{m}}\right)$$
(9)
$$V\left(\mathit{ln}\left(LER\right)\right)=\frac{{pLER}_{s}^{2}}{p{LER}_{s}+p{LER}_{m}}{\left(\frac{SD\left({Y}_{s}\right)}{{Y}_{s}}+\frac{SD({M}_{s})}{{M}_{s}}\right)}^{2}+\frac{{pLER}_{m}^{2}}{p{LER}_{s}+p{LER}_{m}}{\left(\frac{SD\left({Y}_{m}\right)}{{Y}_{m}}+\frac{SD({M}_{m})}{{M}_{m}}\right)}^{2}$$
(10)

where Ys, Ym, Ms, and Mm represent the average yield of soybean and maize in the intercropping system, as well as under monoculture (as calculated in Eq. 2, 3, and 5). V(ln(LER)) represent the variance of ln(LER). pLERs and pLERm represent the partial LER (relative yields) of soybean and maize, respectively. SD(Ys), SD(Ym), SD(Ms), and SD(Mm) represent the standard deviations of Ys, Ym, Ms, and Mm, respectively.

The comprehensive effect (mean effect size: R+) and its confidence interval were calculated using random-effects model in R software (version 3.6.1) with the "metafor" package, following the equations (Viechtbauer 2010):

$${R}_{+}=\frac{{\sum }_{i=1}^{k}{W}_{i}{R}_{i}}{{\sum }_{i=1}^{k}{W}_{i}}$$
(11)
$${W}_{i}=\frac{1}{{{\tau }^{2}+V}_{Di}}\mathrm{ or }{W}_{i}=\frac{1}{{\tau }^{2}{+V(lnLER)}_{i}}$$
(12)
$$\mathrm{S}=\sqrt{\frac{1}{{\sum }_{i=1}^{k}{W}_{i}}}$$
(13)
$$95\mathrm{\% CI}={R}_{+}\pm 1.96 S$$
(14)

In these equations, k represents the total number of studies, i denotes the ith study, Wi denotes the weight of the ith study, \({\tau }^{2}\) is the between-study variance, S represents the standard deviation of R+, and the 95% CI denotes the confidence interval of R+. For improved comprehension, the relative percentage change (%) in ln(LER) and its 95% CI were transformed using the formula (eR+—1) × 100%.

Random forest regression model

A random forest (RF) regression model was developed to assess the importance of influential factors and predict the distribution of the NE and LER in a per cell map of China for intercropping (Liu et al. 2019). The model-building and prediction process consisted of two major steps, which are described below.

First, Dataset1 was used as the training dataset (S) to create the RF models. The dataset included a set of M-dimensional vectors (X) representing the influence parameters and the corresponding target parameters (NE or LER) denoted as Y. The RF models were generated by randomly selecting ntree subsets (Sk) using the bootstrap resampling method to create regression tree models. At each node of the tree, mtry features were randomly selected, and split points for these features were explored to minimize the sum square error between the estimated and real values. The optimal split variable (j) and split point (s) were determined by solving the following equation:

$$\mathrm{min}\left(j,s\right)\left[{min}_{c1}\sum_{yi\in {R}_{1}\left(j,s\right)}{\left(yi-c1\right)}^{2}+{min}_{c2}\sum_{yi\in {R}_{2}\left(j,s\right)}{\left(yi-c2\right)}^{2}\right]$$
(15)

Here, R1(j, s) and R2(j, s) represent the resulting half-planes based on the split, and c1 and c2 are the average output values for datasets R1 and R2, respectively. This splitting process was repeated for each sub-node until a minimum node size (number of observations at a terminal node) was reached. The ensemble of all regression trees (hi(X)) yielded the final prediction model (RF) as the average of the individual tree outputs.

The general errors of prediction based on the out-of-bag (OOB) data were calculated as the mean square error (MSE_OOB). The OOB data represent approximately 37% of the training data that were not selected in each bootstrap sample. The importance of each variable was determined by measuring the increase in the OOB error (IncMSE) when the variable was randomly permuted while keeping all other variables unchanged. The relative importance (RI) of each variable, expressed as a percentage, was calculated as follows:

$$RI=\frac{IncMSE}{{\sum }_{i=1}^{M}MS{IncMSE}_{i}}$$
(16)

where, IncMSE_i represents the increase in mean square error when the ith variable is permuted, and M represents the total number of variables.

Second, after creating the RF model, Dataset2 was used to predict the NE and LER for each target field across mainland China's dryland soil. The "party" package in R software was employed to calculate the RF model and variable importance using the "cforest" and "varimp" functions, respectively. To evaluate variable importance, all 30 variables from Dataset1 were used to train the models, and the resulting RI values for the NE and LER were exported. For prediction, a subset of variables from Dataset1 including natural environmental parameters (Sunshine, Precip, Temp, Aridity, H.index, Accum.Temp) and initial soil properties (BD, SOM, TN, TP, TK, a.N, a.P, a.K, pH, Sand, Clay) was selected. The NE and LER were then predicted based on the corresponding variables from Dataset2. The RF models were created with a set number of trees (1,000) in each forest, a minimum node size of 2, and a randomly selected number of features (mtry) set at 3 for node splitting. The models were executed by using the high-performance computing resources provided by the Beijing Super Cloud Computing Center.

Model accuracy and validation

The performance of the RF model was evaluated using the "leave-one-out cross-validation" framework. This involved excluding one observation at a time from the dataset, using the remaining nobs-1 observations to train the RF model, and then generating an estimated value for the excluded observation. This process was repeated until each observation had both a real value (from the original data) and an estimated value (from the prediction data). Furthermore, the R2 and RMSE measures were calculated to evaluate the accuracy of the RF model as follows:

$$MAE=\frac{{\sum }_{i=1}^{n}|pi-o|}{n}$$
(17)
$$RMSE=\surd \frac{1}{n}\sum_{i=1}^{n}{(pi-o)}^{2}$$
(18)
$${R}^{2}=\frac{{\sum }_{i-1}^{n}{(pi-o)}^{2}}{{\sum }_{n=1}^{n}{(oi-o)}^{2}}$$
(19)

where MAE represents the mean absolute error, RMSE represents the root mean square error, R2 denotes the regression coefficients of determination, Pi denotes the estimated value, Oi is the real value, and O is the average of the real value.

Finally, the linear regression relationship between the estimated and real values was assessed to validate the RF model.

Results

Overall impact of intensive intercropping on crop yield

In the intensive soybean/maize intercropping system in China, the average absolute yield gain (NE) was 3.3 ± 0.09 Mg ha−1(mean ± s.e.m.), while the land use efficiency (LER) was 1.4 ± 0.02 (Figs. 2, 3a, b). Approximately 94% of the individual NE values were greater than 0, indicating a positive yield gain, and the same percentage of LER values exceeded 1, demonstrating the efficiency of intercropping (Fig. 3a, b). When examining regional variations within China's major food production areas, the NE values showed an increasing trend from the Northeast Plain (NEP) to the North China Plain (NCP), the northwest semi-arid (NWSA), and the Yangtze River Basin (YRB). The NE was significantly lower in South China (SC), with an average of 1.9 Mg ha−1 (Fig. 1c). A similar pattern of increase was observed for the LER across the regions, except for NWSA, where a slight decrease was observed compared to NEP and NCP (Fig. 1c).

Fig. 2
figure 2

The geographical distribution of the field experimental sites across Chinese dryland soils included in this analysis

Fig. 3
figure 3

Frequency distributions of the NE (a), the LER (b) and their regional differences (c) in intercropping systems in China. An LER > 1 indicates better performance of intercropping than monoculture. n indicates the number of samples with three or more replications. The categorized regions included the Northeast Plain (NEP), the North China Plain (NCP), northwest semi-arid region (NWSA), the Yangtze River basin (YRB) and South China (SC)

Potential drivers of intensive intercropping-induced changes in crop yield

Overall, the multiple random-forest models revealed that temperature and soybean delay days were the most important factors influencing the intensive intercropping-induced changes in crop yield, contributing 10% and 14% to the NE and LER, respectively (Fig. 4). Specifically, the NE was strongly influenced by maize plant density (9%), soybean delay days (8%), annual accumulated temperature, and row distance between soybean and maize, each accounting for 5% of the variance (Fig. 4a). Similarly, the LER was influenced by sunshine hours (11%), maize plant density (10%), row distance, total potassium, temperature, and available phosphorus, with contributions of 6%, 6%, 5% and 5%, respectively (Fig. 4b). Notably, other factors, such as the aridity index, the humidity index, soil total nitrogen and phosphorus, available nitrogen, soil organic matter, bulk density, nitrogen and phosphorus inputs, soil texture and pH, showed no significant effect on intensive intercropping yield in China, with relative contributions of less than 4% each (Fig. 4a, b).

Fig. 4
figure 4

The relative importance (%) of variables for the NE (a) and LER (b) under intercropping as affected by different factors based on the random forest regression model. The considered variables included the annual mean of accumulated sunshine (Sun.), annual mean precipitation (Precip.), annual mean temperature (Temp.), annual mean aridity (Aridity), annual mean humidity index (H.index), annual mean plant available accumulated temperature (Accum.temp., > 10 ℃), soil total and available nutrients (TN, TP, TK, a.N, a.P, a.K), soil organic matter (SOM), pH, soil bulk density (BD), and soil texture (Sand, Clay, Silt). Soybean sowing delay time (Delay), the row ratio of maize and soybean (Row.ratio), the row distance of maize and maize (MM.row.dis.), of soybean and soybean (SS.row.dis.), and of soybean and maize (M.S.row.dis.), the seeding density of maize (M.inter.den.) or soybean (S.inter.den.) under intercropping, and the experiment duration years (Dur.)

In more detail, the NE showed a slightly decline while the LER exhibited a rapid decline with increasing average sunshine hours under the intensive intercropping compared to sole crops (Fig. 5a, k). Similarly, the NE and LER declined with both the annual average temperature and the accumulation of temperature above 10 ℃ (Fig. 5b, c, l, m). The relationships between intercrop yield and soil properties were more complex, with the NE and LER showing rapid decreases in response to soil available phosphorus. Specifically, the NE was influenced by phosphorus inputs, and the LER was affected by soil pH (Fig. 5d, e, p, q). However, the LER initially increased from approximately 1.0 to 1.7 or 1.5 within the first 18 g kg−1 of soil total potassium and 185 mg kg−1 of soil available potassium, respectively, and then drastically decreased with increasing potassium levels (Fig. 5n, o). Field management also had significant effects on crop yield in the intensive intercropping system (Fig. 4). The LER reached its highest value at approximately 50 days when soybean sowing after maize (Fig. 5r), while the change in the NE was minimal (Fig. 5f). The NE decreased with increasing row distance between soybean and maize (Fig. 5g), whereas the LER increased and reached its highest value at approximately 70 cm (Fig. 5s). Both the NE and the LER initially increased with increasing maize plant density in specific intercropping systems, followed by a rapid decline if the maize seed density exceeded approximately 60 thousand per hectare (Fig. 5i, t).

Fig. 5
figure 5

The relationships between the NE (a-j) and the LER (k-t) and influencing factors under intercropping. Only variables with large effects are shown, which are accumulated sunshine (Sunshine, hour), annual mean temperature (Temp, ℃), annual mean plant available accumulated temperature (Accum.Temp, > 10 ℃), available P (mg kg−1), the amount of input P fertilizer (Input.P, kg ha−1), soybean sowing delay time (S.delay.days), the row distance of soybean and maize (M.S.row.distance, cm), the seeding density of soybean (S.inter.density, 104 plants ha−1) and maize (M.inter.density, 104 plants ha−1) under intercropping and the experiment duration (Duration, year) for the NE, according to Fig. 4a. Then, the annual mean accumulated sunshine (Sunshine, hour), annual mean precipitation (Precip, mm), annual mean temperature (Temp, ℃), soil total K (TK, g kg−1), available K (a.K, mg kg−1), available P (a.P, mg ka−1), pH, soybean sowing delay time (S.delay.days), row distance of soybean–maize (M.S.row.distance, cm) and seeding density of maize (M.inter.density, 104 plants ha−1) under intercropping for the LER

Potential crop yield gain under intensive intercropping as simulated by the RF model

The probabilities of the NE (Fig. 6a) and LER (Fig. 6b) were estimated using random forest regression models by incorporating local natural factors (Sunshine, Precip, Temp, Aridity, H.index, Accum.Temp) and specific soil fertility information (TN, TP, TK, a.N, a.P, a.K, SOM, BD, pH, Sand, Clay) under intensive intercropping in China. Overall, the models explained 26% and 37% of the national NE and LER, respectively, with acceptable accuracy indicated by a 1.26 MAE and 1.62 RMSE for the NE, and a 0.14 MAE and 0.18 RMSE for the LER in large-scale estimation (Table 1).

Fig. 6
figure 6

Estimates of the NE (a) and LER (b) under intercropping based on random-forest regression model incorporating local natural factors of the environment and soil properties in dryland areas of China. Only the cropping regions are presented. The colours indicate ranges of predicted NE and LER.. NA indicates the non-cropping region of mainland China

Table 1 The model performance of RF-NE and RF-LER as evaluated by the correlation between the stimulated and measured data

According to the model predictions, the average NE and LER probabilities for China were estimated to be 2.8 Mg ha−1 and 1.4, respectively (Fig. 6 and Table 2). The predicted yield showed a gradual increase from north to south without minus effect value regions, reaching its peak in the YRB and SC (Fig. 6). The NE of the NEP, NCP, and NWSA regions were above zero, and the LER values of these three regions exceeded 1, although they were slightly lower than the national averages (Fig. 6). At the provincial level (Table 2), the potential NE ranged from 3.6 Mg ha−1 in Guizhou to 2.5 Mg ha−1 in Liaoning, while the LER ranged from 1.5 (Guizhou, Chongqing, Hunan, Sichuan, Guangxi, and Hubei) to 1.35 (Ningxia and Shangxi). The ten provinces with the highest NE values were Guizhou, Chongqing, Hunan, Sichuan, Zhejiang, Guangxi, Hubei, Jiangxi, Shanghai, and Fujian, most of which also exhibited high LER values.

Table 2 Estimates of the NE (a) and LER (b) for each province of China under intercropping based on random forest regression models

Discussion

Overall impact of intensive intercropping on crop yield

Our meta-analysis revealed that the average net effect (NE) and land equivalent ratio (LER) of intensive soybean/maize intercropping in China were 3.2 Mg ha−1 and 1.4, respectively, surpassing the values reported in several global meta-analyses. Li et al. (2020a, b) synthesized a global NE of 2.1 Mg ha−1, while Yu et al. (2015), Martin et al. (2018), and Xu et al. (2020) reported global LER values of 1.22, 1.3, and 1.32, respectively. Moreover, our estimates exceeded the China-scale LER of 1.35 reported by Xu et al. (2020). One possible explanation for the weaker intercropping yield advantages observed in previous global meta-analyses is the inclusion of row intercropping and mixed intercropping patterns, which have consistently demonstrated significantly lower yield advantages compared to strip intercropping, as indicated by Li et al. (2020b).

Crop yield gain under intensive intercropping as affected by climatic traits

Sunshine hours emerged as a crucial climatic factor strongly impacting the crop yield gain under intensive soybean/maize intercropping in China, as indicated by our RF model. It accounted for 4% and 11% of the variance in the NE and LER, respectively (Fig. 4). Specifically, we observed a negative correlation between sunshine hours and both the NE and the LER across China (Fig. 5a). This finding suggests that regions with less sunshine exhibit higher absolute yield gains and land use efficiency under intensive intercropping. In essence, the intercropping system demonstrates a greater potential for efficient light utilization in areas with limited sunlight. This result provides empirical support for the superior light capture ability of intensive intercropping compared to sole crops, aligning with the objectives of agricultural strategies aimed at optimizing light resource utilization. The observed yield advantage of intercropping can be attributed to temporal niche differentiation in partly (Yu et al. 2015). Maize typically has a longer growth period than soybean but exhibits slower growth during the seedling stage. In contrast, soybean has a shorter growth period but displays rapid growth within the same timeframe. As a result, while soybean reaches maturity and is harvested relatively quickly, maize continues to grow and cover the bare land, thereby significantly increasing the overall light capture. This enhanced light resource utilization is particularly pronounced in regions with fewer sunshine hours (Dong et al. 2018; Hanming et al. 2012; Li et al. 2021a). Furthermore, in strip intercropping systems where soybean partially replaces maize, the border rows of maize receive and absorb more light due to their dominant position and lower planting density compared to those of sole crops (Zhang et al. 2008).

Environmental temperature is another crucial variable influencing the yield of intensive soybean/maize intercropping in China. Our dataset revealed that the annual mean air temperature contributed 10% to the NE and 5% to the LER, while the annual mean accumulated temperature (> 10 ℃) accounted for 5% of the NE and less than 4% of the LER (Fig. 4). Moreover, our analysis reveals that NE experiences a gradual decline with rising temperature and accumulated temperature, whereas LER initially ascends with temperature before declining at higher temperatures. Notably, our findings illustrate that in cooler areas with temperatures below approximately 13 ℃, LER is lower compared to regions with an annual air temperature exceeding 13 ℃ (Fig. 4m). This observation may be attributed to a decrease in soybean yield under cooler conditions when intercropped with tall maize, creating suboptimal conditions for lower soybean growth within the intensive soybean/maize system due to shading by maize (Liu et al. 2017).

Crop yield gain of intensive intercropping as affected by soil properties

In China, the yield of intensive intercropping is predominantly influenced by soil available nutrients, as opposed to factors such as soil texture, bulk density, total nutrients, and soil types (Fig. 4). However, it is important to interpret these findings cautiously, considering that China has experienced long-term excessive fertilizer inputs, even in regions with already high soil nutrient contents (Li et al. 2011). Our results do not imply that the aforementioned factors are unimportant, but rather indicate that, under current conditions, they do not significantly restrict intercrop yield gain in China (Li et al. 2016). Among the available nutrients, nitrogen made a relatively small contribution of only 3% (sum of the NE and LER), whereas soil available phosphorus accounted for 9% and potassium accounted for 7% under intensive intercropping (Fig. 4). This minimal impact of nitrogen may be attributed to the inherent ability of soybean to effectively utilize atmospheric nitrogen (N2) and provide additional nitrogen sources to adjacent maize, thereby ensuring ample nitrogen availability for subsequent crops (Duchene et al. 2017; Zhao et al. 2022). Furthermore, both the NE and the LER tended to decrease with higher initial soil available phosphorus and potassium levels, as well as with increased phosphorus inputs (Fig. 5). This suggests that intensive intercropping results in greater nutrient utilization efficiency in regions characterized by poor soil phosphorus and potassium fertility. Recent studies have highlighted the role of root interactions in intercropping systems as a major mechanism for accessing limited or unavailable nutrients, particularly under adverse environmental conditions such as phosphorus deficiency (Zhao et al. 2022). Overall, intensive soybean-maize intercropping in China achieves yield advantages in both low-fertility and high-fertility areas.

Crop yield gain of intensive intercropping as affected by field management

Field management practices play a significant role in determining crop yield gains under intensive intercropping, accounting for 33% and 38% of the contributions to the NE and the LER, respectively (Fig. 4a, b). Among the various field management factors considered, soybean delay time, maize plant density, and maize-soybean row distance emerged as more influential variables than the row ratio, row distance between maize plants, row distance between soybean plants, and soybean plant density (Fig. 4). This observation aligns with the findings of Ahmed et al. (2020b), demonstrating an increase in land equivalent ratio (LER) with greater sowing delay of soybean, attributed to the reduction of size-asymmetric competition (Fig. 5r). Implementing intensive intercropping with soybeans requires careful field practices and mechanical interventions to avoid damage to maize seedlings (Ahmed et al. 2020a). To optimize the intercropping system, we recommend simultaneous planting of soybeans and maize using a single machine. The choice of row distance and planting density strongly influences the competitive interactions for light, heat, water, and nutrients among intercropped species, ultimately impacting their yields (Hu et al. 2020; Raza et al. 2020; Ren et al. 2016). Based on our field dataset, we propose an optimal row distance of 70 cm and approximately 60,000 maize plants per hectare for intensive soybean/maize intercropping in China, as these values corresponded to the highest LER (Fig. 5s) and the highest NE and LER (Fig. 5h, t). However, it is important to note that our conclusions are based solely on yield gains and land use efficiency, disregarding potential environmental impacts.

Notably, our analysis revealed no significant effect of fertilizer inputs (nitrogen, phosphorus and potassium) on intensive soybean/maize intercropping in China, except for a negative relationship observed between phosphorus inputs and the NE (Fig. 5e). This finding aligns with several global meta-analysis reporting a non-significant relationship between fertilizer inputs and the LER in intercropping systems (Yu et al. 2015; Wang et al. 2022; Martin et al. 2018; Himmelstein et al. 2016; Pelzer et al. 2014). Furthermore, Zhu et al. (2023) found that high fertilizer inputs, including nitrogen, phosphorus and potassium inputs, weakened intercrop overyielding at both field and global scales. Although we agree that higher intercrop yield gains are often achieved with greater fertilizer inputs (Li et al. 2020b), this relationship was not evident in our analysis due to the influence of other more significant factors, as mentioned earlier.

Potential crop yield gain of intensive intercropping and its spatial distribution

According to our model predictions, the average net effect (NE) and land equivalent ratio (LER) of intensive soybean/maize intercropping in China were estimated to be 2.8 Mg ha−1 and 1.4, respectively (Fig. 6 and Table 2). This implies an overyielding of 2,800 kg per hectare compared to current sole crop cultivation, or the need for approximately 1.4 times more cropland to achieve equivalent yields in monoculture systems, considering current available soil and environmental resources. Although our models results explained only 26% to 37% of the NE and LER, this estimate still underscores the substantial theoretical yield potential of intensive intercropping in China. Moreover, our simulations revealed a gradual increase in predicted yields from north to south, with the highest levels observed in the Yangtze River Basin (YRB) and the South China (SC) region (Fig. 6). This trend aligns with the results of our meta-analysis, although SC exhibited the lowest NE and LER in the meta-analysis (Fig. 3). The failure to simulate values in SC accurately may be attributed to the challenge of incorporating complex field management practices into the RF models, as these practices are difficult to represent as the continuous variables required by the RF model. Additionally, regions with high NE and LER probabilities corresponded well to areas characterized by higher temperatures, fewer sunshine hours, and lower soil fertility (please see the resource maps in Dai et al. 2015; Xu et al. 2015; Zhao et al. 2013). The higher absolute yield gains and land use efficiency observed in regions with limited agricultural resources can be attributed to the efficient utilization of light, heat, radiation, and environmental resources under intensive soybean/maize intercropping (Fig. 5). Finally, our results highlight Guizhou, Chongqing, Hunan, Sichuan, Zhejiang, Guangxi, Hubei, Jiangxi, Shanghai, and Fujian provinces as the top ten recommended areas for the implementation of intensive intercropping, although other regions also demonstrated positive effects (NE > 0, Fig. 6) on intercrop yield.

Conclusions

Our study demonstrates that intensive soybean/maize intercropping with large inputs of fertilizers, pesticides, and machinery yields significant benefits compared to sole crop cultivation in China. The higher yields achieved in intercropping systems are primarily driven by factors such as light (radiation), air temperature, soybean delay days, and maize density. Furthermore, this intensive intercropping maintains efficient acquisition of resources, including light, temperature (heat), accumulated temperature, and soil nutrients, particularly in regions characterized by low soil fertility and limited agricultural resources. Additionally, our study provides national potential distribution maps for the net effect (NE) and land equivalent ratio (LER) in China, revealing that the Yangtze River Basin may is the most suitable area for implementing intensive intercropping.