1 Introduction

Like industry had to go through technological changes, agriculture needs to go through the same path to survive. Agricultural production is mainly challenged by weather conditions, crop diseases, and pest infections, which make it vulnerable to high risks. Increased demand, changing consumer patterns, food waste, climate change, and food scarcity are other issues that intensify the need for a drastic change in the agricultural sector (Clercqet al., 2018). To meet these challenges, agricultural actors, especially small and medium-sized enterprises (SMEs) are shifting towards digitalization. This includes robotics (Lottes et al., 2018), artificial intelligence (AI) (Liu et al., 2020), blockchain (Kamble et al., 2020), internet of things (IoT) (Ray, 2017), and big data analytics (BDA) (Liu et al., 2020). Industry 4.0 (I4.0) holds promising opportunities to reshape the agriculture industry and transform it into what is called, “Agriculture 4.0” (Clercq et al., 2018; Liu et al., 2020). However, this transformation is escorted by high investments that hinder SMEs' ability to invest in I4.0.

On the other hand, supply chain finance (SCF) has been growing as an innovative method to support SMEs' operations and investments (Yan et al., 2020). Agricultural SMEs can benefit from SCF to get funding for their investments in agriculture 4.0, lower financing costs and improve financing efficiency and effectiveness (Xu et al., 2018). This can be achieved by applying for loans. These loan requests are then evaluated at the financial service providers (FSPs) level based on the credit risk assessment. At this level, the leading enterprises (LEs), with whom SMEs are involved, act as the guarantors between agricultural SMEs and FSPs (Zhang et al., 2015). However, the few number of agricultural SMEs that have cooperated in SCF over the past decades alongside the fast-changing and uncertain technological context of the agricultural sector generate high credit risk with SCF for investments in agriculture 4.0. This hinders the wide implementation of agriculture 4.0, and creates further risk of propagation through the supply chain (SC) (Wang et al., 2020a, 2020b). Subsequently, it is crucial for FSPs to make correct credit loan decisions in SCF for investment, especially in agriculture 4.0.

For this purpose, machine learning (ML) approaches emerge as a strong forecasting method to predict SMEs' credit risk in SCF for financing both operations and investments (Abedin et al., 2020; Zhu et al., 2019). ML can significantly reduce variance and improve accuracy (Zhang & Ma, 2012). Ensemble machine learning (EML) in particular were acknowledged to provide better forecasting performance than ML and to generate better results with fewer requirements of data (Bi et al., 2019; Shajalal et al., 2021). EML uses multiple learning algorithms to obtain better predictive performance than the ones that originally derived from (Opitz & Maclin, 1999). Nevertheless, very few studies have focused on the use of EML for SMEs’ credit-risk forecasting in SCF for specific investments such as agriculture 4.0. Besides, no comparison between two or more hybrid methods has yet been conducted. This limited use of EML generates a lack of accuracy and diversity of the proposed models, which can seriously affect the forecasting of credit risk in SCF, increase credit risk uncertainty and hinder further investment in agriculture 4.0. Therefore, there is an urgent need to develop a new model to forecast credit risk as none of the models developed in the literature can deal with this special context.

Thus, this paper aims to address this research void, and address the following research objectives.

  • To prove that the hybrid EML approach is remarkably better than Individual ML methods in predicting SMEs credit risk to help SMEs, FSPs and LEs to accurately assess and mitigate credit risk associated with I4.0 investment of SMES.

  • To provide the optimistic set of variables that SMEs, FSPs and LEs must focus on alleviate credit risks, thus providing the conditions under which the SMEs would be seen as creditworthy in the eyes of their LEs and FSPs.

We, therefore, propose a hybrid EML approach that investigates the combination of three different sets of techniques, i.e., Gama Test (GT), Rotation Forest algorithm (RotF) and Loogit Boosting algorithm (LB). The proposed approach has been tested using data collected from 216 agricultural SMEs, 195 LEs and 104 FSPs operating in African agricultural market. The experimental study includes a meticulous performance comparison of the proposed EML approach with certain hybrid and simple classifiers to confirm its performance.

The remainder of our paper is structured as follows: Sect. 2 reviews the literature on agricultural SMEs' credit risk and the methods of forecasting SMEs’ credit risk for investments in agriculture 4.0 through SCF. Section 3 explains the construction and theory of the proposed approach. Section 4 presents the numerical application. Section 5 discusses the results of the experiment. Implications of the study are provided in Sect. 6. Finally, conclusions and future research directions are given in Sects. 7 and 8.

2 Theoretical underpinning

2.1 The concept of agriculture 4.0

Agriculture 4.0 integrates the farming operations’ internal and external interacting networks to provide digital, autonomous, and real-time information and allow communication between all parties in the agriculture industry (Wolfert et al., 2017). The technologies that enable agriculture 4.0 include robotics, AI, blockchain, BDA, IoT, and 3D food printing. Robots are used to collect vegetables and harvest the good ones through computer vision (Lottes et al., 2018). AI is applied in different applications in agriculture like decision support systems, mobile expert systems, and intelligent animal health monitoring (Liu et al., 2020). Traceability and tracking of product movements are ensured with blockchain which enhances the recall process of contaminated food. Historical data about weather, seed, soil quality is analyzed to provide farmers with insights for better decision-making using BDA (De Clercq, Vats, and Biel 2018). Monitoring the soil’s efficiency, temperature and humidity have become easier with the IoT (Prathibha et al., 2017). In a nutshell, I4.0 can deliver important improvements to shift the agriculture industry's challenges into opportunities.

2.2 Supply chain finance for investment in agriculture 4.0

Lipper et al. (2014) suggested that adapting agricultural systems will require increased upfront investment. Therefore, it is important to identify and credit the “co-benefits” generated through smart agriculture and to get access to the appropriate funds to absorb financial risks generated by innovative activities. The research and development phases may be co-financed with private partners, governments, or even universities (Clercq et al., 2018). The role of governments in supporting this shift is highly highlighted. This assistance can be achieved by offering financial incentives, identifying potential partners and deals to invest in, and providing affordable infrastructures (Clercq et al., 2018). Despite the importance of governments’ support, food SC is required to consider other alternatives. In this context, SCF emerges as an effective strategy to lower financing costs and improve financing efficiency and effectiveness (Xu et al. 2018).

Hofmann (2005) defines SCF as an approach for two or more organizations in a supply chain, including external service providers, to jointly create value through means of planning, steering, and controlling the flow of financial resources on an inter-organizational level. While preserving their legal and economic independence, the collaboration partners are committed to share the relational resources, capabilities, information, and risk on a medium- to long-term contractual basis. Suppliers and buyers can collaborate with financial institutions and high-technology firms to provide capital requirements, benefiting from both lower interest rates and flexible payment methods (Wuttke et al., 2016). The use of SCF techniques such as cash-to-cash cycle, the weighted average cost of capital (WACC), payables and receivables have the potential to remarkably reduce operational risks, improve data visibility, availability, and delivery of cash (Lam et al., 2019).

SCF allows for financial institutions to improve their risk assessment process when dealing with SMEs. Deakins and Hussain (1994) consider that this assessment is escorted with high uncertainty due to information distortion and asymmetry, which affects FSPs’ willingness to approve credit loans. In this context SCF process guarantees data availability and accuracy, which help FSPs to accurately estimate credit risk and profitability. Weak food SC players like agricultural SMEs can benefit from SCF to invest in I4.0, resulting in solving not only liquidity-profitability problems in the short term, but also financial and credit risks in the long term (Lam et al., 2019).

2.3 Agriculture 4.0 investment’s credit risk of agricultural SMEs in SCF

Financing investments in agriculture 4.0 in SCF involve three main actors: agricultural SMEs, LEs involved in the SC with the SMEs, and funders represented by banks or other FSPs. Figure 1 depicts the interactions between those actors.

Fig. 1
figure 1

Interactions between the actors of SCF for agriculture 4.0 investments

When the agricultural SME is involved with a LE of a high credit level, FSPs are more disposed to consider the SME's financial request to finance its investment because a large-scale LE pays the receivable accounts with stable profitability. Therefore, LEs are considered a guarantor between agricultural SMEs and FSPs (Chi et al., 2017b; Zhang et al., 2015).

Deciding whether to finance an SME’s investment on agriculture 4.0 in a SC or not, and which SME should get priority to finance its investment is based on evaluating credit risks in SCF. SCF can reduce credit risks; however, it cannot eliminate it entirely (Zhang et al., 2015). Therefore, it represents a barrier of agriculture 4.0 investments’ support in SCF, as it can be ‘contagious’ and could lead to credit risk propagation in the whole SC (Wang et al., 2020a, 2020b). Credit risk associated with investments in agriculture 4.0 of agricultural SMEs in SCF depends on five factors:

  • Agriculture development status in the country The level of the agricultural sector development influences the agri-food SC operating status. Good development prospects creates a large profit space in both profitability and debt paying ability for agricultural SMEs and LEs (Wang et al., 2020a, 2020b).

  • Quality and credit conditions of agricultural SMEs SMEs are usually regarded as riskier enterprises compared to larger companies since they face a more uncertain and competitive environment (Stiglitz & Weiss, 1981). Thus, it is important for FSPs to distinguish between ‘high’ and ‘low’ quality SMEs (Song et al., 2016). This quality is based on the governance structure, staff skills, financial performance and management expertise (Zhang et al., 2015). Agricultural SMEs of 'high' quality are more likely to commit to loan fulfillment on time.

  • Credit conditions of LEs LEs are the ones with tight contacts with FSPs like banks and financial institutions; therefore, their credit records can be easily verified. If the LE has a good financial status and conforms to the FSP requirements of profitability and solvency, they can repurchase the contract, or act as guarantor between the agricultural SME and the FSP. Hence, credit risk can be reduced further (Zhang et al., 2015).

  • The profitability of agriculture 4.0 investment Agriculture 4.0 can improve productivity, cost savings, and profitability. I4.0 would enable farmers to achieve a drastic advance in their financial performance and could help them achieve maximum profits. Consequently, the good profitability gained from agriculture 4.0 would positively influence the credit risk, as agricultural SMEs would have enough resources to repay their loans.

  • Status of the cooperative relationship in the SC Under good cooperation between SMEs and LEs, profitability and commitment to repay debts can be improved (Xingli & Liao, 2020). SC collaboration is one of the most spread formative elements required to build SC visibility and mitigate various risks (Belhadi et al., 2021). Zhang et al. (2015) suggested that trade relationships between the two parts have a significant impact on the credit risk. Therefore, it is crucial to maintain stable and durable relationships across the SC.

2.4 Machine learning approaches for forecasting credit risk of SMEs in SCF

ML emerges as a strong forecasting method to predict SMEs' credit risk in SCF for financing both operations and investments (Zhu et al., 2019). Several studies used individual ML to predict credit risk for SMEs in SCF. For instance, Jackson and Wood (2013) constructed a default prediction model using logit regression. Their model was highly efficient at this point compared to other traditional methods undertaken by FSPs, such as Z″-Score model. Afterwards, models such as Generalized Extreme Value Regression (Calabrese & Osmetti, 2013), and nonparametric approach based on Random Survival Forests (Fantazzini & Figini, 2009) have been used for loan defaults in SMEs and performed much better than the classical logit model. Further, Chen et al. (2010) proposed the Key Mediating Variable model. These models generally require large datasets, fewer variables, and high accuracy for input data (Zhu et al., 2019). In addition, they are inadequate when dealing with nonlinear pattern classification, and they need to assume certain data distribution (Huang et al., 2004). These conditions are absenting since few SMEs have cooperated in SCF over the past decades alongside the fast-changing and uncertain technological context of agriculture. This hinders individual ML techniques' efficiency from dealing with credit risk forecasting problems for SMEs in SCF.

On the other hand, EML comes in the form of a concrete finite set of individual ML approaches, which generally leads to much flexible structure to exist within these approaches. It uses multiple learning algorithms to obtain better predictive performance than the one obtained from any composing learning algorithm alone (Opitz & Maclin, 1999). The number of EML techniques use-cases for credit risk forecasting in the context of SMEs is very limited. Even though EML is acknowledged to provide better forecasting performance than individual ML, only very few studies focused on using EML approaches for SMEs’ credit-risk forecasting in SCF for specific investments such as agriculture 4.0 (Zhu et al., 2016, 2017). Among the few, Zhu et al. (2019) predicted the credit risk in SCF by applying a two-stage hybrid model combining Random Subspace and MultiBoosting. Decision Tree (DT) was taken as the base method for the proposed model. However, several limitations are escorted with this hybrid EML approach. First, DT is highly sensitive to small changes in data which can cause instability (Dev & Eden, 2019). Therefore, it cannot be applied in the context of agriculture 4.0 investments since the available data in this context is dynamic and unstable. Second, the specific C4.5 algorithm used generally overfits the training examples with noisy data (Singh & Gupta, 2014). Third, limited datasets were used (46 SMEs and seven CEs), which cannot accurately verify their proposed models' performance. Finally, the study compares the hybrid approach to classical ML methods with no comparison between two or more hybrid approaches. The previous models considered only SME- oriented or/and general financial influencing factors, and did not consider specific factors like the profitability in the context of agriculture 4.0 investment. Therefore, to accurately predict credit risk for agricultural SMEs in SCF, both diversity and error reduction need to be fulfilled, especially in the context of agriculture 4.0 projects that is a unique and novel challenge with the lack of complete datasets. Additionally, some of these models do not reduce the feature selection and work with the original set of independent variables, which is less powerful in forecasting than the optimistic set of independent variables. Therefore, there is an urgent need to develop a new model to forecast credit risk in this special context.

To our knowledge, this study is among the earlier studies that deal with forecasting credit risk in SCF of agricultural SMEs for investment in agriculture 4.0 by making use of EML power and its strong forecasting attributes. The special context of African countries in this study is one of the main reasons why we used EML. Given the fact that SCF is a novel solution that is not widely applied in Africa, therefore, it is practically complicated to obtain a complete dataset especially for SMEs. In this vein, EML emerges as an efficient solution, as acknowledged by Zhu et al. (2019) that confirms EML was found to achieve a high-performance forecasting when dealing with small datasets since it is able to extract knowledge by training the model.

Hybrid EML in particular are more performant, more accurate, more robust and more diverse than IML or EML (Zhu et al., 2016, 2017) which responds to the needs of our study. Again, agriculture 4.0 investment’s credit risk of agricultural SMEs in SCF is in its infant stages, therefore LEs and FSP are less disposed to finance such risky operations if credit risk is not accurately assessed.

Building up on the previous discussions, we propose a hybrid EML approach that investigates the combination of three different set of techniques, i.e., GT, RotF algorithm and LB. We use 22 variables that consider the main factors influencing the credit risk of agricultural SMEs in SCF for investment in agriculture 4.0. For each factor we define the suitable set of variables, their definition and measurement. In addition, we use a large dataset, supporting further the performance of our model. We also compare two individual classification models, i.e., RotF and LB, alongside two-hybrid classification models Adaboost (AB)-RotF and Random Forest (RF)-LB to evaluate the classification performance.

2.5 Proposed ensemble learning model

EML is a group leaning approaches that make use of a set of individuals, independent and complementary ML methods to solve the same problem in order to yield better accuracy with collective intelligence (Ilk et al., 2020). The common feature of EML approaches is the reiterated running of the base ML algorithm to a sample retrieved from the training dataset. Following each running of the algorithm, a new classifier is created and appended to the ensemble (Webb & Zheng, 2004). The classification process is conducted by each classifier of the ensemble in isolation from the other classifiers before the ensuing votes fit together to release a single final classification (Shang & Goes, 2020).

In this study, we propose a hybrid EML approach to estimate the credit risk of agricultural SMEs in SCF for investment in Agriculture 4.0 illustrated in Fig. 2. The approach includes two main methods i.e., the instance partitioning method aiming to mitigate the effect of the noise data and the attribute partitioning method, which enhances the results accuracy and encourage diversity among trees. Combining these two features allow the proposed approach to benefit from the diversity needed to reduce test error (Webb & Zheng, 2004). It will also result in a better computation efficiency and considerable improvements in performance. Further, the proposed model deals perfectly with small data sets since it uses RotF technique. This latter was found to be more robust and stable than extreme learning machines, SVM, and neural networks, especially with small training sets (Chi et al., 2017a; Han et al., 2018; Sagi & Rokach, 2018).

Fig. 2
figure 2

Flowchart of the proposed RotF-LB ensemble ML approach

As shown in the figure, the first stage is the pre-processing of the raw data by conducting nominal.

As shown in the figure, the first stage is the pre-processing of the raw data by conducting nominal data encoding, numerical data scaling, and outlier removal as well as feature selection using a non-linear modeling through GT in order to determine the most impactful variables on the forecasting performance. Afterwards, the dataset is split into several sub-datasets using bootstrap sampling with the replacement approach in LB. Then, RotF technique is used to select new sub-datasets from the original sub-datasets. Following, LB technique is used to train these new sub-datasets to provide a set of decisions from each new sub-dataset. Finally, the majority vote approach enables to aggregate the decision set and deduce the final decision. The overall pseudo-code of the proposed RotF-LB is given in Appendix. In the following subsections, we describe the three main techniques involved in our model i.e., feature selection using GT, RotF and LB.

2.6 Feature selection using GT

Selecting appropriate independent variables for ML models is important for improving forecasting accuracy and reducing computation time and overfitting (Gouvêa & Gonçalves, 2007). It is commonly performed to increase the predictive model’s interpretability and possibly reduce its cost (Zhu et al., 2019). Feature selection approaches aim to select a set of features by optimizing a criterion over different combinations of inputs. Afterwards, the dependencies between each combination of input features and the corresponding output are computed (Gouvêa & Gonçalves, 2007).

GT was highly used in feature selection problems. It is a technique for estimating noise variance, or the minimum mean square error (MSE), that can be achieved without overfitting using any continuous nonlinear models (Sharafati et al., 2020). GT allows to efficiently estimate the variance of the model’s output that cannot be accounted for by any smooth model based on the inputs. Also, GT leads to more accurate input selection than other methods like mutual information and grants time efficiency (Noori et al., 2010). The results of the GT are based on the calculation of a statistic called gamma. The calculation is as follows:

Consider \({X}_{i}\) the vector of independent variables (the input) and \({y}_{i}\) the dependent variables (the output).

  1. 1.

    Compute the delta function of \({X}_{i}\) and the gamma function of \({y}_{i}\) using Eq. (1) and (2).

    $$ \delta \left( k \right) = \frac{1}{m}\sum\nolimits_{i = 1}^{m} {\left| {X_{i,k} - X_{i} } \right|^{2} ,\quad k = 1, \ldots ,p} $$
    (1)

    where \(X_{i,k}\) is the kth nearest neighbors for each \(X_{i}\) (1 ≤ i ≤ m).

    $$ \gamma \left( k \right) = \frac{1}{2m}\sum\nolimits_{i = 1}^{m} {\left| {y_{i,k} - y_{i} } \right|^{2} ,\quad k = 1, \ldots ,p} $$
    (2)

    where \(y_{i,k}\) indicates the values of the output corresponding to each \(X_{i,k}\).

  2. 2.

    Compute the gamma statistic (Δ) using regression line as follow γ =  + Δ, A, δ, and γ are the slope of the regression line, the delta function, and the gamma function, respectively.

  3. 3.

    Calculate the intercept of the regression line, which is equal to the gamma statistic

    $$ (\Delta ):\vartheta_{ratio} = \frac{\Delta }{{\sigma^{2} \left( y \right)}} $$
    (3)

where σ2 (y) is the variance of the dependent variable y

The \(\vartheta_{ratio}\) is within range of [01], where the best prediction performance provides the zero value for the ratio.

2.7 Rotation forest algorithm

RotF is a popular ensemble classifier generation technique in which the training set for each base classifier is formed by applying Principal Component Analysis (PCA) transformation to rotate the original attribute axes. RotF is built with independent decision trees trained with a complete dataset including a rotated feature space. The feature set is randomly divided into Z subsets in order to determine what principal components are expected to maintain variability and generate the training samples of the base classifier. Via k axis rotations, the new features for base classifier are formed, ensuring data diversity through doing feature extraction for each base classifier and accuracy by keeping all principal components and using the whole dataset to train each base classifier (Rodríguez et al., 2006).

Let L be a training set, L = [X Y] where X is the matrix of the input attribute values in a form of N × n matrix and Y is N-Dimensional vector with class labels of the data.

  • z be the number of attribute subset

  • x be the data point to be classified by n features

  • T number of iterations

  • F is the feature set

For j = 1, 2,…,T

  1. 1.

    Start by splitting F the attribute set into Z subsets \({F}_{l,z}\) (z = 1, 2…, Z)

  2. 2.

    For z = 1, 2,..,Z

    1. i.

      Select X columns that attributes in \({F}_{l,z}\) and compose submatrix \({X}_{t,z}\)

    2. ii.

      From \({X}_{t,z}\) draw a bootstrap sample of objects \({X}_{t,z}^{^{\prime}}\)

    3. iii.

      Run PCA and obtain \({D}_{t,z}\) matrix containing the coefficient of the ith principal component in the ith column.

  3. 3.

    Organize the resultant vectors with corresponding coefficients in the rotation matrix \({R}_{i}\).

2.8 Logitboosting algorithm

Boosting is a gradual additive method that works by sequentially applying a classification algorithm to reweighted versions of the training data and then taking a weighted majority vote of the sequence of classifiers thus produced (Friedman, 2001). However, boosting algorithms are challenged by the overfitting when dealing with noisy data. Therefore, Friedman (2001) proposed an advanced variant of the AB algorithm namely LB: a new algorithm that can handle random label noise in a better way. LB can greatly reduce training errors and hence results in better generalization. The potential function used by the LB algorithm is convex; therefore, it allows better computation efficiency and results in dramatic improvements in performance.

The details of LB algorithm is presented below:

Let E be an ensemble of learning and Z the number of boost samples.

  1. 1.

    Starting with equal weights \({\omega }_{i}\) = \(\frac{1}{N}\), i = 1,…,N, Function F(x) = 0 and sample probability estimates p(\({x}_{i}\)) = 0.5

  2. 2.

    Repeat form = 1 to Z

    1. i.

      Compute the working responses and weights

      $$ Z_{i} = \frac{{y_{i} - p\left( {x_{i} } \right)}}{{p\left( {x_{i} } \right)\left( {1 - p\left( {x_{i} } \right)} \right)}}\,{\text{and}}\,\omega_{i} = \frac{{p\left( {x_{i} } \right)}}{{1 - p\left( {x_{i} } \right)}} $$
      (4)
    2. ii.

      Fit the decision function \(f_{m}\)(x) by a weighted least-squares regression of \({\mathrm{z}}_{\mathrm{i}}\) to \({x}_{i}\) using weights \({\omega }_{i}\).

    3. iii.

      Update both F(x) and p(x)

      $$ F\left( x \right) = F\left( x \right) + 0.5f_{m} \left( x \right)\,{\text{and}}\,p\left( x \right) = \frac{{e^{F\left( x \right)} }}{{e^{F\left( x \right)} + e^{ - F\left( x \right)} }} = (1 + e^{ - 2F\left( x \right)} )^{ - 1} $$
      (5)
  3. 3.

    Output classifier = the most often predicted class of E.

3 Empirical study: a case study of agricultural SMEs in Africa

3.1 Context and variables

Financing the integration of I4.0 investments for agricultural SMEs in Africa is not well developed. Plenty of institutional and policy-related factors increase credit risk uncertainty, thus influencing the attraction of private investment or even the use of public funds to develop agriculture 4.0 capabilities in Africa (Wang et al., 2020a, 2020b). In view of that, exploring a large set of requests for funding agriculture 4.0 initiatives in the African exchange market should involve plenty of indicators. The dependent variables are the probability of risky agriculture 4.0 initiative taken directly from the FSPs and is assigned the value of 0 or 1, indicating whether the initiative is likely to be risky or not. Furthermore, we select, from the literature, 22 independent variables related to the different factors presented in Sect. 1.3. Table 1 defines the initial set of independent variables and their factors.

Table 1 Original set of independent variables

3.2 Data collection procedure

Since SCF is considered as a novel solution to overcome financial issues and not yet applied widely in Africa, only a few African SMEs cooperate with LEs and FSPs in making use of SCF during agriculture 4.0 investments. This makes it practically complicated to obtain a complete SCF dataset for investment in agriculture 4.0, especially for SMEs (Belhadi et al., 2019). Therefore, a survey-based study seems to be the best alternative to obtain the initial dataset to evaluate our proposed EML approach (Belhadi et al., 2020). The selection criteria for the initial sample include three factors. First, we targeted SMEs listed on the Small and Medium Enterprise Board of the African Securities Exchanges Association (ASEA) with the International Standard Industrial Classification (ISIC) codes between 01–03 and 10–11 related to agricultural activities. Second, we ensured that these SMEs had engaged in SCF for an agriculture 4.0 investment by reviewing their official website and newspapers. Third, the LEs are selected to be direct partners (buyers or suppliers) of the selected SMEs, which have some supremacy in leading operations of the SC, enjoying strong creditworthiness and having high financial power. Thus, when SMEs have financing requirements, SCF can take them with the involvement of the LEs (Wuttke et al., 2019).

Based on the above sample selection criterion, we select 618 listed agricultural SMEs, 469 LEs from their direct SC circles alongside 551 FSPs operating in the agricultural sector in Africa. The companies were then contacted via an initial email, attaching a website link to an electronic copy of the survey with a request to participate in the project. Given the multitude reminders and phone follow-ups for the targeted companies, we have obtained a relatively high response rate. Indeed, of the initial sample, 216 questionnaires were returned from agricultural SMEs with a response rate of 34.95%, 195 questionnaires from the LEs with a response rate of 41.57% and 104 questionnaires from the FSPs with a response rate of 18.87%. Data were collected between January and June 2020. The online survey design was such that responses to all the questions were compulsory, without which the respondent was not permitted to make the final submission of the questionnaire. This compulsion eliminated missing values and incomplete responses, making all the questionnaires usable for analysis. However, to account for the early and late submissions' non-response bias, we followed a two-step procedure. In the first step, performance bias was tested by pair-wise comparison of the demographic characteristics of the samples using paired sample t-tests. We did not observe a significant difference. In the second step, we compared two categories: the first 30% (equivalent to early respondents) and the last 30% using ANOVA analysis on all variables. We found no significant difference between respondents and non-respondents (i.e., p > 0.2). Both the tests yield no significant difference between samples. Therefore, we conclude that non-response bias was not a potential concern in this study. It is noteworthy that the respondents of the questionnaire were enterprises’ senior leaders with titles such as financial manager, sales manager, etc. Collected data have been triangulated with official announcements of the sample companies as well as related articles released in newspapers and business magazines.

3.3 Performance measurement

The performance of the RotF-LB model is evaluated based on interpretable performance measures (Fayyaz et al., 2020). For comparison purposes, two individual classification models i.e., RotF (Ren et al., 2017) and LB (Friedman, 2001) alongside two-hybrid classification models i.e., AB-RotF (Rodríguez et al., 2006) and RF-LB (Kotsiantis, 2013) are also conducted through the experiments. Typically, five evaluation measures are used to evaluate and compare the classification performance, including the accuracy, precision, recall, F-measure, and the area under the curve (AUC) measure. Table 2 provides calculation details.

Table 2 Evaluation measures

Obviously, a good forecasting approach should have high classification accuracy, precision, recall, F-measure and AUC. A high classification accuracy means that the model can correctly identify positive samples and negatives samples, helping thereby FSPs to cluster credit risky agriculture 4.0 investments and non-credit risky agriculture 4.0 investments (Bekhet & Eletter, 2014). Additionally, higher classification precision implies that FSPs can correctly identify non-credit risky agriculture 4.0 investments among all the positive samples (Zhu et al., 2019). The recall value quantifies the ability to pinpoint non-credit risky agriculture 4.0 investments from all the samples that should have been labeled as non-risky (Ying Liu & Huang, 2020). Moreover, the F-measure is the harmonic mean of the precision rate (i.e., positive predictive) and the recall rate (i.e., sensitivity). Accordingly, the higher the F-measure is, the better the forecasting performance of the approach. Finally, AUC measure is considered to quantify the approach's ability to avoid false classification (Zhu et al., 2019).

3.4 Computational experiments

We conduct two sets of computational experiments. The first experiment set aims to select the most important (optimistic) independent variables \( {\rm A}^{*} = \left\{ {\alpha_{1}^{*} ,\alpha_{2}^{*} ,\alpha_{3}^{*} , \ldots ,\alpha_{m}^{*} } \right\}\) from the original set of variables \( {\rm A} = \left\{ {\alpha_{1} ,\alpha_{2} ,\alpha_{3} , \ldots ,\alpha_{n} } \right\}\). Specifically, we conduct a GT on the original set of variables to select the best set of independent variables. Beside its positive impact on the model accuracy and computational time (Sharafati et al., 2020), selecting appropriate independent variables can inform us and FSPs about which factors are prominent forecasting credit risk related to agriculture 4.0 investment in SCF (Gouvêa & Gonçalves, 2007). Additionally, agricultural SMEs’ managers could be informed about which factors to focus on to enhance their financing ability and creditworthiness when seeking SCF for their agriculture 4.0 investments (Zhang et al., 2015). Further, in the second experiment set, we conduct a parameter tuning of the used models' main parameters. Therefore, we grid-search approximately 1000 parameter settings for each classifier and use a 10-fold cross-validation method to push up the forecasting performance and lessen the effect of the training set's variability.

The experimental study is conducted on a PC with a 2.11 GHz, Intel i7-8650U CPU and a 16 GB RAM using Windows 10 operating system. The data mining toolkits used is Anaconda (https://www.anaconda.com/): a conditional free and open-source distribution of the Python programming language for data science and machine learning applications. The total experiment took a runtime of about four hours. In the following section, we provide a discussion on the experimental procedures and some important experimental results.

4 Results and discussion

4.1 Discussion on feature selection

According to Ransom et al. (2017), the optimistic set of independent variables A* often presents more forecasting power than the set A of the original independent variables. Using nonlinear modeling, GT is applied on the original set A in order to reduce the features and obtain the most powerful optimistic variables for forecasting the credit risk associated with investments in agriculture 4.0 of SMEs. Table 3 demonstrates the ϑ − ratio values of the optimistic set A*, including 11 variables.

Table 3 Set of optimistic independent variables according to GT

The GT findings indicate that the top three highest relative importance ϑ − ratio include the return on investment and abnormal return of agriculture 4.0 investment and the credit rating of the LE. Surprisingly, these variables are found to be more important than the credit conditions variables of the agricultural SME. This highlights SCF-related variables' role in helping agricultural SMEs get financing access to support their agriculture 4.0 investment. Further, the findings show that the profitability of the investment itself is the main effect factor in assessing SMEs’ credit risk in SCF for agriculture 4.0 investments. Also, several factors related to the LE and the agricultural SME's credit conditions are important, such as the credit rating of the LE and the current ratio of the agricultural SME (Lang Zhang, Hu, and Zhang, 2015). Importantly, the intensity of the contract between the agricultural SME and the LE is found to be critical in assessing the credit risk and promoting SME's access to SCF finance to support their agriculture 0.4 investment itself is profitability, no variables belonging to the status of agriculture development in the country is determinant in assessing credit risk in SCF (Wuttke et al., 2019).

4.2 Discussion on model forecasting performance

The performance of the five classification models, i.e., RotF, LB, AB-RotF, RF-LB, and RotF-LB are evaluated using the measures presented in Table 3. Two rounds of evaluation have been done, i.e., using the original set of independent variables A and using the optimistic set of independent variables A*. The results of these two rounds of evaluation are illustrated in Table 4. For more clarity during the discussion, the comparative results are presented in Figs. 2 and 3.

Table 4 Performance evaluation of RotF-LB and other classification models using the optimistic set A* and original set A of independent variables
Fig. 3
figure 3

Comparison of the performance measures of the considered models using the original set of independent variables A

The performance evaluation results provide prima facie evidence confirming that the optimistic set of independent variables A* enhances the performance of all the five classification models. Furthermore, the comparison depicted in Fig. 3 indicates that the proposed hybrid EML approach i.e., RotF-LB shows the best forecasting performance using both the original set A and the optimistic set A* of independent variables. Notably, we highlight that using the simple classifier RotF or combined with AB contributes to enhancing precision. This is critical for FSPs in SCF to guarantee that the non-credit risky agriculture 4.0 investments identification is actually correct. However, individual RotF or even combined with AB does not ensure high accuracy. This could be misleading for FSPs since they might miss several non-risky agriculture 4.0 investments and even profitable due to the inability to distinguish risky and non-risky agriculture 4.0 investments. On the other hand, using LB individually or to boost another classifier enhances the accuracy and recall of the forecasting performance. This is of utmost importance for FSPs to draw a clear distinction between risky and non-risky agriculture 4.0 investments. However, LB shows a weak precision. Overall, neither RotF nor LB is sufficiently robust by itself since they are unable individually to present a high F-measure, which is the combination of precision and recall.

Further, the complementarity between RotF and LB allows to alleviate each approach's weaknesses individually and makes their combination the strongest classification approach in all performance measures among the four forecasting approaches (see Fig. 3). This combination gets stronger with the optimistic set of independent variables (see Fig. 4).

Fig. 4
figure 4

Comparison of the performance measures of the considered models using the optimistic set of independent variables A*

4.3 Discussion on the accumulated local effect (ALE) plots

Having identified the set of the most important (optimistic) independent variables A* influencing the credit risk of agriculture 4.0 investments in agricultural SMEs using the output of GT, it pave the way for FSPs to disclose the most important factors and variables to manage and evaluate agriculture investments through SCF of SMEs. Besides, we have confirmed the robustness and the high relative forecasting power of our proposed hybrid RotF-LB EML approach compared to other approaches. Further in this section, we emphasize the optimistic set of independent variables A* by exploring the individual effect of these variables on the credit risk of agriculture 4.0 investments. This information is of utmost importance for decision-makers in FSPs to manage credit risk, agricultural SMEs to enhance creditworthiness and LEs to reduce the credit risk of joint liability. Following the many weaknesses of several analysis studies, such as the partial dependence plot and the individual conditional expectation, the concept of Accumulated Local Effect (ALE) is introduced (Apley, 2016) as the most reliable technique in one-dimensional plots for highly correlated space of variables. The aim of using ALE is to visualize the marginal effects of each variable across its distribution. To elaborate ALE plots, we have used the “alibi” library in Python. Figures 4, 5 and 6 illustrate the ALEs visualizing the impact of the optimistic set of variables on the probability of non-risky agriculture 4.0 investments. Dashed horizontal lines stand for ALE = 0 for that variables, indicating values that do not influence predictions significantly. The distribution of each variable is illustrated visually on the x-axis, where ticks represent individual data points.

Fig. 5
figure 5

ALE plots of the effect of optimistic variables related to the quality and credit conditions of agricultural SME on the probability of non-risky agriculture 4.0 investments

Fig. 6
figure 6

ALEs plots of the effect of optimistic variables related to credit conditions of LE on the probability of non-risky agriculture 4.0 investments

4.4 The Impact of quality and credit conditions of agricultural SMEs

The variables related to agricultural SMEs' quality and credit conditions represent the classical measures to assess credit risk on an investment without involving SCF. Figure 5 depicts the ALE plots indicating the impact of variables related to agricultural SMEs' quality and credit conditions on the probability of non-risky agriculture 4.0 investments. According to Fig. 5a, the probability of non-risky agriculture increases significantly with higher values of the current ratio of agricultural SMEs until it exceeds approximatively 3.8. Then, the probability of non-risky agriculture 4.0 investments decreases with higher values of the current agricultural SME ratio. Interestingly, the agriculture 4.0 investments become non-risky (ALE > 0) once the current ratio of agricultural SMEs exceeds 2.0. However, the agriculture 4.0 investments are becoming again risky (ALE < 0) with the current agricultural SME ratio above 3.8. This pattern is fully consistent with the financial significance and characteristics of this variable. Financial literature indicates clearly that a minimum two-to-one ratio between floating assets and liabilities is required to ensure a high ability of short-term debt paying (Custódio et al., 2013). Therefore, we observed a high probability of non-risky investment at this point. On the other hand, higher values of the current ratio might indicate an excess of liquidity or inventory within the agricultural SME, which means poor and lack of efficiency in managing liquidity (Bhunia et al., 2011). Hence, financing investments in this kind of SME is risky. We conclude that agricultural SMEs should maintain their current ratio between 2.0 and 3.8 to show high creditworthiness in the eyes of FSPs. Moreover, Fig. 5b depicts that the probability of non-risky agriculture 4.0 investment decreases with higher financial leverage of the agricultural SME. Agriculture 4.0 investments made by SMEs with a financial leverage exceeding 0.18 are qualified as risky. Indeed, a high level of long-term debt means that the SME is accumulating external financing rather than mobilizing internal capital to finance its investments in the long term. According to the Pecking order theory (Frank & Goyal, 2003), long-term profitability decreases with a high debt accumulation level in the long-term. Therefore, agricultural SMEs should control the value of financial leverage at lower than 0.18. Furthermore, according to Fig. 5c, the profit margin on sales of the agricultural SME is positively correlated with the probability of non-risky agriculture 4.0 investment in the general trend. Naturally, the agricultural SME's higher profitability signifies a high financial ability, which may mitigate credit-risks and enhance trust between FSPs and the SME. Therefore, agricultural SMEs should optimize their operations and reduce their production costs prior to engaging in agriculture 4.0 investment. Finally, Fig. 5d indicates an interesting trend. Naturally, a high growth rate of the agricultural SME should involve faster expansion velocity of the asset management scale and thereby a low risk in financing long-term investments such as agriculture 4.0. Nonetheless, we notice that from a certain value, the probability of non-risky agriculture 4.0 investments decreases. This could be explained by the fact that smaller companies mostly rush in expanding their assets in a short period through long-term debt, which generates financial and operational risks (Chen et al. 2020). Hence, agricultural SMEs are recommended to opt for internal capital to expand their assets even in a longer period.

4.5 The impact of the credit conditions of the LE and the status of the cooperative relationship

The variables associated with the credit conditions of the LE alongside the status of the cooperative relationship between the LE and the agricultural SME constitute the cutting-edge features brought by SCF to investments’ financing. Figure 6 depicts the ALE plots indicating the impact of variables related to agricultural SMEs' quality and credit conditions on the probability of non-risky agriculture 4.0 investments. According to Fig. 5a, the LE rating is positively correlated with the probability of non-risky investment. Generally, from the credit rating level 'fair' of the LE, agriculture 4.0 investments are seen as non-risky (ALE > 0). This finding suggests that agricultural SMEs should work with LEs having at least a 'fair' level of credit rating to improve their creditworthiness in the SCF context. Also, the effect of the financial leverage of the LE depicted in Fig. 6b is similar to that of the financial leverage of the agricultural SME (Fig. 5b). However, the threshold value of the leverage for the LE is found to be approximately 0.28, which is greater than the acceptable value for the agricultural SME. This can be explained by the agency theory (Shapiro, 2005), which suggests that larger firms tend to be more mature and diversified and can absorb financial debt in the long-term run more than smaller companies. Another interesting feature indicated in Fig. 6c is that the probability of non-risky agriculture 4.0 increases significantly with a higher profit margin of the LE. However, once a certain level has been reached, the probability of non-risky investment levels off regardless of the further increase of the profit margin on sales of the LE. Using this insight, agricultural SMEs willing to invest in agriculture 4.0 could effectively select the LE with the minimum required level of profit margin on sales for applying for SCF. In addition, FSPs could use this finding to support their financing decision-making. Concerning the intensity of the contract between the agricultural SME and the LE, weak relationships (< 5 years) are found to reduce the probability of non-risky agriculture 4.0 investments. Therefore, agricultural SMEs need to tie long-term relationships with LEs in their SCs.

4.6 The impact of the profitability of the agriculture 4.0 investment

The overall picture of credit-risk assessment could not be completed without exploring the impact of the variables associated with the profitability of the agriculture 4.0 investment itself (Fig. 7). First, Fig. 6a depicts the impact of the return on investment of agriculture 4.0. A higher value of the investment return indicates high profitability of the investment and a high non-probability of risk. We notice that the minimum acceptable rate of return is approximately 40%. Above this value, the agriculture 4.0 investment becomes non-risky. This extends several studies in the literature (e.g., Horváth & Szabó, 2019; Moeuf et al., 2018). This finding implies that SMEs should engage in large-scale I4.0 programs whose return on investment exceeds 40%. FSPs should request strong evidence for agriculture 4.0 projects presenting a return on investment below 40% and weigh up whether these investments are worth the risks associated with them. Second, the abnormal return of agriculture 4.0 investments depicted in Fig. 6b presents a linear effect on the probability of non-risky investment. Indeed, at least the investment should be able to achieve what has been expected from it (abnormal return = 0) to be acceptable. Finally, it is noticeable from Fig. 6c that the profitability index follows a similar trend. In fact, around the value of 1.0 of the profitability index, the agriculture 4.0 investment switches from the risky zone to the non-risky zone. Thus, the motivation for agriculture 4.0 investment financing for SMEs might be questioned if the capital investment cost is too high or the present value of the cash collected from the investment is too low. Therefore, to minimize the financing credit risk, FSPs decision-makers need to pay more attention to the comparison capital invested vs. present value of cash collected from agriculture 4.0 investment.

Fig. 7
figure 7

ALEs plots of the effect of optimistic variables related to the profitability on the probability of non-risky agriculture 4.0 investments

5 Managerial implications

The extant literature on I4.0 related investments lacks the mechanisms to highlight the factors affecting the assessment of credit risk of SCF to promote the financing, especially for smaller companies to implement such technologies (Zhu et al., 2019). Accordingly, the present study could be a solution for SMEs and startups who mostly face tremendous challenges to get access to funding to finance their I4.0 implementation initiatives. Through the practical example of agriculture 4.0 investments for African SMEs in SCF, our study provides numerous implications for SMEs, FSPs alongside LEs to assess accurately and mitigate the credit risk associated with I4.0 investments of SMEs.

First, the findings of our study extend the traditional lending system mainly based on the credit conditions of the SME to include other crucial SCF factors such as the credit conditions of the LE and the status of the cooperative relationship between the LE and the agricultural SME, alongside the profitability of the agriculture 4.0 investment itself. Moreover, SMEs, FSPs, and LEs are provided with the optimistic set including the most important variables to focus on to assess accurately and alleviate credit risks associated with agriculture 4.0 investments, namely current ratio, financial leverage, profit margin on sales and growth rate of the agricultural SME. Furthermore, our model is perfect when dealing with small datasets. The use of RotF was found to be robust and stable than extreme ML, SVM, and neural networks, especially with small training sets (Abedin et al., 2019; Han et al., 2018). Thus, our model would be of great usefulness for SMEs that particularly deal with small datasets.

Second, our study clearly highlights the conditions under which the SMEs would be seen as creditworthy in the eyes of their LEs and FSPs. Accordingly, agricultural SMEs should control their current ratio between 2.0 and 3.8 with financial leverage exceeding 0.18. Agricultural SMEs are also recommended to increase their profitability and growth through internal capital and minimize long-term debt usage. These insights could redirect the effort of agricultural SMEs in Africa to improve their credit conditions and get access to funding to support their agriculture 4.0 transformation.

Third, the study's findings imply that agricultural SMEs should select strong, creditworthy LEs for their SCF collaboration (Wuttke et al., 2019). This includes LEs with high credit ratios, low financial leverage, and strong profitability in the long-term. In addition, agricultural SMEs are concerned with establishing long-term cooperation with their LEs in order to be able to improve their credit conditions in the SCF context to finance their agriculture 4.0 initiatives (Wuttke et al., 2019). Thus, our findings provide a useful guideline for agricultural SMEs that struggle to invest in I4.0 to support their agricultural operations.

FSPs should be aware of the usefulness and high performance of the proposed EML approach in accurately evaluating the credit risk associated with agriculture 4.0 investments of agricultural SMEs in SCF, which results in improving SC cash flow, reducing potential risks in the entire chain, and making correct loan decisions. Future savings can be made due to only a slight improvement in the forecasting accuracy, which was guaranteed with our model. This would also enable them to shift the way of doing things, and push them further to consider more developed tools and consider building a supportive culture for innovation. The study also implicitly supports the trend of refining popular algorithms to be more efficient and suitable for different cases and transform EML into simpler and more comprehensive models while preserving the predictive accuracy of the methods they were derived from. We believe that EML approaches are bound to move financial services toward the direction of efficiency, accuracy, and dynamism not only in loan decision making but in different applications.

Finally, in the special context of the coronavirus pandemic, economic sectors all over the world were deeply affected, including the agricultural sector. Many agricultural SMEs are currently faced with issues such as high demand uncertainty and changing customer patterns since many customers and local markets were closed. Therefore, agricultural SMEs have potential risks to be penalized by excess stocks and supply disruptions. As a result, FSP are more hesitant and uncertain about financially supporting investments in agriculture 4.0 projects. Consequently, it is more challenging to make correct credit loan decisions in SCF for investment, especially in agriculture 4.0. Thus, to help mitigate credit risk and credit risk propagation, this finding assumes significance in this special context.

Recent evidence shows that the covid crisis has made digitalization more crucial than ever. Therefore, more I4.0 projects are expected to rise and most likely to stick in the long term. Hence, our findings also prepare agricultural SMEs, LEs and FSPs to confidently face the upcoming challenges that will escort these new projects, mainly the credit risk forecasting. This research would enable these parties to engage in digitalization projects, create a culture of learning, innovation, resilience and trust.

6 Conclusion, limitations and future research

The present study addresses forecasting credit risk associated with agriculture 4.0 investments of SMEs in the SCF context. In doing so, an initial set of 22 independent variables affecting credit risk of agriculture 4.0 investments have been established based on five prominent factors, i.e., status of agriculture development in the country, quality and credit conditions of the agricultural SME’s, credit conditions of the LE, status of the cooperative relationship and the profitability of the agriculture 4.0 investment. This list has been prioritized using the GT algorithm to establish an optimistic set of variables, including 11 variables. These variables have been used as an input for a novel hybrid EML approach based on RotF and LB algorithms. Using a dataset collected from 216 agricultural SMEs, 195 LEs and 104 FSPs operating in Africa's agricultural market, the performance of the proposed RotF-LB ensemble approach has been tested. The results showed a good performance of the proposed model and suggested meaningful implications to assist agricultural SMEs, LEs and FSPs to promote the wide transformation towards agriculture 4.0.

In this study, we recognize certain limitations that opens for future research avenues on forecasting credit risk associated with SMEs' I4.0 investments in SCF. First, sample selection and data collection could generate some biases that might alter our findings. Hence, we recommend future research to conduct further experiments to confirm our proposed approach using large datasets from different contexts, particularly with primary data on SMEs in SCF. Second, although the comparison of the forecasting performance of the RotF-LB approach with some individual and EML approaches is beneficial, it may not be adequate. Future studies are encouraged to conduct further comparisons with advanced EML techniques to confirm its performance. In addition, the use of LB can certainly lessen the risk of the curse of dimensionality; however, it cannot omit it entirely. Thus, non-linear approaches such as autoencoders and representation learning need to be investigated, with the aim to reduce dimensionality and optimize accuracy.