1 Introduction

The Logistics Performance Index (LPI) of the World Bank is a well-known practical instrument for measuring a country's logistics performance that is available to policymakers (World Bank, 2018). The LPI, which measures national logistics performance on a biannual basis since 2007, is arguably the most important instrument to emerge from the trade facilitation domain. It primarily focuses on (international) trade logistics and assesses national logistical connection across six performance dimensions. These LPI indicators can be mapped into two main categories. First, are-as for policy instruction, demonstrating key inputs to the supply chain (consistent with LPI indicators of (i) infrastructure, (ii) customs, and (iii) service quality). Second, the supply chain performance outcomes (consistent with LPI indicators of (iv) time, cost, and reliability—timeliness, (v) international shipments, and (vi) tracking and tracing) (Kinra et al., 2020). The LPI score, which is related to trade facilitation, provides a macroeconomic perspective on how policymakers can positively influence the global supply chain capabilities and performance of organizations. Unfortunately, the most recent LPI data is from the year 2018. The predicted information of the overall LPI score based on the economics attribute may provide policymakers with important decision information. Researchers explored the relationships between LPI scores and other associated information for designing country logistics policy (Göçer et al., 2021). Policy reforms at the national level could result in significant performance gains when upgrading the firms' supply chain capabilities (Mann, 2012). In other words, a country's logistics development policy can help to improve the country's microeconomics. This is because the efficiency with which supply chains connect firms to domestic and international opportunities influences logistics performance (World Bank, 2018). Consequently, both data of macroeconomics and microeconomics may exert an impact on the LPI score of a country. That is related to the concept of economics and finance and the performance of the system, and one of the major challenges for policymakers is to understand the economics underlying the entire system (Kayal & Rohilla, 2021).

The logistics performance related to LPI score and dynamics of economics time series data, if provided in a shorter time frame, may provide more benefits to policymakers in terms of immediately improving or resharpening their policies. In logistics and supply chain management, big data and predictive analytics are gaining traction. So close to a real-world problem in the field of supply chain, modeling and solving are required (Pooya et al., 2021). Many smart technology applications i.e., artificial intelligence (AI) and data science technologies using big data now show promise in improving the efficiency and effectiveness of different logistical operations and transportation networks (Chung, 2021). The use of big data could have a major influence on supply chain capability and performance (Govindan et al., 2018), necessitating new ways to performance monitoring (Kamble & Gunasekaran, 2020). Researchers have already proposed using AI or machine learning (ML) techniques to evaluate a country's logistics performance (Kinra et al., 2020). The use of ML results in a flexible mathematical structure capable of identifying linear, non-linear and complicated connections between the important parameters. ML models are among the most explored of the most recent approaches owing to their ability to recognize complicated patterns in a variety of applications. The application of predictive ML is advantageous to a range of data formats such as time series data, which an ML model is required to automatically learn such interdependencies from data to obtain accurate prediction (Shih et al., 2019). Hence, there is a high level of ML applicability in prediction (Henrique et al., 2019). Various time series data, however, exhibit both nonlinear and linear properties (Xu et al., 2019). Numerous time series prediction approaches exist that employ nonlinear and linear algorithms separately or in combination (Büyükşahin & Ertekin, 2019).

There are a few publications that investigate sustainable logistics and supply chain management from the perspective of supply chain network partners, such as suppliers, manufacturers, and customers, as well as the perspective of the innovative and intelligent supply chain, such as internet of things (IoT), big data, AI, and blockchain technology. Furthermore, rather than using real data from businesses, many publications use the quantitative (Likert scale) or qualitative scale (Fuzzy) method to collect data (Wang et al., 2022a, 2022b). In this study, we develop a hybrid model in which we use the data envelopment analysis (DEA) technique to enhance the predictive power of different ML of linear and nonlinear techniques (linear regression and artificial neural network (ANN)) when dealing with few data in the domain of logistics and economics (e.g., LPI, microeconomic and macroeconomic data which biannually arrangement). Being able to extrapolate the LPI values should improve the operational performance and resource allocation for a country, which is essential for achieving Sustainable Development Goals (SDGs), as the United Nations (UN) has identified logistics and transportation as a critical component of sustainable development. In which the agenda for sustainable development incorporates the SDGs, sustainable transportation is integrated into several SDGs and targets, most notably those connected to economic growth, infrastructure development, and city development (United Nations, 2015).

The primary goal of this work is to construct a prediction procedure based on DEA combined to ML to estimate the LPI score (together with the trend) using time series data of national economic parameters. This study investigates two key research questions: first, can the microeconomic data of financial efficiency derived from the DEA technique, provide precise prediction performance when combined with macroeconomic data? Second, how do the different ML prediction of nonlinear and linear algorithms findings help policy reforms at the country level, which could lead to considerable performance gains when upgrading national supply chain capabilities? The outcomes of the study would provide policymakers with quick access to anticipated LPI scores with the aim of reforming short-term logistics policy to improve the national logistics and supply chain capabilities and performance.

In summary, this work presents LPI score prediction procedures that use macroeconomic parameters (gross domestic product or GDP per capita, import and export amounts) and microeconomic factors (firm and supply chain financial efficiency) as input to ML algorithms. Nonlinear and linear models were considered. On the one hand, the ANN model was selected since it belongs to a significant class of non-linear prediction models. When data is highly volatile and multicollinear, ANN outperform linear models (Büyükşahin & Ertekin, 2019). On the other hand, the Ridge, Least Absolute Shrinkage and Selection Operator (LASSO), and Elastic-net regression models were chosen for the linear regression model because they are rapidly becoming essential linearity tools for prediction (Cui & Gong, 2018). The goal is to confirm the correlation between the LPI score and the specified input factors. The prediction procedure was applied on six countries of Association of Southeast Asian Nations (ASEAN), namely Indonesia, Malaysia, the Philippines, Singapore, Thailand, and Vietnam. These six ASEAN countries have the greatest level of export activities. These nations are distinguished by their rapid economic growth and strong participations in the global supply chain (Nguyen & Almodóvar, 2018).

The following summarizes the main contributions of this paper:

  • We create the ML to estimate the performance in the context of logistics and supply chain which ML is an important methodological in logistics and supply chain management towards a country's sustainable development.

  • We propose a hybrid model which the DEA approach is combined with ML algorithms of linear regression and ANN that the efficiency information obtained by DEA can be used to fine-tune the performance of the prediction procedure of ML.

  • We introduce the collective instance to maximize the number of instances due to the typically high accuracy even with large datasets. The multivariate pattern of combination of time series datasets may be raising or lowering the complexity and linearity interdependencies. Which may be occurred accurate prediction results utilizing different nonlinear and linear ML algorithms.

Furthermore, the benefit of prediction results may provide decision information to policymakers in six ASEAN countries to improve their logistics performance, thereby driving sustainable logistics and supply chain management. In addition, other policymakers can use the information to build a global trade collaboration network with the ASEAN region, where economic growth is based on the countries' logistics and supply chain operations capabilities.

The remainder of this paper is organized as follows. Section 2 examines related works. Sections 3 and 4 detail the methods and application, as well as the case study in detail. Then, in Sect. 5, we describe the result and provide a discussion along with the policy implications. Finally, in Sect. 6, we reach a conclusion that also addresses the study's limitations.

2 Literature Review

2.1 ML and Its Applicability in Policy Decision Making

ML is a rapidly expanding subject of computing algorithms that seek to replicate human intelligence by learning from their surroundings. ML methods are concerned with how computers execute and mimic human learning behaviours in order to acquire new information and enhance prediction performance over time. ML models, a subfield of AI, incorporate a range of concepts by exploiting the rapid increase of data (Zhu et al., 2021). Predictive or classification analytics is a crucial ML function. The fundamental idea behind ML is to use computer algorithms to understand and learn from data. When presented with new data, ML algorithms generalize previously learned information and create predictions making it simpler to make judgments in new situations (Ray & Chaudhuri, 2021).

ML is one of the essential computational techniques to the decision-making process in assisting in policy making, and offering enhanced organization operations at both firm and national level (Souza et al., 2019) that provides the correlations between data inputs and decision outputs (Coyle & Weller, 2020). Furthermore, the use of ML in policy making is beneficial in the commercial and economic domains. For example, a procurement policy that makes decisions based on goals, values, risk and certainty (Mulligan & Bamberger, 2019). Perboli et al. (2021) demonstrate how ML can assist public decision-makers in developing and executing regional policies to encourage the growth of small and medium-sized companies. Additionally, Baştuğ and Yercan (2021) explored the establishment of transport and supply chain policies by logistics services to fulfill their aims within the existing operating constraints owing to the COVID-19 pandemic. According to Ranjan et al. (2022), several years of trading and the increasing popularity of Bitcoin have piqued the interest of society at large, particularly economic policymakers, that ML algorithms can be highly effective and useful in Bitcoin price prediction. This is because the ML system design generates policies not just once but over time as they adopt and evolve (Mulligan & Bamberger, 2019). Even though prediction is essential in economics, economists may play a critical role in solving prediction policy challenges using ML (Mullainathan & Spiess, 2017). The use of dynamic economic big data as inputs to anticipate decision outputs to help policy making across every economic and commercial sector will gain appeal. As a result, additional research in these areas is required.

2.2 Logistics and Supply Chain and the Application of ML

Over the last few years, the use of ML in logistics and supply chain management has grown in popularity (Feizabadi, 2022). In various contexts, researchers have used a variety of ML tools. Table 1 shows examples of studies involving the application of ML algorithms and their significant findings.

Table 1 The state-of-the-art ML research in logistics and supply chain domain

As shown in Table 1, various ML algorithms tend to concentrate on prediction. Moreover, several papers in the logistics and supply chain domain propose a review of some of the most significant works providing an exhaustive overview of recent ML-based algorithms demonstrating that those methods outperform traditional prediction approaches in terms of prediction accuracy (Garre et al., 2020; Gonçalves et al., 2021; Hosseinnia & Ebrahimi, 2022). The outcomes of models created by leveraging ML's predictive power can assist supply chain managers in making more accurate decisions. Moreover, the development of the DEA model and the ML approach is limited in the area of logistics and supply chain management.

2.3 Predictive ML

One of the most used ML methods is supervised learning which trains the system using a collection of known or unknown patterns. It can be used in classification and prediction (or regression) applications (Zhu et al., 2021). In terms of prediction, ML algorithms which incorporate AI systems attempt to extract patterns acquired from past data—a process known as training or learning—in order to generate predictions about new data. As more data is acquired, their predictive capability improves, increasing prediction accuracy over time. This is because prediction accuracy is such an important criterion for predictive ability (Zhang et al., 2018). The general predictive ML framework for logistics performance prediction based on supervised learning algorithm is depicted in Fig. 1. When estimating the logistic performance predictor variable, it is necessary to consider the externally related variable. On the one hand, if no externally related variables exist, the training data of the predictor is built through time windows. On the other hand, training data is constructed using externally relevant variables. In common, accurate and fine-grained multivariate data can enhance the accuracy of prediction (Zeng et al., 2021). External variables that are highly connected with the performance of logistics systems have also been investigated in the past to develop multivariate predictions (D’Aleo & Sergi, 2017; Wong & Tang, 2018). The prediction efficacy of such approaches is highly dependent on the external variables chosen. Modelers will need to experiment with different combinations of external variables for model training and prediction to determine the contribution of each variable to the prediction results. Generally, the higher the prediction accuracy, the more interpretable these variables are (Zeng et al., 2021).

Fig. 1
figure 1

Source: Modified from D’Aleo and Sergi (2017)

General predictive ML framework for logistics performance prediction.

One of the fundamental approaches utilized in literature is the ANN (Xu et al., 2019). ANN models rely solely on a nonlinear data structure (Tealab et al., 2017). Since the different macroeconomic and microeconomic aspects are more complicated, a more advanced algorithm is necessary for a particularly nonlinear assignment. For example, the relationship between GDP production, product market competitiveness and business investment is controversial. On the one hand, the relationship might be beneficial but it could also be detrimental (Kordanuli et al., 2017). When applied to the aim of prediction utilizing macroeconomic and microeconomic variables, it was discovered that ANN resulted in satisfactory model performance (Yakub et al., 2020).

Moreover, several economic indices may be related. World Bank (2018), for example, claimed that LPI score has a substantial beneficial influence on raising GDP per capita as well as import and export volume (D’Aleo & Sergi, 2017; Takele, 2019). If the underlying model is linear, the most commonly used statistical approach for analyzing economic data is Ordinary Least Squares (OLS) regression. However, due to the freedom of OLS assumptions, penalized regression models or shrinkage methods such as Ridge, LASSO and Elastic-net regression are becoming an increasingly significant tool in statistics particularly in time series research (Uyen et al., 2021). Correlations between explanatory variables are a crucial element in these linear regression methods (McDonald, 2009). Ridge, LASSO and Elastic-net work well because they avoid overfitting and minimize model complexity by penalizing the size of coefficients. Indeed, based on its shrinkage approach, this penalized regression produces efficient and quick results particularly in the case of highly dimensional and correlated explanatory variables (Uyen et al., 2021).

Furthermore, several research have evaluated the performance of ANN and linear regression models, among others are samples drawn from diverse sectors or domains (Kim et al., 2020). Since the study contexts and situations are varied, the overall results could not clearly single out a better model. As a consequence, all findings are simply a comparison of the results of accuracy indicators assessed in each research to determine which model has the lowest inaccuracy. Finally, each research result suggests that the models referred to in their studies may be utilized to predict future output in a similar context.

3 Methods and Application

3.1 Methodological Framework

As aforementioned, typically, the factors include the economic component, which shows a substantial correlation between the logistics performance of a nation of LPI score and an economic factor, such as the country's economic development. For an example, GDP per capita (World Bank, 2018), export and import volume that LPI components have a significant positive effect towards increasing international trade for both import and export (Takele, 2019). The reason that these factors include the fundamental economic component, which shows a substantial correlation between the logistics performance of a nation and an economic factor in which the logistics performance improvement policy can be generated based on the predictor correlation parameters. In this study, we emphasize supply chain efficiency in terms of financial efficiency of the supply network by limiting it to the six ASEAN nations. Because the performance of logistics is influenced by how efficiently supply chains connect firms to domestic and international opportunities (World Bank, 2018). From Fig. 10, in which the training data from externally relevant economics variables both macroeconomic and microeconomic features is built. Then the prediction procedure is proposed and shown in Fig. 2, to which the financial efficiency of an individual node in the supply chain determined by the traditional DEA approach is altered in terms of microeconomic features. And, we apply the regression technique, such as Ridge, LASSO or Elastic-net along with ANN approach to investigate the results of LPI prediction.

Fig. 2
figure 2

A methodological framework of LPI prediction procedure

3.2 ML of ANN Approach

ANN is a useful ML model. Three critical aspects influence ANN: the unit's input and activation functions, network design, and the weight of each input connection (Osisanwo et al., 2017). It is composed of three layers of nodes (neurons), namely the input, hidden, and output layers. The data sample are accepted by the input layer, and the target category is returned by the output layer, as shown in Fig. 3a. The neuron, the fundamental unit of these networks, mimics the human counterpart, having dendrites for taking input variables and emitting an output value that may be used as input for other neurons (Laboissiere et al., 2015). The neural network's layers of basic processing units are interconnected, with weights assigned to each connection, which are changed during the network's learning process. Finally, the neural network's final layer is in charge of combining all of the signals from the preceding layer into a single output signal—the network's reaction to specific input data (Xu et al., 2019).

Fig. 3
figure 3

Architecture of neural network

In Fig. 3b, a simple ANN structure is shown, covering the neuron connections, biases allocated to neurons, and weights designated to the connections, depicting a multi-layer model (Zhu et al., 2020). A neuron \(k\) can be identified by two equations, as follows Eqs. (1, 2) (Fath et al., 2020):

$$ y_{k} = f\left( {u_{k} + b_{k} } \right) $$
(1)
$$ u_{k} = \mathop \sum \limits_{i = 1}^{N} w_{ki} x_{i} $$
(2)

where \({x}_{1}\), \({x}_{2}\), …, \({x}_{n}\) are the inputs, \({w}_{k1}\), \({w}_{k2}\), …, \({w}_{kn}\) are the neuron weights, \({u}_{k}\) is the computation outcome of weighted inputs, \({b}_{k}\) is the bias term, \(f(.)\) is the activation function, and \({y}_{k}\) is the output. There are several algorithms with which a network can be trained (Osisanwo et al., 2017).

The main advantage of ANN is better at identifying very complex patterns and making accurate predictions. However, the main disadvantage is related to the model networks are used to approximate or estimate functions that generally need a considerable quantity of training data (Syam & Sharma, 2018).

3.3 ML of Penalized Linear Regression Technique

Regression is one of the highly common and necessary tasks of ML that falls under the area of supervised learning. Regression analysis is essential in statistical modelling and, as a result, in performing ML tasks. Several alternatives, including Ridge regression, LASSO, and Elastic-net, have been established in the literature during the last few decades. Because of their resilience and interpretability to overfitting in high-dimensional datasets, ML linear regression algorithms were widely employed in the task of prediction (Cui & Gong, 2018). When the data is noisy, however, traditional linear regression methods, such as OLS regression tends to overfit, which means that the obtained model works well when predicting the training samples but fails when predicting a new/unseen sample. Ridge regression, LASSO regression, and Elastic-net regression, on the other hand, use a variety of regularization approaches to maximize the generalizability of predicting unseen samples in noisy data.

  1. (a)

    Ridge regression:

Ridge regression creates a model that minimizes the sum of the squared prediction errors in the training data of dataset \(N\) (obtained from the linear regression function of input variable \({x}_{i}\) and its corresponding output \({y}_{i}\)) as well as an L2-norm regularization, i.e. the sum of the squares of regression \(\beta \) coefficients that applied from \({\beta }_{1}\) to \({\beta }_{p}\). The following Eq. (3) is the objective function:

$$ \mathop {\min }\limits_{\beta } \mathop \sum \limits_{i = 1}^{N} \left( {f\left( {x_{i} } \right) - y_{i} } \right)^{2} + \lambda \mathop \sum \limits_{j = 1}^{p} \beta_{j}^{2} $$
(3)

This method can shrink the size of the regression coefficients, resulting in greater generalizability for predicting unseen data. A regularization parameter \(\lambda \) or the penalty factor employed in this algorithm to regulate the trade-off between the prediction error of the training data and L2-norm regularization, i.e., a penalty trade-off between bias and variance (Zou & Hastie, 2005).

The advantages of the model are that it can deal with strongly correlated environmental factors, and it is effective when the amount of data is modest. In contrast, the disadvantages may concern to the estimations of model are biased (Ahmadi‐Nedushan et al., 2006).

  1. (b)

    LASSO regression:

The L1-norm regularization is applied to the OLS loss function in LASSO regression, with the goal of minimizing the sum of the absolute values of the regression coefficients (Tibshirani, 1996). The objective function is written as follows Eq. (4):

$$ \mathop {\min }\limits_{\beta } \mathop \sum \limits_{i = 1}^{N} \left( {f\left( {x_{i} } \right) - y_{i} } \right)^{2} + \lambda \mathop \sum \limits_{j = 1}^{p} \left| {\beta_{j} } \right| $$
(4)

Most coefficients are generally set to zero in this L1-norm regularization, while one random feature is retained among the correlated ones (Zou & Hastie, 2005). As a result, LASSO regression produces a highly sparse predictive model, which enables predictor tuning and decreases model complexity. This can be an issue for a regression with a small number of samples but a high number of features (Efron et al., 2004).

This model provides an advantages of interpretable model and selects a subset of predictors having the greatest influence on the response variable. And when less data is available, it might be utilized for feature selection. For the disadvantage, the model selects one covariate at random from a set of highly collinear variables to incorporate in the model and discards the others (Boucher et al., 2015).

  1. (c)

    Elastic-net regression:

Elastic-net regression seeks to overcome the limitations of the LASSO technique (Zou & Hastie, 2005). The objective function is written as follows Eq. (5):

$$ \mathop {\min }\limits_{\beta } \mathop \sum \limits_{i = 1}^{N} \left( {f\left( {x_{i} } \right) - y_{i} } \right)^{2} + \lambda \mathop \sum \limits_{j = 1}^{p} \left( {\alpha \left| {\beta_{j} } \right| + \frac{1}{2}\left( {1 - \alpha } \right)\beta_{j}^{2} } \right) $$
(5)

As a result, Elastic-net regression is essentially a mixture of LASSO regression and ridge regression, allowing the number of selected features to be greater than the sample size while still attaining a sparse model (Zou & Hastie, 2005). A mixing parameter \(\alpha \) is utilized to adjust the weighting of the L1-norm and L2-norm contributions. The values for \(\alpha \) of Elastic-net lie between Ridge \((\alpha =0)\) and LASSO \((\alpha =1)\).

The advantage of this model such as it performs well when the number of parameters is greater than the number of samples. And, the model provides a more stable and interpretable model than the LASSO. However, the disadvantage of this model cannot be utilized when there is a limited amount of data available because it overwhelms the data with too many model variables (Boucher et al., 2015).

In summary, the Elastic-net is a regularized regression method that linearly combines both penalties i.e. L1-norm and L2-norm of the LASSO and Ridge regression methods, and it proves particularly useful when there are multiple correlated features.

3.4 Standard DEA Model and the Application of DEA Panel Data Method

Charnes et al. (1978) proposed the DEA method (called Charnes, Cooper and Rhodes (CCR) model), which is a non-parametric approach for determining the relative performance of a set of similar decision-making units (DMUs) using sets of inputs and outputs. In other words, it assesses how effectively an organization or other unit uses available resources to produce a set of products or services, which is the input and output data, in comparison to other units in the data set. DEA provides a benchmark (frontier) against which competitors can identify areas of "best-practice" associated with the highest performance measures. A DMU can operate on or near the border, with the distance to the border reflecting inefficiency (Mantri, 2008).

In the traditional DEA method, a set of DMU \(j\) is formed, utilizing quantities of inputs \(X\in {x}^{m}\) to deliver quantities of outputs \(Y\in {y}^{s}\), where \(m\) and \(s\) indicate the numbers of the inputs and outputs. Specifically, \({x}_{ij}\) denotes the amount of the \(i\) th input used and \({y}_{rj}\) the amount of the \(r\) th output produced. The efficiency score of each DMU, \(\theta \), is measured as Eq. (6):

$$ \theta = \frac{{\mathop \sum \nolimits_{r = 1}^{s} \mu_{r} y_{r} }}{{\mathop \sum \nolimits_{i = 1}^{m} \upsilon_{i} x_{i} }} $$
(6)

where \({\mu }_{r}\) and \({\upsilon }_{i}\) are the output and input weights respectively. The focus is to optimize the ratio of outputs to inputs (Charnes et al., 1978).

We can call the envelopment DEA models as radial efficiency measures, because these models optimize all inputs or outputs of a DMU at a certain proportion. Färe and Lovell (1978) introduce a non-radial measure which allows nonproportional reductions in positive inputs or augmentations in positive outputs (Zhu, 2009).

Focusing on envelopment DEA model, depending on the interest of the analysis, two alternative approaches are available in DEA that can be identified as an input-oriented or output-oriented model. For the input-oriented model where the inputs are minimized and the outputs are kept at their current levels (Banker et al., 1984). An objective of the input-oriented DEA model is to maximize the ratio of virtual output to virtual input while keeping the ratios for all the DMUs not more than one. In contrast, the output-oriented DEA models consider the possible (proportional) output augmentations while keeping the current levels of inputs that the ratios for all the DMUs not less than one (Zhu, 2009).

The CCR models assume constant returns to scale (CRS) which means that if there is an increase in the inputs, the results in a proportion increase in the output level as well, for example, reducing input while remains output unchanged. Since DEA techniques have been introduced, various extensions of the CCR models have been proposed. Banker et al. (1984) have introduced an extension of the original CCR models which called Banker, Charnes and Cooper (BCC) model. The BCC models assume Variable Returns to Scale (VRS). The VRS models reflect the fact that production system may exhibit increasing, constant and decreasing returns to scale (Soheilirad et al., 2018). Because it does not require any assumptions and can be used to measure the efficiency of DMUs with multiple inputs and multiple outputs, DEA is more popular in the literature (Yeşilyurt et al., 2021). Standard linear programming (LP) techniques can be used to solve the majority of popular DEA models (Chen & Cho, 2009).

DEA has been widely used since the pioneering work. It includes the DEA panel data method. It is used to compute the long-term scale efficiency of DMUs by estimating the long-term efficiency scores. The DEA panel data method has the advantage of allowing us to estimate a single coefficient of efficiency for the period of analysis while taking into account the data panel structure. As an example of the window analysis, by this approach, the technical efficiency is analyzed sequentially with a specific window width (for example, the number of years in a window) utilizing a panel data of DMU (Řepková, 2014). We can compute the time-invariant scale efficiency representative of the study period by extending the DEA panel data approach to estimations with CRS and VRS (Pérez-López et al., 2018).

In this study, to apply the DEA panel data of window analysis, the envelopment formulation of an input-oriented mechanism to illustrate the constant returns of scale (CRS) situation (Zhu, 2022) which apply to window analysis observed in total \(T\) period (Řepková, 2014) is utilized and shown in Eq. (7):

$$ {\text{min}}\theta - \varepsilon \left( {\sum\limits_{{i = 1}}^{m} {s_{i}^{ - } } + \sum\limits_{{r = 1}}^{s} {s_{r}^{ + } } } \right) $$
(7)
$$ s.t.\quad \sum\limits_{(j = 1)}^{(n \times T)} {\lambda_{j} x_{ij}^{t} + s_{i}^{ - } = \theta x_{i0} ,\quad i = 1, \ldots ,m,} $$
$$ \mathop \sum \limits_{j = 1}^{n \times T} \lambda_{j} y_{rj}^{t} - s_{r}^{ + } = y_{r0} ,\quad r = 1, \ldots ,s\quad j = 1, \ldots ,n \times T, $$
$$ \lambda_{j} ,s_{i}^{ - } ,s_{r}^{ + } \ge 0, $$

where \(n\) is the number of members in a set of DMUs iperiod \(t\) \(\left(t=1,\dots , T\right)\) in which the subscript \(_{i0}\) or \(_{r0}\) represents the evaluating DMU, \(\lambda_{j}\) is a nonnegative scalar, \(\varepsilon\) is non-Archimedean infinitesimal, and \(s_{i}^{ - }\) and \(s_{r}^{ + }\) are the slacks of the input \(x_{ij}^{t}\) and output \(y_{rj}^{t}\) respectively. Equation (7) is an input-oriented DEA window analysis model where the objective function and its constraints are minimizing the inputs while maintaining the outputs at their current levels (Zhu, 2022). With an optimality result (\(\theta =1\)) and all slacks are zero, the DMU is said to be CRS-efficient and is operating on the CRS frontier. Otherwise, the DMU is CRS-inefficient, and an improvement is required by decreasing the input and/or increasing the output (Charnes et al., 1978).

3.5 Model Performance Evaluation

The accuracy of ML prediction results is determined by the model structure and associated training algorithm. Regarding the previous study, we chose a popular performance criterion that produces consistent results (Salehi et al., 2020). This section presents information on selected performance criteria that are used to confirm the accuracy of the results and validate the performance of different prediction models for decision-making.

The mean absolute error (MAE) (8), the root mean square error (RMSE) (9), and Nash − Sutcliffe efficiency coefficient (NSE) (10) (Başakın et al., 2021) along with the Kruskal‐Wallis test at 95% confidence interval will be used to measure the performance of model.

$$ MAE = \frac{1}{N}\mathop \sum \limits_{k = 1}^{N} \left| {y\left( k \right) - \hat{y}\left( k \right)} \right| $$
(8)
$$ RMSE = \sqrt {\frac{1}{N}\mathop \sum \limits_{k = 1}^{N} \left( {y\left( k \right) - \hat{y}\left( {\text{k}} \right)} \right)^{2} } $$
(9)
$$ NSE = 1 - \frac{{\mathop \sum \nolimits_{k = 1}^{N} \left( {y\left( k \right) - \hat{y}\left( k \right)} \right)^{2} }}{{\mathop \sum \nolimits_{k = 1}^{N} \left( {y\left( k \right) - \overline{y\left( k \right)} } \right)^{2} }} $$
(10)

where \(N\) is the amount of validation data, \(y\) is the target or observed value of LPI score which \(\overline{y }\) is its average value and \(\widehat{y}\) is the prediction or estimation value of LPI score which \(\overline{\widehat{y} }\) is the average value of its. The smaller values for MAE and RMSE indicate a higher accuracy, while the NSE varies between -∞ and 1 and NSE value close to 1 denotes the good prediction performance.

To compute the Kruskal–Wallis statistic \((H)\), let \(n\) denotes the total number of observed and predicted data set, \(R\) denote the rank for each numbers in \(n\), and \({R}_{i}\) and \({n}_{i}\) are the rank and the number of DMU of each dataset \(i\), respectively. Then, \(H\) is as follows Eq. (11):

$$ H = \frac{12}{{n\left( {n + 1} \right)}}\sum \frac{{R_{i}^{2} }}{{n_{i} }} - 3\left( {n + 1} \right) $$
(11)

The statistic \(H\) follows the \({\chi }^{2}\) distribution with a degree of freedom \((df=i-1)\) (Guo et al., 2013).

3.6 Analytical Tool

This experiment made use of the MATLAB 2020b Neural network toolbox. The ANN was formed using the default parameters (initial value of momentum = 0.001, epochs = 1,000 and maximum fail = 6). For prediction problems, a feed-forward ANN with backpropagation learning was constructed (Aguinaga et al., 2017). TRAINML is chosen for a network training function that uses the Levenberg–Marquardt optimization approach to alter the weight and bias variables. TRAINLM is a fast algorithm, however, it uses more memory than other algorithms (Amin et al., 2013). To minimize errors, gradient descent with momentum weight and bias learning function, or LEARNGDM, is utilized. This function subtracts the weight change associated with a particular neuron, taking into account the neuron's input and error terms, learning rate, weight and bias, and momentum term, and is equivalent to gradient descent with momentum backpropagation (Baruah et al., 2017). TANSIG, or tangent sigmoid, is utilized as a transfer function in future calculations for the input variable \(x\) as shown in Eq. (12) (Moayedi & Rezaei, 2019):

$$ TANSIG\left( x \right) = \frac{2}{{1 + e^{ - 2x} }} - 1 $$
(12)

TANSIG is employed in both the output and hidden layers. They calculate the output based on the net input. The values returned by this activation function range from -1 to + 1 (Narvekar et al., 2017). For ANN, the network is built upon 1 × 10 (one hidden layer with ten nodes).

Moreover, the penalized linear regression analysis was carried out using RStudio in which the “glmnet” package is employed. In addition, the datasets were centered and scaled, and a tenfold cross-validation was performed to produce internally valid performance metrics. To fit generalized linear and related models, the “glmnet” package employs penalized maximum likelihood. The regularization path for the ridge, LASSO, or Elastic-net penalty is calculated using a grid of values (on the log scale) for the regularization parameter lambda. The technique is highly fast and can exploit sparsity in the input matrix \(x\). It works with linear, logistic, and multinomial regression models, together with the Poisson and Cox regression models. Fitting multi-response linear regression models, generalized linear models for custom families, and relaxed LASSO regression models are also possible. Prediction and graphing methods, as well as cross-validation procedures, are included in the package (Fonti & Belitser, 2017).

4 Case Study

4.1 Application Domain and Scope

ASEAN countries are undergoing a rapid mechanical revolution. In 2020, the services sector led ASEAN's economy, accounting for 50.6% of the bloc's GDP, followed by manufacturing (35.8%) and agriculture (10.5%). Travel is the most important contributor to ASEAN's exports and imports in terms of services trade. Other business services and transportation accounted for the majority of ASEAN trade. In the manufacturing domain, electrical and machinery (with equipment and parts) account for 29.7% and 28.1% of total manufacturing goods exports and imports, respectively (HKTDC Research, 2022). The agricultural domains, provide the world with a diverse array of agricultural-based food products (Fan et al., 2021). These are the main engines driving ASEAN's growing trade volume.

Singapore, Vietnam, Malaysia, Thailand, Indonesia, and the Philippines account for six of the ten ASEAN member countries. As illustrated in Fig. 4, these nations have the greatest levels of exports and imports among ASEAN members (ASEAN Stats, 2021). Six of ten ASEAN countries as aforementioned are selected to evaluate the performance of their logistics service supply chain as a case study.

Fig. 4
figure 4

Exports and imports of goods of ASEAN countries (ASEAN Stats, 2021)

4.2 Dataset and the Preparation

The S&P Global Market Intelligence platform is a database of global organizations in a variety of industries that is accessible via a license. It contains worldwide logistics and transportation and manufacturing firms’ data and enables categorization by country, firm, development stage, and commodity. In this study, the data relating to the S&P Global Market Intelligence source is divided into two parts: the microeconomic dataset and the macroeconomic dataset. First, a microeconomic dataset relating to data used to analyze the behavior of particular companies in an attempt to comprehend company decision-making processes. Second, a macroeconomic dataset is a collection of data that is primarily used to examine the dynamics of aggregate data in a country. In other words, microeconomic data refers to firm-level data while macroeconomic data refers to country-level data. The firm's financial data and the country's economic and demographic statistics are the source of microeconomic and macroeconomic dataset in S&P Global Market Intelligence, respectively. These datasets are derived from real-world data and may contain both linear and non-linear interdependence, with each country exhibiting a unique pattern.

Dataset spanning 12 fiscal years from 2009 to 2020 (six periods of window width) is used to analyze the efficiency of a logistics service supply chain in six ASEAN countries. Firstly, the dataset as of year 2009 to 2016 are used as training, testing and simulation sets to simulate and verify the LPI score of 2018. The model with the highest performance will be chosen to predict the 2020 LPI score. Datasets from 2009 to 2018 are used as training, testing and validation sets (for the 2020 LPI score prediction). The panel of dataset related to the predictor and dependent parameters is shown in Fig. 5.

Fig. 5
figure 5

Panel of the dataset

Based on S&P Global Market Intelligence's industry classification, we created the logistics service supply chain depicted in Fig. 6 based-on the physical and support supply chain of (Carter et al., 2015), in which the node of logistics service providers (air freight, maritime and logistics services) serves as the supplier of focal companies of manufacturing. In addition, the first node of functional service providers serves as the supplier of the logistical service providers node (airport and marine port services).

Fig. 6
figure 6

Evaluation of node financial performance in a supply network with DEA

To summarize, the number of individual businesses pertaining to logistics service supply chain is indicated in Fig. 6, for which the S&P Global Market Intelligence data source is accessible throughout the research period. Per each country, there are two companies of functional service provider node, three companies of logistics service provider node, and five companies of focal firms of manufacturers node.

The selection of sufficient DMU should be considered in order to maintain the power of DEA discrimination. The DMU of six ASEAN countries may be appropriate under the economics and continent geography factors. Also, by developing a logistics and supply chain collaboration framework, these ASEAN member states can improve the speed and reliability of their supply chains. This includes reducing the time and costs required for products to cross the border, whether by land, sea, or air, as well as identifying and resolving major trade and investment bottlenecks (ASEAN Secretariat, 2018). Besides that, this prediction procedure may be used in other regions that the selected DMU countries should consider with comparable features, such as all countries with similar economic scales or those that are coastal states and can offer water freight transportation network collaboration. The sufficient and appropriate selection of the country to apply the collective instance perspective may affect the prediction results that are related to the level of complexity and linearity or non-linearity of the dataset.

Moreover, the DEA concept is also formed into Fig. 6, where we selected the cost of operation and total asset as the direct input and the revenue as the direct output for each node. In the case of a single input (output) and multiple outputs (input), the stochastic frontier analysis (SFA) method (typically using maximum likelihood estimation) and functional form of input–output-relation is specified (e.g. linear, semi-log, double-log) may be appropriate (Lampe & Hilgers, 2015). However, in this study, due to dataset constraints (single output and multiple inputs) and a limited number of DMU, we chose the DEA approach because DEA is preferred over SFA when the sample size is small or medium (Banker et al., 1993). In addition, the DEA input-oriented approach is chosen, which has an advantage over the output-oriented approach when dealing with multiple inputs (Embaye & Bergtold, 2017).

5 Findings and Discussion

5.1 The Simulation Results of LPI

We begin by simulating the LPI score using ML of ANN and regression techniques such as Ridge, LASSO, and Elastic-net. Only the macroeconomic statistics of GDP per capita, exports, and imports are utilized as inputs to generate the simulation results of LPI output. The dataset had 508 instances from 127 countries (inclusive of 6 ASEAN countries), covering four periods of LPI announcement (2010, 2012, 2014, and 2016). For the dataset of 508 instances, 70% (356 instances) of the data is used as a training set, while the remaining 30% (152 instances) is used to test and validate model performance (50:50 ratio of test and validate). Reason for estimating a sample of 127 nations rather than only 6 ASEAN countries in this initial step, is that it relates to the concept of leveraging large data to improve the accuracy of ML predictions. The instance must be maximized due to the typically high accuracy even with big datasets (Bouktif et al., 2018).

Then, as a simulation set, the 127 cases of country in the year 2018 are employed. The findings in Table 2 show the MAE, RMSE, and NSE based on the selection approaches, as calculated by average across ten runs. To ensure accurate results of ML, the values given are the average of at least ten runs performed under the exact identical conditions (Papandrianos et al., 2022). In terms of time consumption, previous research has consistently found that a run time of ten is appropriate (Salehi et al., 2020). Furthermore, the standard deviation (SD) of average value is used to check the statistical property of accuracy for all ML algorithms for a robust and reliable algorithm consideration (Wang et al., 2022a, 2022b). To report the results, Fig. 10 in Appendix A depicts the repetition of two to twenty run tasks and computation of the average SD across all tasks. The results show that the average SD remains constant as the number of runs increases, indicating that the ML approach is robust and reliable.

Table 2 Results of performance value of simulation set of 2018 LPI

According to the simulation findings in Table 2, the ANN outperformed the regression approach in terms of accuracy (the result of 127 counties utilizing by ANN is shown in Table 5 in appendix). Since ANN model relies solely on a nonlinear data structure (Tealab et al., 2017), it is likely that the LPI score of some countries is not strongly associated with GDP per capita, exports, and imports. Furthermore, according to Table 2, the MAE, RMSE, and NSE when considering only six ASEAN countries derived using the ANN method is *0.1333 (lowest), *0.1581 (lowest), and *0.782 (highest), respectively. The next subsection will illustrate our proposed LPI prediction process, which refers to the study goal of precising the LPI prediction findings that integrate microeconomic data to macroeconomic data as prediction parameters. Hence, in this investigation, the procedure's goal MAE and RMSE should be less than 0.1333 and 0.1581 with NSE should be more than 0.782.

5.2 DEA of Financial Efficiency Evaluation

The efficiency score generated using the DEA model is used to represent the microeconomic data in this study. The standard DEA model is used to assess the financial efficiency of each node based-on the supply chain structure of a case study. According to Fig. 6 with the input and output data as present in Table 6 in appendix, the results of financial efficiency of functional service provider (FSP) (airport and marine services), logistics service provider (LSP), and focal manufacturing company (MFG) are assessed using the conventional DEA Eq. (7) as shown in Table 7 in appendix.

Table 3 Result of verification of 2018 LPI

The growth or reduction of efficiency scores in comparable rows may reflect the trend of efficiency in the same business over the research period from 2009 to 2020, which is combined by every couple of years, as shown in Tables 7. The notion of annual or time-by-time benchmarking using the DEA technique to analyze panel data is one of the concepts of the DEA application, which some research name it as DEA panel data (Pérez-López et al., 2018). The primary goal of this technique is to create a trend of efficiency in study time.

5.3 ML Approach Verification

We verify the country LPI of year 2018 by utilizing the microeconomic and macroeconomic data of 2009 to 2016 as the training, test and validation set. There are 720 cases in this dataset (2 FSPs \(\times\) 3 LSPs \(\times\) 5 MFGs \(\times\) 6 countries \(\times\) 4 periods \(=\) 720) that it might entail mapping the microeconomic data of individual financial efficiency scores at each stage to the macroeconomic data on each country. For the dataset of 720 instances, 70% (504 instances) of the data is used as a training set, while the remaining 30% (216 instances) is used to test and validate model performance (50:50 ratio of test and validate).

Based on a dataset from 2009 to 2016 that was utilized as a training set, the LPI score of 2018 was verified. The verification employs ML from the ANN method, as well as ML from regression techniques such as Ridge, LASSO, or Elastic-net. The outcomes of re-simulation are given in Table 3, by averaging across ten runs. In Table 3, the ANN technique offers the best value of MAE, RMSE and NSE as 0.0949**, 0.1274**, and 0.8585**, respectively. The results of ANN which nearly matches the result of the Ridge regression as 0.1101***, 0.1282***, and 0.8567*** for MAE, RMSE and NSE, respectively. Based-on these performance indicators, however, the LASSO and Elastic-net regression methods may not work well with this feature dataset when compared to each other.

In summary, when considering only the ANN and the Ridge regression method, our prediction strategy can achieve the goal of the model having an MAE and RMSE less than 0.1333 and 0.1581, and NSE more than 0.782 as well. However, because the features of the two methods differ, one of them may not be appropriate for all six ASEAN countries. Singapore, for example, is a developed country, whereas the others are developing. To offer the best results, we emphasize the difference (gap) between the actual and predicted LPI per country using both techniques. For ANN, it may be fitted to MYS, PHL, and SGP, which have gaps of 0*, − 0.03*, and − 0.01*, respectively. Ridge regression, on the other hand, may be fitted to IDN, THA, and VNM, with gaps of + 0.01*, − 0.11*, and − 0.05*, respectively.

Moreover, the statistical significance of the acquired data was examined using the Kruskal–Wallis test in this study, additionally to an analysis of whether the predicted and observed or LPI distributions, were consistent (Başakın et al., 2021; Citakoglu, 2021). Wherewith H0 denotes a hypothesis based on the statistically significant difference between mean predicted and observed efficiency score. Table 4 reveals that the H0 hypothesis was rejected (P-value \(\ge \) 0.05) for all simulation models; in other words, there is no significant difference between predicted and observed averages. This suggests that the ML approach of ANN, Ridge, LASSO, and Elastic-net had a statistically significant beneficial influence on prediction.

Table 4 Result of Kruskal–Wallis test

It is extremely difficult, if not impossible, to create a prediction model that produces 100% accurate results. Aside from the closest findings, one of the advantages of a useful prediction model is the accuracy of the predicted trend. Table 3 also involves a trend prediction from 2016 to 2018 to demonstrate the value of the prediction technique. The 2018 LPI verification findings based on the ANN and Ridge regression technique discovered that the predicted LPI score is accurate in trend prediction for all countries linked to their fitting process. Based on the ANN method, the model predicts that the LPI score for MYS will fall from 3.42 in 2016 to 3.22 in 2018, with the actual score falling to 3.22. For the Philippines, the model predicts that the LPI score will rise from 2.86 in 2016 to 2.87, then rising to 2.90. In Singapore, the model predicts that the LPI score would fall from 4.14 in 2016 to 3.99 in 2018, then falling to 4.00. Based on the Ridge regression approach, the model predicts that the LPI score in Indonesia will rise from 2.98 in 2016 to 3.16 in 2018, with the actual value rising to 3.15. For Thailand, the model predicts that the LPI score will rise from 3.26 in 2016 to 3.3 in 2018, and indeed the actual score rises to 3.41. In Vietnam, the model predicts that the LPI score will rise from 2.98 in 2016 to 3.22 in 2018, with the actual score rising to 3.27.

Consequently, we use an ANN method to estimate the 2020 LPI score for MYS, PHL, and SGP and the Ridge regression approach to predict the 2020 LPI score for IDN, THA, and VNM. The results are presented in the next section.

5.4 ML Approach Prediction

The prediction using ML of ANN and Ridge methods were carried out using the dataset from 2009 to 2018, with 70% and 30% as training and test set, respectively. Figure 7 illustrates the outcomes of prediction after averaging over ten runs.

Fig. 7
figure 7

The prediction result of 2020 LPI based on the ANN and Ridge regression method a IDN, b MYS, c PHL, d SGP, e THA, f VNM

In Fig. 7, under the ANN approach, the 2020 LPI of MYS is higher than the 2018 LPI (increase from 3.22 to 3.3; + 0.08*), the 2020 LPI of PHL is lower than the 2018 LPI (decrease from 2.9 to 2.89; − 0.01*), and the 2020 LPI of SGP is higher than the 2018 LPI (increase from 4 to 4.11; + 0.11*). Furthermore, under the Ridge regression approach, the 2020 LPI of IDN is similar to the 2018 LPI (invariable as 3.15; 0*), the 2020 LPI of THA is lower than the 2018 LPI (decrease from 3.41 to 3.33; − 0.08*), and the 2020 LPI of VNM is higher than the 2018 LPI (increase from 3.27 to 3.34; + 0.07*). Figure 7 also depicts the forecast result of the 2020 LPI based on the ANN and Ridge regression methods for easier comparison.

Figures 8 and 9 showcase the findings on the macroeconomic and microeconomic metrics compared to its LPI score for six ASEAN countries. For macroeconomic factor of GDP per capita, exports amount, and imports amount, we have re-scaled by setting the data of 2010 as 1.

Fig. 8
figure 8

The macroeconomic parameters compared to LPI score a IDN, b MYS, c PHL, d SGP, e THA, f VNM

Fig. 9
figure 9

The financial efficiency parameters compared to LPI score a IDN, b MYS, c PHL, d SGP, e THA, f VNM

In Fig. 8, the research characteristics, including GDP per capita, exports amount, and imports amount, are almost in accordance with the LPI score of IDN, THA, and VNM from 2010 to 2018 (except in year 2010 for THA and 2016 for VNM). This indicates that the input and output parameters of ML may be highly linear, as such Ridge regression is more appropriate and indeed as shown in the simulation results earlier, the Ridge regression technique's performance is more superior in this case. Hence, applying Ridge regression approach to the 2020 prediction may be acceptable for these three countries (IDN, THA and VNM). For MYS, all research macroeconomic characteristics have similar trend as LPI score from 2010 to 2016 but it moves in an opposite direction in 2018. The impact of the year 2018 may cause the input and output parameters of ML to be nonlinear, resulting in ANN outperforming Ridge regression in simulation. Hence, using the ANN technique in 2020 prediction may be fair to MYS. For PHL and SGP, all study macroeconomic are mostly not in similar trend as the LPI score. Considering that the primary input and output parameters of ML in macroeconomics exhibit non-linearity, the ANN technique therefore is more acceptable to be used for prediction in 2020 for PHL and SGP.

Figure 9 depicts the microeconomic data that correlate to the financial efficiency score of supply network nodes that are used to compare the trend of LPI score. For MYS, THA, and VNM, the financial efficiency of the LSP node moves in line with the LPI trend. For PHL, the LPI score trend is dependent on FSP's financial efficiency. When the graphical technique is used, however, the significance of the association between the trend of LPI score and the financial efficiency of supply network members is not discovered for SGP.

6 Highlight of the Findings

When ML is used to estimate the performance in the context of logistics and supply chain the ML model is required to automatically learn such interdependencies from data to obtain an accurate prediction. To predict the logistics performance represented by the LPI score which applies the time series data of economics features as the input data of the models. These such datasets are contained both linear and non-linear interdependence, with each country exhibiting a unique pattern. When the proposed collective instance of 127 countries, ANN outperforms linear models that reason from the overall dataset is highly volatile and multicollinear. According to the benefits of ANN, it is better at identifying very complex patterns and making accurate predictions (Syam & Sharma, 2018), which is similar to previous work by Wang and Zhang (2020), Han and Zhang, (2021), Kosasih and Brintrup (2021), Wu et al. (2021), and Feizabadi (2022) that ANN-based models perform well when applied to the complex supply chain system.

Moreover, when the approach is used to the collective instance of six ASEAN countries and DEA is used to fine-tune the prediction procedure's performance, the ANN consistently provides an acceptable result that is close to that of the ridge regression model. Ridge regression has the advantage of being able to deal with highly correlated environmental variables and is useful when there is a small amount of data (Ahmadi-Nedushan et al., 2006). However, when the results for each country are considered, it is obvious that the ML of the ANN prediction procedure is appropriate for the SGP, MYS, and the PHL datasets. Ridge regression is fitted to the IDN, THA, and VNM datasets. And one reason that ANN cannot fit all countries may be related to the quantity of training data since the limited amount of dataset inherently is a disadvantage to the ANN model (Syam & Sharma, 2018). As a result of the multivariate pattern of combining time series in distinct sets of data, the complexity and linearity interdependencies may be increased or decreased. It is possible to obtain accurate prediction results by utilizing different nonlinear and linear ML algorithms. Policymakers must grasp this concept while predicting LPI. It is critical to evaluate the relationship between economic parameter input characteristics and logistics performance. According to the findings of Khan et al. (2017), the factor of economic growth of per capita income influences logistics performance, and the mediation of sustainable factors, such as energy demand and greenhouse gas emissions, influences economic growth to improve logistics performance. Time series prediction approaches that use nonlinear and linear algorithms separately or in combination with initial or mediation variables may offer distinct advantages.

6.1 Policy Implications

In terms of policy implications, the outcomes of the procedure can be utilized to assist policymakers. This is because the procedure is often predicted using up-to-date dynamic economics big data such as the 2020 predicted LPI. The prediction findings can be utilized to monitor, enhance or reform a country's short-term logistics and supply chain policies particularly when focused on the prediction trend that provide more accurate information.

It is critical for policymakers to understand this notion when predicting logistics performance at the national level. It is important to consider how economic parameter input features correspond with logistical performance. If the data does not have a linear correlation for example, and the prediction technique is anticipated to generate findings with an acceptable level of accuracy, the ANN approach is required. The Ridge regression approach on the other hand is appropriate for data that is linearly associated. Another advantage of utilizing linear data in predicting is that policymakers may decide logistics performance improvement policy based on input characteristics. In this study, countries such as IDN, THA and VNM where macroeconomic factors including GDP per capita, import and export volumes would have a direct impact on logistics performance. As a result, strategies to encourage imports and exports or to boost GDP per capita are implemented. Nations could formulate policies to encourage the performance of the logistical system. One supporting factor could be these nations depend primarily on imports and exports. In this regard, the LPI indicators of the efficiency of customs and border management clearance and the ease of arranging competitively priced international shipments that support the increase of import and export volumes are required. This is in tandem with the study by Yeo and Deng (2020) who highlighted that there is a relationship between trade facilitation and international trade. In that, the development of import and export processes not only reduces direct and indirect logistics costs through the simplification and harmonization of procedures and documentation but also increases the efficiency and effectiveness of a trade.

The microeconomic aspect that was discovered which was associated with their LPI scores was the financial efficiency of LSP. It is native to MYS, THA and VNM. As a result, increasing the operational efficiency of LSP is another strategy that could assist in strengthening the country's logistics efficiency. LSP operational efficiency improvement is related to the LPI indicators of the competence and quality of logistics services, the frequency with which shipments reach consignees within the scheduled or expected delivery time, and the ability to track and trace consignments of LSP. For PHL, it has not driven its business through significant imports and exports. It was discovered that enhancing the operational efficiency of infrastructure providers may be a significant strategy that would improve the country's logistics performance. It is concerned to quality of transportation-related infrastructure that is an important primary LPI indicator. SGP, which already is a world-leading logistics performer and with no linear associations to the economic parameters of this study may need to develop policies that fulfill the evaluation LPI criteria. Its primary function is to maintain a continuously high level of performance related to all LPI indicators. This outcome is consistent with Intal et al.’s (2021) observation that SGP logistics performance should continue to drive innovation in order to create new supply chain solutions. For an example, the development of Singapore’s national single window, and innovations in Singapore’s customs which may be a model of modern logistics management that it can also link to other ASEAN countries. Such policies development of each country will the country sustainable development including economic growth, infrastructure and cities development.

7 Conclusion

In conclusion, the present study proposed the LPI prediction procedure using the ML approach and up-to-date dynamic economics big data as input parameters. The ML of ANN which was used in the study is a prominent method of predicting nonlinear datasets. Furthermore, ML of linear regression techniques such as Ridge, LASSO and Elastic-net produce prediction outcomes for highly correlated variables’ datasets. It helps in the prediction and trade-off of linearity and nonlinearity features of macroeconomic variables and LPI scores which have been found to have a positive association in previous studies. However, when the data is created to group prediction methods using a case of six ASEAN countries with their comparable or dissimilar economic context, this contributed to raising or lowering the linearity of the combined dataset. The dataset was determined by the microeconomic data of company financial efficiency collected using the DEA method.

According to the simulation and prediction of LPI score in the findings section, attending to the study objective is to apply the prediction procedure employing ML technique to estimate the LPI score based on the country’s economic parameters. From the complex pattern of a collective instance of these six countries, the non-linear algorithm of ANN performed best, followed by the penalized linear of Ridge regression method. The goal of the study is achieved through the use of these two methods, as measured by the deviation of MAE, RMSE, and NSE. And from the Kruskal–Wallis test, it do not significance difference between mean predicted and observed LPI score. Then the accuracy of predicting trends in the simulation data of the 2018 LPI are all correct for each country when applied the proposed models. The findings also show the ML of ANN prediction procedure is suitable for the economic data of SGP, MYS and PHL. Ridge regression ML is fitted to economic data from IDN, THA and VNM.

Furthermore, the findings could address the first study question of whether microeconomic data on the financial efficiency of individual firms and their supply networks could be used to fine-tune the performance of the prediction procedure. In the managerial implications section, the findings addressed the second question, "How do the different ML prediction of nonlinear and linear algorithms findings help policy reforms at the country level which may lead to significant performance gains in upgrading nations' logistics and supply chain capabilities?" The prediction results show that macroeconomic factors are influencing the rising logistics performance index in Vietnam in 2020. This outcome is consistent with the study of Nguyen et al. (2021) presented that there is an influence of logistics on Vietnam's economy that can apply to macro policy development. Moreover, as in Malaysia, logistics performance can be influenced by the financial efficiency of the logistics business in terms of microeconomic factors. This outcome related to the research of Ayesh et al. (2021) which suggested that Malaysian LSPs should develop more effective value-based strategies to develop financial confidence and maintain long-term dedicated relationships in order to achieve business objectives. Furthermore, the results of each country may be used to make logistics and supply chain policy decisions, which may result in significant performance gains in upgrading nations' logistics and supply chain capabilities, as well as support a global trade collaboration network for the nation's and region's sustainable development.

Nevertheless, there are certain limitations in this study due to the microeconomic data source used by S&P Global Market Intelligence. The source only offers data for public companies and does not provide data for private companies which resulted in the limitation of DMU in the evaluation of financial supply chain efficiency using the DEA technique. Moreover, a review of the literature was used to determine the macroeconomic factors that influenced the LPI. Other feature economic growth factors of initial and mediation of sustainable factors such as fuel and renewable energy prices and consumption rates, institutions, population and education, labor market, level of road safety, and technological readiness may have been overlooked. This could be an extension of research into countries' logistics performance. Aside from that, the established feature selection stage of the prediction procedure uses the ML approach to select the set of factors that have a strong correlation to logistics performance. Furthermore, this study identified only two forms of predictive ML, applying more diverse ML methods to a subset of other cases could be the focus of future research when other relevant attributes are being studied.