Introduction

Since the birth of the first vertical business to business (b2b) commercial website "ChinaChemNet" in the 1990s, China has gradually entered a new economic era characterized by the development of e-commerce [1, 2]. E-commerce plays an important role in creating consumer and investment demand, opening up employment channels, stimulating innovative activities, and driving regional economic development [3, 4]. Due to the continuous improvement of upstream, midstream and downstream industrial supply chains, the development of e-commerce is gradually accelerating [5]. Meanwhile, there have been great changes in closely related logistics. For example, Inoue et al. [6] showed that an ecosystem strategy based on the e-commerce market can significantly improve the performance of logistics companies. However, some scholars (e.g. Hsiao et al. [7]; Kim et al. [8]) believe that based on the perspective of consumers, logistics systems can only play its role by integrating the whole e-commerce supply chain.

Relevant data indicate that the development of e-commerce has boosted the demand for regional logistics. According to data released by the China Federation of Logistics and Purchasing (2019), the transaction scale of China's e-commerce market reached 28.4 trillion yuan in 2018 [9]. The rapid development of e-commerce has brought about a huge demand for logistics transportation and distribution. In 2018, China's courier industry’s same-city business volume totaled 11.41 billion pieces, which is an increase of 23.1% year-on-year. In addition, off-site business volume totaled 38.19 billion pieces, which is an increase of 27.5% year-on-year. Also, international/Hong Kong, Macao and Taiwan business volume totaled 1.11 billion pieces, up 34% year-on-year [9]. In the national total express, e-commerce delivery accounts for a very high proportion reaching over 60%. In the business of major private courier companies, e-commerce orders account for more than 80% [9].

However, the rapid development of e-commerce also poses the following new requirements and challenges for logistics and distribution:

  1. 1.

    Last-mile delivery to e-commerce customers has always been one of the biggest challenges for logistics providers. The last-mile product delivery is also an expensive option for retailers [10]

  2. 2.

    Logistics and distribution providers are often unable to meet the delivery needs of customers during peak e-commerce shopping periods [11]. Due to the rapid development of e-commerce and the wide areas covered by e-commerce networks, logistics providers are unable to meet the demand for logistics and distribution. This is because their logistics and distribution capabilities lag behind the development speed of e-commerce [12]. Much of the reason for such problems lies in the uneven development speed between e-commerce on the one hand and logistics on the other hand.

Recently, some scholars, based on different perspectives, have conducted relevant studies on the problem of insufficient logistics distribution capacity. For example, Mladenow et al. [13] and Huang et al. [14] propose that the introduction of crowdsourcing logistics will effectively address the last mile of urban logistics distribution, as well as mitigate peak periods problems such as inadequate logistics and distribution capacity. Ishfaq and Sox [15], Hsiao and Hansen [16], Hsu and Wang [17], Yu et al. [18], and others use different forecasting models to forecast regional logistics demand to provide decision support for local economic development. However, in the era of e-commerce, the impact of e-commerce on logistics and distribution cannot be ignored. Also, the indicators related to the development of e-commerce need to be taken into account when making regional logistics forecasts. Therefore, based on the literature review, an indicator system for forecasting logistics demand in Guangdong province was constructed, and applied the GM (1,1) model and the BP neural network model to simulate and forecast the logistics demand. After comparing the two methods, the BP neural network model with less prediction error and more stable results was chosen to forecast the logistics demand of Guangdong province for 2020–2022. The contributions of this paper are:

  1. 1.

    The indicator system for logistics demand forecasting, based on e-commerce perspective, can better reflect the actual scale of logistics demand.

  2. 2.

    By comparing the two methods, it is found that BP neural network has the least error in the prediction of logistics demand and has a good application prospect.

The remainder of the paper is arranged as follows: the second part presents the literature review. The third part describes the research methodology. The fourth part is data analysis and regional logistics demand forecasting. The fifth part discusses the theoretical and practical contributions of this paper. Finally, the sixth part presents the conclusion, limitations and the further work that need to be carried out.

Literature review

The impact of e-commerce on logistics

In a discussion of the relationship between e-commerce and logistics, Delfmann et al. [19] and He et al. [20] earlier proposed that logistics is an important part of e-commerce, which guarantees the realization of e-commerce and the smooth progress of production. In other words, logistics serves e-commerce. However, with the rapid development of e-commerce, the relationship between e-commerce and logistics is also changing.

During the early stages of e-commerce development, Sink and Langley [21] believed that the market was mainly dominated by third-party logistics companies and self-run logistics companies. In this stage, the logistics company mainly serves the enterprise that has the distribution demand. With the rapid development of e-commerce and the further enhancement of its influence, logistics alliance mode [22] and crowdsourcing logistics mode appear [13, 14]. Logistics alliance mode refers to long-term cooperation in the form of a contract between two or more enterprises or organizations to achieve certain logistics goals [22]. For example, China’s “Cainiao Post” is a typical logistics alliance mode. Crowdsourcing logistics is actually one of the third party logistics models, in which companies (crowdsourcers) outsource delivery orders via the Internet to uncertain individuals (crowdsourcees) [23]. Typical crowdsourcing logistics companies include “Meituan crowdsourcing” in China and "MyWays" in Sweden. Actually, e-commerce has not only brought about changes in logistics patterns, but also expanded the customers base of logistics service providers. The customers of logistics service providers are not only enterprises. Under the influence of e-commerce, consumers have become the larger customers of logistics service providers [24]. Especially with the increasing popularity of online shopping, more and more parcels in residential areas are delivered to consumers' homes by logistics practitioners [25].

Therefore, in the era of e-commerce, logistics is not only limited to provide services for e-commerce. The development of e-commerce not only promotes the birth of new logistics mode, but also expands the customer base of logistics service providers. In view of this, the impact of e-commerce will be fully considered when constructing the indicator system for logistics demand forecast.

Indicators of logistics demand forecasting

Many scholars have built a diversified logistics demand forecasting indicator system for different situations (see summary in Table 1). For example, Nguyen [26] based on the perspective of logistics development in Southeast Asia, takes GDP, growth rate of total regional logistics, attractiveness of logistics regions, distribution of regional logistics, and regional distance as variables in logistics demand prediction. Fan and Wu [27] selected GDP, total cost of social logistics, total output value of the first, second and third industries, freight volume, goods turnover, total import and export volume, express volume, postal outlets, total retail sales of social consumer goods, and so on as indicators of logistics demand forecast. Han et al. [28] took GDP, total cost of social logistics, investment in social fixed assets, import, and export volume as indicators of logistics demand calculation and prediction. Du and Chen [29] used GDP, post and telecommunications services, total retail sales of social consumer goods, and residents' consumption level as indicators for logistics demand forecasting. The research of the above scholars has a good reference value, which this paper uses to construct the indicator system of logistics demand prediction for e-commerce development in Guangdong province.

Table 1 Logistics demand forecasting indicators under different circumstances

However, considering the impact of e-commerce on logistics, it is necessary to reflect on the role of e-commerce in logistics demand forecasting indicators. E-commerce logistics often involves both consumer and enterprise customers. Since sales channels and distribution channels are separated, there are various ways of distribution [24]. In addition, e-commerce demands give rise to a series of logistics activities. For example, locker points of delivery (unattended) and service points of delivery (attended) [25]. Therefore, this paper integrates the relevant indicators of e-commerce into the prediction of Guangdong logistics demand, reflects the scale of logistics demand more realistically, and expands the indicator system of logistics demand.

Methods used in logistics demand forecasting

Domestic and foreign logistics demand prediction is mainly based on quantitative methods including support vector machine, artificial neural network, linear regression, genetic algorithm, GM (1, 1) model, and other single and combined prediction models [30]. For example, Nguyen [26] used the L-OD method to forecast the logistics demand in Southeast Asia. Yan et al. [31] established a logistics demand combination prediction model (grey model and exponential smoothing model) using cargo throughput. Fan and Wu [27] predicted logistics demand using the composite kernel model, and found that this method could improve the prediction accuracy and had good robustness. Han et al. [28] proposed a logistics demand forecasting model based on fuzzy cognitive map. Cao et al. [32] used genetic algorithm and support vector regression machine to predict regional logistics demand. Compared with the above methods, BP neural network has the capacity for non-linear mapping, self-learning and self-adaptation, generalization and fault tolerance [33]. The logistics demand forecasting indicators studied in this paper come from various sources and have non-linear data characteristics among them. Therefore, BP neural network is used to forecast the logistics demand. Meanwhile, to verify the accuracy of the prediction method, the GM (1, 1) is used as the control model.

Through the above analysis, this paper finds that the existing research has provided a wealth of literature on logistics demand forecasting indicators and forecasting models. However, there are relatively few studies on logistics demand forecasting under the influence of e-commerce. Furthermore, in terms of forecasting methods, more combined models are needed to compare the prediction accuracy.

Methodology

The purpose of this study is to predict the logistics demand scale of Guangdong province in the era of e-commerce. On the basis of literature review, the research process is developed as shown in Fig. 1. The research framework includes the following seven steps:

  1. 1.

    The determination of forecast objects: the prediction of the logistics demand scale of Guangdong province in the e-commerce era.

  2. 2.

    Target setting: forecasting logistics demand for the next 3 years.

  3. 3.

    Literature review: preliminary screening of relevant indicators and collection of indicators data.

  4. 4.

    Application of factor analysis (FA) model: correlation analysis, reliability and validity tests on the indicators of primary election.

  5. 5.

    Construction of an indicator system: after passing the correlation analysis, reliability and validity test, the indicator can be used to construct the index system.

  6. 6.

    The prediction methods (GM (1, 1) model and BP neural network model) were used to predict logistics demand. Then the prediction errors of these two methods were compared.

  7. 7.

    Prediction results: the method with the minimum error is selected to predict the logistics demand of Guangdong for the next 3 years.

Fig. 1
figure 1

Research process

To give readers a clear understanding of the research methods used in this paper, the principles and applications of FA model, GM (1, 1) model and BP neural network are briefly introduced below.

Factor analysis model

Factor analysis method was first proposed by psychologist Charles Spearman and was well known by the academic circles in the 1930s [34]. The starting point of factor analysis is to replace most of the information of the original variable with fewer independent factor variables, which can be represented by the following mathematical model [35]:

$$ \begin{gathered} x_{1} = a_{11} F_{1} + a_{12} F_{2} + \cdots + a_{1m} F_{m} , \hfill \\ x_{2} = a_{21} F_{1} + a_{22} F_{2} + \cdots + a_{2m} F_{m} , \hfill \\ \cdots \hfill \\ x_{p} = a_{p1} F_{1} + a_{p2} F_{2} + \cdots + a_{pm} F_{m} , \hfill \\ \end{gathered} $$
(1)

where \(x_{1} ,x_{2} , \cdots ,x_{p}\) represent P original variables, which are standardized variables with mean value of 0 and standard deviation of 1. \(F_{1} ,F_{2} , \cdots ,F_{m}\) represents m factor variables, and m is less than P. Expressed in the matrix form as follows [31]:

$$ X = AF = \alpha \varepsilon , $$
(2)

where F is the common factor, which can be understood as m coordinate axes perpendicular to each other in the high-dimensional space; A is the factor loading matrix, and is the load of the ith original variable on the jth factor variable.

GM (1, 1) model

The GM (1,1) model was proposed by Deng [36]. In this method, the approximate exponential law is generated by summing the original data and then the modeling is carried out. This model has been applied by many scholars in various fields. For example, Liu et al. [37], Shen et al. [38], and Wang et al. [39] have, respectively, applied this prediction model to the research of tourism industry, power systems, logistics supply chain and other industries. The basic principle of GM (1,1) model is as follows [36]:

The GM (1,1) model is based on the grey system theory. The differential fitting method is applied to process the various factors in the system as grey data, to establish the prediction model [39]. Its expression formula is [36]

$$ \hat{y}^{(1)} (k + 1) = \left( {y^{(0)} (1) - \frac{b}{a}} \right)e^{ - ak} + \frac{b}{a}, $$
(3)

where \(y^{(0)} (1)\) is the original data of each factor in the regional logistics demand scale system; \(\hat{y}^{(1)} (k + 1)\) is the one-time accumulation value of the original data for various factors in the regional logistics demand scale system; k is time; a is the development grey number; b is the endogenous control grey number.

BP neural network model

In the mid-1980s, Rumelhart [40] proposed the famous Error Back Propagation (BP), which solved the learning problem of multi-layer neural networks. BP neural network is generally multi-layered. The layers of BP neural network model are input layer, hidden layer and output layer [41]. The hidden layer of BP neural network can be one or more layers. The topological structure of a BP neural network containing two hidden layers is shown in Fig. 2 [41, 42].

Fig. 2
figure 2

The structure of BP neural network

BP neural network has the following characteristics [42]:

  1. 1.

    The network is composed of multiple layers with all connections between layers, and no connections between neurons of the same layer

  2. 2.

    The transfer function of BP network must be differentiable. In BP networks, Sigmoid function or linear function is generally used as the transfer function. Sigmoid function can be divided into log-Sigmoid function, and Tan-Sigmoid function depending on whether or not the output value contains negative values. A simple log-Sigmoid function can be calculated using formula (4):

    $$ f(x) = \frac{1}{{1 + e^{ - x} }}, $$
    (4)

where the range of x includes the whole field of real numbers. The function value is between 0 and 1. In specific applications, parameters can be added to control the position and shape of the curve.

  1. 3.

    Use error BP algorithm for learning.

BP neural network has been applied to multi-disciplinary prediction. For example, Qin et al. [43], Zhang et al. [44], Hu [45], respectively, used this method to simulate and predict behavioral recognition, job-shop scheduling problem, and optimization of intelligent logistics distribution center. In view of the extensiveness and reliability of the model in the field of prediction, this study will use this method to predict the logistics demand scale of Guangdong province.

Empirical results

Construction of indicator system

As the basic, strategic and leading industry of national economic development, logistics industry plays a vital role in the development of regional economy. Based on literature review, this paper follows the following principles when constructing the logistics demand forecasting indicator system:

  1. 1.

    Reflect the development level of logistics industry as much as possible

  2. 2.

    Meet the requirements of measuring logistics demand

  3. 3.

    Comprehensively consider the availability and reliability of indicator data

  4. 4.

    Reflect the characteristics of the era of e-commerce.

In addition, using the indicators of domestic and foreign scholars such as Ishfaq and Sox [15], Hsiao and Hansen [16], Hsu and Wang [17], Fan and Wu [27], Han et al. [28], Du and Chen [29], etc., this study constructed a target layer using the indicators of Guangdong logistics demand prediction. Taking logistics demand environment, commercial trade environment, basic support environment and e-commerce information environment as latent variables, the indicator system takes 13 indicators such as logistics demand scale as observation variables as summarized in Table 2.

Table 2 Logistics demand forecasting indicator system

Data source

The logistics demand scale is used to represent the logistics demand environment. According to the availability and authority of indicators, this paper selects the whole society's cargo transport volume in Statistical Yearbook of Guangdong Province (2000–2019) to calculate the scale of regional logistics demand. Other indicators are also available in the statistical yearbook. The data for this paper were collected in August and September of 2020 when the Guangdong Statistical Yearbook for 2020 had not been published yet. Therefore, relevant data are collected up to 2019.

Since the data of each indicator are different in units, to get more accurate prediction results, this paper has carried out dimensionless processing on the original data of each indicator. All indicator data are converted into [0, 1]. The formula is \(x = [x_{ij} - \min (x_{j} )]/[\max (x_{j} ) - \min (x_{j} )]\) [49, 50]. SPSS was used for this process, and then the data are obtained as shown in Table 3.

Table 3 Normalized data of Guangdong logistics demand indicators from 2000 to 2019

Correlation degree

According to the data sorted out in Table 3, the logistics demand scale is associated with 12 indicators. The results are shown in Table 4. The results show that the order of correlation degree between these 12 indicators, and logistics demand scale is X3 > X1 > X2 > X5 > X10 > X11 > X6 > X12 > X4 > X8 > X7 > X9. Since the 12 correlation degree values are all greater than 0.7 and reach the three-level accuracy [35], the 12 indicators selected in this study are applicable to the logistics demand scale prediction.

Table 4 Correlation degree between logistics demand scale and other indicators

Multi-factor coupling analysis

Logistics demand system is a non-linear complex system. In this paper, a variety of influencing factors, such as commercial trade, basic support and e-commerce information, etc. are considered comprehensively. The relationship and restriction among these factors are verified using the multi-factor coupling method.

Commercial trading environment

The commercial trade environment provides an important demand driving force for the development of logistics including regional GDP, per capita disposable income, total retail sales of consumer goods, total import and export trade and other indicators. Through the coupling analysis of logistics demand scale and commercial trade environment by curve fitting, it can be found that logistics demand scale is significantly positively correlated with commercial trade environment. Meanwhile, all R2 fitted by the curve passed the significance test, indicating a good effect, as shown in Fig. 3.

Fig. 3
figure 3

Coupling analysis of logistics demand scale and commercial trade environment

Foundation supporting environment

The basic supporting environment provides an important guarantee for the development of logistics, including the logistics fixed assets investment, the number of employees in the logistics industry, the financial expenditure of logistics and transportation, traffic mileage, and other indicators. Through the coupling analysis of logistics demand scale and foundation supporting environment by curve fitting, it can be found that logistics demand scale and foundation supporting environment also present a high positive correlation. All R2 fitted by the curve passed the significance test with good results, as shown in Fig. 4.

Fig. 4
figure 4

Coupling analysis of logistics demand scale and basic supporting environment

E-commerce information environment

E-commerce information environment provides an important driving force for the development of logistics including the total revenue of post and telecommunications business, internet access users, number of mobile phone users, investment in information transmission, and internet. Through the coupling analysis of logistics demand scale and e-commerce information environment by curve fitting, it can be found that logistics demand scale is significantly positively correlated with e-commerce information environment. In addition, R2 fitted by the curve passed the significance test, indicating a good effect, as shown in Fig. 5.

Fig. 5
figure 5

Coupling analysis of logistics demand scale and e-commerce information environment

Principal component coupling analysis

The curve fitting method is adopted to conduct a coupling analysis of the logistics demand scale with the commercial and trade environment, the basic supporting environment, and the e-commerce information environment. It can be found that the logistics demand scale has a significant non-linear positive correlation with the indicators of these three aspects. However, the R2 value of posts and telecommunications revenue, information transmission, internet investment and other indicators is not high. Therefore, factor analysis is used to reduce the dimension of these indicators.

The specific operation method is as follows: the data are imported into SPSS for factor analysis. The KMO value was greater than 0.6, and the significance of Bartlett sphericity test was 0.000. This means that these indicators were suitable for factor analysis. The cumulative contribution rate of the three factors extracted by dimension reduction reached 96.23%, indicating that all information could be comprehensively reflected [35].

According to formula (5) and formula (6), the scores of F1, F2 and F3 can be calculated. The formula is as follows [35]:

$$ F_{i} = C_{i} *X, $$
(5)
$$ \begin{gathered} F_{i} = \left[ \begin{gathered} F_{1} \hfill \\ F_{2} \hfill \\ F_{3} \hfill \\ \end{gathered} \right],\;\;C_{i} = \left[ \begin{gathered} C_{1} \hfill \\ C_{2} \hfill \\ C_{3} \hfill \\ \end{gathered} \right] = \left[ {\begin{array}{*{20}c} {{0}{\text{.299 }}} & {{0}{\text{.304 }}} & {{0}{\text{.282 }}} & \cdots & {{0}{\text{.134 }}} \\ {{0}{\text{.266 }}} & {{0}{\text{.264 }}} & {{0}{\text{.295 }}} & \cdots & {{0}{\text{.435 }}} \\ {{0}{\text{.316 }}} & {{0}{\text{.311 }}} & {{0}{\text{.304 }}} & \cdots & {{0}{\text{.196 }}} \\ \end{array} } \right], \hfill \\ X = \left[ \begin{gathered} X_{01} \hfill \\ X_{02} \hfill \\ \cdots \hfill \\ X_{34} \hfill \\ \end{gathered} \right], \hfill \\ \end{gathered} $$
$$ F = \frac{{\lambda_{{1}} }}{{\lambda_{{1}} + \lambda_{{2}} + \lambda_{{3}} }}*F_{1} + \frac{{\lambda_{{2}} }}{{\lambda_{{1}} + \lambda_{{2}} + \lambda_{{3}} }}*F_{2} + \frac{{\lambda_{{3}} }}{{\lambda_{{1}} + \lambda_{{2}} + \lambda_{{3}} }}{*}F_{3} , $$
(6)

where \(\lambda_{{1}}\),\(\lambda_{{2}}\) and \(\lambda_{{3}}\) are the corresponding eigenvalues of each principal component, which are, respectively, 4.040, 3.876 and 3.632. The principal component scores and the comprehensive scores are shown in Table 5.

Table 5 Principal component scores and factor synthesis scores

Through the coupling analysis of logistics demand scale with F1, F2 and F3 by curve fitting again, it can be found that logistics demand scale presents significant non-linear positive correlation with these three principal components. Simultaneously, the R2 fitted by the curve passed the significance test and were all greater than 0.95, indicating a good effect [35], as shown in Fig. 6.

Fig. 6
figure 6

Coupling analysis of logistics demand scale and principal component

The formula of curve fitting is as follows:

$$ y = { 52}.{728}F_{{1}}^{{5}} - { 1427}.{7}F_{{1}}^{{4}} + { 13168}F_{{1}}^{{3}} - { 44869}F_{{1}}^{{2}} + { 66787}F_{{1}} + { 117663}\;\;{(}R^{{2}} = 0.{9972),} $$
$$ y = { 99}.{921}F_{{2}}^{{5}} - { 2311}.{2}F_{{2}}^{{4}} + { 18331}F_{{2}}^{{3}} - { 55377}F_{{2}}^{{2}} + { 8}0{253}F_{{2}} + { 115592 (}R^{{2}} \, = \, 0.{9887),} $$
$$ y = { 72}.{221}F_{{3}}^{{5}} - { 1787}.{6}F_{{3}}^{{4}} + { 146}0{1}F_{{3}}^{{3}} - { 42265}F_{{3}}^{{2}} + { 59853}F_{{3}} + { 11969}0\;\;(R^{{2}} \, = \, 0.{9949)}{\text{.}} $$

Data prediction and results

Predictions of the GM (1, 1) model

The GM (1, 1) model simulation fitting sequence G and simulation error sequence E is calculated using GTMS7.0 software:

G = (119,216.00, 131,621.00, 137,032.00, 143,964.00, 156,094.00, 158,470.00, 145,911.00, 165,426.00, 176,279.00, 179,722.00, 205,034.00, 234,978.00, 266,359.08, 305,833.00, 352,926.00, 376,434.00, 377,645.00, 400,601.00, 424,996.00, 446,050.00);

E = (0.000, − 19,792.391, − 15,982.116, − 12,932.464, − 14,257.735, − 4938.058, 20,281.032, 14,470.060, 18,451.109, 31,065.359, 23,134.674, 12,005.234, 990.140, − 16,438.427, -39,668.240, − 37,345.319, − 10,595.406, − 3284.864, 5082.427, 19,492.263).

The grey error sequence E is treated non-negatively, and then Eviews8.0 was used for ADF test. It can be found that at the significance level of 0.05, the null hypothesis of the existence of unit root is rejected, which indicates that the error sequence E is a stationary sequence [51]. The test results are shown in Table 6.

Table 6 ADF test for prediction error

The GM (1,1) method was used for modeling and predicted the logistics demand scale of Guangdong province from 2000 to 2019, as shown in Table 7.

Table 7 Prediction results of GM (1, 1) model

The grey model development coefficient a is − 0.079 and the grey action coefficient b is 98010.607. Obviously, − a < 0.3 indicates that the prediction accuracy is high, and that the GM(1, 1) model established can be used for medium- and long-term prediction [51].

Therefore, by drawing the actual value of logistics demand scale and the predicted value by GM (1,1) model, it is found that there is a large error between the predicted value and the actual value as shown in Fig. 7.

Fig. 7
figure 7

Comparison between GM (1, 1) model prediction results and actual values

Prediction of the BP neural network model

Based on the three-layer BP neural network for modeling and prediction, this paper determined that the BP network input was 3 (namely the 3 principal component scores calculated above). The logistics demand scale of Guangdong province was taken as the output of the network. According to Kolmogorov theorem, the number of neurons in the hidden layer has the following functional relationship with the number of neurons in the input layer and output layer [42]:

$$ K = \sqrt {m + n} + a, $$
(7)

where n and m are the number of input and output neurons, and a is a constant whose value is between 1 and 10.

According to the empirical formula and repeated training test, when the number of hidden layer neurons is determined as 9, the prediction effect is the best. The BP neural network prediction model is obtained as shown in Fig. 8.

Fig. 8
figure 8

BP neural network prediction model

Matlab was used for BP neural network training. The error trend, training state and regression fitting results are shown in Figs. 9, 10 and 11, respectively.

Fig. 9
figure 9

Results and trends of mean square error

Fig. 10
figure 10

Training status of BP neural network

Fig. 11
figure 11

Regression fitting results

Then the input layer matrix B and the output layer matrix P are constructed:

$$B = \left[ {\begin{array}{*{20}c} {{\text{0}}{\text{.000}}} & {{\text{0}}{\text{.000}}} & {{\text{0}}{\text{.000}}} \\ {{\text{0}}{\text{.2254}}} & {{\text{0}}{\text{.2303}}} & {{\text{0}}{\text{.2639}}} \\ \vdots & \vdots & \vdots \\ {{\text{8}}{\text{.5245}}} & {{\text{7}}{\text{.9414}}} & {{\text{8}}{\text{.2573}}} \\ {{\text{9}}{\text{.6410}}} & {{\text{9}}{\text{.0015}}} & {{\text{9}}{\text{.7187}}} \\ \end{array} } \right],\;\;P = \left[ {\begin{array}{*{20}c} {{\text{119216}}{\text{.00}}} \\ {{\text{131621}}{\text{.00}}} \\ \vdots \\ {{\text{424996}}{\text{.00}}} \\ {{\text{446050}}{\text{.00}}} \\ \end{array} } \right].$$

The trained network was used to predict each group of data. The comparison between the actual value and the predicted value is shown in Fig. 12. It is found that the error between the BP neural network model predicted value and the actual value is small, indicating that the BP neural network prediction model has a high prediction accuracy.

Fig. 12
figure 12

Comparison of actual and BP neural network model predicted values

The actual value, BP neural network model predicted value, and BP neural network prediction error value are shown in Table 8.

Table 8 Prediction results of BP neural network model

Comparison of prediction errors

To further verify the accuracy and validity of GM (1, 1) model and BP neural network model, three error measurement methods (Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE)) were used to evaluate the two prediction methods’ accuracies. The three error measurement methods expressions are, respectively [52,53,54]:

$$ {\text{MAE}} = \frac{1}{t}\sum\limits_{j = 1}^{t} {\left| {\mathop Y\limits^{ \wedge }_{j} - Y_{j} } \right|} , $$
(8)
$$ {\text{RMSE}} = \sqrt {\frac{{1}}{t}\sum\limits_{j = 1}^{t} {\left( {\mathop {Y_{j} }\limits^{ \wedge } - Y_{j} } \right)^{2} } } , $$
(9)
$$ {\text{MAPE}} = \frac{1}{t}\sum\limits_{j = 1}^{t} {\left| {\frac{{\mathop {Y_{j} }\limits^{ \wedge } - Y_{j} }}{{Y_{j} }}} \right|} \times 100\% , $$
(10)

where \({\hat{Y}}_{j}\) is the predicted value of logistics demand scale, \({ {Y}}_{j}\) is the actual value of logistics demand scale, \(t\) is the total number of training samples and inspection samples. According to statistics, the prediction accuracy evaluation results of GM (1, 1) model and BP neural network model are shown in Table 9.

Table 9 Comparison of prediction errors between GM (1.1) model and BP neural network model

It can be seen from Table 9 that the MAE, RMSE and MAPE of BP neural network prediction model are the smallest among the two prediction methods. Among them, the prediction error of BP neural network is generally less than 0.05% and the average absolute percentage error is 0.008%. This indicates that the BP neural network model is more accurate than GM (1, 1) model, and it has better application and popularization value.

Prediction of logistics demand scale

The established BP neural network prediction model was applied, and combined the F1, F2 and F3 prediction scores under the time series (see Fig. 13 for details) to predict the logistics demand scale of Guangdong province in 2020–2022. The predicted results are 476,128,600 tons, 483,276,440 tons and 4,845,766,100 tons, respectively, as shown in Table 10. The predicted data show that from 2000 to 2022, the scale of logistics demand in Guangdong province will increase from 119,216 to 4,845,766,100 tons with an average annual growth rate of 6.74%.

Fig. 13
figure 13

Predicted scores of F1, F2 and F3 under the time series

Table 10 Logistics demand scale of Guangdong Province from 2020 to 2022

The logistics industry is a vital basic industry supporting economic development. To understand the significance of each variable is conducive to identifying the key factors driving the development of logistics demand. The rotated principal component matrix was sorted out, and data less than 0.58 were eliminated to obtain the importance of principal component variables, as shown in Table 11.

Table 11 Importance degree of principal component variables

It can be seen from Table 11 that the importance of the three principal component variables to the regional logistics demand scale is ranked as F1 > F2 > F3. In F1, traffic mileage and total import and export trade became the most important influencing factors. In F2, financial expenditure on logistics and transportation, investment in information transmission and internet are the crucial variables. In F3, GDP and logistics fixed assets investment are the most significant influencing factors.

Discussion

On the basis of previous studies, considering the impact of e-commerce on logistics demand, this paper forecasts the logistics demand of Guangdong from 2020 to 2022. Before the logistics demand forecast, the establishment of a related indicator system is very important. So far, many researchers (e.g., Nguyen [26]; Fan and Wu [27]; Du and Chen [29]) have paid attention to indicators such as regional GDP, per capita disposable income, total retail sales of social consumer goods, total import and export trade, fixed asset investment in the logistics industry, number of employees in the logistics industry, financial expenditure in logistics and transportation, and mileage of vehicles. However, indicators related to e-commerce are less considered. As the first creative point, in this paper, e-commerce information environment factors such as total revenue of post and telecommunications services, internet access users, number of mobile phone users, information transmission and internet investment are incorporated into the Guangdong logistics demand indicator system. This reflects the historical background and realistic situation of logistics demand. For the second creative point, the comparison between GM (1, 1) model and BP neural network model provides a more accurate choice for Guangdong logistics demand prediction (comparison with Fan and Wu [27], Han et al. [28], Wang and Yan [30], Cao et al. [32]). These findings enrich the research foundation of related fields. Based on the above findings, this paper proposes the theoretical and practical implications below.

Theoretical implications

From the perspective of e-commerce, the logistics demand prediction indicator system of Guangdong was constructed, and GM (1, 1) model and BP neural network model were used to make the prediction. This study has three theoretical contributions.

First, this paper constructs the Guangdong logistics demand forecasting indicator system by considering the development background of e-commerce. Logistics and e-commerce no longer have the relationship between service and being served [19, 20]. Also, in the previous literature on regional logistics demand forecasting, the relevant indicators of e-commerce are rarely considered (e.g., Nguyen [26]; Fan and Wu [27]; Han et al. [28]). Based on the historical background of e-commerce, this paper considers the related indicators of e-commerce driving logistics demand, and provides a new perspective for the establishment of regional logistics demand indicator systems.

Second, this paper enriches the literature of regional logistics demand forecasting. With the rapid development of mobile e-commerce, online shopping has become the living habit of the vast majority of residents. The resulting logistics order accumulation, regional logistics, and distribution capacity is insufficient and other problems frequently occur [13, 14]. By forecasting the logistics demand in Guangdong, this paper provides ideas and reference for solving the above problems, and embodies the value of regional logistics demand forecasting.

Third, this paper expands the application of GM (1, 1) model and BP neural network model in regional logistics demand forecasting. The results show that GM (1, 1) model and BP neural network model have a good application prospect in regional logistics demand prediction, and BP neural network model has a relatively small prediction error and a relatively better prediction effect. Meanwhile, the findings of this study are consistent with those of Du and Chen [29], and Wang et al. [55], which is helpful to promote the prediction and application of BP neural network model in regional logistics demand, last kilometer logistics demand, crowdsourcing logistics demand and other aspects.

Practical implications

From a practical point of view, the insights provided by our study can provide recommendations for logistics enterprises and relevant e-commerce platforms. This study has three practical implications.

First, relevant e-commerce platforms should pay attention to the prediction of regional logistics demand, especially in the peak period of logistics delivery such as a shopping carnival. During peak delivery periods and shopping carnivals, there are often too many orders to make the logistics work [13, 14]. Forecasting the possible logistics demand in advance can alleviate the problem, and provide suggestions for the implementation of relevant measures.

Second, e-commerce platforms and logistics enterprises should choose scientific forecasting methods when they make regional logistics demand predictions. The BP neural network model has better predictive effect than GM (1, 1) model. This finding is consistent with the results of Du and Chen [29] and Wang et al. [55]. Scientific and accurate prediction results can provide a reasonable reference for e-commerce platforms, and logistics enterprises to make decisions with the greatest effectiveness. Therefore, it is necessary for relevant enterprises to further study the function and role of BP neural network model in logistics demand prediction.

Finally, logistics companies should encourage experimentation with new distribution patterns (e.g., crowdsourcing logistics, and logistics alliances). In addition to the prediction of regional logistics demand, the implementation of new distribution modes by logistics enterprises is also helpful to alleviate problems such as insufficient capacity, and delay of logistics delivery during peak periods [13, 22]. Meanwhile, the new distribution modes can better integrate with e-commerce to realize the mutual assistance cycle between logistics and e-commerce.

Conclusions

The development of e-commerce is both an opportunity and a challenge for logistics. On the one hand, the e-commerce drives the logistics to change unceasingly, and has promoted the regional logistics demand scale. On the other hand, the rapid growth of e-commerce orders has also put great pressure on logistics distribution. To alleviate and solve the issue of mismatch between regional logistics demand and e-commerce growth rate, a logistics demand forecasting indicator system from the perspective of e-commerce was built. Then this indicator system was applied, combined with GM (1, 1) model and BP neural network model, to predict regional logistics demand. The results show that this indicator system can better reflect the quantification standard of the current regional logistics demand scale. Also, this study found that BP neural network model has a good effect in prediction of regional logistics demand. The findings of this study are helpful for people to re-understand the relationship between e-commerce and logistics, and to pay attention to the key factors driving regional logistics demand.

However, there are some limitations in this study. First, the indicator system of regional logistics demand prediction is based on the perspective of e-commerce, which may be different from other viewpoints. Logistics demand is not only affected by e-commerce. There are other factors such as the distribution mode of logistics enterprises, the structure of consumer groups and other factors that cannot be ignored. Therefore, it is necessary to expand the regional logistics prediction indicator system to other diverse perspectives in future studies. Second, this study only compared the prediction results of GM (1, 1) model and BP neural network model. Obviously, Kalman filter prediction model, combination prediction model and regression prediction method are also good prediction model choices. In the future, comparative studies between these different forecasting methods and the BP neural network model should be added. Finally, the problem of how to combine the new distribution mode with the logistics demand forecasting model to achieve the goal of ameliorating the distribution efficiency of logistics enterprises still needs to be deeply discussed and solved. In other words, regional logistics demand prediction plays an early warning and guiding role for e-commerce platforms and logistics enterprises. Therefore, the research work related to it is worth continuous optimization.