1 Introduction

In recent years, there has been a substantial surge in online purchasing. Because of the advancement of information and communication technology, as well as on broad internet access, online purchases have grown increasingly popular. As a result of the proliferation of IT platforms such as smartphones, tablets, laptops, and e-commerce has become very popular over the past decade (O'Sullivan, 2017). According to the European Commission, 60% of German consumers and 44% of French consumers purchased both real and virtual goods (Consoli, 2016). Online sales in the United Kingdom are predicted to increase from 14.5% in 2015 to 19.3% in 2019 (Cárdenas et al., 2017; Clement, 2019). About 45% of European shoppers shop online. Similarly, 516 million online shopping users in China and home delivery are the leading e-commerce parcel distribution service (Bureau, 2017). As a result, China's online shopping market transactions have increased by 30.1% in the first half of 2018, accounting for 22.7% of total consumer goods retail sales in 2018, up 10.2% from 2017 (Lei, 2018).

The rise of e-commerce has increased overall parcel deliveries (Bowes, 2016; Boyer et al., 2009; Iwan et al., 2016) and, as a result, home deliveries (Gonzalez-Feliu et al., 2012; Kämäräinen et al., 2001; McKinnon & Tallam, 2003; Van Duin et al., 2016; Visser et al., 2014). However, e-consumers cannot always stay at home when delivering packages, which leads to numerous delivery failures (Dell’Amico & Hadjidimitriou, 2012; Kedia et al., 2017; Song et al., 2009). Failure to deliver is an issue for both the buyer and the carrier. Moreover, re-delivering products cost extra money for logistic companies, leading to increased traffic jams (Taniguchi & Kakimoto, 2004), emissions of pollutants (Song et al., 2013), and environment-related factors (J. Edwards et al., 2010a, 2010b; J. B. Edwards et al., 2010a, 2010b; F. N. McLeod & Cherrett, 2009). As a result, alternate delivery methods must be considered. Therefore, collection and delivery points (CDPs) represent an interesting alternative for customers and businesses as a more convenient and flexible delivery option, as well as a solution to the "last mile" delivery problem (Iwan et al., 2016; F. McLeod et al., 2006; Morganti et al., 2014; Punakivi et al., 2001; J.-J. Xu & Hong, 2013; J. Xu et al., 2014; Yang, 2014; Kumar et al., 2021). This solution may also help to reduce the number of deliveries that fail, reduce freight, delivery volume (by aggregating several deliveries at the same location), the number of trips, congestion, and pollution emissions (Allen et al., 2007; F. McLeod et al., 2006), as well as reducing labor and fuel consumption by stakeholders. The efficiency of the parcel delivery process is improved through CDPs. As a result, this alternative solution is beneficial to the long-term viability of urban freight transport and urban mobility (Ahvenniemi & Huovila, 2021; Kedia et al., 2017; Lemke et al., 2016; Van Duin et al., 2016).

As an alternative to home delivery, a CDPs system is characterized in this article as a network of facilities (China Post Stations and Cainiao Stations) where carriers deliver orders and e-customers pick up online purchases. CDPs' locations must enhance accessibility for potential users to increase efficiency and maintain network reliability. Moreover, CDPs have been situated along with the road network and easily accessible locations to optimize urban freight transportation (due to commodity consolidation in optimal trips) and urban mobility sustainability (as a result of e-consumers picking up goods in non-motorized modes). Therefore, the CDPs' network's success is typically determined by its accessibility and socio-demographic characteristics.

Although previous research has identified parcel locker location preferences, there has not been extensive research assessing existing CDPs locations in Nanjing, China, focusing on accessibility and utilizing clustering methods with population, GDP, and road network. While cluster analysis has been used in transportation research to investigate freight (Cidell, 2010), transit (Diaz-Varela, et al, 2011)., crashes (du Preez, et al, 2019), environmental justice (Haustein et al., 2016), and mobility (Schweitzer, 2006). It has not yet been used to investigate CDPs distributions.

In order to address two research questions, this study examines the distribution of 1224 CDPs (including 424 China Post Stations and 800 Cainiao Stations) in Nanjing. In terms of accessible factors like population and service coverage area, how are CDPs distributed? and (b) How does the present distribution of CDPs affect equity? Different datasets are investigated using geographic methods and cluster analysis to address these problems.

2 Data collection and methods

2.1 Study area

Nanjing is the capital of Jiangsu Province and a large city along the Yangtze River in eastern China (32° 2′ 59.9964'' N and 118° 46′ 0.0120'' E). As illustrated in Fig. 1, Nanjing comprises 11 administrative districts, including six central districts (Xuanwu, Gulou, Qinhuai, Jianye, Qixia, and Yuhuatai) and five suburbs (Pukou, Luhe, Gaochun, Jiangning, and Lishui). These 11 districts are divided into 108 subdistricts (known as Jiedao) with a total area of 6,600 square kilometers and a population of 8.28 million people. Along with Beijing, Luoyang, and Xi'an, Nanjing is one of China's four ancient capitals. Nanjing, like most Chinese cities, underwent fast economic expansion and urbanization following Xiaoping's economic reforms in 1979. Nanjing contains traits of both modern and traditional Chinese cities. Nanjing's development is like that of other of China's eastern cities in terms of technological advancement.

Fig. 1
figure 1

Study Area

2.2 Data sources

Point of interest (POI) data, road network, administrative boundaries, population, and GDP of the study area were primarily used as secondary data in the research. The POI data were gathered via the web platform “Metro Data Tech (MDT)” (https://www.metrodata.cn/) using the “POI Query.” A name, address, longitude, and latitude were all included in each POI data point. After cleaning, converting, and sorting the data, the reliable data for the Cainiao Stations and China Post Stations were 800 and 424, respectively. A spatial database of collection and delivery points (CDPs) was built in ArcMap. The road network data was obtained from Open Street Map (http://www.openstreetmap.org/). Nanjing boundaries data were collected from the China National Geographic Information Center, while population and GDP data were obtained from the National Bureau of Statistics of China.

  • China Post is a state-owned corporation that operates as China's official postal service, providing postal services throughout the country. The corporation formally shares the office with the State Post Office, a branch-level state entity that theoretically regulates the postal industry throughout the country. The company and its branches at all levels provide common postal services throughout the country and provide special services such as special communications, government newspapers, general draft letters, blind reading, and free delivery of revolutionary martyrs’relics. Over the years, the postal group has changed from another traditional postal enterprise to a modern postal enterprise. As a result of many years of continuous development, the postal group has evolved from a traditional postal company into a modern corporate enterprise uniting postal services, Logistics services, and e-business which are delivered as domestic and international delivery services.

  • Cainiao Network is an enterprise launched on May 28, 2013, by Alibaba Group and based on Alibaba Group's logistics ambition of fulfilling consumer orders worldwide within 72 hours, and 24 hours in China. The goal is to use big data and other technologies to improve efficiency while delivering products in China and international markets. For this purpose, the Cainiao Network and its logistics partners have created and operated a global sales network. In addition to processing packages from e-commerce sites and merchants, Cainiao also covers daily logistics, including fresh groceries in its vertical industry.

3 Methods

3.1 Average nearest neighbor analysis

Average nearest neighbor (ANN) analysis is the primary method to check the uniformity of the geographical allocation of CDPs. The ANN analysis calculated the average distance between each CDP and the nearest actual center (\({\overline{D} }_{O}\)) and the average distance between each CDP in the assumed random distribution (\({\overline{D} }_{E}\)). The ratio (\({\overline{D} }_{O}\)) to (\({\overline{D} }_{E}\)) is an ANN ratio and indicates whether the geographical distribution is decentralized or clustered. The real median distance was less than the average distance assuming an arbitrary distribution if the ANN ratio was less than one. Thus, the geographical distribution was measured as aggregation when the ANN ratio was less than 1. Otherwise, it was discrete.

The Z-score indicates the importance of the geographical distribution type in the ANN analysis. The null hypothesis was complete spatial randomness (CSR), and the Z-score indicated whether the null hypothesis was rejected at various confidence levels.

$$ANN=\frac{{\overline{D} }_{O}}{{\overline{D} }_{E}}$$
(1)
$${z}=\frac{{\overline{D} }_{O}-{\overline{D} }_{E}}{SE}$$
(2)
$$SE=\frac{0.26136}{\sqrt{{}^{{n}^{2}}\!\left/ \!{}_{A}\right.}}$$
(3)

where \({\overline{D} }_{O}\) is the observed average distance among each CDP and its closest neighbor, \({\overline{D} }_{E}\) is the estimated average distance for an arbitrary pattern, A is the minimum closed area around all features in the specified area of the region, and Z-score is standard deviations.

3.2 Kernel density estimation

Using a kernel function, kernel density estimation creates a nicely tapered surface by calculating a magnitude-per-unit area using point characteristics. Density surfaces highlight the locations of point features, making it easier to identify hotspots where events took place (Wu, et al, 2018). It is used to investigate the characteristics of spatial agglomeration in criminal, commercial, and traffic activities (Chen, et al, 2016; Jia et al., 2018; Liu, et al, 2017). In this study, kernel density is used to identify the CDPs distribution's spatial agglomeration properties. Contrarily, the point distribution is denser, dispersion is lower and the higher kernel density grade (Wenhao et al., 2015). The equation is as follows.

$$f\left({\varvec{s}}\right)=\sum\nolimits_{i}^{n}\frac{k}{\uppi {r}^{2}}\left(\frac{{d}_{is}}{r}\right)$$
(4)

where \(f\left({\varvec{s}}\right)\) represents the density at position s; r is the search radius of the core density estimate, which equals 500 m; \({d}_{is}\) is the distance between positions I and s, and k is the weight of \({d}_{is}\).

3.3 Spatial correlation index

The spatial characteristics of population density distributions (global and local correlation) were studied through spatial correlation index (SCI) analysis (Anselin & Getis, 1992; Ord & Getis, 1995, 2001). The relationship between population distribution density at the global and local population attributes was calculated by using Getis-Ord General G. The Getis-Ord General G formula is as follows:

$$\mathrm{G}\left(\mathrm{d}\right)=\frac{\sum \sum {\mathrm{w}}_{\mathrm{ij}}\left(\mathrm{d}\right){\mathrm{x}}_{\mathrm{i}}{\mathrm{x}}_{\mathrm{j}}}{\sum \sum {\mathrm{x}}_{\mathrm{i}}{\mathrm{x}}_{\mathrm{j}}}$$
(5)

where d is the distance among the roads in the inner city, Wij(d) is the study area’s spatial weight, and xi and xj are the population density indices for roads i and j, respectively.

G(d) 's estimated value equals E(G) if space is not agglomerated.

$$\mathrm{E}\left(\mathrm{G}\right)=\frac{\mathrm{w}}{\mathrm{n}\left(\mathrm{n}-1\right)}\;\mathrm{w}=\mathrm{\Sigma \Sigma }{\mathrm{w}}_{\mathrm{ij}}\left(\mathrm{d}\right)$$
(6)

For a regular allocation, the numerical value of the test for G(d) is Z(G).

$$\mathrm{Z}\left(\mathrm{G}\right)=\frac{\mathrm{G}-\mathrm{E}\left(\mathrm{G}\right)}{\sqrt{\mathrm{Var}\left(\mathrm{G}\right)}}\;\mathrm{E}\left(\mathrm{G}\right)=\frac{\mathrm{W}}{\mathrm{n}(\mathrm{n}-1)}$$
(7)

The research area displays clusters with high values when G(d) was greater than E(G), and the value of Z(G) was substantial. If G(d) approaches E(G), the variables in the study area were randomly distributed.

The Getis-Ord Gi* formula is:

$$\mathrm{Z}\left({\mathrm{G}}_{\mathrm{i}}^{*}\right)=\frac{{\mathrm{w}}_{\mathrm{ij}\left(\mathrm{d}\right){\mathrm{x}}_{\mathrm{j}}}}{{\sum }_{\mathrm{j}}^{\mathrm{n}}{\mathrm{x}}_{\mathrm{j}}}$$
(8)

\({\mathrm{G}}_{\mathrm{i}}^{*}\)(d) standardizing:

$$\mathrm{Z}({\mathrm{G}}_{\mathrm{i}}^{*})=\frac{{\mathrm{G}}_{\mathrm{i}}^{*}-\mathrm{E}(\mathrm{G})}{\sqrt{\mathrm{Var}({\mathrm{G}}_{\mathrm{i}}^{*}})}$$
(9)

Wij is the spatial weight, while E(\({\mathrm{G}}_{\mathrm{i}}^{*}\)) and Var(\({\mathrm{G}}_{\mathrm{i}}^{*}\)) were the mean and variance of \({\mathrm{G}}_{\mathrm{i}}^{*}\), respectively.

If Z(\({\mathrm{G}}_{\mathrm{i}}^{*}\)) was positive and significant, it indicated that the values near position “i” were comparatively high (above average) and were classified as high spatial aggregates. If Z(\({\mathrm{G}}_{\mathrm{i}}^{*}\)) was negative and significant, it showed low spatial clustering with relatively low values around position i (below average).

3.4 Accessibility analysis

In order to assess the CDPs' accessibility, we estimated the network service area. A region derived from a certain location that takes into account all roadways that can be reached within the given impedance is known as a network service area (time or distance). The population of this region can be estimated once the service areas have been identified. To create service regions, we used the Network Analyst plugin in ArcMap 10.3.

We estimate the population taking into account three different area ranges connected to the greatest distance e-consumers are willing to travel to access the items utilizing various means of transportation.

4 Results and discussion

4.1 Spatial pattern of CDPs

Although the geographic locations of all CDPs in Nanjing were shown in Fig. 2, the statistical analysis of the geographic distribution structure of CDPs was a critical first step in identifying spatial differences. ANN analysis was used to quantify whether CDPs were spatially evenly distributed or not.

Fig. 2
figure 2

The geographical distribution of CDPs and population density in Nanjing

Assuming the 800 CDPs distribution in Nanjing was entirely random, the average distance (also known as the expected average distance) between Cainiao Stations, China Post, and their nearest station was 680.92 and 647. 85 m, respectively. Instead, the actual mean distance i.e. observed mean distance in the distribution pattern of Cainiao Stations was 363.30 m, while that of China Post was 391.74 m. Therefore, the coefficient of the ANN of Cainiao Station was 0.54, and that of China Post Station was 0.60, which were both less than 1. It indicated that the Cainiao Stations and China Post Stations distributions in Nanjing were more concentrated than the assumed arbitrary distribution. The distribution of Cainiao Stations and China Post Stations was clustered at a 99 percent significance level, as shown in Fig. 3 as well as according to the Z-value and p-value in Table 1.

Fig. 3
figure 3

Average nearest neighborhood analysis for CDPs

Table 1 The results of ANN analysis

Thus, the geographical distribution of the Cainiao and China Post Stations was concentrated throughout Nanjing, indicating the presence of spatial differences in the geographical distribution of CDPs in the city. Cainiao and China Post Stations were more prevalent in Gulou, Xuanwu, and Qinhuai, while more infrequent in Gaochun, Lishui, Luhe, and Pukou. Due to the high population density, increased demand for services, and right economic conditions in these two regions, these two logistics companies usually first allocated their CDPs in the central areas.

4.2 Spatial correlation of CDPs

The spatial structure of CDPs in the Nanjing was analyzed, considering areas with high population density and gross domestic product (GDP) i.e. statistically significant clustering (Fig. 4). The results illustrated that the high concentration of the population in the city urban center (CUC) in Nanjing largely coincided with GDP density. The GDP area was 17.3 km2 and the highly populated area was 42.4 km2. Besides this, the combined area of the two regions was also 11.2 km2. As a result, 65% of the GDP density cluster area overlapped with a high population. Consequently, this overlaid region collected the most potential needs (residential and workplaces) and might be more suitable for CDP locations.

Fig. 4
figure 4

Getis-Ord Gi* cluster analysis (high population and GDP density areas in Nanjing)

The superimposed cluster analysis aimed at integrating the spatial characteristics of the local population and high GDP density area  with the spatial concentration of China Post and Cainiao Stations. The Getis-Ord Gi* cluster analysis results showed that China Post Stations in Nanjing were linearly distributed along highways, which led to a substantial spatial overlap with areas of high population concentration and GDP density (Fig. 5a). Similarly, Cainiao Stations were situated along Nanjing's main corridors, showing a dense linear distribution and correlation with the population. These results demonstrated that, from the investor's point of view, the location of the Cainiao Stations had a more appropriate spatial structure due to their correlation with higher population density and GDP areas (Fig. 5b). Figure 5 shows the density of China Post and Cainiao Stations up to 500 m of Nanjing’s main corridors.

Fig. 5
figure 5

Kernel density of CDPs and Getis-Ord Gi* cluster analysis (high population and GDP density areas with main roads 500-m buffer)

We quantified the number of CDPs within (i) the densely populated areas (ii) the GDP density areas, and (iii) the main corridors along with a 500-m buffer to identify the number of CDPs closer to the roads and possible CDPs’ accessibility. The results presented in Table 2 showed that China Post Stations (51.1%) and Cainiao Stations (63.2%) in Nanjing were concentrated in populated areas. The areas where Cainiao Stations were highly concentrated intersect with accumulated regions of GDP density. Finally, all CDPs were located close to the main corridors in Nanjing. In general, all CDPs overlap with population and GDP, with territories showing spatial structures near the main corridor.

Table 2 Quantitative results of CDPs within the cluster areas of high population, high GDP, and main corridor buffer

Finally, Table 2 illustrates the spatial patterns for each category. Cainiao Stations were concentrated in specific areas of Nanjing, showing closer aggregation to the size of population and GDP density, indicating that location decisions could be based on population density. While China Post Stations in China demonstrated the spatial structure of clusters, indicating that spatial coverage could be considered on their location.

Table 3 showed that Cainiao Stations were concentrated in densely populated areas with high GDP. Compared to the China Post Stations, there were many Cainiao Stations (800), which provided quite many services to the population of Nanjing. Cainiao Stations were mainly located around city streets and towns, indicating that consumers were more likely to collect packages in these places. Therefore, private companies should be encouraged to install CDP in these locations by developing policies that support consumer preferences and choices. On the other hand, the China Post Stations were primarily located in the main corridors of Nanjing. This means that the China Post Station drivers could more easily receive the goods in these places.

Table 3 Results of Getis-Ord Gi* cluster analysis

Additionally, China Post Stations were located at a moderate level in high population density and high GDP density areas but were highly distributed along main roads and covered a high number of services. On the other hand, Cainiao Stations were highly distributed in high population areas as well as in high GDP areas and moderately focused along the main roads and covered a medium to a low number of services.

4.3 Accessibility to CDPs

Generally, most consumers collect their packages by walking and cycling. According to Nanjing Residents Travel Survey, the average speed of a pedestrian is 1.4 m s−1, and 10 m s−1 of vehicle displacement (Feng & Yang, 2015) as shown in Table 4.

Table 4 Travel time and accessibility coverage distance on foot, cycling, and car

According to the study area, the accessibility modes from the above table were partially modified to determine the appropriate coverage distance for CDPs’ users for each accessibility mode. The frequency graph shown in Fig. 6 revealed various detailed distributions of three modes of accessibility: the calculation of walking availability (1,000 m), cycling (2,000 m), and driving (5,000 m). In addition, the sequence of numbers in Fig. 7 showed the availability estimate for different modes of accessibility on the Nanjing map. The range of the availability indicator was divided into three group colors i.e. the dark blue represented the high accessibility while light blue represented less availability of CDP.

Fig. 6
figure 6

Frequency graph of various accessibility modes according to service coverage area

Fig. 7
figure 7

Various accessibility modes in term of service coverage area

The distribution of accessibility was dependent on the accessibility mode. Figure 6 showed a slight disparity in walking accessibility, the access to China Posts Stations and Cainiao Stations was 13.8 and 9.2%, respectively. It showed that most CDPs were not accessible on foot, and this mode allowed consumers to access only a few CDPs.

The cycling mode differed slightly from walking. Figure 6 showed that the range of bicycle availability for Cainiao Station and China Post Station was 25.3 and 28.9%, respectively. In other words, it was easier for consumers to ride a bike than to walk to access a CDP. In the central region, cycling availability was more balanced, and almost all consumers had moderate CDP access via cycling. Therefore, the cycling mode offered more consumers easy access to CDPs, as shown in less dark blue in Fig. 7.

Figures 6 and 7 demonstrate the accessibility of different traveling modes. According to Fig. 6, the range of access for driving was much higher than those of walking and cycling, so inequality decreased if consumers drove to CDPs. The distribution of driving access was entirely distinct from the other two modes of accessibility. CDPs in the city center had relatively high accessibility, while CDPs away from the city center had comparatively low accessibility. Therefore, if consumers preferred CDPs’ access, the only factor that affected CDPs’ access was the distance.

Based on the above analyses, accessibility varied to different degrees, regardless of how consumers access it. Consumers visiting CDP on foot or by bike had the highest accessibility within cities and centers closer to CDPs, but those who came by car should be closer to the center of the city's suburbs. It showed that the network could be optimized to reach more e-consumers in fewer locations. However, to stimulate sustainable flow, many agencies might encourage non-electric travel to get CDP cargos.

5 Conclusions

The results suggested that the geographical distribution of the Cainiao and China Post Stations was concentrated throughout Nanjing. Both the study areas had prevalent features in geographical distribution such as in Gulou, Xuanwu, and Qinhuai areas with high population density, great demand for services, and good economic conditions, but rare in Gaochun, Lishui, Luhe, and Pukou areas on the outskirts of the city, which indicated the presence of spatial differences in the geographical distribution of CDP in both the cities. In terms of spatial correlation, Cainiao Stations were located densely in the populated areas with high GDP. Compared with the Cainiao Station, the number of China Posts was smaller but primarily located in the main corridors of Nanjing because of the coverage and road access factors. The accessibility of CDP in Nanjing via walking, cycling, and driving showed differences in spatial distribution. Among accessibility modes, walking was the most suitable way for accessing the parcels at CDP rather than cycling and driving due to the long distance from the community population.

An accessibility study for different transport modes helps to improve the efficiency of CDP-related travel. In particular, the study of spatial correlation analysis assessed the spatial differences concerning the most appropriate modes of transportation. It made an important contribution to promoting the sustainable development of the logistics industry by transforming modes of transport. Exploring CDP accessibility in logistics is essential to optimize logistics, improve customer satisfaction, and promote the logistics industry. The findings could offer researchers, city planners, and other practitioners to understand the relationship between spatial index, accessibility modes, and service coverage of state ownership and private logistic enterprises for the better e-commerce industry and, consequently, help to develop effective strategies regarding the plan CDP locations.