1 Introduction

To satisfy the increasing travel demand caused by fast urbanization, Chinese cities have resorted to new metro systems and large scale urban redevelopment in mega-cities. Metro stations generate large pedestrian flows in the streets surrounding the stations (Yang, Chen, Le, & Zhang, 2016; Zacharias, 2001). However, roads in metropolitan regions were designed mainly to serve vehicles rather than pedestrians. Therefore, the metro stations often led to a conflict between the original built environment and the large number of pedestrians generated by the new metro stations. The inconvenience for pedestrians is caused by larger block sizes, narrow sidewalks, lower land use mix (Clifton, Smith, & Rodriguez, 2007), fewer crossing facilities at intersections, larger width of vehicle roads associated with unsafe crossings (Li, Fisher, Brownson, & Bosworth, 2005; Lin, 2015). It has triggered urban renewal processes in these areas aimed at improving the streetscape.

To improve the street-level built environment around metro stations for pedestrians who travel from/to the metro station, it is necessary to understand the relationship between street-level built environment characteristics and pedestrian route choice. Unfortunately, only a few studies examined the quantitative relationship between the street-scale built environment around metro stations and pedestrian route choice decisions (e.g., Guo & Loo, 2013; Liu, Yang, Timmermans, & de Vries, 2020; Lue & Miller, 2019; Nuworsoo, Cooper, Cushing, & Jud, 2013). They found a significant impact of built environment characteristics on pedestrian route choice. On average, pedestrians tend to prefer wide sidewalks, low FAR, more retail frontage, fence appearance, and signalized intersections. Moreover, there is evidence that density, number of walkway crossings, and road width influence shopping pedestrian route choice behavior (e.g., Borgers & Timmermans, 2014; Borgers & Timmermans, 2015; Tilahun & Li, 2015; Zhu & Timmermans, 2011). As the relationship between the built environment and pedestrian route choice varies by context (e.g., Moran, Rodriguez, & Corburn, 2018; Rodriguez et al., 2015; Tribby, Miller, Brown, Werner, & Smith, 2017), it is necessary to allow for heterogenous, context-dependent preferences of pedestrians.

Different pedestrian route choice models have been applied. Several researchers have used macroscopic models to investigate the impact of different built environment design scenarios on the distribution of pedestrian flows. For example, Schelhorn, O’Sullivan, Haklay, and Thurstain-Goodwin (1999) applied an agent-based simulation model to predict pedestrian movements in urban centers considering road configuration and the spatial location of the attractions. Dijkstra, Jessurun, Timmermans, and de Vries (2011) simulated the movements of shopping pedestrians, considering a variety of shops, number of stores, road network characteristics, and pedestrianized streets. Rose, Ligtenberg, and van der Sperk (2014) included retail, drinking and dining facilities, and cultural attractions. Examples of microscopic pedestrian flow models include Daamen and Hoogendoorn (2003a), Daamen and Hoogendoorn (2003b), Ukkusuri, Miranda-Moreno, Ramadurai, and Isa-Tavarez (2012), and Chen (2017). The latter built a simulation model of pedestrian movement in a subway station.

This paper concentrates on the simulation of pedestrian flows from/to a metro station. It aims to provide feedback to urban planners about the consequences of different renewal scenarios on the distribution of pedestrian flows. Simulated flows can be used to elaborate on particular scenarios or combined different scenarios in light of a set of different and possibly conflicting objectives. It contributes to the knowledge of pedestrian-friendly street-scale (micro) built environment design surrounding metro stations.

The paper is organized as follows. First, we will discuss the data collected for this study. It is followed by an outline of the simulation approach. Next, we discuss the results of the simulations, which include the results of uncertainty analysis. The paper is concluded with a discussion of the main results.

2 Study area and data

This study is conducted in the area surrounding Yingkoudao (YKD) metro station, located in the city center of Tianjin, China, with a size of 2000*2000 m squared. YKD station has a large passenger flow and is the transit station of metro lines No.1 and No.3. The area is the biggest commercial and business center in Tianjin. It includes various land uses (e.g., shopping malls, business, residential, primary and middle schools, restaurants, entertainment, and open spaces). The case study area includes not only streets with wide sidewalks, wide roads, very high-rise buildings, and well-maintained crossing facilities, but also streets with narrow sidewalks, low buildings, and few crossing facilities.

The core of the simulation is a pedestrian route choice model. It requires data on observed pedestrian flows and street-level characteristics of the built environment. Data on pedestrian volume at the metro station were collected in March 2016 by five trained Bachelor students from Tianjin University. They collected data on the number of pedestrians who entered respectively exited the Yingkoudao metro station during three different time periods: morning (8:00–9:00), noon (12:00–13:00), and afternoon (17:00–18:00). Phone cameras were used to capture pedestrian volumes at every exit/entrance of the station. Observations were made on two randomly selected weekdays and two randomly selected weekend days (excluding national holidays). Observations were made for 2 min in each time period that includes arrivals/departures of subway trains.

The number of pedestrians who entered/exited the metro station in each video was counted by the investigators. To avoid significant error, each video was processed twice by two different investigators. If the difference in the number of counted pedestrians was bigger than 2, the video was processed again; otherwise, the average was used as the final count. Based on the counted number of pedestrians in the 2-min videos, the total number of pedestrians for each exit/entrance for each time period was extrapolated. The peak count was observed between 17:00 and 18:00. The lowest number of passengers was observed for the 8:00–9:00 time period. Estimated counts suggest that 5040 pedestrians/hour enter/exit the station in the morning (8:00–9:00), 6810 at noon (12:00–13:00), and 11,220 in the afternoon (17:00–18:00).

The second data collection, which was a paper-based questionnaire was conducted in September 2018. The data collection targeted pedestrians who used the YKD metro station to enter/leave the study area. The questionnaire asked details of the egress or assess trips such as origin and destination (OD), start time, end time, the exact route involved, and trip purpose. The trip purposes include work/school, leaving/returning home, shopping, visiting others, recreation (restaurants, entertainment, parks, public facilities, etc.), services (banks, government, post office, etc.), and other. The routes were drawn on a map. In addition to trip information, respondents were invited to report their age, gender, whether they work close to the station and for how many years, and whether they live close to the station and for how many living years. The data involved information about 515 trips of 302 randomly selected respondents.

The third data collection, conducted in November 2018, concerned the street-level built environment and road network. The street-level built environment data were mainly collected from topographic maps from the Tianjin Government, the 2018 Baidu satellite map, and the 2018 Baidu street view map. The data included building area, land use, length of building lot coverage (BLC), block area, sidewalk length, sidewalk width, number of lamps, road width, and the number of vehicle lanes on each road. Fence presence (physical separation between pedestrians and vehicles/cyclists) and traffic lights at intersections were collected from the 2018 Baidu street view map. Street greenery areas were collected from the 2018 Baidu satellite map. The selection of these variables is based on the literature and the main considered street-level elements in the practice of urban planning and design.

3 Simulation approach

To explore the impact of the street-scale built environment on the distribution of pedestrian flows, a Netlogo simulation was developed and applied. The simulation concentrated on pedestrian access/egress trips to the metro station. The route choice of each pedestrian was simulated using a latent class route choice model.

The simulation involves five steps: 1) setting up the road network and associated built environment attributes in the study area; 2) generating the simulated pedestrians and OD pairs for each pedestrian; 3) defining scenarios to explore their effect on pedestrian flows; 4) predicting the route choice of each pedestrian based on the latent class route choice model; 5) conducting uncertainty analysis. Figure 1 shows these steps.

Fig. 1
figure 1

Simulation framework

3.1 Setting up the road network and built environment attributes

The pedestrian road network was drawn manually in an ArcGIS shapefile using a sidewalk and vehicle road layer. Two-sided sidewalks on roads with more than two vehicle lanes were drawn as two separate links, while two-sided sidewalks on roads with two or less than two vehicle lanes were drawn as a single link. For these roads, we assume that pedestrians are influenced by the built environment on both sides, as pedestrians can easily cross the roads.

The links were spatially attached to four shapefile layers (building layer, block layer, sidewalk layer, and vehicle road layer). For the two-sided sidewalks presented by a single link, the properties of the building and block layer on both sides of the link were summed as properties of the link. The properties number of lamps respectively street greenery area in the sidewalk layer were also summed as single properties of the link. The properties sidewalk length, sidewalk width, fence presence, and traffic lights at intersections in the sidewalk layer and road width in the vehicle road layer on both sides of the sidewalks were averaged. For roads with more than two vehicle lanes, represented by two links, all properties in the building, block, and sidewalk layers on each side of the road were spatially attached to the link on the corresponding side.

The built environment attributes of each link are shown in Table 1. The link-level data were converted to route-level data for pedestrian road choice modeling. The road-level built environment attributes are listed in Table 2. Land use mix is defined to include the vertical and horizontal dimensions and was calculated as defined in Eq. (1) (Cerin, Macfarlane, Ko, & Chan, 2007; Lau, Giridharan, & Ganesan, 2005; Song, Merlin, & Rodriguez, 2013).

$$ \mathrm{Land}\;\mathrm{use}\;\mathrm{mix}=-1\ast \left[{\sum}_{h=1}^H\left(\frac{P_h}{p}\right)\ast \ln \left(\frac{P_h}{p}\right)\right]/\ln (H)\kern0.5em h=1,2,3\dots, 10 $$
(1)

where h is a land use category, Ph is the gross floorage of land use h in m2, P is the gross floorage of all studied land uses in m2.

Table 1 Built environment attributes at the link level
Table 2 Built environment attributes at the route level

The Path Size Correction factor (PSC) is a specific factor in the Path Size Correction (PSC) model that represents the influence of overlapping routes on route choice probabilities (Prato, 2009). In this study, it is calculated as defined in Eq. (2).

$$ {PSC}_l=-\sum \limits_{k\epsilon l}\ \left[\ \frac{L_k}{L_l}\bullet \ln \sum \limits_{a\in C}{\delta}_{k a}\ \right] $$
(2)

where link k belongs to route l, Lk is the length of link k, Ll is the length of route l, δka equals 1 if route a includes link k and is equal to zero otherwise, C is the choice set of the alternative routes.

3.2 Generating simulated pedestrians

The number of simulated pedestrians is based on the video data collected in 2016. Thus, the route choice of 5040 pedestrians was simulated in the morning (8:00–9:00), 6810 pedestrians were simulated for the noon period (12:00–13:00), while the simulation involved 11,220 pedestrians for the afternoon period (17:00–18:00). Each simulated pedestrian needs to be characterized in terms of the selected socio-demographic characteristics, while in addition trip purpose, and OD pair need to be imputed from the travel survey data collected in 2018. Age, gender, working years close to the station, living years close to the station, OD pair, and trip purpose of the simulated pedestrians were generated using Monte Carlo draws from the observed profiles of the sampled pedestrians. This approach guarantees that any correlations between socio-demographics, OD pairs, and travel purposes are retained in the simulations.

Table 3 shows the distribution of age, gender, if they work nearby the station and how many working years, and if they live nearby the station and how many living years. Seven trip purposes were distinguished in the data collection, but to simplify the simulation only three were used: work/school, going home, and going shopping (which includes the other categories). The revealed data showed that different trip purposes are distributed quite differently at different times of the day. There were more work/school trips in the morning than in the afternoon, and more shopping trips at noon compared to the morning. The distributions are presented in Table 4.

Table 3 Distribution of pedestrian characteristics
Table 4 Distribution of trip purpose by time of day

3.3 Defining the scenarios

In 2019, the Tianjin government planned a series of projects to improve the built environment of the study area, especially for the roads highlighted in Fig. 2 (Tianjin Bureau of Commerce, 2019). Based on these projects, we developed 4 new design scenarios by changing the street-level built environment of 6 links. Three links are named Chifeng Road-1 (C-1), Chifeng Road-2 (C-2), and Chifeng Road-3 (C-3). Three links named Harbin Road-1 (H-1), Harbin Road-2 (H-2), and Harbin Road-3 (H-3) are on Harbin Road. Chifeng Road which is directly connected with the metro station is the main road. Harbin Road does not directly connect with the metro station. The 4 scenarios are shown in Table 5. A baseline scenario without any changes to the built environment serves as a benchmark.

Fig. 2
figure 2

Scenarios in the study area

Table 5 Scenarios

The detailed new designs are shown in Table 6. The 4 new design scenarios add fences between sidewalks and vehicle lanes, crossing facilities, and increase sidewalk width in line with the main goals of the Tianjin government’s projects. Besides these changes, links C-1, C-2, and C-3 in the Chifeng Road with land use mix change scenario additionally increase land use mix by about 10% with more restaurants and entertainment shops reflecting Tianjin government’s intentions. As shown in Table 6, all 6 links will have a fence between sidewalks and vehicle lanes. All 6 links will have traffic lights at both endpoints. The “baseline” column in Table 6 is the current situation of each link in the study area. The “New design” column in Table 6 represents the new value of each variable on the link.

Table 6 Built environment changes on each link

3.4 Simulating route choice

The core of the simulation is the route choice model. In this study, a latent class route choice model that accounts for overlapping routes was used. The PSC latent class logit model is a combination of the path size correction logit model and latent class discrete choice model and considers the effect of overlapping routes and pedestrian preference heterogeneity. The probability Pns1 of pedestrian n belonging to class 1 is shown in Eq. (3), while the probability Pns2 for the same pedestrian belonging to class 2 is shown in Eq. (4).

$$ {P}_{ns1}=\frac{\exp \left({V}_{n\left|s1\right.}\right)}{\exp \left({V}_{n\left|s1\right.}\right)+\exp \left({V}_{n\left|s2\right.}\right)}\kern0.5em n=1,\dots, N $$
(3)
$$ {P}_{ns2}=\frac{\exp \left({V}_{n\left| sq\right.}\right)}{\exp \left({V}_{n\left|s1\right.}\right)+\exp \left({V}_{n\left|s2\right.}\right)}\kern0.5em n=1,\dots, N $$
(4)
$$ {V}_{n\mid s1}={\beta}_{gender\mid s1}{X}_{gender}+{\beta}_{age\mid s1}{X}_{age}+{\beta}_{work\mid s1}{X}_{work}+{\beta}_{live\mid s1}{X}_{live}+{\beta}_{purpose1\mid s1}{X}_{purpose1}+{\beta}_{purpose2\mid s1}{X}_{purpose2}+{constant}_{s1} $$
(5)
$$ {V}_{n\mid s2}={\beta}_{gender\mid s2}{X}_{gender}+{\beta}_{age\mid s2}{X}_{age}+{\beta}_{work\mid s2}{X}_{work}+{\beta}_{live\mid s2}{X}_{live}+{\beta}_{purpose1\mid s2}{X}_{purpose1}+{\beta}_{purpose2\mid s2}{X}_{purpose2}+{constant}_{s2} $$
(6)

where Vn ∣ s1 is the utility of pedestrian n belonging to class 1, Vn ∣ s2 is the utility of pedestrian n belonging to class 2. βgender|s1, βage|s1, βwork|s1, and βlive|s1 are the parameters of gender, age, working in the study area or not, and living in the study area or not for class 1. βpurpose1|s1 is the parameter for trip purpose going to work/school for class 1, βpurpose2|s1 is the parameter for trip purpose going home for class 1. For class 2, βgender|s2, βage|s2, βwork|s2, and βlive|s2 represents the effects of gender, age, working in the study area or not, and living in the study area or not. βpurpose1|s2 is the parameter for trip purpose going to work/school, while βpurpose2|s2 is the parameter for trip purpose going home. Because the categorical variables were effect-coded, the effect of shopping and other can be derived by summing the parameters for the other two trip purposes.

The probability of pedestrian n of class 1 choosing route k from choice set D is Pnk|s1, while Pnk|s2 is the probability of agent n of class 2 choosing route k from choice set D. These probabilities can be expressed as

$$ Pnk\left|s1=\frac{\exp \left({V}_{nk\left|s1\right.}+{\beta}_{PSC}\cdot {PSC}_k\right)}{\sum_{l\in D}\exp \left({V}_{nl\left|s1\right.}+{\beta}_{PSC}\cdot {PSC}_l\right)}\right.\kern0.5em n=1,\dots, N $$
(7)
$$ Pnk\left|s2=\frac{\exp \left({V}_{nk\left|s1\right.}+{\beta}_{PSC}\cdot {PSC}_k\right)}{\sum_{l\in D}\exp \left({V}_{nl\left|s2\right.}+{\beta}_{PSC}\cdot {PSC}_l\right)}\right.\kern0.5em n=1,\dots, N $$
(8)
$$ {V}_{nk\left|s1\right.}={\beta}_{1\mid s1}{X}_1+{\beta}_{2\mid s1}{X}_2+{\beta}_{3\mid s1}{X}_3+{\beta}_{4\mid s1}{X}_4+{\beta}_{5\mid s1}{X}_5+{\beta}_{6\mid s1}{X}_6+{\beta}_{7\mid s1}{X}_7+{\beta}_{8\mid s1}{X}_8+{\beta}_{9\mid s1}{X}_9+{\beta}_{10\mid s1}{X}_{10} $$
(9)
$$ {V}_{nk\left|s2\right.}={\beta}_{1\mid s2}{X}_1+{\beta}_{2\mid s2}{X}_2+{\beta}_{3\mid s2}{X}_3+{\beta}_{4\mid s2}{X}_4+{\beta}_{5\mid s2}{X}_5+{\beta}_{6\mid s2}{X}_6+{\beta}_{7\mid s2}{X}_7+{\beta}_{8\mid s2}{X}_8+{\beta}_{9\mid s2}{X}_9+{\beta}_{10\mid s2}{X}_{10} $$
(10)

where Vnk|s1 is the total utility of route k in the latent class 1, Vnl|s1 is the total utility of route l in latent class 1, l is a route in choice set D, βPSC is the estimated coefficient for the PSC factor, PSCk is the PSC value for route k, PSCl is the PSC value of route l in set D, x1x10 are the 10 built environment variables listed in Table 7. β1|s1β10|s1 are the corresponding parameters of the ten built environment attributes in class 1, β1|s2β10|s2 are the corresponding parameters of the ten built environment attributes in class 2.

Table 7 Parameters of the route choice model

The actual simulation involved the following steps: for each simulated pedestrian and each trip.

  1. (1)

    The membership probability was calculated as a function of gender, age, work location, residential location, and trip purpose.

  2. (2)

    A Monte Carlo drawn was made from these membership probabilities to simulate the latent class of that pedestrian.

  3. (3)

    Given the simulated latent class, the route choice probabilities were calculated using the route choice model for each route in the pedestrian’s choice set.

  4. (4)

    A Monte Carlo drawn was made to simulate the chosen route.

3.5 Uncertainty analysis

Because the route choice model is a probabilistic model, every set of Monte Carlo draws may lead to a different simulated route. Thus, the results of the simulation are prone to simulation error. In order to quantify the resulting uncertainty, in this study, an uncertainty analysis was conducted at the link level by time of day. Coefficients of variation (CV) of the pedestrian flow on each link and time of day, based on n runs were calculated as shown in Eq. (11). The CV was calculated after respectively 50, 100, 150, 200, 250, and 300 runs.

$$ {CV}_{kt\mid n}={\sigma}_{kt\mid n}/{\mu}_{kt\mid n}\kern0.5em n= 50, 100, 150, 200, 250, 300 $$
(11)

where CVkt|n is the coefficient of variation based on n runs of t link k at time of day t, σkt|n is the standard deviation of the pedestrian flows based on n runs of link k at time of day t, μkt|n is mean value pedestrian flow based on n runs of link k at time of day t,

In addition, based on Miller’s (1992), 95% confidence levels were calculated as

$$ {CI}_{kt\mid n}=\frac{s}{\overline{x}}\pm {z}_{\alpha /2}\sqrt{{\left(n-1\right)}^{-1}{\left(\frac{s}{\overline{x}}\right)}^2\left(0.5+{\left(\frac{s}{\overline{x}}\right)}^2\right)}\kern0.5em n=50,100,150,200,250,300 $$
(12)

where CInt ∣ k is the confidence interval on n runs of link k at time of day t, \( \frac{s}{\overline{x}} \) is equal to σkt|n / μkt|n when sample x is an asymptotic normal distribution with mean μ and variance σkt|n2, s is the sample variance, zα/2 is the 95% percentile of the standard normal distribution, n is the number of runs, and α is 0.05 for the common 95% probability level.

4 Scenario analysis

4.1 Simulation results

4.1.1 Results of the baseline scenario

First, the pedestrian flows for the Baseline scenario were simulated. The simulation was run separately for the three different time periods (morning, noon, and afternoon). For each time period, the simulation was repeated 200 times. The simulated pedestrian flows are partly shown in Fig. 3 (top part) and the absolute number of pedestrians on the 6 links are shown in Table 8. The width of the red links represents the magnitude of the simulated pedestrian flows. The thicker the red link, the more pedestrians were simulated to pass that link. The distribution of pedestrian flows across the different time periods shows the higher accumulation of pedestrians near the metro station and less pedestrians further away from the metro station. The number of pedestrian flows is calculated based on the average number of 200 runs. Table 8 only shows the pedestrian flows on the 6 links. Links H-2 and H-3 have more pedestrians compared to links C-1, C-2, C-3, and H-1. The results for the other links in Fig. 3 (bottom part) in the whole study area are not shown. The destinations/origins of each trip that have the station either as origin or destination are shown in Fig. 3 as green circles. Bigger circles reflect a higher number of pedestrians who have this link as their destination/origin. As expected, the biggest circle is observed for the metro station.

Fig. 3
figure 3

Pedestrian flow on each link by time period and the distribution of destinations/origins

Table 8 Absolute number of pedestrians on the 6 links by time of day

4.1.2 Results of the 4 scenarios

The simulated pedestrian flows for each time period (morning, noon, and afternoon) under the 4 hypothetical scenarios are shown in respectively. Figures 4, 5, 6, and 7. The horizontal axis is the link. The vertical axis is the absolute number of simulated pedestrians. The numbers on the bars represent the difference between the simulated pedestrian flow on each link under the considered scenario compared and the baseline scenario.

Fig. 4
figure 4

Results of the Harbin Road scenario

Fig. 5
figure 5

Results of the Chifeng Road without land use mix change Scenario

Fig. 6
figure 6

Results of the Chifeng Road with land use mix change Scenario

Fig. 7
figure 7

Results of the Harbin Road & Chifeng Road without land use mix change scenario

Figure 4 indicates that for the Harbin Road Scenario in the morning (green color) the differences in pedestrian flows with the baseline are very small (from − 0.2 to + 4) for all 6 links. It indicates that at this time of day the pedestrian flows on the 6 links are hardly affected by the Harbin Road scenario. The same applies to the noon period. In the afternoon (grey color), all reported links attract more pedestrians (from + 21.3 to + 276) compared to the baseline scenario. The increase in the number of pedestrians is much higher for links H-1, H-2, and H-3 than for C-1, C-2, and C-3. Thus, the simulated impact of this scenario is higher for Harbin Road than for Chifeng Road.

In the Chifeng Road without land use mix change Scenario (Fig. 5), the number of pedestrians is simulated to increase for links C-1, C-2, and C-3, but to decrease for links H-1, H-2, and H-3 compared to the baseline scenario. The increase is slightly higher on C-3 than C-1 and C-2. Thus, the improvements on the built environment of Chifeng Road negatively affect the pedestrian flows on the Harbin Road especially for H-1 and H-2 links. The increase in the number of pedestrians on Chifeng Road exceeds the decrease in the number of pedestrians on Harbin Road. It suggests that pedestrians from other links would be attracted to Chifeng Road. The impact has a similar pattern for all times of day, but changes are bigger in the afternoon than in the morning and noon.

In the Chifeng Road with land use mix change Scenario (Fig. 6), the changes in pedestrian flows are very similar to the simulated changes under the Chifeng Road without land use mix change Scenario (Fig. 5). Upon reflection, this negligible difference can be explained by the very small parameter estimates for land use mix, while the latent class membership probabilities are the same for the two Chifeng scenarios.

In the Harbin Road & Chifeng Road without land use mix change scenario (Fig. 7), the number of pedestrians is simulated to increase for links C-1, C-2, C-3 in the three time periods. The increase for links H-1, H-2, H-3 only happens in the afternoon period. The number of pedestrians decreases on links H-1, H-2, in the morning and H-1, H-2, H-3 at noon compared to the baseline scenario. The increase is much higher on links C-1, C-2, and C-3 than on links H-1, H-2, and H-3 in each time period. The results indicate that the improvement of the built environment of both Chifeng Road and Harbin Road is simulated to positively affect the pedestrian flows on Chifeng Road all day and the pedestrian flows on Harbin Road mainly in the afternoon. The improvement has negative effects on pedestrian volumes of Harbin Road in the morning and noon. Based on the changing number of pedestrians, the simulated impact is higher for Chifeng Road than for Harbin Road.

4.2 Uncertainty analysis

To quantify the uncertainty in the simulated pedestrian flows due to simulation error, coefficients of variation (CV) were calculated for each link and time of day after respectively 50, 100, 150, 200, 250, and 300 runs. Table 9 and Fig. 8 presents the results for the morning time period for each link under the Baseline scenario. Results for other time periods and the other scenarios are in the same order of magnitude. As seen in Table 9, the CVs for each link do not change much beyond 150 runs and become stable after 200 runs. The CVs on all links are smaller than 9.3%. Link C-3 has higher CVs than the other links. At the maximum number of 300 runs, the confidence intervals are around 3%. Thus, overall, uncertainty due to simulation error is small.

Table 9 Coefficients of variation and confidence intervals of each link in the morning period by number of runs for the baseline scenario
Fig. 8
figure 8

Uncertainty analysis of the baseline scenario in the morning period

4.3 Interpretation

The simulated results suggest that pedestrian flows on the main roads near the metro station are affected by the simulated changes in the built environment. The Harbin Road & Chifeng Road without land use mix change scenario suggests that the simulated impact is higher for the main road which is directly connected with the metro station. The simulated Harbin Road scenario has a very limited impact on the distribution of pedestrian flows in both the morning and noon time period. This can be explained by differences in trip purpose and socio-demographics of the pedestrians for the different time periods. In the morning, the largest share of pedestrians concern working/school pedestrians. They have a higher probability to belong to class 1, for which the increased utility of the presence of a fence is canceled out by the decreased utility of increasing sidewalk width and adding more traffic lights. At noon, there were more pedestrians for the purpose of shopping and other, who have a higher probability to belong to class 2. This class more likely shops at the large shopping malls on the west side of Harbin Road. The shortest path to reach the west side of Harbin road is from the main road next to the metro station or via Chifeng Road to the north part of the shopping street.

In the afternoon, the Harbin Road scenario has a positive influence on pedestrian flows on all 3 links of Harbin Road and the 3 links of Chifeng Road. Thus, the improvement of the streetscape Harbin Road has synergistic effects on Chifeng Road. The increased attraction of Harbin Road seems to lead to more pedestrians traveling through the links of Chifeng Road. Especially, there are more pedestrians going home (54%) and “shopping and other” pedestrians (40%) in the afternoon. These pedestrians have a higher probability of belonging to class 2.

The Chifeng Road with land use mix change scenario and Chifeng Road without land use mix change scenario were predicted to have a positive influence on pedestrian flows of Chifeng Road in the three time periods although the difference varies across time periods. Moreover, the changes in the Chifeng Road built environment decreased the pedestrian flow in Harbin Road which indicates a competitive relationship between these two roads. The most important reason might be the spatial location of the two roads. Links H-1, H-2, H-3 of Harbin Road are not directly linked to the metro station; pedestrians need to pass through the links of Chifeng Road. The routes passing the links of Harbin Road, especially H-1 and H-2, have to pass at least C-2 or C-3. However, the routes from/to the northern part of Chifeng Road do not necessarily include Harbin Road.

The small differences between the two scenarios can be interpreted directly from the relative small parameter for land use mix and the small differences in land use mix. It should be realized, however, that in this simulation we only capture the influence of varied land use on route choice and that the change in land use mix is very small. Past studies only found that walking trip frequency and destinations have a positive relationship with the higher land use mix (Day, 2016; Saelens & Handy, 2008) but have less evidence on land use mix and pedestrian route choice.

5 Discussion

In completing this paper, a number of limitations should be mentioned. First, the simulation only concerned changing the route choice of pedestrian from/to the metro station. It did not consider pedestrians that enter or leave the study area from other locations. Of course, the route choice behavior of these other pedestrians may also be affected by the changes in the built environment. Thus, the simulated number of pedestrians for each link and their shares of the pedestrian volume is not representative of the area. This is also the reason why the simulations in the baseline scenario cannot be validated against pedestrian counts. If the interest would be in the total pedestrian system in the study area, the data should be collected at every entry/exit point in the study area.

Second, we only considered access and egress trips. In reality, however, a pedestrian may be involved in multi-stop (and multi-purpose) trips. This will influence the distribution of pedestrian flows in the study area. The inclusion of multi-stop, multi-purpose trips means that appropriate data on such trips need to be selected and that the current simple route choice model should be replaced with a much more complex model of route choice behavior under multi-stop, multi-purpose trips.

Third, we did not consider the possible effect of changing land use and street characteristics on destination choice, not at the city level, not within the study area. It means that the results only depict the changing route choice behavior of a selected group of pedestrians conditional on invariant destination choice.

Fourth, it should be realized that simulation error is only one source of error. An error may also be introduced by model specification. Assessing its impact requires the comparison of different model specifications, based on different assumptions underlying pedestrian route choice behavior. Another source of error is the demarcation of the choice sets. Future research should examine the effect of different specifications of choice set on simulated pedestrian flows.

Finally, the estimated results of the route choice model emphasize the relative importance of the presence of a fence. Consequently, some distinctive simulation results are the direct consequence of this parameter. However, when we look into the data, it became clear that most links directly surrounding the metro station where we observed most pedestrians had a fence. Hence, the distribution of data points in the space spanned by the presence of fence and the number of observed pedestrians on a route not far from uniformly distributed. Consequently, the estimated parameter for the presence of fence is likely biased. This is a clear limitation of revealed preference models.