Introduction

In the past decade, bike-sharing systems (BSSs) have garnered increasing attention as environmentally sustainable transportation modes and have been implemented wordlwide. Currently, more than 2,000 cities have implemented BSSs, which provide a flexible mode of transportation for short-distance trips and solve the “last mile” problem related to urban traffic (Meddin and DeMaio 2021). Moreover, the introduction of BSS has effectively reduced air and noise pollution, relieved traffic congestion, and improved public health (Hatzopoulou et al. 2013; Wang and Zhou 2017; Otero et al. 2018; Zhang and Mi 2018; Nieuwenhuijsen and Rojas-Rueda 2020).

The rapid introduction of BSS in many cities has accelerated numerous studies on BSS, particularly focusing on providing knowledge gains regarding planning, public policy assessment, operational management, user attitudes or perceptions, modal shifts, and social transformations related to BSSs (Bordagaray et al. 2012; Ahillen et al. 2016; Bao et al. 2017; Raux et al. 2017; Kutela et al. 2021). Although BSSs have been successfully deployed in many cities, the uneven distribution of bikes at different stations remains a critical problem. The extreme imbalance in the number of bikes available at each station sometimes results in users being unable to rent or return bikes at their desired times because of a lack of available bikes or empty docks at stations, which negatively influences their riding experience (Singla et al. 2015; Razzaque and Clarke 2015). Therefore, many BSS operators use trucks or trailers to redistribute bikes across systems, and have attempted to increase service availability and minimize redistribution costs (Liu et al. 2016; Médard de Chardon et al. 2016; Schuijbroek et al. 2017).

The effectiveness of redistribution strategies largely depends on the accurate short-term prediction of future demand. Therefore, studies have been actively conducted to estimate the demand for bikes at each station during a future period. Consequently, the factors that influence bike-sharing demand and usage patterns of shared bikes should be understood. Hence, numerous studies have investigated the relationships between bike-sharing usage and various explanatory factors, such as weather conditions, calendar events, built environments, geographical factors, and bike infrastructure (Etienne et al. 2014; El-Assi et al. 2017; Guo et al. 2017; Kim 2018; Yang et al. 2020; Wu et al. 2021).

However, as revealed by previous studies, shared bikes are used for various purposes, ranging from transportation to leisure (Kaplan et al. 2015; Bao et al. 2017). When public bikes are used for transportation, trips are made for various reasons, such as commuting, shopping, and eating (Bao et al. 2017). Travel patterns can appear in various ways, depending on the main trip purpose of public bikes for individual users (Bao et al. 2017; Xing et al. 2020). However, the purpose of an individual trip can be inferred using only trip data collected by the BSS, which usually includes rental and return stations, and rental and return dates and times.

In general, the regularity and variability of time and space differ depending on the trip purpose, which is a key point when inferring the purpose of individual trips using trip data with other data sources, such as point-of-interest (POI) (Lee and Hickman 2014; Gong et al. 2016; Alsger et al. 2018). For instance, commuting trips show higher regularity and repeatability than those for other purposes, and the majority of commuters depart from home during the morning peak hours (7:00 AM–9:00 AM), and return during the evening peak hours (5:00 PM–7:00 PM) (Long and Thill 2015; Ma et al. 2017). The difference in trip purpose may also influence the choice of passes. Most BSSs provide various options for passes depending on the usage period, such as a single-ride or one day, week, month, or year. Users who want to use a bike repeatedly gain a price advantage by purchasing a pass that is valid for a long period (e.g., one year), whereas users who want to rent a bike for cycling on a weekend buy a one-day pass Kong et al. (2020). Therefore, the usage patterns of shared bikes may show different characteristics depending on the pass type; however, studies investigating the impact of these factors on different usage patterns by pass type are rare. Although Zhou (2015) and Kong et al. (2020) compared the temporal usage patterns between customers (short-term users) and subscribers (long-term users), they did not investigate the factors that contribute to the demand for bike-sharing depending on the pass type. To manage BSSs efficiently and effectively, it is necessary to quantitatively analyze the relationship between the demand for bike-share and several factors, including the built environment, weather conditions, and bicycle and public transit infrastructure, separately for different pass types. If the impact on the factor differs depending on the pass type, the accuracy of the demand prediction decreases.

In this study, bike-sharing data collected in Seoul, South Korea, were used to compare the usage patterns of BSS by pass type and to investigate the influence of several explanatory factors on the demand for BSS depending on the pass type. Our specific research questions were as follows: (1) How do spatiotemporal usage patterns of bike-sharing differ depending on the type of pass? (2) How does the influence of explanatory factors on the demand for bike-sharing depend on the type of pass? To solve these problems, we applied several data-mining techniques, including clustering, regression, and classification, along with basic statistical analysis.

The remainder of this paper is organized as follows. “Related work” section provides a comprehensive review of studies related to bike-sharing. “Study area and data description” section describes the study area (Seoul) and data used in this study. “Methodology” section introduces the analysis methods used to observe the difference in usage patterns by pass type, and the results of the analyses are presented in “Experimental results” section. “Conclusion” section concludes the paper and provides a view of future research directions.

Related work

Factors affecting bike-sharing demand

In recent years, many BSSs have been equipped with systems that allow users to rent bikes using a web or mobile application. Thus, trip records are automatically collected from the system. Based on trip data obtained from the BSS, several studies have investigated usage patterns and examined the influence of explanatory factors, such as weather conditions, calendar events, socio-demographic characteristics, bike infrastructure, land use, and the built environment.

Faghih-Imani et al. (2014) determined the effects of weather, temporal characteristics, bicycle infrastructure, land use, and urban form attributes on bike rentals and returns at station level, using linear mixed models in Montreal, Canada. This study showed that the type of day was correlated with bike rentals and returns, and an increase in commercial enterprises and universities in the vicinity of a station significantly affected bike rentals and returns. In addition, it was found that the number of nearby bike stations increased departures and arrivals of shared bikes, while the capacity of nearby bike stations decreased departures and arrivals. Similar to Faghih-Imani et al. (2014), Faghih-Imani et al. (2017) developed a mixed linear model to estimate the influence of bike infrastructure, and socio-demographic and land-use characteristics on arrivals and departures in Barcelona and Seville, Spain. This study revealed that the number of nearby bike stations and the proportions of restaurants, recreation, and business POIs over all categories of POIs were significant to the arrival and departure rates, regardless of the city. The increase in the proportions of restaurants, recreation, and business POIs positively affected the increase in the arrival and departure rates for both cities. However, as the number of nearby bike stations increased, the arrival and departure rates also increased in Barcelona, whereas the opposite trend was observed in Seville. Other studies have suggested that the impact of the built environment and bike infrastructure factors is not consistent across cities. Rixey (2013) showed that as the job or population densities increased in the vicinity of a bike station, the use of shared bikes increased for three bike-sharing systems in Minneapolis, Denver, and Washington, D.C., but Maurer (2012) found that population density had no significant relationship between the use of bike-sharing in Sacramento, California. Moreover, Bao et al. (2018) categorized the stations into five categories based on the POI distributions in the neighborhood of each station. Thus, this study obtained a different geographically weighted regression model for each category to establish the relationship between bike-sharing ridership and various types of POI. Based on these results, the prediction performance of the separate ridership models was generally better than that of the joint model, which implies that the POI factors influence the demand for public bikes differently depending on the land-use mix.

In terms of weather conditions, many studies used temperature, precipitation, humidity, and wind speed as explanatory factors for the demand of shared bikes (Eren and Uz 2020). Several studies have revealed that the relationship between temperature and bike-sharing demand is nonlinear (Heinen et al. 2010; El-Assi et al. 2017; Kim 2018). A temperature between 0 and 30 °C positively affected the increase in bike-sharing demand, and bike-sharing demand was usually maximized in the temperature range of 20–30 °C (Eren and Uz 2020; El-Assi et al. 2017). However, the effect of higher temperatures differed between the studies. Kim (2018) found that a temperature of above 30 °C negatively affected bike-sharing demand in Daejeon, South Korea. This study also investigated the effects of weather and calendar events on the demand for BSS, and observed that the temperature-humidity index and non-working days influenced the demand for public bikes, depending on the type of day (working vs. non-working days). Moreover, Shen et al. (2018) also found that temperatures above 31 °C decreased bike usage in Singapore, even though the temperature in Singapore was approximately 30 °C throughout the year. Unlike Kim (2018) and Shen et al. (2018), El-Assi et al. (2017) revealed that the use of shared bikes had increased at temperatures above 30 °C in Toronto, Canada. Riding a bike is an outdoor activity; thus, heavy precipitation and strong wind speeds showed a strong negative correlation with the use of bike-sharing in most studies (Faghih-Imani et al. 2014; Kim 2018; Shen et al. 2018; Wang et al. 2018).

Usage patterns depending on users

In contrast, many studies have investigated the effects of socio-demographic characteristics of users on travel demand and patterns, which is necessary for understanding the user profile of bike-sharing systems and accurate prediction of bike-sharing demand (Eren and Uz 2020). Several studies provided strong evidence of a correlation between the demand for shared bikes and personal characteristics such as age, gender, education, and income. In terms of gender, the number of male users was generally more than that of female users, which was also observed for general bicyclists (Zhao et al. 2015; Aldred et al. 2016; Abasahl et al. 2018). Moreover, the usage patterns of shared bikes differ between genders. Pellicer-Chenoll et al. (2021) analyzed the usage patterns of shared bikes in Valencia during weekdays using network, spatial, and statistical analyses and compared the patterns between men and women. This study showed that there was no significant difference between men and women in terms of travel distance and time, but the use of shared bikes by women decreased in the peripheral areas of the city, which is consistent with other studies suggesting that women ride bikes less than men owing to safety concerns (Garrard et al. 2008; Heesch et al. 2012). Moreover, Wang and Akar (2019) ffound that female users were more sensitive to weather and traffic conditions than male users, although several factors affecting the bike-sharing demand of men and women (positive or negative) were similar.

Most studies agreed that the majority of users of bike-sharing were under the age of 40 (Shaheen et al. 2012; Fishman et al. 2013; Böcker et al. 2020). Fishman et al. (2015) revealed that there were 3.3 times more people in the 18–34 age group than other age groups among the users of bike-sharing services in Melbourne and Brisbane. Similarly, Wang et al. (2018) investigated the effects of various factors, such as the built environment, weather, bicycle infrastructure, and public transit on the use of bike-sharing between different age groups for New York City’s bike-sharing users. In this study, the 28–37 age group used bike-sharing the most. Additionally, these results revealed that the influence of some factors varied across age groups. Population density was positively correlated with the demand for bike-sharing for the 16–27 age group, but insignificant for the 28–51 age group. In addition, people with high incomes had a higher preference for shared bikes than those with low incomes (Fishman et al. 2015; Li et al. 2018); income tends to affect price sensitivity. Kaviti et al. (2019) found that low-income groups were more sensitive to price than the other groups.

Moreover, some studies have tried to find differences in the travel patterns of bikes between different types of bike-sharing users, and between bike-sharing users and traditional bicyclists. Buck et al. (2013) compared the demographics of short-term and long-term bike-share users and regular cyclists based on Washington, D.C. and found that the D.C. area cyclists were older than the bike-share users, and the mean household income of general cyclists was higher than that of bike-share users. Comparing different types of bike-share users, the main purpose of using a shared bike for short-term users or casual users was touring or sightseeing, whereas long-term users used shared bikes mainly for work-related or commuting purposes (Buck et al. 2013; Kaviti et al. 2019). Zhou (2015) compared the temporal usage patterns of bike-sharing for customers and subscribers in Chicago, USA. This study showed that subscribers used shared bikes more frequently for commuting, whereas customer users were more recreation oriented. Moreover, Kong et al. (2020) investigated the behaviors of BSS users in terms of the relationship between bike-sharing and public transit for customers and subscribers in four cities in the USA. In all cities, it was observed that the proportion of modal substitution trips was twice that of the subscribers. However, modal integration trips were more frequently observed for subscribers than for customers.

Models for predicting bike-sharing demand

From previous studies, factors found to influence the demand for public bikes have been used as attributes in demand prediction models in several other studies (Li et al. 2015; Hulot et al. 2018). Most studies have focused on improving prediction performance and were typically aimed at increasing the rebalancing efficiency through accurate demand prediction at the station level. Therefore, unlike studies that investigated the relationship between bike-sharing usage and various spatiotemporal factors, these studies used various machine-learning techniques to build accurate prediction models.

Yang et al. (2016) built random forest models for demand predictions and Li and Zheng (2019) proposed a similarity-based efficient Gaussian process regressor to predict the number of bikes to be rented at different locations such as stations, clusters, and at the system level. Kim (2021) used random forest models to predict bike rentals and returns at cluster level and applied two different approaches to obtain station-level demands using the hierarchical demand prediction framework. Lin et al. (2018) and Zi et al. (2021) developed a novel graph convolutional network (GCN) model. Lin et al. (2018) proposed a GCN with a data-driven graph filter that can learn hidden heterogeneous pairwise correlations between stations to predict hourly station level demands in a large-scale bike-sharing network, whereas Zi et al. (2021) combined a GCN with temporal attention to model the spatial and temporal dependency between varying stations, and reflect the influence of different levels of time granularity.

In addition, several studies have considered spatial dependency in modeling to improve it appropriateness (Zhang et al. 2017; Guidon et al. 2019). In modeling the traffic prediction of BSSs, the assumption of non-spatial regression models, in which individual errors are independent of each other, may be violated, leading to biased estimation of standard errors of the model parameters. Therefore, if spatial dependency exists between samples, spatial regression considering spatial lag and error would be a better choice for modeling. Additionally, many studies have revealed that spatial dependency significantly improves the model fit (Shen et al. 2018; Guidon et al. 2020).

Faghih-Imani and Eluru (2016) incorporated the temporal effects and spatial dependency of BSS demand using spatial lag and spatial error models. As observed, considering the spatial lag or spatial error improved the model’s fit. The spatial lag model was validated to be better than the spatial error model by comparing spatial lag and spatial error models. Shen et al. (2018) used spatial autoregressive models that considered spatial lag and error dependency to analyze the spatiotemporal patterns of dockless bike-sharing in Singapore. Guidon et al. (2019) also utilized spatial autoregressive models that considered both spatial lag and error dependency to build models that may be used to determine whether the expansion of bike-sharing systems is necessary. In Wu et al. (2021), spatial lag and spatial error models were compared to investigate the effects of the built environment, such as different categories of POI, demographic information, and network properties of the bike-sharing system in Suzhou, China.

Station-level prediction is more difficult than system- and cluster-level predictions, because demand at station level fluctuates significantly, whereas demand at the system and cluster levels is more robust and regular (Kim 2021). However, instability and fluctuation in bike demand at station level can differ according to the trip purpose. As the main trip purpose might differ depending on the type of pass, more accurate demand predictions would be helpful if the differences in the usage patterns and the relationships between explanatory variables and demand by the type of pass were revealed.

Study area and data description

The study area was Seoul, which is the capital city of South Korea. As of 2020, the population of Seoul was approximately 9.7 million, comprising nearly 20 % of the total population of South Korea (51.8 million), and its population density reached approximately 16,000 inhabitants per square kilometer, according to the Korean Statistical Information Service.

Seoul Bike (“Tareungyi” in Korean) is a public bike-sharing service that was launched in October 2015 and operated by the Seoul Metropolitan Government. As of 2020, Seoul Bike users exceeded 2.78 million, which is one-fourth of the population of Seoul. In this study, the Seoul Bike one-year trip data from January 2019 to December 2019 were used because the system change (installation and closure of stations) in 2019 was marginal compared with that in 2018 and 2020. In addition, the usage patterns of Seoul Bike might differ significantly in 2020 from those in other years because of the COVID-19 pandemic. Each record provides rent and return stations, rent and return times, and pass types for each trip. Seoul Bike offers various vouchers, which can be categorized as seasonal passes and one-day passes. For seasonal passes, there were 7-, 30-, 180-, and 365-day passes. Each pass was further divided into two types according to the available usage time upon bike rental; although bikes can be used throughout the usage period of each pass, they must be returned to any station every one or two hours.

Four explanatory factor categories (i.e., weather, POI, population, and station attributes) were considered in this study. Weather data were obtained from the National Climate Data Service System, which provides hourly statistical data on past observational weather factors. From the weather data, this study selected temperature, humidity, precipitation, wind speed, and the temperature-humidity index (THI) as explanatory factors based on previous studies that investigated the relationship between the demand for bike-sharing and weather conditions (Gebhart and Noland 2014; El-Assi et al. 2017; Kim 2018). In the case of temperature, the temperature was divided into five sections (<0, 0–10, 10–20, 20–25, 25+) and binary variables corresponding to each section were generated to indicate whether the temperature corresponds to each section to incorporate the nonlinear relationship between temperature and bike-sharing demand in regression models.

The POI data were collected from the Kakao Local API, which provides locations of POIs in different categories, such as subway stations, restaurants, tourist attractions, and coffee shops within a certain radius. This study used POI variables that represented the number of POIs in different categories within walking distance from bike stations as explanatory variables. Based on previous studies (Xu et al. 2019; Chang et al. 2020; Hu et al. 2021), the walking distance was set to 500 m, and POI data were collected within 500 m of bike stations. Except for the POI variables for restaurants and cafés, the other categories of POI variables were used as they were. The log transformation was applied to the variable representing the number of restaurants and cafés in the walking distance from bike stations to obtain the normal distribution in the values of the variable. However, this study utilized not only regression analysis to predict the demand for shared bikes but also classification analysis to distinguish bike stations where the usage of long-term season passes was high. The demand for shared bikes has been verified to correlate with the number of POIs near bike stations (Faghih-Imani et al. 2014, 2017).

This study used two different population datasets: the 2019 population and housing census, and the mobile-phone-based floating population data, and generated variables that represent the resident population and floating population within 1 km of bike stations. The 2019 population and housing census was provided by Statistics Korea, and mobile-phone-based floating population data generated by KT, South Korea’s largest telecommunications company, were provided by the Seoul Metropolitan Government. Similar to the POI variable for restaurants and cafés, the two population variables were converted to log-transformed variables. Moreover, the original population variables related to the resident population and floating population may not significantly affect the proportion of usage of long-term season passes. Therefore, similar to the POI variables, instead of the original population variables, the ratio of the floating population to resident population was used as an explanatory variable for the classification analysis.

For factors categorized into station attributes, the Google Elevation API and Kakao Map were used to obtain the elevation of bike stations and the shortest distance to bike roads. In addition, factors related to the area density of neighborhood bike stations were generated for the station attribute, which is defined as follows.

$$\begin{aligned} dens(s, A) = \frac{|\{j|j\in A\}|}{area(A)}, \end{aligned}$$
(1)

where dens(sA) denotes the area density of the neighborhood bike stations for station s with respect to a given neighborhood A, \(|\cdot |\) denotes the size of the set, and area(A) denotes the area of A in \(\hbox {km}^{2}\). The number of docks at each bike station was not considered in this study because any number of public bikes can be parked at bike stations, regardless of the number of docks at bike stations at Seoul Bike.

Table 1 summarizes the explanatory factors used in this study. Among the explanatory factors, “Temp <0” was not used in the regression analysis because it was redundant. The only indicator variable related to temperature has a value of 1 and the others are 0; thus, “Temp <0” is deterministic at the given values of the other variables.

Table 1 Description of explanatory factors

Methodology

This study used three analysis methods to investigate the spatiotemporal usage patterns of Seoul Bike according to the type of pass in depth. The first method was a statistical analysis, which includes typical explanatory data analysis and clustering analysis. The main purpose of this analysis was to distinguish the differences in spatiotemporal usage patterns between different passes. Through statistical analysis, this study aimed to show differences in the distributions of travel times, hourly bike rentals and returns, and locations of bike stations with high demand, depending on the type of pass, as well as the relationship between the proportion of bike rentals according to the type of pass and temporal usage patterns. The second method was a regression analysis. The main goal of this analysis was to investigate the impact of various factors on the demand for shared bikes. The regression analysis was inspired by the different spatial distributions of bike rentals, depending on the type of pass observed in the first analysis. The third method was a classification analysis. The classification analysis was conducted to determine the significant factors that increased the proportion of bike rentals using long-term seasonal passes. By comparing the estimated coefficients of different models by pass type, it was confirmed that the factors that increase or decrease the demand for shared bikes differ depending on the pass type. However, when the estimated coefficients of the same factor for different models had the same sign, it was difficult to accurately compare the magnitude of the influence. Therefore, classification analysis was used to quantitatively identify factors significantly related to regions where a specific type of pass was frequently used.

Statistical analysis of basic bike usage patterns

Before the quantitative analysis, a statistical analysis of basic bike usage patterns was conducted. In this analysis, various spatiotemporal patterns such as travel time distribution, hourly bike rentals, and returns by pass type were collected from the Seoul Bike data.

In addition, a clustering analysis was used to validate whether the preference for a specific pass type differed depending on the station locations and whether the preference for a specific pass type was correlated with temporal usage patterns. To achieve this goal, two different inputs were used for clustering. The first input vector of each station for clustering was defined as the proportion of bike rentals, depending on the pass type, as follows:

$$\begin{aligned} {\textbf{x}}_s^1 = \left( \frac{f_{s,i,rent}^p}{f_{s,rent}},\ldots ,\frac{f_{s,I,rent}^p}{f_{s,rent}}\right) , \end{aligned}$$
(2)

where s denotes a station, and \(f_{s,rent}\) and \(f_{s,i,rent}^p\) denote the total bike rentals of station s in 2019 and the number of bike rentals by the \(i^{th}\) pass type for station s in 2019, respectively. I denotes the total number of pass types. The clustering using \({\textbf{x}}_s^1 \) was conducted to validate whether the preference for a specific pass type differed depending on the station locations. If the spatial distribution for each cluster shows spatial dependency, certain geographic factors influenced the increased use of a specific pass.

Regarding the results of the basic statistical analysis, it was observed that the usage patterns of the 7-day pass were similar to those of the 30-day pass, and the usage patterns of the 180-day pass were similar to those of the 365-day pass. Moreover, the differences in available usage time were marginal, regardless of the type of pass for rental periods. Hence, the experimental results were presented for three types of passes; 1) one-day pass (denoted “One-day”), 2) short-term season passes including 7- and 30-day passes (denoted “Season(Short)”), and 3) long-term season passes including 180- and 365-day passes (denoted “Season(Long)”).

The second input vector represents the temporal usage pattern of each station. It is defined as the hourly proportion of bike rentals and returns, as follows:

$$\begin{aligned} {\textbf{x}}_s^2 = \left( \frac{f_{s,0,rent}^t}{f_{s,rent}}, \ldots , \frac{f_{s,23,rent}^t}{f_{s,rent}},\frac{f_{s,0,return}^t}{f_{s,return}}, \ldots , \frac{f_{s,23,return}^t}{f_{s,return}}\right) , \end{aligned}$$
(3)

where \(f_{s,h,rent}^t\) and \(f_{s,h,return}^t\) denote the frequency of bike rentals at a specific time interval h, and each \(h\in \{0,\ldots ,23\}\) denotes a one-hour interval over a day, starting at midnight. For example, \(f_{s,0,rent}^t\) and \(f_{s,0,return}^t\) indicate the total frequency of bike rentals and returns at station s between 0 and 1 AM in 2019, respectively. The clustering analysis for \({\textbf{x}}_s^2\) was used to validate whether the preference for a specific pass type was correlated with the temporal usage patterns, which could be achieved by comparing the two different cluster assignments using \({\textbf{x}}_s^1\) and \({\textbf{x}}_s^2\). If the two different cluster assignments were correlated, it was concluded that the preference for a specific pass type causes a specific temporal usage pattern.

For clustering analysis, we used k-means clustering, which is simple and popular. The appropriate number of clusters was determined by checking the silhouette coefficients. The average hourly proportion of bike rentals and returns might be distorted for bike stations with low bike rentals and returns; thus, bike stations whose bike rentals or returns in the year of 2019 were less than a specified threshold were excluded from the clustering analysis. In this study, bike stations with both bike rentals and returns equal to or greater than 100 were used for the clustering analysis; the number of selected bike stations for the clustering analysis was 1,516, covering nearly all 1,521 bike stations operated throughout 2019. Moreover, the temporal usage patterns of the stations differed between weekdays (excluding public holidays) and weekends (including public holidays), regardless of the pass type. Therefore, all analyses were conducted separately on weekdays and weekends.

Learning models to analyze bike usage patterns by type of pass

To quantitatively analyze differences by pass type, this study used two approaches: (1) regression, a traffic prediction model for the total daily bike rentals at each station was trained for each pass type to examine differences in the impact of explanatory factors on bike demand by pass type, and (2) classification, a classifier to separate stations that differed in the proportion of bike rentals by pass type was trained. By comparing the traffic prediction models, differences in the effects of several factors on bike rentals by pass type were determined. Moreover, using classification analysis, the characteristics of areas where a specific type of pass was used more often than other pass types could be understood.

Regression

To determine a method for predicting the daily bike-sharing demand at individual stations, this study considered spatial dependency. As observed in Fig. 5, bike rentals over the course of a year showed spatial dependency, regardless of the pass type, which intuitively showed that spatial dependency should be considered in traffic modeling. Moreover, there were differences in the locations of bike stations with a high demand for shared bikes depending on the type of pass, as well as the type of day, which indicated that the impact of the explanatory factors on bike rentals may differ depending on the type of pass. Based on the observations presented in Fig. 5, spatial dependency tests were conducted for trained models depending on the type of day and pass using linear regression; this analytically confirmed that spatial lag and error dependency should be considered in traffic modeling.

In summary, this study built linear and spatial regression models depending on pass type. Because the distributions of daily bike rentals were highly right-skewed, the target variable was computed using the log transformation of the number of daily bike rentals at each station, as follows:

$$\begin{aligned} y_{s,i,d} = \log _{10} (f_{s,i,d}+1), \end{aligned}$$
(4)

where \(y_{s,i,d}\) denotes the target value of station s on a specific date d using \(i^{th}\) pass type (\(i\in \) {One-day, Season(Short), Season(Long)}) and \(f_{s,i,d}\) is the total bike rentals at station s on a specific date d using \(i^{th}\) pass type. To test for spatial dependency, linear models were trained using log-transformed average bike rentals for one year as the target variable and only spatial variables as the explanatory variables, ignoring temporal variables. Subsequently, spatial dependency was tested using Maran’s I and Lagrange multiplier (LM) tests for for linear models; the test results are presented in Table 4 of “Regression analysis results” section. After confirming the existence of a spatial lag or error dependency, spatial regression was applied to train the traffic prediction models. Spatial regression has been successfully used to build demand prediction models for several BSSs (Shen et al. 2018; Guidon et al. 2019, 2020). Regarding the results described in Table 4, this study used the spatial autoregressive model with an additional autoregressive error structure (SARAR). The model has the following form.

$$\begin{aligned} {\textbf{y}} = \rho {\textbf{W}} {\textbf{y}} + {\textbf{X}}\varvec{\beta } + {\textbf{u}}, {\textbf{u}} = \lambda {\textbf{W}} {\textbf{u}} + \varvec{\epsilon }, \end{aligned}$$
(5)

where \({\textbf{y}}\) denotes the target (dependent) variable, \(\rho \) and \(\lambda \) represent parameters for the spatial lags of the dependent variable and error term, respectively, and \({\textbf{W}}\) denotes the row-standardized spatial weights matrix. The set of neighboring stations for each station used to obtain \({\textbf{W}}\) is defined as the set of stations within 2 km of each station. The radius of the neighbor set was 2 km because approximately 70 % of bike rentals in 2019 were used between bike stations within 2 km, and each station had at least one neighboring station within 2 km.

Similar to clustering analysis, the prediction models for each pass type were trained separately for weekdays (excluding public holidays) and weekends (including public holidays).

In addition to the Akaike information criterion (AIC) and adjusted \(R^2\) (\(R^2_{adj}\)) used for the goodness-of-fit metrics, three evaluation metrics were adopted for regression in terms of prediction accuracy: (1) root mean square error (RMSE), (2) mean absolute error (MAE), and (3) root mean square log error (RMSLE). These evaluation metrics have different characteristics. The RMSE is sensitive to the presence of outliers; while the MAE and RMSLE are relatively robust to outliers. Unlike the MAE and RMSE, the RMSLE considers the relative error between the predicted and actual values. However, RMSLE can be exceptionally large because of some stations with low daily demand, because when the actual target is 1 and the predicted value is 2, the predicted value is twice the actual value. Therefore, to measure the prediction accuracy for bike stations that are steadily used, the RMSLE for samples with actual target values of at least 10 (\(\hbox {RMSLE}_{\ge 10}\)) was computed in addition to RMSLE using all samples. Although the target for training the prediction models (\(y_{s,i,d}\)) was defined as the log-transformed number of daily bike rentals, the predicted value (\({\hat{y}}_{s,i,d}\)) was converted back to the original scale (\({\hat{f}}_{s,i,d}\)) by taking an exponential value, which was then used to calculate the RMSE, MAE, and RMSLE.

The basic summary statistics of the target variables for the traffic prediction models are listed in Table 2. Regardless of weekdays and weekends, the mean, Q1, Q2, and Q3 values of targets for long-term season passes were the largest among all types of passes, implying that bike rentals using long-term season passes accounted for the largest portion of bike rentals. If there is a difference in the scale of the targets of different models, it is difficult to compare the estimated coefficients of different models. Moreover, the RMSE and MAE are not scale-invariant. Therefore, new target variables for “One-day” and “Season(Short)” were calculated by multiplying a constant such that the mean of each case was equal to that of “Season(Long).”

Table 2 Summary statistics for target variables

Classification

To train a classifier, class information of the samples in the training data is required. The classification analysis in this study aimed to determine the factors that significantly increased or decreased bike rentals when using a particular type of pass. Hence, the clustering results using the proportion of bike rentals depending on the type of pass as the input were used as the targets of the classifiers. Similar to the average hourly proportion of bike rentals and returns, the proportion of bike rentals by the type of pass can be distorted for bike stations with low bike demand. Hence, to determine the targets for the classification analysis, bike stations with least 50 rentals were selected, which is smaller than the threshold for the clustering analysis because the number of pass types was much smaller than 24. Although a smaller threshold was used for the classification analysis than for the clustering analysis, the number of selected bike stations was 1,519, which was similar to 1,516, the number of selected bike stations used for the clustering, regardless of the type of day.

Similar to the regression analysis, to consider spatial dependency, a logistic mixed model (LMM) was used instead of logistic regression (LR), which cannot incorporate spatial dependency. The LMM is an extension of LR that can incorporate random effects, random errors, or both (Klassen et al. 2014; F. Dormann et al. 2007). Moreover, this study used a spatial lag in the response variable, similar to the regression analysis. Because the LMM was applied to all bike stations with no random effects, the model used in this study can be formulated as follows:

$$\begin{aligned} \ln \left( \frac{{\textbf{p}}}{{\textbf{1}}-{\textbf{p}}}\right) = \rho {\textbf{W}} {\textbf{y}} + {\textbf{X}}\varvec{\beta } + \varvec{\epsilon } + {\textbf{u}}, \end{aligned}$$
(6)

where \({\textbf{p}} = (p_1,\cdots , p_S), p_s = \text {Pr}(y_i=1)\), \(y_i\) is a binary variable, \(\varvec{\epsilon }\) denotes spatially independent errors, and \({\textbf{u}}\) represents the spatially dependent errors used to incorporate the spatial effect in the LR. For the LMM, the same \({\textbf{W}}\) as in SARAR was used, and the correlation structure that incorporated spatial dependency was defined as a spatial Gaussian structure in this study.

The explanatory variables for the classification models included only factors related to spatial information around the bike stations used in the regression analysis.

Experimental results

Statistical analysis results

Basic characteristics of bike usage patterns

Figure 1 shows the hourly proportion of bike rentals by pass type on weekdays and weekends. The one-day and season passes on weekdays differ in that the one-day pass is rarely used for the morning peak time, unlike the season passes. Season passes were more frequently used during the morning and afternoon peak times than at other times. However, the long-term season passes showed little difference in the proportion of bike rentals at the two peak times compared with the short-term season passes. In addition, the ratios of bike rentals for the one-day pass in the evening were higher than those of the others on weekdays.

Fig. 1
figure 1

Distributions in hourly proportion of bike rentals varying with pass types

Figure 2 shows the hourly share of passes in bike rentals. Regardless of weekday and weekend, long-term season passes were the most popular. The shares of long-term season passes in the morning on both weekdays and weekends and the afternoon peak time on weekdays were significantly higher when compared to the other times. In other words, long-term season passes appear to be primarily used for commuting. Unlike the long-term season passes, the share of the short-term season passes did not change significantly over time. The one-day pass is more frequently used on weekends than on weekdays, and the share of the one-day pass is particularly high on public holiday afternoons, which supports the use of the one-day pass for occasional purposes such as leisure. This assumption is supported by the results shown in Fig. 3, which shows the monthly shares of passes. On weekdays, the share of one-day passes did not show a significant difference from month to month. However, on weekends, the shares of the one-day pass in April, May, September, and October, which represent peak seasons for outdoor activities, were higher than those in other months.

Fig. 2
figure 2

Hourly shares of the passes in bike rentals

Fig. 3
figure 3

Monthly shares of the passes in bike rentals

Figure 4 shows the travel time distributions for the different passes. The season pass users usually ride public bikes for less than 20 min, whereas users of the one-day pass ride public bikes for between 20 and 60 min. Moreover, the ratio of travel times exceeding 60 min was highest for one-day passes. On weekends, the proportion of bike rentals exceeding 60 min by those holding one-day passes increased.

Fig. 4
figure 4

Travel time distributions

Spatial characteristics of bike usage patterns

Next, the spatial characteristics of bike usage patterns depending on pass type were investigated. Figure 5 shows the normalized distributions of bike rentals by station. The thick horizontal blue line represents the Han River and the thin blue lines correspond to the tributaries of the Han River. The Han River and its tributaries contain well-constructed bike roads. To obtain Fig. 5, the total bike rental frequencies for each station by pass type in 2019 were calculated. The frequencies were normalized to their maximum values; i.e., the more the bike rentals at a station, the darker the red color of the marker for that station was. Regardless of the day, bike rentals for one-day passes showed a larger deviation than season passes. However, the long-term season passes showed a high frequency of bike rentals at more stations, particularly on weekdays.

Fig. 5
figure 5

The normalized distributions in bike rentals by stations

Figure 6 shows the population and total corporations within 500 m \(\times \) 500 m grids. The dashed ellipses denote the following representative business districts: (1) the Central Business District (CBD), including the Gwanghwamun area; (2) the Gangnam Business District (GBD); (3) the Yeouido Business District; and (4) the Seoul Digital Industrial Complex. Compared with other regions, these business districts have small populations and large numbers of businesses. Comparing Figs. 5 and 6, it is seen that stations with high frequencies of bike rentals by long-term season passes, especially on weekdays, are located in business districts. However, this does not hold for the GBD; bike rentals in the GBD were relatively low, even for other passes. This may be due to the many hills in this district.

Fig. 6
figure 6

The population and the number of corporation in grids

Finally, a clustering analysis was conducted to investigate the relationships between the locations and usage patterns of the bike stations. Specifically, k-means clustering was performed twice using two different types of inputs for individual bike stations: (1) the proportion of bike rentals by pass type (\({\textbf{x}}_s^1\) defined by Eq. (2)), and (2) temporal usage patterns (\({\textbf{x}}_s^2\) defined by Eq. (3). For clustering, 1,516 bike stations were used regardless of day type. Regardless of the type of input for clustering, the silhouette coefficients were highest when the total clusters was two, and these values decreased significantly as the total clusters increased to three on both weekdays and weekends. Hence, the number of clusters was determined to two for all the cases. The cluster labels were determined such that pairs of clusters with more overlapping stations had the same labels as the other pairs. In other words, C1 obtained by the first clustering analysis overlapped more with C1 obtained by the second clustering analysis than with C2 obtained by the second clustering analysis.

Figure 7 shows two cluster centers when the proportion vectors of bike rentals by pass type were used as the input for clustering. The proportion of use of the one-day pass was higher than the overall average in the case of C1, whereas the proportion of use of long-term season passes was higher than the overall average in the case of C2, regardless of weekdays and weekends. However, the difference between C1 and C2 was greater on weekends than on weekdays. In addition, the proportions of the short-term season passes in C1 and C2 were similar. On weekdays, 770 and 746 stations were assigned to C1 and C2, and on weekends, 491 and 1,025 stations were assigned to C1 and C2. This result indicates that the users rented shared bikes using long-term season passes at more bike stations on weekends than on weekdays, which implies that the travel patterns of the long-term season pass holders were more diverse on weekends than on weekdays. This might be because users frequently rent shared bikes for commuting on weekdays.

Fig. 7
figure 7

Cluster centers: Proportion of bike rentals depending on the type of pass

Figure 8 shows two cluster centers as solid lines when temporal usage-pattern vectors are used as the input for clustering. The standard deviation for each cluster is represented by the shaded area. On weekdays, C1 corresponds to stations with more rentals than returns during the morning peak time, and more returns than rentals during the afternoon peak time. Conversely, C2 shows characteristics opposite to those of C1 on weekdays. On weekends, the difference between the two clusters was not as significant as for those on weekdays. Compared to C2, C1 shows sharp peaks in the afternoon for both rentals and returns. In addition, C2 showed a peak in the evening for bike returns. Compared with the cluster assignment using the proportion of bike rentals depending on the type of pass, on weekdays, the imbalance in the number of bike stations in each cluster was larger for the clustering result using temporal usage patterns. On weekdays, 1,183 and 333 stations were assigned to C1 and C2. On weekends, 1,023 and 493 stations were assigned to C1 and C2. Regardless of the type of day, more than 60 % of the bike stations were assigned to the same cluster (approximately 62 % and 67 % for weekdays and weekends, respectively) for two cluster assignments based on different input types. Considering that temporal usage patterns vary depending on the trip purpose using shared bikes, this result supports the idea that users select passes according to the main purpose of using shared bikes.

The temporal usage patterns of two different clusters for each pass type are provided in Section S1 of the Supplementary material. Summarizing the results of the clustering analysis using temporal usage pattern vectors of each type of pass, on weekdays, the cluster centers of one-day passes were significantly different from those of season passes, but the difference was marginal on weekends. However, regardless of the pass type, most bike stations belong to the same cluster for all pass types.

Fig. 8
figure 8

Cluster centers: Temporal usage pattern

Table 3 Summary of clustering analysis

Table 3 summarizes the C1 and C2 characteristics according to the two different clustering results. In terms of the proportion of bike rentals by pass type, C1 can be characterized as bike stations with a high proportion of one-day pass usage, while C2 can be characterized as bike stations with a high proportion of long-term season pass usage. Simultaneously, C1 can be characterized as high bike rentals during the morning peak hours and bike returns during the evening peak hours, while C2 can be characterized as high bike rentals during the evening peak hours and bike returns during the morning peak hours based on the clustering result using temporal usage patterns on weekdays. On weekends, C1 shows the sharper peaks in the afternoon for bike rentals and returns, and C2 shows broader peaks in the afternoon for bike rentals, and in the evening for bike returns.

Fig. 9
figure 9

Location of bike stations by clusters: Proportion of bike rentals depending on the type of pass

Fig. 10
figure 10

Location of bike stations by clusters: Temporal usage patterns

Figures 9 and 10 show the bike station locations by cluster assignment based on the proportion of bike rentals by pass type, and obtained from temporal usage patterns, respectively. The station locations according to the matching results of the two clustering results are provided in Section S1 of the Supplementary Material. On weekdays, bike stations in C2 are usually located in the business districts of Seoul, which may be because long-term season passes are mainly used for commuting. On weekends, the bike stations in C2 were mainly located in typical residential areas in Seoul. In addition, bike stations in some business districts belong to different clusters, depending on the type of day. Bike stations in the CBD and Yeouido Business District were assigned to C2 on weekdays and C1 on weekdays based on the clustering result using the temporal usage pattern. Moreover, bike stations in the Yeouido Business District were also assigned to C2 on weekdays and C1 on weekdays, based on the clustering results using the proportion of bike rentals by pass type. According to that C1 for the cluster assignment using the proportion of bike rentals by pass type corresponds to the cluster with the highest proportion for one-day passes than season passes, and C1 corresponds to the line with the sharper peak shown in Fig. 8b than C2, the reason that the use of one-day passes was higher in some business districts on weekends might be due to the high floating population, and not the high residential population. Moreover, on weekends, the average distance to the Han River or its tributaries (where bike roads are well-constructed in Seoul) was shorter for C1 than for C2, regardless of the input type. Thus, the Seoul Bike users seemed to rent shared bikes for leisure purposes, particularly at bike stations in C1.

Regression analysis results

Table 4 summarizes the test results that checked the spatial dependency of the linear regression models, and the model parameters trained by linear regression were provided as an appendix. Regardless of weekday and weekend, the test results confirmed that spatial lag and error dependency should be considered.

Tables 5 and 6 present the results of the prediction models trained using SARAR on weekdays and weekends, respectively. The estimated parameters (“Coef.”), and standard errors (“Std. Err.”) for the models with AIC and adjusted \(R^2\) are summarized. In terms of AIC and \(R_{adj}^2\), the spatial regression models showed superior performance compared with the corresponding linear regression models. The parameter \(\rho \) for the spatial lag was significant for all spatial regression models, and the parameter \(\lambda \) for spatial autocorrelation in the residuals was also significant for all SARAR models.

Table 4 Tests for spatial dependency for the linear regression models
Table 5 Parameter estimates of traffic prediction models using SARAR: Weekdays
Table 6 Parameter estimates of traffic prediction models using SARAR: Weekends

Comparing the absolute values of the estimated coefficients between the linear regression and spatial regression models, regardless of the pass type, the coefficients were overestimated, which may have resulted from omitting the spatial lag. In particular, the overestimation of the coefficients was stronger in the models for season passes than in those for one-day passes, as well as in the models for weekdays than in those for weekends.

As expected, high humidity, heavy precipitation, and a high temperature-humidity index negatively influenced the demand for bike rentals in all models. Regardless of weekdays and weekends, the one-day pass was more influenced by weather conditions than the season passes. The coefficients of the wind speed variable were positive on weekdays, particularly for the season passes. Because the sizes of these coefficients are small compared to those of other weather variables, the use of season passes can be reasonably interpreted as being less influenced by wind speed rather than increasing on windy days.

In terms of POIs, several categories of POIs significantly affected the demand for shared bikes. Except for schools and bus stops, as the total POIs increased in the neighborhood of a bike station, bike rentals usually increased. In Seoul, over 99 % of the bike stations are located within 400 m of bus stops. One of reasons for using shared bikes is access to or egress from public transit stations. However, the bus network in Seoul has a denser structure than the subway network in Seoul, which implies that the first last mile problem in this access to or egress from public transit stations. However, the bus network in Seoul has a denser structure than its subway network, which implies that the first last-mile problem in the bus system is much less significant than that in the subway system in Seoul. Tourist attractions and parks significantly increased the use of the one-day pass compared to the number of season passes. Universities positively influence bike rentals on weekdays; however, they negatively influenced or did not significantly influence bike rentals on weekends. This could be because students typically visit their universities on weekdays.

As the resident and floating populations increased, the use of shared bikes increased for all types of passes based on linear regression models. However, according to the SARAR models, an increase in the floating population did not increase bike rentals.

The elevation of bike stations and the shortest distance to bike roads negatively influenced bike rentals in all models. Their impact on bike rentals was greatest for the one-day pass, particularly on weekends. In terms of the distribution of other stations, bike rental decreased as the area density of bike stations with one km increases, regardless of the type of pass and modeling algorithm for both weekdays and weekends. Because one km is sufficient to travel on foot, the demand for shared bikes was dispersed to other bike stations if there were many other bike stations within this range. By contrast, bike rental by season passes increased when the area density of bike stations ranges from one to three km, whereas the use of the one-day pass increases the most when many other bike stations exist in the range of three to five km on weekends. This can be understood through the observation that travel time increases for a one-day pass, particularly on weekends.

In summary, the regression results confirm that a one-day pass is also used for leisure as part of outdoor activities. Unlike commuting, bike rentals for leisure are optional and irregular; thus, they are influenced more by weather conditions. Moreover, bike rentals using one-day passes were concentrated at stations that were easily accessible, and appropriate for riding.

For both weekdays and weekends, the AIC was highest for the one-day pass models, and lowest for the long-term season pass models, whereas \(R_{adj}^2\) showed the opposite results. This implies that the model fit for the season passes is superior to that of the one-day pass, which was expected because the demand for bike rentals using season passes was more stable than those using one-day passes. In terms of \(R_{adj}^2\), the model for the one-day pass was highest on weekends, whereas the model for the long-term season passes was highest on weekdays. Therefore, the use of the long-term season pass is stable on weekends, and the use of the one-day pass is stable on weekends, possibly because the long-term season passes are usually used for commuting, and the one-day pass is frequently used for leisure.

Additionally, the fit of the models was compared using the RMSE, MAE, and RMSLE of the predicted values, and the results are summarized in Table 7. In Table 7, the results of the one-day pass models show a worse performance than those of the other models for all metrics. In addition, the spatial regression models achieved better prediction performance than the linear regression models for all the evaluation metrics. In particular, the spatial regression models dramatically enhanced the prediction performance in terms of \(\hbox {RMSLE}_{\ge 10}\) when compared with the linear regression models. Finally, the higher RMSE and MAE values on weekdays may result from more bike rentals on weekdays than on weekends. However, the RMSLE values, which are scale-invariant, are lower on weekends than on weekdays for the one-day pass, implying that fluctuations in temporary demand for public bikes are higher on weekdays than on weekends.

Table 7 Evaluation results of traffic prediction models
Table 8 Evaluation results depending on the type of error

Table 8 summarizes the evaluation results depending on the type of error. In this table, “Under” and “Over” represent the MAE of the under- and over-estimated samples, respectively. In addition, “Under/Over” shows the ratio of MAE for under-estimated samples to MAE for over-estimated samples and “Ratio of Over” represents the proportion of under-estimated samples. Similar to the results shown in Table 7, the MAE of the under- and over-estimated samples was greatest for the one-day pass. In particular, the MAE of under-estimated samples was greater than that of over-estimated samples, and their ratio was also largest for the one-day pass, even though the proportion of under-estimated samples was smallest for the one-day pass. Hence, according to Table 8, it can be inferred that bike rentals using one-day pass may suddenly increase dramatically, which is difficult to predict.

Classification analysis results

Table 9 shows the results of the classification analysis used to identify the factors influencing the shares of the different types of passes. As mentioned in “Learning models to analyze bike usage patterns by type of pass” section, 1,519 bike stations were used for the classification analysis. The target of the classifier was defined as 1 if a station belonged to C2, based on the cluster assignment using the proportion of bike rentals by pass type, which is the cluster where the proportion of bike rentals using long-term season passes is high; otherwise, it was defined as 0. Again, summarizing the characteristics of the clusters, C1 used a one-day pass more than the overall average, and C2 used the long-term season passes more than the overall average. On weekdays, 747 (49.2 %) bike stations had a target value of 1,and 1,028 (67.7 %) bike stations had a target value of 1 on weekends. Considering that the proportion of bike rentals using one-day pass was greater on weekends than on weekdays at the system level, the large size of C2 on weekends implies that the use of one-day pass increased significantly, mainly at certain bike stations.

Table 9 Parameter estimates of LMMs

On weekdays, many spatial variables could not significantly classify clusters that show different proportions of bike rentals by long-term season passes. However, as the proportion of restaurants and cafés increased in the vicinity of a bike station, the probability of bike rentals using long-term season passes at that station increased. In general, this might be because restaurants and cafés are plentiful in business districts with many workers. The positive coefficient of the variable representing the ratio of the floating population to the population also implies that the use of long-term season passes is more frequent in business districts. In addition, the coefficient of the variable representing the number of subway stations near the bike station is negative on weekdays. From this result, it can be inferred that the long-term season pass users used a shared bike to access subway stations not within walking distance for commuting. In other words, long-term season pass users tend to ride a shared bike as a feeder mode for subways on weekdays, which is consistent with previous studies that investigated the modal integration between bike-sharing and public transit (Ma et al. 2018; Kong et al. 2020). Moreover, on weekdays, the average distance to the nearest subway station for bike stations where the proportion of bike rental by long-term season passes was high was longer than that for bike stations where the proportion of bike rental by long-term season passes was low.

In contrast, more spatial variables significantly influenced the proportion of bike rentals for long-term season passes for individual bike stations on weekends than on weekdays. On weekends, bike stations, where more shared bikes were rented using one-day passes, were located near subway stations, parks, and tourist attractions. In addition, the coefficient of the variable representing the shortest distance to bike roads was positive on weekends, but negative on weekdays. This result indicates that the weekend usage of one-day passes was high at bike stations close to bike roads. In other words, on weekends, areas near parks and bike roads were more popular for bike rentals with temporal travel purposes such as leisure. In contrast to weekdays, the variable representing the ratio of the floating population to the population was insignificant over weekends, which might be because the holders of long-term season passes also rent a shared bike for various purposes such as shopping and leisure.

Conclusion

Understanding bike usage patterns and their explanatory factors is beneficial for improving public bike utilization efficiency and turnover. The main contributions of this study are summarized as follows.

First, this study revealed that spatiotemporal usage patterns differed depending on pass type. long-term season passes of more than 180 days were mainly used for transportation (in particular, commuting), whereas one-day or short-term season passes seemed to be used more for leisure than other purposes. The intended purposes of bike rentals seemed to cause differences in usage patterns and variability in demand over time and space. Consequently, differences were observed based on the time of day, day of the week, and frequently used locations for each pass type. In addition, the distribution of travel time differed according to the pass type. Based on the clustering analysis results, different types of passes showed different temporal usage patterns, and even for the same type of pass, there were different temporal patterns that were significantly distinguished depending on the locations of bike stations. However, the factors that determined a particular temporal pattern for each pass type did not differ significantly from one pass type to another.

Second, this study determined the different influences of the factors on shared bike demands depending on the type of pass, using various machine-learning methodologies. This study used two spatial models for regression and classification. The SARAR models were trained to predict the daily demand for bike stations at each station on weekdays and weekends, and the LMM models were trained to distinguish bike stations where the use of long-term season passes was higher than the others. Statistical tests for spatial dependency showed that spatial lag and error dependency should be considered in modeling, and the trained spatial regression models showed a better model fit than the linear regression models. According to the estimated parameters for all regression models, the one-day pass was more influenced by weather conditions than the long-term passes, especially on weekdays, and the difference in the impact of different factors on bike rentals was larger on weekdays than on weekends; this could be because long-term pass users also rent public bikes for leisure. Bike rentals by one-day passes increased when the density of bike stations was high in an area of three to five km from bikes stations on weekends. This was reasonable because the user’s travel time for a bike rented using a one-day pass was longer than that for other passes, particularly on weekends. According to both the regression and classification models, the use of one-day passes was high at stations near parks, bike roads, and tourist attractions on weekends. Therefore, considering these results, rather than installing bike stations evenly throughout Seoul, expanding bike stations in the vicinity of parks where bike roads are well-equipped and connection to public transit is convenient, makes it more suitable for people to use shared bikes for leisure purposes in conjunction with public transit on weekends. By monitoring the status of bike stations, mainly in the aforementioned locations, it may be possible to proactively address the problem of bike imbalance over weekends. Moreover, it was observed that the influence of some factors was different depending on the type of pass, and if this result was not reflected to predict the demand for shared bikes, the accuracy of the demand prediction could decrease, which hinders efficient bike rebalancing.

In terms of the model fit evaluated using RMSE, MAE, and RMSLE, the regression models for the one-day pass showed worse performance than those for the season pass, although \(R_{adj}^2\) of the models for the one-day pass was highest on weekends. The low prediction accuracy of the models for the one-day pass may have resulted from the high variability in the use of a one-day pass. This study showed that the demand for public bikes by one-day passes was more variable than that for other passes; an enhanced prediction model for one-day passes should be developed to accurately predict bike rentals at station level. In particular, at stations where the proportion of bike rentals using one-day passes is high, such as stations close to the Han River and its tributaries, the demand for public bikes fluctuated significantly. Hence, the total bikes at such stations should be managed more carefully than at other stations, because situations in which users cannot rent a bike may occur more frequently at those stations than at other stations. Moreover, over-estimations were more frequently than under-estimations, but the magnitude of the errors was, on average, greater in under-estimations than in over-estimations, particularly for one-day passes. This means that there were days when the demand for shared bikes increased significantly, and the regression models did not predict this properly, particularly for the one-day passes. The sudden demand increase was usually due to external factors, such as weather conditions and holidays, and was observed at certain bike stations. Therefore, understanding when and where extreme demand occurs will help BSS operators manage BSSs and reduce bike imbalances.

This study had several limitations that should be considered in future studies. First, because this study aimed to discover differences in usage patterns depending on the pass type, linear regression algorithms were used to determine the relationships between bike demand and explanatory factors. However, more advanced machine-learning algorithms should be adopted to accurately predict the demand. Second, for clustering based on the proportions of bike rentals depending on the pass type, proportions were calculated using full-year data divided only into weekdays and weekends. If seasonal variations are considered, they can be used to identify detailed usage patterns, depending on the pass type. Finally, the Seoul Bike data did not provide user ID information for each trip record; therefore, the usage patterns of Seoul Bike could be analyzed in depth depending on the pass type. If user ID information is available, different usage patterns in transactions can be characterized using the same pass type.