1 Introduction

In recent years, the possibility of geo-referenced data collection has opened doors for understanding human behaviour in urban contexts, such as pedestrian mobility (Mizzi et al. 2018), taxi services (Rodrigues et al. 2018) and bike and scooter-sharing mobility (Zhu et al. 2020). In particular, bike-sharing services offer new opportunities for researchers to study human mobility by analysing spatial patterns (Mooney et al. 2019) and temporal patterns (Liu et al. 2018) of bike usage or by studying the effects of weather on bike-sharing services (Nosal & Miranda-Moreno 2014).

Local governments promote cycling as an eco-friendly means of local transportation within touristic and historical cities due to a variety of health and environmental benefits, including reducing noise and air pollution, reducing the need for new parking lots and improving public health. However, identifying the transport demand of cyclists and quantifying the utilisation of the road represent the main challenges to improve infrastructures (e.g., bike lanes and racks) and manage traffic.

Like many other cities, Bologna -a historical and a university city in Italy- struggles to manage urban traffic and its side-effect, such as CO2 emissions. In 2017, the Public Transport Authority SRM Reti e Mobilità SrlFootnote 1 promoted the six months Bella MossaFootnote 2 initiative through which it encouraged eco-friendly means of transportation and reduced day-to-day reliance on single-occupancy car journeys. The program was intended to reward users with points (to be transformed into prizes later) for walking, cycling, and using public transport. In particular, through a mobile application, the users chose the type of activity they were about to perform, and the application started tracking and recording positions via the users’ smartphone GPS. In this study, we focused on cycling mobility due to the wider coverage of the city offered by this means of transportation (e.g., short/long trips, small/large streets).

To understand cycling mobility, we analysed bike usage through the 320,118 self-reported unique bicycle trips in the city of Bologna within the Bella Mossa dataset. These trips span over six months period, starting from April 2017 to September 2017. In particular, the study covers the following aspects:

  1. 1.

    Temporal analysis: We evaluated how the cycling mobility changes during days, weeks and months, and computed the average speed, length, and travel times of the trips to understand the temporal characteristics of the cycling mobility.

  2. 2.

    Spatial analysis: We reconstructed the utilisation map of the city’s road network highlighting popular roads and most favourite points within the city. The identification of the points of interest and routes used by users made it possible to study how people spread out from the main identified locations

  3. 3.

    External variables: To the best of our knowledge, when analysing bike usage the studies in literature focused on a single aspect such as by either understanding the impact of weather conditions (Caulfield et al. 2017) or analysing temporal patterns (Liu et al. 2018). However, we considered several variables that can affect cycling mobility, such as weather, pollution, seasonality, holidays, and events.

  4. 4.

    Predictive analysis: We used machine learning and deep learning approaches to forecast the number of trips for the near future, that is, in the next i) 10 minutes, ii) 30 minutes, and ii) 60 minutes to study daily bicycle traffic trends that could possibly help in traffic management and infrastructures deployment.

Our work aims to study bike mobility in a historical and tourist city in a broad way by addressing both aspects that may influence bike usage and exploring predictive models that can support decision-makers in urban traffic management.

The paper is structured as follows. Section 2 outlines the current state of knowledge within which this study falls. Section 3 presents a detailed description of the datasets. In Section 4, we discuss the descriptive analysis results using the cycling dataset and other datasets about weather, pollution, and events. Section 5 presents the experimental results on short-term mobility forecasting exploring predictive models that can support decision-makers in urban traffic management. Finally, some concluding remarks and future works are made in Section 6.

2 Related works

In this section, we examine works that have studied bike mobility. Firstly we discuss papers that have investigated the behaviours of bike users (Section 2.1). In particular, we have found that several aspects (e.g., temperature, precipitation and the characteristics of the city) that may affect bike usage had been studied separately using dock-based bike-sharing systems. This study aimed to provide a holistic picture of cycling mobility in a tourist and historical city using self-reported bike trips. We also examine the papers where authors have performed prediction analysis for different tasks, such as distributing bikes in the city for easier usage or traffic flow (Section 2.2). In this case, we found that the main studies focused on predicting the use of the bike-sharing service to better organise the redistribution of the bikes. Whereas, we aimed to predict the bike use within the road network of the city.

2.1 Analysing bike usage

Numerous studies have been done in order to understand the effect of weather on the use of bicycles. For example, in Caulfield et al. (2017), the research was conducted on how weather conditions affect the usage of bikes in the city of Cork, Ireland: trips are shorter during rains, and longer trips are done during sunny days. In Nosal & Miranda-Moreno (2014), the authors found that precipitation in a single hour might significantly affect the number of bike trips. They also noted that cycling during weekends is more affected by weather than during weekdays. El-Assi et al. (2017) studied how weather affects bike sharing demand in the second biggest city of Canada, Toronto. The findings implied that there is a significant correlation between air temperature and bike usage. Authors also investigated how the built environment affects bike demand, concluding that bike infrastructure plays a major role in increasing the popularity of bike usage. Nankervis (1999) found out in their study that weather’s both short-term (e.g. daily temperature) and long-term (e.g. seasonal conditions) affect bike usage significantly. Similarly, in Sears et al. (2012), researchers discussed how seasonal factors affect bicycle commuting. The results of the study confirmed that there is a high correlation between weather and bike usage.

Some other studies focus on the impact of infrastructure on bike usage. For example, in Liu et al. (2018), the authors investigated which road attributes influence bike users’ path choices. In Fishman et al. (2015), researchers in Melbourne and Brisbane tried to quantify the factors influencing bike-share membership through a questionnaire. The results showed that the distance to the closest docking station is highly correlated with the membership and the authors indicated that the result confirms other prior researches (Fuller et al. 2011; Molina-García et al. 2015).

Some other works analysed bike usage across different parts of the cities. For example, in Froehlich et al. (2009) Froehlich et. al. investigated behaviours of bike users across different locations and times of the day in Barcelona’s neighbourhood. In Zhao et al. (2015), the authors explored the bike-sharing travel time and trips by gender and day of the week. The results of the study suggested that demand for bike usage is generated in the residential districts, while the biggest hubs are train stations. The paper showed that there is a big difference in the sense of distance and trip duration between men and women. Additionally, the authors reported that women are using bikes during weekdays more, while men use it on average more during weekends. In another paper, researchers examined the behaviour of bike usage in the city of Lyon, France, by focusing on how social behaviour help in planning and designing policies in transportation (Borgnat et al. 2011).

In this work, we analysed how different indicators such as i) air pollution, ii) weather conditions, iii) temperature affect bike usage. In addition, we also analysed the spatial and temporal dynamics of bicycle usage in order to provide a holistic picture of cycling mobility.

2.2 Predictions using bike data

The second line of our research has been conducted about predicting the usage of bikes by exploring various machine learning algorithms. In Xu et al. (2013), the authors analysed operational data from bike-sharing systems to understand activity patterns and for predicting bicycle traffic flow. In particular, the authors used a hybrid model that combines clustering with a support vector machine to improve accuracy improving the error rate of the best model by 31%. A hierarchical prediction model consisting of a bipartite clustering and a Gradient Boosting Regression Tree was employed in Li et al. (2015) to predict the number of bikes that are rented from each station to meet the demand. The proposed model was able to reduce the prediction error for anomalous hours. However, clustering models are particularly suitable for data from dock-based bike-sharing systems, where the path is not always available and origins and destinations are fixed locations. For instance, in Zhang et al. (2016) the authors tried to predict the destination and arrival time of each bicycle trip, which can effectively help the companies to move bikes on time for the under-supplied station. In particular, employing the Multiple Additive Regression Trees based on the stochastic gradient boosting approach, the authors achieved an accuracy of 87% in predicting the destination and a Mean Absolute Error of 7.39 and a \(R^2\) value of 0.059 in predicting the duration of the trip.

In some recent works, a deep learning approach has been used in order to forecast bike demand. Some of them experimented with large-scale datasets and tried to predict demand for different time intervals using only historical bike usage data (Ai et al. 2019; Xu et al. 2018). The obtained results suggested that the Long Short-Term Memory (LSTM) model has reasonable good prediction accuracy for different time intervals, with an average Mean Absolute Percentage Error of 32.98. In particular, the best results were obtained for 30-min interval prediction. Similar performances were also confirmed in Zhang et al. (2018), where the authors used the LSTM model to predict the number of trips by considering public transport usage. For predicting the facility choice of cyclists between on-street and off-street facilities, a logistic regression model was developed by authors of Duc-Nghiem et al. (2018). The proposed model has shown goodness of fitting with a Hosmer-Lemeshow value of 12.23. In Singhvi et al. (2015), along with the usage of weather data, taxi usage was also explored for predicting bike trips for the city of New York obtaining a \(R^2\) value of 0.746 through a simple linear regression model. In this work, we tried to predict the number of trips by not only considering the historical bike usage data but also air temperature, precipitation, holidays, and events.

A number of studies have also investigated dockless bike-sharing systems (Luo et al. 2019; McKenzie 2018; Shaheen & Cohen 2019) studying the different aspects that influence its life cycle and characteristic spatial and temporal aspects as the basis for the future development of predictive models. In particular, the main emphasis is on analysing spatial patterns of such systems, as without stations, bicycles can be left anywhere in the city, which also raises the question of redistribution of bikes in particular during the weekdays (Mooney et al. 2019; Gu et al. 2019; Shen et al. 2018). These papers have focused mainly on the problem of redistribution of bikes in different locations.

3 Datsets

This section of the paper provides information about the datasets we used for the analysis. In particular, we describe the Bella Mossa 2017 dataset about self-reported bike usage and the other supplementary data sources (weather, pollution, events, etc.) that were used for our analysis.

3.1 Bella Mossa 2017 dataset

The main dataset used in the paper contains mobility data of different transportation means for the period of six months, that is, from April 1, 2017 to September 30, 2017. Bella Mossa was an initiative promoting a healthy lifestyle and sustainable mobility among users residing in Bologna, Italy. In 2017, the city had a population of 388,367 residents of whom 243,864 were aged between 15 and 64, while the student population consisted of 85,244 individuals of whom about half were off-site students. The Bella Mossa initiative gave the user a chance to win various gifts and discounts as a reward for using a more sustainable and healthy means of transportation. For participating, a user just needed to download the mobile application and started it whenever they go out for a walk, use bikes, trains, buses, or even when using carpooling. The running application tracked and stored data related to the users’ position using the GPS of the users’ smartphones with an accuracy of about 5 metresFootnote 3. During the six months of the program, there were over 15,000 unique users who registered for this program, and 3.7 million Km was covered by them during 895,000 journeys. For security reasons, the dataset does not include users’ personal information, and thus, we could not distinguish and analyse different patterns considering users’ gender or age. Moreover, the data is anonymised, and it is worth noting that each trip in the dataset is identified by a unique identification number (ID), however, this ID does not allow users to be identified on different days.

In the paper, we focused on bike usage, that during the six months has generated 72,398,780 data points in 320,118 unique trips. Each record of the dataset is characterised by the following attributes: the activity ID (to identify the points of the same trip), the timestamp, the geographic coordinates (i.e., latitude and longitude), the GPS accuracy, and the speed. We grouped the data points according to the activity ID and the timestamp. The missing values were replaced by interpolation of the previous and next values during the trip. We used trips information to analyse factors affecting the number of trips, while we used raw data points to reveal which streets and parts of the cities had been mostly utilised by bike users. In particular, we analysed bike usage patterns during different periods, weather conditions and social events. We also incorporated these features for forecasting the number of trips for the short-term prediction (that is, in the next 10, 30 and 60-minute time intervals) for understanding how busy specific streets would be in terms of bike usage.

3.2 Supplementary data sources

Besides the main dataset, we used several supplementary datasets for analysing bike usage and predicting the number of bike trips.

  • 1) Weather: We downloaded historical data about temperature, precipitation, and wind for the observation period for the city of Bologna from the website of the Regional Agency for Prevention, Environment, and EnergyFootnote 4 of Emilia-Romagna, Italy. The data comes from a single weather station called “Bologna Urbana”, located near the city centre (latitude 44.501219 - longitude 11.329968), which covers the whole city. Samples are collected every 60 minutes and contain information about the hourly average air temperature above 2 meters from the ground, the hourly average wind speed above 10 meters from the ground, and the cumulative precipitation data over 1 hour period.

  • 2) Pollution: We used the information about pollution from the website of the Ministry of Economic DevelopmentFootnote 5 in Italy. Based on the 2005 World Health Organisation (WHO) Air Quality GuidelinesFootnote 6, we analysed three main air pollution indicators, that is i) Particulate matter (PM), ii) Ozone (O3), and iii) Nitrogen dioxide (NO2). Hourly samples come from three stations called “Porta San Felice” (latitude 44.498984 - longitude 11.327387), “Via Chiarini” (latitude 44.499151 - longitude 11.285007) and “Giardini Margherita” (latitude 44.482668 - longitude 11.354143) which covers the whole city.

  • 3) Holidays and events: We used publicly available information about the holidays in Italy and public events in the city of BolognaFootnote 7. The dataset contains information about public holidays, national celebration days, and civil solemnities in the city of Bologna. Also, we gathered information about strikes and protests for the observation period for the city of BolognaFootnote 8.

3.3 Software

The descriptive analysis was conducted using the Python language (version 3.8.13) and the libraries Pandas (version 1.4.2), NumPy (version 1.22.3), SciPy (version 1.7.3, module “optimize” for curve fitting), TensorFlow (version 2.0) and Scikit-learn (version 1.0.2, module “metrics” for evaluation). The figures were plotted using Tableau (version 2020.1), QGIS (version 3.2) and the Python packages Matplotlib (version 3.5.1) and Seaborn (version 0.11.2)Footnote 9. We used the Spyder IDE version 5.1.5 on a macOS (version 12.5) system with an M1 chip and 16GB of RAM.

4 Descriptive analysis

In this section, we present the results of the descriptive analysis of the Bella Mossa bike data. Firstly, we present the results of the temporal analysis looking into daily, monthly and seasonal trends of bike usage (Section 4.1). Then, we show the spatial analysis results to highlight which parts of the city attract most of the bike trips (Section 4.2). Next, we present the results by combining the cycling mobility data with the supplementary data sources. In particular, we discuss how weather conditions, such as a change in air temperature and precipitation, correlate with bike usage (Section 4.3). Also, we investigated the changes in three different indicators of air pollution with bike usage (Section 4.4). Finally, we analysed public holidays and events, such as strikes and protest, in relation to bike usage (Section 4.5).

4.1 Temporal analysis

Firstly, for the whole observation period, we computed the i) statistical distribution of distances, ii) statistical distribution of travel times, and iii) average speeds of the trips, and we derived the distribution laws that theoretically describe the experimental data. We also analysed both daily and weekly usage of bikes, which helped us in understanding long-term trends, such as monthly and seasonal trends. In particular, the seasonality study allowed us to analyse the change in mobility of bicycle users during spring, summer, and the start of autumn.

The blue histograms in Fig. 1a and b provide the distribution of the covered distances and travel times, showing a peak of around 1600 meters and 12 minutes, respectively. This highlights a characteristic of bike usage in Bologna that 40% of the trips cover less than 1.6 kilometres distance for 12-13 minutes of travel time. In addition, both distances and travel times can be described using the same curved power law function:

$$\begin{aligned} y = x^{\alpha + \beta x} \end{aligned}$$
(1)

where x represents the distance or travel time, respectively, y is the counting of trips, while the estimated parameters \(\alpha\) and \(\beta\) are equal to (\(1.41\,\pm \,7.04\times 10^{-3}\)) and (\(-8.40\times 10^{-5}\,\pm \,2.36\times 10^{-6}\)) for fitting the distance covered, and (\(1.65\,\pm \,3.70\times 10^{-3}\)) and (\(-2.99\times 10^{-4}\,\pm \,3.50\times 10^{-6}\)) for fitting the travel time, with an R\(^2\) value of 0.94 and 0.98, respectively. Whereas Fig. 1c shows the histogram of the average travel speeds with a peak of around 3.2-3.3 meters per second, that is, compatible with the speed usually reported in experimental literature (Dozza & Werneke 2014; Menghini et al. 2010). The travel speed has been fitted using two different distributions according to the identified speed peak. In particular, the travel speed has been fitted using two different distributions according to the identified speed peak. In particular, up to the speed of 3.2 meters per second, the travel speed follows a linear distribution (Eq. 2), with \(\alpha\) and \(\beta\) estimated values equal to (\(4.32\times 10^{3}\,\pm \,83.04\)) and (\(-2.95\times 10^{2}\,\pm \,1.59\times 10^{2}\)), respectively, with an R\(^2\) value of 0.99.

$$\begin{aligned} y = \alpha x + \beta \end{aligned}$$
(2)

Whereas, for higher travel speeds the distribution is best approximated by a Gaussian function (Eq. 3, with \(\alpha\), \(\beta\) and \(\gamma\) estimated values equal to (\(1.48\times 10^{4} \,\pm \, 2.71\times 10^{2}\)), (\(2.72\,\pm \,5.25\times 10^{-2}\)) and (\(1.20\,\pm \,2.54\times 10^{-2}\)), respectively, with an R\(^2\) value of 1.00.

$$\begin{aligned} y = \alpha e^{-\frac{(x-\beta )^{2}}{2\gamma ^{2}}} \end{aligned}$$
(3)

In Eqs. 2 and 3, x and y represent the speed of travel and the counting of trips, respectively. The double trend of the travel speed highlights a characteristic of bike usage. In particular, the decay after peak speed demonstrates that in the urban context, the bike speed is limited by the environment and other road users.

Fig. 1
figure 1

The distributions and the theoretical laws that describe the covered distances (a), the travel times (b) and the average travel speeds (c)

Figure 2 shows the trips aggregated by weekdays. The 84% of total trips done during working days suggests that on weekdays bikes are heavily used, possibly either for home-to-work or home-to-study. This assumption is also confirmed in Fig. 3 as we can see that most of the trips during the working days were happening from 07:00 to 09:00 and from 17:00 to 19:00, that is, mostly connected with the time people move for either work or study and returning in the evening.

Fig. 2
figure 2

Number of trips aggregated by days of the week

Fig. 3
figure 3

Weekends vs weekdays: the start and end times of work represent two peaks during the week

Figure 4 shows the number of trips spread over the six months. We can observe that the number of trips reached its peak during May (25%) and then it gradually dropped down, reaching its lowest level during August. In particular, there was a 40% drop in trips from July to August. In August, only 30% of May’s trips were done. This behaviour can be explained as during summer more and more people start leaving the city for vacations, and the off-site students start going back to their hometowns. However, in September the number of trips started increasing again, with an increase of 28% compared to the number of trips in August, possibly due to the return of students and people from holidays.

Fig. 4
figure 4

Seasonality of cycling mobility

4.2 Spatial analysis

In this section, we discuss the paths used by bike users and the possible attractive hubs of the city. Bologna’s historical city centre covers an area of 4.5Km\(^2\), which is the red polygon shown in Fig. 5a. The city has several characteristics that can help to understand urban mobility. In particular, the train station is located north of the city (denoted using “T” letter in light blue colour in Fig. 5a) just outside the historic centre (denoted using “C” letter in purple colour in Fig. 5a), and it is one of the main nodes in the national railway network. Most of the departments of the University of Bologna are distributed within the historic centre, however, the engineering department is located in the southwest part of the city (denoted using “U” letter in green colour in Fig. 5a). There are two big hospitals in the east and west of the city (denoted using “H” letter in red colour in Fig. 5a). During the day, there is a large number of people who move to and from the city for work and study purposes.

Fig. 5
figure 5

Bologna map (a), bike road network in May (b) and bike road network in August (c)

4.2.1 Bike road network

In this section, we identify the road networks often used by bike users (based on the dataset). Figure 5b and c show which streets were the most used by bike users through the density-based gradient palette for the months of May 2017 and August 2017. For each month, we normalised the data before plotting the graphs. The yellow roads were the most used, while the red ones were used by a smaller number of users. We can clearly observe the difference in terms of bike usage between the two months. We can observe that the density of the streets is similar, and the high-used paths correspond to the main arteries of the centre’s road network. The road network in Bologna has a radial structure and, since the bicycle is used for medium-long trips, it is quite common to cross the city passing through the centre.

4.2.2 Main hubs

The analysis of the trips allowed us to identify three main hubs of the city, which have a high number of either start or endpoints. The first is the Piazza Maggiore (the main square) and the neighbourhood streets, which are one of the main centres of city life. The other two are the train station and the engineering department of the University of Bologna. To understand how bike trips spread out from the three aforementioned hubs, we compared if there is a significant change in bike usage between the month of May (chosen as one of the representatives of the non-holiday period) and the month of August (holiday period) for each hub. In Figs. 6, 7 and 8, the locations were aggregated by the number of trips started from the main hub (the green circles in each figure) and ended in the final destination (the ranked red dots in the figures). The circle size indicates the number of trips ending at that destination. The significant decrease in the number of rides between May and August does not allow a direct comparison of the size of the markers between the images, thus, the size of the circles is relative to the trips of the month represented. In addition, we also ranked these locations by inserting numbers in the circles.

  1. 1.

    The spreading pattern from Piazza Maggiore (in the city centre) changed considerably between May and August. May is considered a “working month” and we can see that the main destinations were outside the city centre (Fig. 6a). This indicates that most probably people were going back to home after spending time in the city centre (possibly from the workplaces and offices in the city centre). Indeed, most of the trips ended in residential areas of the city. In contrast, during August, which is a “vacation month” most of the trips ended inside the historical city centre (Fig. 6b).

  2. 2.

    Similar to the findings of Zhao et al. (2015), we find out many people commute for work or studies and use bikes to arrive at the train station or to ride from the train station to their work or study place. In May, the four main destinations were various departments of the university (Fig. 7b), while in August (Fig. 7b), the final destinations changed and most of the trips ended in a large shopping mall just outside the city centre.

  3. 3.

    Considering the engineering department, it is interesting to note how the pattern changed from May to August. During the month of May, when the lessons are still in progress, a significant number of trips ended in places in the city where students spend their non-university time, such as student residences and pubs (Fig. 8a). Whereas, for the month of August, the two most popular destinations turned out to be near parks (Fig. 8b). In addition, the number of trips also became very less in the month of August due to the holiday period.

Fig. 6
figure 6

Spreading out pattern from the Piazza Maggiore hub, difference between May (a) and August (b)

Fig. 7
figure 7

Spreading out pattern from the train station hub, difference between May (a) and August (b)

Fig. 8
figure 8

Spreading out pattern from the engineering department hub, difference between May (a) and August (b)

4.3 Weather’s effects on bike usage

In this section, we discuss the results about how weather conditions, that is air temperature, precipitations and wind speed, affect the number of bike trips during the six months period.

4.3.1 Air temperature

The spring in Bologna is characterised by comfortable temperatures (18 degrees Celsius as the average temperature during March, April and May with about 70% humidity on average), while in the summer the weather is hot and sultry (30 degrees Celsius as the average temperature during June, July and August with about 66% humidity on average). Comfortable weather conditions usually play a significant role in bike usage, and when the air temperature is too high or too low, people prefer a different means of transport (Nosal & Miranda-Moreno 2014). Figure 9 shows the number of bike trips and the trend of the temperature during the six months period. Temperatures in April and May, which are around 20 degrees Celsius, are comfortable for making bike trips, while from mid of June to the end of August, when the average air temperature reaches 27 degrees Celsius or above, the number of trips starts to drop. In summary, we can confirm that bike trips are strongly dependent on air temperature, which confirms the findings of many previous studies (Nosal & Miranda-Moreno 2014; Nankervis 1999).

Fig. 9
figure 9

Average daily air temperatures in Celsius and the number of bike trips

4.3.2 Precipitations

During the six months of the observation period, most of the days were dry with no rain or snow. There were only some days from May 08 to May 14 and from September 04 to September 10 with some amount of precipitation, while during other months, the number of rainy days was significantly low. Thus, we decided to look into the weeks with the highest number of precipitations and compare them with the following week. In particular, we compared the week from May 08 to May 14 against the week from May 15 to May 21 (Fig. 10a), and the week from September 04 to September 10 against the week September 11 to September 17 (Fig. 10b).

Fig. 10
figure 10

Precipitation amount and number of trips during different weeks in May (a) and September (b)

From May 08 to May 14, there were two rainy days, that is Tuesday and Friday, and they heavily affected the number of bike trips (Fig. 10a). In particular, on Tuesday the number of trips was less than half compared the next week’s Tuesday. However, there was a small difference in the number of trips on Fridays. It is to be noted that most of the trips were made in the early morning hours, as shown in Fig. 3. Looking into hourly data, we detected that on May 09, there were heavy rains from 2 am to 9 am that strongly affected the number of trips, while on May 12, it was raining only for one hour around 3 pm, and this did not significantly affect the number of trips.

The same goes for the comparison between the two weeks of September (Fig. 10b). In particular, from September 04 to September 10, there were two rainy days, that is Saturday and Sunday. On Saturday, the number of trips remained unchanged compared to the Saturday of the following week since it was raining from 8 pm to 10 pm, while on Sunday due to rain, which occurred from 2 am to 11 am, it led to a half number of trips.

4.3.3 Wind speed

Finally, we analysed how the speed of wind affects bike usage. Figure 11 shows the number of trips and the average wind speed during the six months period. The results show that there is no correlation between wind speed and the number of bike trips during the observation period. The absence of correlation is to be found in the lack of strong winds in the city of Bologna. In particular, the wind speed feels comfortable during the six months, and Table 1 summarises the statistics about wind speed during the study period.

Fig. 11
figure 11

Average wind speed and the number of bike trips

Table 1 Wind speed statistics in m/s during the six months period

4.4 Pollution and bike usage

Being one of the most ecological means of transportation, there is a notion that the usage of bikes might decrease the pollution of air (Hertel et al. 2008; Johansson et al. 2017). We analysed the changes in three different indicators of air pollution during the period of observation, that is i) Particulate matter (PM), ii) Ozone (O3), and iii) Nitrogen dioxide (NO2). Table 2 summarises the statistics about the three pollution indicators during the study period. However, no positive or negative correlation could be identified. We also analysed if there were any changes in the indicators for separate streets that were being heavily used by bicycles. Similarly, we could not find any correlation between air pollution and bike use during the period of observation. Possible reasons for this result could be due to having a low amount of observations about air pollution and the lack of car usage data.

Table 2 Pollution statistics in micrograms per cubic meter air during the six months period

4.5 Holidays and events

In this last section, we discuss how holidays, events, such as protests and strikes, and national celebrations affect bike usage. During the period of observation, there were 14 public holidays. Figure 12 shows the bike trips aggregated by day of the week for April (Fig. 12a) and September (Fig. 12b), where the changes in the bike usage were more evident. In particular, the results show a significant difference during the Easter holidays (April 16-17, 2017) and during FerragostoFootnote 10 (i.e., August 15, 2017). Figure 12a shows that during the Easter holidays, the number of trips dropped by 35% on Good Friday, over by 50% on Holy Saturday and Easter Sunday, that is the orange bars in the figure, and almost by 80% on Easter Monday, that is the green bar in the figure. Figure 12b shows that during Ferragosto in August 2017, that is the green bar in the Figure, the number of trips dropped by 60% in comparison to the number of trips done one week before and after.

Fig. 12
figure 12

Number of bike trips aggregated by day of the week for each week, for April (a) and August (b). Holidays are marked with a blue star

Next, we analysed how events such as protests, strikes, and large gatherings in the city affect bike usage. The reason behind analysing the impact of these types of events is the following: during protests or strikes, the number of main roads is being diverted, blocked, or closed, which may lead to an increased number of bike trips that represents a more agile means of transportation. On the other hand, a strike may also decrease the usage of bikes as people may decide not to go to work. Analysing the protests in Bologna during the six months period, we did not notice any significant change in the usage of the bike. It is worth mentioning that most of the protests and strikes were not large and did not last for more than a few hours. Also, although no information was found on the closure of the roads, the number of bike trips did not significantly change during these events.

5 Predictive analysis

In this section, we present the results of the predictive analysis with the aim to predict bike trips for the next 10, 30 and 60-minute.

5.1 Predictive algorithms and metrics

We employed various methods, that is Linear Regression, Random Forest, Extreme Gradient Boosting (XGBoost), and LSTM for the predictive task. The rationale behind the choice of these four models lies in the following reasons. Firstly, it has been shown in the literature that LSTM is particularly good in regression tasks related to mobility (Ai et al. 2019). Linear regression is also a commonly used model that offers an additional advantage which is the immediate interpretability of its coefficients (Zhang et al. 2018). Whereas tree-based models (i.e., Random Forest) and XGBoost have been successfully exploited in other studies in the literature related to urban mobility with both dockless and dock-based bike-sharing systems (Li et al. 2015; Xu et al. 2018).

The prediction models are evaluated using different metrics that allow understanding the performance of the predictive algorithms. In particular, given the regression nature of the prediction, we used the following metrics: Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R\(^2\) metrics.

5.2 Feature selection

We manually selected the features for the prediction. In particular, we selected the following features: temperature, precipitation, hour_of_the_day, month, season (spring, summer and autumn), day_of_week (Monday, Tuesday, ..., Sunday), holiday (whether a day is a holiday or not), hour_history, and week_history. The last two variables represent the number of trips for one hour and one week before, respectively. Based on the results shown in Sections 4.3.3 and 4.4, we excluded wind speed and air pollution in our final evaluation.

5.3 Experimental setup

In order to predict bike usage in the next 10, 30 and 60-minute, we have prepared three datasets aggregating the number of trips by 10-minute, 30-minute and 60-minute time slots, respectively. In all datasets, we converted categorical variables into dummy variables. We computed the correlation among all the variables with the number of trips, and we found out that the number of trips is mostly correlated with hour_history, week_history, temperature, month, and season (i.e., spring, summer, and autumn). In addition, we used four different splits, that is 90/10, 80/20, 70/30, and 60/40 for the training/test ratio. For each split, we applied the 10-fold Cross-Validation technique to the training set and evaluate it on the test data.

5.4 Prediction results

Among all the prediction models, LSTM gave the best results for the next 60-minute that is the most challenging prediction since the travel times show a peak of around 720 seconds (12 minutes), as shown in Fig. 1b. Table 3 provides the results for the next 60-minute interval prediction for the 90/10 split. It is worth noticing that the LSTM model improves the four metrics on average by 70.91% compared to the second-best model, which is the Random Forest. Thus, considering the best performance of the LSTM model, we provide results with various data split ratios in Table 4, for the 60-minute, 30-minute and 10-minute interval prediction. The best results were achieved with a 90/10 split ratio. It is worth noticing that the results improve significantly by reducing the prediction interval. In particular, predicting a 30-minute interval improves the results on average 5 times compared to a 60-minute prediction, and a 10-minute interval prediction improves the results on average 3.5 times compared to a 30-minute. However, considering the distribution of travel times (Fig. 1b) and the need for decision-makers to have a time prediction of a wider scope, we think that despite the slightly higher error rate, the 30-minute prediction is the most interesting and useful. This consideration is also supported by previous literature (Xu et al. 2018). The importance of hour_history feature can be noted in that it improved the model on an average by 30%, considering MSE, MAE and RMSE results. Whereas the week_history feature improved the results by around 45%. As a micro analysis, Fig. 13 shows the LSTM prediction results over the 18 days of the test set (September 13, 2017 - September 30, 2017). Each time step on the graph is 30 minutes and it is worth noticing that the prediction trend (i.e., the red line in the graph) almost completely overlaps the observation trend (i.e., the blue line in the graph).

Table 3 Models comparison in 60-minute interval prediction using 90/10 split ratio
Table 4 Prediction results varying the split ratio. The values represent the LSTM model results predicting 60-minute, 30-minute, and 10-minute intervals, respectively
Fig. 13
figure 13

LSTM model 30-minute prediction over the test set of 18-day (September 13, 2017 - September 30, 2017)

In addition we also present the SHAP summary plot (Lundberg & Lee 2017) for the LSTM model (see Fig. 14). SHAP plot is based on optimal Shapley values, a concept based in game theory and is a combination of feature importance with feature effects. On Y-axis, the features are ranked from most important to least. Each point represents a data point, and the x-axis points to the Shapley value of the data point for a particular feature. The colour of each data point signifies whether the value of the data point is high (shown using red colour) or low (shown using blue colour). As shown in the Fig. 14, speed is the most important feature for the prediction, followed by hour and so on. Speed, weather and precipitation features are impacting negatively, whereas hour and temperature positively.

Fig. 14
figure 14

SHAP results for the LSTM model: features’ impact on the model output

6 Conclusions

The significantly growing awareness about climate change and pollution has given rise to the need for eco-friendly and healthy means of transportation. Cycling allows for lightening road traffic within touristic and historical cities where the traffic congestion is exacerbated. Thus, understanding cycling mobility is of the utmost importance to improve bike infrastructures and encourage bike use.

In this paper, compared to previous works, which have either focused on descriptive analysis or predictive, we presented a two-way approach for understanding the bike usage for the city of Bologna, that is both descriptive and predictive analysis. In particular for descriptive analysis, we analysed temporal and spatial patterns of bike users and the impact of weather conditions, such as air temperature, precipitation, wind speed, pollution conditions, and holidays on bike usage. For the predictive part, we compared different models to predict the trips for the short-term period, that is, 10-minute, 30-minute and 60-minute time intervals. The results show a seasonality of cycling trips and more in detail a weekly trend in favour of working days, which is congruent with commuting behaviour from/to work and study place. Also, the spatial analysis confirms this result, in particular, we found several attractive points that coincide with places of study, which were less frequently visited during the summer period. The several supplementary datasets used in the descriptive analysis allowed to confirm a negative correlation between bike trips and precipitation and highlight that temperature around 26/27 degrees Celsius and over leads to a decrease in the number of bike trips. However, we did not find evidence of the correlation between air pollution and bike usage. In the predictive analysis, we found that LSTM provides the best results for predicting shorter time intervals, which could be of practical help in traffic infrastructure management (e.g., traffic lights, temporary traffic detours) on road network links. In particular, the 30-minute interval prediction seems to be a good trade-off between accuracy and time span compared to 10 and 60-minute time interval prediction. As LSTM is a black-box model, we also employed SHAP tool for the explainability of the model, which is often missing in previous works. The model can be used to predict the demand gap for strengthening the bike-sharing services, which usually require a redistributing service to make sure that people who would like to use bikes will most certainly find one near their location.

In future works, several directions can be considered. Firstly, we plan to analyse the other transport data within the Bella Mossa dataset to understand interactions with bike usage and different behaviour in the use of the city’s road network. We also plan to study a larger dataset with a longer timeline to improve the analysis of seasonal factors which may also improve the prediction results. Finally, we plan to compare the patterns of bike users from different cities.