Cycling to work in 90 large American cities: new evidence on the role of bike paths and lanes
- First Online:
- Cite this article as:
- Buehler, R. & Pucher, J. Transportation (2012) 39: 409. doi:10.1007/s11116-011-9355-8
- 3.2k Views
This article analyzes the variation in bike commuting in large American cities, with a focus on assessing the influence of bike paths and lanes, which have been the main approach to increasing cycling in the USA. To examine the role of cycling facilities, we used a newly assembled dataset on the length of bike lanes and paths in 2008 collected directly from 90 of the 100 largest U.S. cities. Pearson’s correlation, bivariate quartile analysis, and two different types of regressions were used to measure the relationship between cycling levels and bikeways, as well as other explanatory and control variables. Ordinary Least Squares and Binary Logit Proportions regressions confirm that cities with a greater supply of bike paths and lanes have significantly higher bike commute rates—even when controlling for land use, climate, socioeconomic factors, gasoline prices, public transport supply, and cycling safety. Standard tests indicate that the models are a good fit, with R2 ranging between 0.60 and 0.65. Computed coefficients have the expected signs for all variables in the various regression models, but not all are statistically significant. Estimated elasticities indicate that both off-street paths and on-street lanes have a similar positive association with bike commute rates in U.S. cities. Our results are consistent with previous research on the importance of separate cycling facilities and provide additional information about the potentially different role of paths vs. lanes. Our analysis also revealed that cities with safer cycling, lower auto ownership, more students, less sprawl, and higher gasoline prices had more cycling to work. By comparison, annual precipitation, the number of cold and hot days, and public transport supply were not statistically significant predictors of bike commuting in large cities.
KeywordsBicyclingUrban transportInfrastructureBike lanesBike pathsSustainability
The mounting body of evidence on the health benefits of cycling has led government agencies, public health organizations, and medical journals to advocate more cycling as a way to improve individual health as well as reduce air pollution, carbon emissions, noise, traffic dangers, and other harmful impacts of car use (British Medical Association 1992; Cavill et al. 2006; CEMT 2004; Dora and Phillips 2000; IOTF 2010; NACTO 2010; USDHHS 1996, 2008; USDOT 1994, 2004, 2010d). Cities around the world have been implementing a wide range of infrastructure, programs, and policies to encourage more cycling (Fietsberaad 2010; Heinen et al. 2010; Krizek et al. 2009; Pucher et al. 2010). Most American cities have focused on providing separate bicycling facilities such as off-street bike paths and on-street bike lanes (Alliance for Biking and Walking 2010; NACTO 2010; Pucher et al. 1999; USDOT 2010d). Past research suggests that separate cycling facilities are associated with higher cycling levels. There is contradictory evidence, however, on the impacts of different kinds of facilities. Some studies find that bike paths are associated with higher cycling levels, but that lanes are not. Other studies find that lanes are related to more cycling, but paths are not. Most prior research that distinguishes between paths and lanes focuses on only one city per study. Most comparative analysis of different cities is hampered by small sample size—usually fewer than 45 cities.
This article examines the link between cycling facilities and cycling levels by analyzing new data on bike lanes and paths in 90 of the 100 largest U.S. cities. The League of American Bicyclists and the Alliance for Biking and Walking collected the data for the authors directly from planners, transportation experts, and government officials in each city for the year 2008. The only comparable measure of bike lane supply available for all 90 cities was ‘centerline miles’ of roads with bike lanes. Data collected for bike paths combined off-road facilities exclusively for cycling as well as multi-use paths shared by cyclists, pedestrians, joggers, in-line skaters, and other non-motorized users. Our multiple regression analysis focuses on measuring the relationship of bike paths and lanes to cycling levels while controlling for cycling safety, socioeconomic factors, land-use, gasoline price, public transport supply, and climate.
Determinants of cycling: the role of off-street paths and on-street lanes
Several studies have estimated the relationship of bike paths and lanes to cycling levels. Results from aggregate cross-sectional studies indicate that there is a positive correlation between cycling levels and the supply of bike paths and lanes (Dill and Carr 2003; LeClerc 2002; Nelson and Allen 1997; Parkin et al. 2008). Based on a sample of 18 small and large U.S. cities, Nelson and Allen (1997) find that one additional mile of combined bike paths and lanes per 100,000 residents is associated with a 0.069% increase in commuters cycling to work. Based on a sample of 42 large U.S. cities, Dill and Carr (2003) find that each additional linear mile of bike lanes per square mile of city area is associated with an increase of roughly one percentage point in the share of bike commuters, even after controlling for days of rain, automobile ownership, and state spending on walking and cycling.
Analyzing data from the 1990 and 2000 U.S. Census, Barnes et al. (2006) find that increases in bike commute levels in Minneapolis and St. Paul were concentrated around newly constructed bike paths and lanes. Cleaveland and Douma (2009) apply the same methods in their case study analysis of six cities and report that the relationship of bike facilities and cycling levels is mediated by local circumstances, such as network connectivity, bike promotion programs, and location of bike facilities along commuting routes leading to downtown.
Disaggregate, individual-level studies report a preference for separate paths and lanes over cycling in traffic (Abraham et al. 2002; Akar and Clifon 2009; Broach et al. 2011; Dill 2009; Dill and Gliebe 2008; Howard and Burns 2001; Hunt and Abraham 2007; Krizek et al. 2007; Lusk et al. 2011; Menghini et al. 2010; Shafizadeh and Niemeier 1997). In a study of Calgary, Canada, Abraham et al. (2002) find that cycling along roads is perceived to be two to four times as onerous as cycling on a bike path in a park. Dill and Gliebe (2008) report that women and inexperienced cyclists in Portland, OR prefer riding on bicycle paths, lanes, and low traffic volume roads over cycling on busy streets.
Findings on the relative importance of paths compared to lanes are contradictory. Vernez-Moudon et al. (2005) report that household proximity to bike paths in Seattle, WA increases the likelihood to cycle by 20%, but they find no effect for bike lanes. Using a wide range of datasets and methods, Cervero et al. (2009), de Geus et al. (2008), and Dill and Voros (2007) report no positive correlation between bike lanes and cycling levels. By comparison, a Minneapolis, MN study by Krizek and Johnson (2006) finds an increased likelihood of cycling for individuals living within 400 m of a bike lane, but no significant impact of bike paths.
Controlling for other determinants of cycling, before-and-after studies show increased levels of cycling after the installation of bike lanes, but report mixed results for bike paths (City of Toronto 2001; City of Vancouver 1999; Cohen et al. 2008; Evenson et al. 2005). A revealed preference survey by Dill (2009) finds that cyclists in Portland are willing to increase trip distance and travel time to ride on bike paths compared to shorter, more direct routes that require cycling on roads with motor vehicle traffic. Furthermore, a revealed preference study by Aultman-Hall et al. (1998) finds that bike paths in Guelph, Ontario are more likely to be used by recreational cyclists than by commuters.
In short, many studies conclude that there is a significant relationship between cycling facilities and cycling levels, but the analyses cannot determine the direction of causation. Moreover, regression analysis of cycling levels is almost always cross-sectional, thus limiting inferences about changes over time. Measurements of cycling volumes before and after the installation of specific facilities provide the simplest kind of time-series evidence, but they almost never control for the range of other factors affecting cycling levels. Most individual-level studies focus on one or a few cities. Such disaggregate, individual level studies can help mitigate some of the problems of aggregate data analysis, but transferring the results to other cities may be difficult because of policy, land use, and cultural differences between cities. Moreover, single-city studies cannot control for the influence of factors such as climate and gasoline price, which do not vary much within any particular city. Aggregate studies usually have a much larger geographic range than disaggregate studies, but they rely on few observations, such as Nelson and Allen (1997) and Dill and Carr (2003), with samples of 18 and 42 cities, respectively. Thus, all studies of the impacts of cycling facilities have their limitations. Our own study is no exception, but it enables analysis of an extensive new dataset of 90 U.S. cities that permits differentiation between bike paths and bike lanes while controlling for a range of other variables.
Data sources and variables
Our regression analysis investigates the relationship between bike lanes and paths and cycling levels in 90 of the 100 largest U.S. cities as determined by population estimates of the 2008 American Community Survey (ACS) (USDOC 2009a). The ACS reports city data following jurisdictional and governmental boundaries (USDOC 2010). City governments provided information on the supply of bike paths and lanes within their official city boundaries. Unless indicated otherwise, data for the variables used in our analysis pertain to the area within the city government jurisdiction. Data for some variables, such as public transport service supply, are only available for the metropolitan statistical area (MSA), including the principal city, suburban areas, and smaller secondary cities. We explicitly indicate in our analysis when we used regional instead of local data. The dependent variable—cycling level—is measured at the city level in two different ways: (1) percentage of commuters by bicycle—bike mode share—which controls for the number of workers in each city; and (2) the number of bike commuters per 10,000 population, which controls for population size.
Data on cycling levels and bikeway facilities
Data on the share of workers regularly commuting by bicycle were derived from the American Community Survey (ACS) 2006–2008 three-year average sample. The specific question posed to survey respondents was: “How did you usually get to work last week?” Respondents were asked to indicate only the main mode if they used more than one. Pooling data from the ACS surveys for 2006, 2007, and 2008 increases sample size and improves the reliability of estimates. Ideally, we would have measured cycling rates for all trip purposes, but the ACS data only report information on commuting to work, and the ACS is the only source of comparable travel data for all cities. The 2001 and 2009 National Household Travel Surveys (NHTS) provide data for all trip purposes, but their sample sizes are less than 3% as large as the ACS surveys and do not permit statistically reliable estimates for individual cities.
Top ten of 90 of the 100 largest U.S. cities by daily bike commuting levels, 2006–2008
% of commuters by bike
Bike commuters per 10,000 population
Bike commuters in 1,000
New York City, NY
Los Angeles, CA
San Francisco, CA
San Francisco, CA
San Francisco, CA
San Diego, CA
The League of American Bicyclists and the Alliance for Biking and Walking collected data for the authors on the supply of bike lanes and paths by directly contacting bike planners, transportation officials, and bicycling experts in the 100 largest cities. Data for 10 of the 100 cities were not available even after multiple attempts to obtain the information. In spite of the missing cities, the resulting database for 90 cities is the most current and extensive source of information on the extent of bikeway networks in large U.S. cities.
Cities use different methods for recording the extent of their facilities. To correct for that inconsistency and to ensure the comparability of data among cities, the League of American Bicyclists and the Alliance for Biking and Walking used a uniform definition of bike lanes: centerline miles of roads with bike lanes. In order to be included, bike lanes had to be clearly designated with pavement markings and signage. They exclude shared bus and bike lanes as well as ‘sharrowed’ lanes intended for joint use by motor vehicles and bicycles. Calculating centerline miles of bike lanes requires adding the length of all stretches of roadway with a bicycle lane. Centerline miles do not distinguish between streets with bike lanes on only one side, in only one direction, and streets with bike lanes on both sides, serving both directions of travel. Thus, the centerline measure understates bicycle facility supply on roads with bike lanes in both directions relative to roads with bike lanes in only one direction. We had to accept that limitation of the centerline measure, since it is the only comparable statistic all 90 cities could compute.
Bike paths comprised both exclusive off-road facilities for cycling as well as multi-use paths intended for joint use by cyclists, pedestrians, joggers, in-line skaters, and other non-motorized users. In fact, most bike paths in American cities are such multi-use paths, while in Europe, they are often exclusively for cyclists, probably due to the much higher cycling volumes needed to justify completely separate paths only for cyclists (Alliance for Biking and Walking 2010; Fietsberaad 2010; USDOT 2010d).
Descriptive statistics for variables in the analysis
Description & measurement
Bike share of commuters
Percent of workers regularly commuting by bike
American Community Survey 2006–2008 averages (USDOC 2009a)
Bike commuters per capita
Daily total number of workers regularly commuting by bike per 10,000 population
Bike lane supply
Miles of bike lanes in city per 100,000 population
Data collected from each city individually; Population data are ACS 2006–2008 averages (USDOC 2009a)
Bike path supply
Miles of bike and shared-use paths in city per 100,000 population
State level data: three year average of bicyclist fatality rate per 10,000 bike commuters
Percent of total population enrolled in college or university
ACS 2006–2008 averages (USDOC 2009a)
Percent of households without a motorized vehicle
ACS 2006–2008 averages (USDOC 2009a)
Regional index combining 22 variables measuring residential density, mix of land uses, strength of downtowns, and connectivity of street network. (Note: higher scores = less sprawl)
Ewing et al. (2002)
Public transport supply
Regional annual vehicle miles of public transport supply per 1,000 inhabitants
National Transit Database (USDOT 2008)
Average state retail price of gasoline (in cents) (2006–2008)
30 year average of annual number of days above 90°F
National Climatic Data Center (NCDC) (2010)
30 year average of annual number of days below 32°F
National Climatic Data Center (NCDC) (2010)
30 year average of annual inches of precipitation
National Climatic Data Center (NCDC) (2010)
Cycling safety is an important determinant of cycling levels. The causation probably goes in both directions. Several studies confirm that increased cycling safety encourages more people to cycle (Alliance for Biking and Walking 2010; Fietsberaad 2006, 2010; Jacobsen et al. 2009a; Pucher and Buehler 2008; USDOT 2010d). Conversely, the concept of ‘safety in numbers’ proposes that, as more people cycle, it becomes safer because more cyclists are more visible to motorists, and an increasing percentage of motorists are also cyclists, which probably makes them more considerate of cyclists when driving. As cycling grows, it is increasingly viewed as normal, gains legitimacy as a means of travel, and generates more public and political support for more and better cycling facilities. Regardless of which explanation is correct, several studies find significant time-series as well as cross-sectional evidence of ‘safety in numbers’ (Elvik 2009; Jacobsen 2003; Robinson 2005).
In our analysis, we measured safety as cyclist fatalities per 10,000 bike commuters at the state level. The National Highway Safety Administration (NHTSA) reports annual fatalities for states but not for cities. Reliable cyclist fatality data are not available at the city level. Cyclist fatalities are rare events, so cities with little cycling have few fatalities and do not collect such data systematically. Thus, the fatality rates used in our analysis refer to cycling safety in the overall state and not the city itself. In addition to that geographic discrepancy, the fatality rate is only a rough approximation of actual cycling safety. Cyclist fatalities result from all trip purposes and not just the trip to work, but the measure of exposure in the denominator of the fatality rate includes only bike commuters. As mentioned earlier, the only nationally comparable source of travel data for all trip purposes is the NHTS. Because the NHTS sample size is less than 3% as large as the ACS sample, it cannot be disaggregated to the state or city level with statistical reliability to calculate total bike trips for all trip purposes. Thus, the fatality rate we calculated is only a very rough approximation, but it helps capture the sharp differences in cycling safety across states: ranging from less than 2 fatalities per 10,000 bike commuters in Alaska, Colorado, Minnesota, and Oregon to over 20 in Alabama (Alliance for Biking and Walking 2010).
Two socioeconomic variables we included were share of students in the population and percent of households without a car. Previous studies find that individuals in households with more cars are less likely to ride a bicycle, while students are more likely to cycle (Dill and Carr 2003; Heinen et al. 2010; Pucher and Buehler 2006). We did not include per-capita income because of its high correlation with car ownership (Pearson’s r = 0.6). The most important impact of income on cycling levels is via car ownership (Dill and Voros 2007; Heinen et al. 2010; Stinson and Bhat 2003). Moreover, the two most recent national travel surveys for the United States, the 2001 and 2009 NHTS, reveal no statistically significant difference in cycling levels among income groups, but a large and statistically significant difference by car ownership levels (Buehler et al. 2011; Pucher et al. 2011a; USDOT 2010b, c).
Previous studies have shown that cycling levels are higher in dense, mixed-use developments with short trip distances and proximity of households to destinations such as offices, stores, and restaurants (Baltes 1997; Ewing and Cervero 2001, 2010; Guo et al. 2007; Handy 1996; Litman 2007a; Moudon et al. 2005; Parkin et al. 2008; Pucher and Buehler 2006; Zahran et al. 2008). Moreover, studies find that a grid-pattern road network increases levels of cycling because short blocks and frequent intersections provide easier bike access and more flexible bicycle route choice to most destinations (Ewing and Cervero 2010).
In our study, we approximate the influence of the built environment by using the composite sprawl index that was developed by Ewing et al. (2002). The sprawl index combines 22 different variables measuring various aspects of urban form, mix of land uses, density, and street network connectivity. Of the cities included in our study, the metropolitan areas with the worst sprawl ratings (lowest numerical values) were Riverside-San Bernardino, CA (14.2), Greensboro, NC (46.8), Raleigh, NC (54.2), and Atlanta, GA (57.7). The metropolitan areas with the best sprawl ratings (highest numerical values) were: New York City, NY (177.8), San Francisco, CA (146.8), and Honolulu, HI (140.2). Although the sprawl index refers to the metropolitan area as a whole, it is also useful for comparing land-use characteristics of the central cities included in our study. For example, the index specifically considers several measures of downtown strength and overall compactness of the urban area. There is no comprehensive land-use index that provides comparable information for central cities only. Thus, we had to assume that the relative differences in land use among metropolitan areas as a whole reflect the relative differences among their central cities.
Public transport may also influence cycling levels. Some studies show that coordinating cycling with public transport can encouraging more cycling as well as more public transport use (Brons et al. 2009; Givoni and Rietveld 2007; Hegger 2007; Martens 2004, 2007; TRB 2005; USDOT 1998). Other studies, mainly from Europe, suggest that public transport may compete with bicycling for short trip distances in cities with good public transport supply (Fietsberaad 2010; Heinen et al. 2010; Pucher and Buehler 2007; Schwanen 2002). Our study includes a variable measuring public transport vehicle miles per capita from the National Transit Database (NTD) for the year 2008 (USDOT 2008). Data were only available at the metropolitan level, since service areas of public transport agencies almost always extend beyond central city boundaries into the suburbs (USDOT 2008).
Few studies specifically examine the impact of gasoline prices and taxes on cycling levels (Pucher and Buehler 2006; Rashad 2009). However, many studies find that higher gasoline prices lead to less driving (Buehler 2010; DeJong and Gunn 2001; Epsey 1998; Hanly et al. 2002; Litman 2007b). In our study we use average gasoline prices by state for the years 2006–2008, as reported by the Energy Information Administration (EIA) (USDOE 2010a). Comparable data on gasoline prices in each of the 90 cities in our study were not available for the years 2006–2008. The state data are only proxies for the unavailable city data, but at least they capture major differences in state gasoline tax rates, fuel distribution costs, and state standards for fuel composition, all of which help determine the final retail price of gasoline (USDOE 2010a, b). The state rates do not, however, reflect variation within states in gasoline taxes and prices.
Previous research shows that climate and topography can affect cycling levels. Several studies find that cycling is deterred by rain as well as by very cold or hot weather (Baltes 1997; Bergström and Magnusson 2003; Dill and Carr 2003; Gatersleben and Appleton 2007; Heinen et al. 2010; Nankervis 1999; Stinson and Bhat 2003; Winters et al. 2007). Our analysis includes three variables measuring weather and climate: (1) average annual number of days that reach temperatures of over 90°F; (2) average number of days below 32°F; and (3) annual precipitation levels. We used 30 year average data for each city provided by the National Climatic Data Center (2010).
Almost all studies find that flat topography facilitates cycling, and that cyclists choose routes that avoid steep gradients (Hunt and Abraham 2007; Menghini et al. 2010; Rietveld and Daniel 2004; Timperio et al. 2006; Vandenbulcke et al. 2011). Topography uninterrupted by harbors, bays, and rivers also favors cycling by enabling more direct routes (Pucher et al. 2011c). However, standardized indices of topography do not yet exist for the cities in our sample. Thus, we were not able to control for the influence of topography on cycling levels.
Similarly, it was not possible to include variables measuring the extent and quality of the many other policies and programs that might potentially affect cycling levels (Heinen et al. 2010; Krizek et al. 2009; Pucher et al. 2010). These measures include, for example, bike parking, bike racks on buses, bike sharing programs, cycling training courses, media campaigns, and educational events (APBP 2002; Brons et al. 2009; Fietsberaad 2010; Givoni and Rietveld 2007; Hegger 2007; Hunt and Abraham 2007; Martens 2007; Netherlands Ministry of Transport 2009; Noland and Kunreuther 1995; Taylor and Mahmassani 1996; TRB 2005; Wardman et al. 2007). Comparable data for these programs are not available for most of the 90 cities.
Bike commute levels by quartile of independent variables and bivariate Pearson’s correlations for the 90 largest U.S. cities
Share of bike commuters by quartile of independent variable
Difference fourth minus first quartile
Bivariate correlation with share of bike commuters
Bike lanes per 100,000 pop.
Bike paths per 100,000 pop.
Bike paths and lanes per 100,000 pop.
Cyclist fatality rate
% College students
% Households without car
Transit revenue miles per capita
Days above 90°F
Days below 32°F
Annual inches of precipitation
Bike commuters per 10,000 population by quartile of independent variable
Difference fourth minus first quartile
Correlation with bike commuters per 10,000 population
Bike lanes per 100,000 pop.
Bike paths per 100,000 pop.
Bike paths and lanes per 100,000 pop.
Cyclist fatality rate
% College students
% Households without car
Transit revenue miles per capita
Days above 90°F
Days below 32°F
Annual inches of precipitation
The correlation coefficients for the control variables suggest the same directions of relationships as previous studies we reviewed, but not all coefficients are statistically significant. City cycling levels and state bike fatality rates have a statistically significant negative correlation. The actual relationship might be stronger, but the state data are obviously an imperfect proxy for city cycling safety. Cities with a higher percentage of students have higher levels of bike commuting. A higher share of households without a car is associated with more bike commuting, but the bivariate correlation is not statistically significant. Bicycle commuting levels are higher in central cities of more compact metropolitan areas. Cities with more public transport supply per capita have higher cycling levels, but the correlation coefficient is not statistically significant. State gasoline retail prices and city cycling levels have a statistically significant positive correlation—consistent with the theory that higher costs of driving encourage cycling. As found by earlier studies, extreme weather conditions deter cycling. Our dataset shows that cycling levels are lower in cities with more days per year with temperatures of 90°F or higher and more annual precipitation. We found no statistically significant relationship between the number of cold days per year and bike commuting.
Multiple regression analysis
The quartile and correlation analysis presented above investigate the relationship between bike commuting and each independent variable, one at a time. The multiple regressions presented below examine the relationship of cycling levels and bike paths and lanes while controlling for safety, socioeconomics, land use, public transport supply, gasoline price, and climate.
We estimated two sets of models. The first model is a log–log Ordinary Least Square (OLS) regression with the natural log of bike commuters per 10,000 population as dependent variable. The second model is a Binary Logit Proportions Model with the share of bike commuters in each city as dependent variable. In both types of models the independent variables are expressed as natural log to assure a more normal distribution of otherwise skewed explanatory variables.
The log–log specification for the first set of models has two advantages. First, it normalizes the skewed independent and dependent variables, thus helping to meet assumptions of the OLS regression. Second, it allows interpreting the regression coefficients directly as elasticities or percentage changes in bike commuting, which makes the results more intuitive and easier to understand.3
Multiple regression analysis of bike commuters per 10,000 population and bike commute share (continues on next page)
OLS regression of ln(bike commuters per 10,000 population)
Binary logit proportions model for share of bike commutersa
Elasticity at mean
ln (bike lanes per 100,000 population)
ln (bike paths per 100,000 population)
ln (fatality rate per 10,000 bike commuters)
ln (percent of students in population)
ln (percent of households without car)
ln (sprawl index)
ln (transit revenue miles of service per capita)
ln (state gas retail price)
ln (annual number of days above 90°F)
ln (annual number of days below 32°F)
ln (annual inches of precipitation)
Pseudo LL (Intercept): −9.048
Pseudo LL(Full): −3.399
Pseudo R2(McFadden): 0.62
Coefficients are consistent with relationships reported in most other studies, but not all estimators are statistically significant. Both bike lanes and bike paths per 100,000 population are significant predictors for bike commuting. A 10% greater supply of bike lanes is associated with a 3.1% greater number of bike commuters per 10,000 population. Similarly, a 10% greater supply of bike paths is associated with a 2.5% higher level of bike commuting. As in our previous correlation analysis, a t-test comparison shows that the coefficients for bike lanes and paths are not significantly different from each other at the 95% confidence level.
Cycling safety is statistically significant as well. A 10% higher cyclist fatality rate per 10,000 commuter cyclists is associated with 3.7% fewer bike commuters per 10,000 population. A 10% higher share of students in the population is associated with 8.6% more bike commuting. A 1% increase in the retail price of gasoline is associated with a 5.2% increase in cycling levels. The cross-price elasticity of bike commuting with respect to gasoline price may seem high, but it is in line with other models estimating the relationship between gasoline prices and cycling levels (Pucher and Buehler 2006; Rashad 2009). The coefficients for public transport supply and the climate variables—number of days per year with temperatures of 90°F or higher, 32°F or lower, and precipitation—are not statistically significant.
Models 3 through 6 present regression results for reduced models, excluding explanatory variables to control for potential multicollinearity and endogeneity. For example, prior research suggests that bike paths and lanes contribute to lower cycling fatality rates (CEMT 2004; Fietsberaad 2010; Lusk et al. 2011; Pucher and Buehler 2008; Reynolds et al. 2009). Possible multicollinearity due to the inclusion of both cyclist fatality rate and bikeway supply variables in our model may siphon off strength from the bike path and lane coefficients. In our dataset of 90 cities, bivariate Pearson’s correlations between the fatality rate and the supply of bike paths and lanes are below 0.3, and tests for multicollinearity do not indicate any serious problem.4, 5 Endogeneity is a second potential problem arising from the inclusion of the cyclist fatality rate variable, since ‘safety in numbers’ suggests that cycling safety increases with higher cycling levels (Jacobsen 2003; Jacobsen et al. 2009b). Model 3 excludes the cyclist fatality rate variable in order to test for the possible distorting influence of any multicollinearity and endogeneity problems caused by its inclusion in the model. The Model 3 estimate of the coefficient for bike path supply is only slightly larger (+0.05) than in Model 2—possibly related to greater safety of off-street facilities (Lusk et al. 2011). T-tests show that the estimated coefficients for bike lanes, bike paths, and all other variables in Model 3 are not statistically different from Model 2.
Including car access and the sprawl index as explanatory variables may also introduce bias into Model 2. Some studies suggest that individuals who cycle more are less likely to own an automobile (Dill and Voros 2007; Parkin et al. 2008; Stinson and Bhat 2003). Similarly, studies show that individuals who prefer to cycle more may choose to live in more compact communities (Heinen et al. 2010; Krizek et al. 2009). Inclusion of these two variables might cause simultaneous equations bias, since cycling levels may also affect the choice to own a car or to live in a compact community. Moreover, car access and sprawl may themselves be negatively correlated with each other, since studies show that individuals living in compact urban areas own fewer cars (Cervero 2003; Ewing et al. 2002, 2008). To test for the possible distorting effects caused by potential simultaneous equations bias and multicollinearity, Models 4 and 5 omit the car access and the sprawl index variables. Similar to our findings in the reduced Model 3, t-test comparisons show that the magnitude and significance of coefficients of the remaining variables in Models 4 and 5 do not change significantly from those estimated in Model 2, where all the variables were entered into the equation.
Finally, Model 6 presents results of a reduced model including only statistically significant variables. This model confirms results from Models 2 through 5, but probably suffers from omitted variables bias. In summary, goodness of fit measures and the direction, magnitude, and significance of the model, coefficients are very similar for Models 2 through 6. In all models, the coefficients for the key explanatory variables of interest—bike paths and bike lanes—remain significant, positive, and are not statistically different from each other at the 95% confidence level. Model 2 seems preferable, because it includes all theoretically relevant variables available for this study, and is thus less prone to omitted variable bias.
We also tested the robustness of our results by re-estimating Model 2 excluding cities with extreme values for the explanatory variables. Such outliers, for example, included cities with the most or least bikeway supply, the most extreme climates, highest and lowest car ownership levels, highest and lowest student share, highest and lowest gasoline prices, and most and least public transport supply. The coefficients estimated for Model 2 without the outliers were similar to our original estimates for the entire sample of 90 cities presented in Table 4.
To test further the robustness of our results, we estimated an additional equation, presented as Model 7 in Table 4, using the share of bike commuters in each city as the dependent variable. For this dependent variable, an OLS regression might estimate values beyond the range of actual possible values of the bike share of commuters (0–1.0). To address this issue, we followed Xing et al. (2010) by estimating a non-linear Binary Logit Proportions Model for bicycle mode share.6 This estimation technique transforms the dependent variable into the ‘log of odds’ of the bike share of commuters and approximates a nonlinear Maximum Likelihood estimation (Xing et al. 2010). Transformation of the dependent variable and nonlinear estimation of the model assure that predicted mode shares lie between 0 and 1.0.
Model 7 displays the results of the Binary Logit Proportions regression. Standard test statistics suggest the model is a good fit. For example, McFadden’s Pseudo R2 is 0.62. All variable coefficients are consistent with the direction of relationships reported by most other studies. Similar to Models 1 through 6, the coefficients for bike paths and lanes are significant and positive, even after using this very different, non-linear estimation technique. The coefficient estimate for lanes is larger than for paths in Model 7, but as in Models 1 through 6, the two coefficients are not significantly different from each other at the 95% confidence level.
The last column of Table 4 presents elasticities for the Binary Logit Proportions Model, setting all other variables at their means. Estimated elasticities from the linear OLS log–log regression model and elasticities (at the mean) from the non-linear Binary Logit Proportions model are not comparable because of differences in functional form, estimation technique, and dependent variables. The significance, direction, and magnitude of coefficients from Models 1 through 7 are similar. In particular, both estimation techniques yield statistically significant positive coefficients for the two main variables of interest: bike paths and bike lanes.
Limitations of the analysis
The cross-sectional analysis in our study aims at explaining differences in cycling rates among cities but cannot be used to predict changes over time. Moreover, as in any cross-sectional regression analysis, none of our models can prove causality, although the significant associations we measured are consistent with the hypothesis that bike paths and lanes encourage more cycling. Our analysis is also limited by its reliance on aggregate, city-level data, which mask variations within cities, among neighborhoods, and individuals. The results suggest a statistically significant relationship between bike paths and lanes and cycling at the city level, but results do not permit conclusions about individual travel behavior.
In addition to the inherent limitations of cross-sectional regression analysis and aggregate data, there is a problem of endogeneity among some of the variables in our models. Cycling levels and the extent of the bikeway network almost certainly affect each other, so that causation is probably in both directions. In this paper, we have focused on the role of bike paths and lanes in explaining variation among cities in cycling levels. Conversely, however, high cycling levels might help explain the provision of a large supply of bike paths and lanes. Endogeneity and simultaneous equations bias are potentially serious problems in our regression analysis because the key explanatory variables—bike paths and bike lanes—are also a function of cycling levels, the dependent variable.
Three of the control variables may cause additional endogeneity problems. For example, cycling safety and car ownership may be influenced by cycling levels, just as cycling levels may be influenced by these two control variables. Land use might also be a function of cycling rates, but only in the long run, if cyclists move to compact, mixed-use neighborhoods. To explore the potential bias introduced by such endogeneity, Models 3, 4, and 5 in Table 4 remove cycling fatality rate, car ownership, and land use from the model—one at a time. Coefficients for the other variables and goodness of fit measures do not change significantly, suggesting that inclusion of the control variables does not cause serious endogeneity problems in the models. At any rate, exclusion of the variables would be theoretically incorrect and would cause underspecification bias.7
Aside from methodological limitations, there are problems with the available data on bike paths and lanes. As noted earlier, the centerline measure of bike lanes does not distinguish between streets with bike lanes on only one side, in only one direction, and streets with bike lanes on both sides, serving both directions of travel. Clearly, bike lanes on both sides of a street provide more supply than a bike lane on only one side of the street. In addition, the data do not distinguish between the specific nature and quality of different types of lanes. For example, bike lanes have varying widths, markings, signage, coloring, and intersection treatments. They can be on the right or left side of the street, or even between traffic lanes. Some bike lanes have buffers or barriers of various sorts to separate them from motor vehicle traffic. Moreover, cities have different policies about maintaining bike lanes and keeping them clear of snow, debris, and motor vehicles.
Similar to bike lanes, bike paths vary in their width, pavement, design, and especially in the extent to which they are shared with other users such as pedestrians. Indeed the term ‘bike path’ is a bit of a misnomer in the USA. Most bike paths included in U.S. statistics are simply multi-use paths shared with pedestrians (Alliance for Biking and Walking 2010; Pucher et al. 1999). In contrast, bike paths in most northern European cities are completely separate facilities for the exclusive use of cyclists (Fietsberaad 2006, 2010; Pucher and Buehler 2008; Pucher et al. 2010). Thus, bike paths in the USA might have less impact on cycling levels than the higher-quality, fully separate bike paths in the Netherlands, Germany, and Denmark. Some mixed-use paths in the USA provide suggestive markings to help separate cyclists from pedestrians, but most do not. Some bike paths require cyclists to dismount when crossing a road, while others stop motor vehicles at crossings and give cyclists the right of way. None of the 90 cities in our dataset provided detailed information on those sorts of variations in the types of bike paths, although these differences may be important for cyclists.
Another limitation of our analysis is that the measure of cycling levels used as the dependent variable only includes daily bike commuters and thus excludes bike trips for all other trip purposes. According to the 2009 NHTS, the journey to work only accounts for 12% of all bike trips (Pucher et al. 2011a; USDOT 2010b). The lack of city-level data on cycling for all trip purposes restricts the inferences that can be drawn from our analysis. It seems likely that regular bike commuters have different characteristics and preferences than recreational cyclists. Thus, the coefficients estimated in our models for the various explanatory variables might differ if the dependent variable had included bike trips for all purposes.
Finally, the analysis was hampered by the unavailability or poor quality of data for control variables. For example, we had to use a very rough proxy for cycling fatality rates based on the available state data, and we could only measure exposure in terms of bike commuting levels. Perhaps the most important control variable we could not include was topography, since all studies show that it influences cycling levels. The model is underspecified in this respect.
Many limitations of our study could be overcome with more and better data, which would also facilitate more advanced modeling techniques and better measurement of control variables. A crucial first step is a larger dataset reporting on cycling for all trip purposes that could be disaggregated to the city level. However, that would require a large new national survey or a vast increase in the sample size of the NHTS, currently the only national travel survey in the USA reporting on travel for all trip purposes. Both of those options seem unrealistic, however, given the difficulty in funding the latest 2009 NHTS (AASHTO 2007). In addition, questions on the proximity to bike paths and lanes might be added to future NHTS surveys, since the 2001 and 2009 NHTS surveys already included questions about car ownership and access to public transport. More detailed information about city-level supply of cycling facilities might be collected by a separate survey, similar to the National Transit Database, which would provide an inventory of bike paths, lanes, and parking. Better statistics on cycling facilities would enable more precision in the analysis of their relationship to cycling levels. Moreover, better local data on cyclist fatality rates in cities and a comparable GIS-based measure of urban topography would also enhance the accuracy of the analysis of cycling levels.
Collecting comparable time-series data on cycling levels as well as bike path and lane supply would facilitate pooled cross-section and time-series regression analysis, which would permit stronger inferences from the models than in our cross-section analysis for only one year. Larger sample size and time series data could also help mitigate some of the endogeneity problems discussed above. For example, more advanced statistical techniques, such as Structural Equation Modeling (SEM), can help control for the simultaneous influence of independent and dependent variables, as well as for correlation among independent variables.
Discussion and conclusion
Over the past two decades, many American cities have focused on building bike paths and lanes to increase cycling (Alliance for Biking and Walking 2010; League of American Bicyclists 2010; Pucher and Buehler 2011; Pucher et al. 2011b; USDOT 2010d). Our analysis of newly collected data on cycling facilities in 90 large U.S. cities shows that cities with a greater supply of bike paths and lanes have higher bike commute levels—even after controlling for other factors that may affect cycling levels. That result is consistent with other studies that confirm the important role of separate facilities (Dill and Gliebe 2008; Dill and Voros 2007; Krizek et al. 2007; Moudon et al. 2005; Nelson and Allen 1997). Most disaggregate, individual-level studies of the relationship between bikeway supply and cycling levels focus on only one city or a few cities. Our study is most similar to two earlier studies, which also used aggregate, city-level data to explore the relationship of bikeways and cycling commute levels (Dill and Carr 2003; Nelson and Allen 1997). We expand on those two studies in several ways.
Our sample of 90 U.S. cities was much larger: more than four times as many cities as Nelson and Allen (18 cities) and more than twice as many cities as Dill and Carr (42 cities). Moreover, our regressions distinguish between paths and lanes, while the multiple regressions in the other two studies either combined the two types of facility (Nelson and Allen) or only included bike lanes (Dill and Carr). Similar to these two previous city-level studies, we find that the supply of bikeways per capita is a statistically significant predictor of bike commuting. By including separate variables for paths and lanes, however, our analysis is able to examine each type of facility separately and finds that they do not have significantly different associations with levels of bike commuting among cities.
Although the main focus of our study was on bike paths and lanes, the models yielded new results about the influence of the control variables on cycling levels. The much larger sample size and data availability for more variables allowed us to include nine control variables in the regression equations, compared to five for Dill and Carr (2003) and four for Nelson and Allen (1997). Our control variables include some of those suggested by Nelson and Allen (1997), such as gasoline price and public transport supply. Similar to the other two city-level studies, our results show that the percentage of college students in the city population is a significant predictor of bike commuting. In contrast to these earlier studies, however, we did not find a significant relationship between bike commuting and precipitation. Although the precipitation variable was estimated to be statistically significant in the regression analysis of Dill and Carr (2003), the authors themselves doubted the actual importance of precipitation as a predictor of cycling, since three of the top ten cycling cities in their sample had very high levels of precipitation. In our own analysis of climate, we included two additional climate control variables—the number of extremely hot and cold days per year—but their estimated coefficients were not statistically significant, either. Thus, none of our three measures of climate were strong predictors of bike commuting.
Similar to Dill and Carr (2003), our study shows that cities with higher car ownership have lower cycling levels. Inclusion of additional control variables in our study revealed that cities with safer cycling, less sprawl, and higher gasoline prices have more cycling. Regional public transport supply per capita was not a statistically significant predictor of bike commuting. Thus, we cannot confirm the speculations by Nelson and Allen (1997) and Schwanen (2002) that public transport supply affects levels of bike commuting.
Most American cities build both bike lanes and bike paths with the expectation that offering both kinds of facilities provides cyclists with more route options and choice of facility type. Prior research finds that some cyclists prefer bike lanes, while others favor bike paths. Some studies find that commuters prefer on-street bike lanes over paths because lanes follow the road network and provide more direct routes (Aultman-Hall et al. 1998). The multiple regression coefficients in our models, however, do not suggest a statistically significant difference between paths and lanes in their relationship to bike commuting. Furthermore, our coefficient estimates for paths and lanes suggest inelastic cycling demand with respect to the supply of cycling facilities. A one percent difference between cities in the supply of bike paths and lanes is associated with less than a one percent difference in cycling levels.
Similar to all previous studies, our estimates of the role of bike paths and lanes do not control for the many other differences among cities in their approaches to encourage cycling. For example, most cities offer suggested bike routes on streets without any separate facilities and consider them an integral part of their overall cycling network. But cities vary greatly in the quality of such routes and do not report statistics consistently, so we did not include bike routes on roads without any dedicated space for cyclists. Similarly, many other infrastructure measures and programs could not be integrated into the model. Intersection improvements and priority traffic signals for cyclists, bike parking, coordination with public transport, traffic education and training, and bike promotion and public awareness campaigns all influence cycling levels to some extent, and should be controlled for in models examining the determinants of cycling. The lack of reliable, comparable data for these other measures prevents their inclusion in the regression models, which are thus inevitably underspecified to some unknown extent. We share this drawback with all other studies.
Whatever the shortcomings of our data and regression models, our estimated equations are consistent with the hypothesis that bike lanes and paths encourage cycling. They reveal a positive relationship even when controlling for a range of other factors expected to affect cycling levels. Although not always statistically significant, the coefficients of explanatory variables in our equations suggest a direction of influence similar to that found in most other studies.
The western Census region includes Alaska, Arizona, California, Colorado, Idaho, Montana, Nevada, New Mexico, Oregon, Utah, Washington, and Wyoming.
Compact land use is measured by a composite ‘sprawl index’ with lower values for sprawled development and higher values for compact development as explained in detail later in the text.
Seven cities reported 0 miles of bike lanes or bike paths. These cities would have been lost in our models, because the natural logarithm of 0 is not defined. Thus, we followed the common procedure of transforming the bike lane and path per 100,000 population variable by adding 1, which yields a log value of 0 for the 7 cities. We also estimated the models without this transformation, with only 83 cities. Significance, sign, and magnitude of coefficients and goodness of fit were very similar to the results of the models presented in this paper.
Variance Inflation Factor (VIF) yields scores for individual variables below 2.7 and a score of 1.9 for the overall equation. Tolerance values are all above 0.4.
A possible reason for this low correlation may be that state cyclist fatality rates are imperfect proxies for city fatality rates.
For an alternative approach to estimating fractional response variables using a so-called ‘quasi-likelihood estimation method,’ see Papke and Wooldridge (1996).
In an attempt to model the simultaneous dependencies among the variables, we experimented with several alternative instrumental variables to estimate a simultaneous equation system using two-stage regressions. Unfortunately, none of the available variables in the dataset were sufficiently exogenous or strong enough to serve as instrumental variables. They failed on one or more criteria required for statistically robust and valid instrumental variables: (1) underidentification (Anderson LM statistic), (2) weak identification (Cragg–Donald Wald F statistic), (3) overidentification (Sargan statistic), (4) or robust instrument inference (Anderson–Rubin Wald test). The best instrumental variable in the dataset was city land area—since area is fully exogenous and correlated with the total number of bike commuters and the extent of bike paths and lanes. The technical estimation procedure of two-stage least squares (2SLS) required combining the length of bike paths and lanes into one variable, because there was only one instrumental variable available. Moreover, the model was re-specified with the log of total number of bike commuters as dependent variable and the log of total length of bike paths and lanes as regressor. This model satisfied most of the statistical tests for appropriateness of the instrument, but failed to reject the null hypothesis of the Sargan test for overidentification—which casts some doubt on the validity of the instrument.
Estimating a 2SLS equation with this imperfect instrumental variable yields results for the bikeway variable that are similar to those for an OLS regression. In the 2SLS model, bike paths and lanes are statistically significant predictors of cycling levels—even after accounting for endogeneity bias. Another instrumental variable we examined—measuring city population per bicycling advocacy group member—yielded similar results: statistical tests point to weak instrumentation, but bike paths and lanes retain their significant and positive coefficient.
This paper is based on a three-year research project funded by the U.S. Department of Transportation: “Analysis of Bicycling Trends and Policies in Large American Cities: Lessons for New York”. It is part of the Research Initiatives Program of the University Transportation Research Center, Region 2, for New York, New Jersey, Puerto Rico, and the Virgin Islands. The authors are indebted to Pat Mokhtarian, Bob Noland, Daniel Rodriguez, Dan Chatman, Radha Jagannathan, Kris Wernstedt, and Matt Dull for their help in revising the paper.