1 Introduction

Light rail transit (LRT) has attracted increasing interest from policymakers in recent decades, with new services being introduced across a range of metropolitan regions. The improved accessibility engendered by LRT is seen by many observers as a catalyst for transit-oriented developments (TODs), and indeed in many cases residential development accompanies the introduction of new LRT services [1, 2]. The general issue encapsulated by the TOD concept is that the stimulus of introducing LRT is thought to have non-trivial impacts on the real estate market of the service area.

The motivation for this study was to investigate the dynamics of residential relocation associated with the inauguration of new LRT service. Empirical data are sourced from a 2008 survey of passengers (n = 1,023) of the Hudson–Bergen Light Rail (HBLR) system in New Jersey (US). Fieldwork for the survey was undertaken eight years after service inauguration, which implies that this study’s results should be interpreted as mid-term effects after the residential property market has had the chance to initially re-equilibrate in response to the stimulus of the new LRT system. The HBLR system (see Fig. 1) feeds heavy rail and ferry services that connect to midtown and downtown Manhattan, and also provides local connectivity between pre-automobile urban neighborhoods of Hudson County, New Jersey (which is characterized by the sixth-highest residential density among counties in the US). In a study of the land use impacts of the HBLR system, Robins and Wells [3] estimate approximately 10,000 newly built residential units, representing a gross investment of $5.3 billion (in 2008 prices).

Fig. 1
figure 1

Schematic representation of the Hudson–Bergen Light Rail (HBLR) system

The results reported in this paper are twofold. First, we investigate the distinctive socio-economic profile of LRT passengers who self-report having relocated to the new transit corridor due, at least in part, to the new transit service. Second, we document the spatial patterns (proximity to LRT stations) of residential relocations which were self-reported to have been influenced by the introduction of the LRT system, as a function of socio-economic characteristics. Both descriptive statistics and results from multivariate analyses are presented.

Although the characteristics of transit-oriented development and the impact of transit on property values are well documented (cf. [4]), few studies have explicitly investigated the impact of public transport investment on residential relocation behavior. Table 1 contains a summary listing of the key characteristics of relevant earlier studies. Specifically, the contribution of the present paper is to provide new results regarding two research questions that remain under-researched, while accounting for the possibility of endogeneity between them:

Table 1 Summary of the previous literature
  1. 1.

    What is the socio-economic profile of people most likely to relocate in response to the stimulus of public transport investment? (a new LRT service, in the case study reported in this paper)

  2. 2.

    Conditional on an LRT passenger having relocated in the prior 5 years, how do socio-economic characteristics relate with the proximity of their residential choice to the public transport service?

The rest of this paper is structured as follows. Section 2 introduces the empirical data employed on this study, and Sect. 3 outlines the analytical framework. Section 4 then summarizes and concludes this paper.

2 Empirical data

The empirical data for this study were collected via an intercept-survey undertaken of passengers waiting on platforms at the seven HBLR stations between the Hoboken Terminal and Tonnelle Avenue stations (see Fig. 1) on 1 May, 2008. The questionnaire instrument contained 39 questions organized around three themes:

  • travel, employment, and residential patterns and changes,

  • customer satisfaction (not considered for the purposes of the present study), and

  • socio-economic and demographic characteristics.

5,384 questionnaires were distributed; the overall response rate was 19 %. Following data processing, a total of 1,023 complete responses for which the respondent’s residential address could be successfully geocoded were taken forward for statistical analysis in this study.

Survey respondents reported the length of time that they had lived at their current address, and those who indicated that they lived at their current address for five or fewer years were subsequently asked to indicate how important the availability of the HBLR service was to their residential relocation decision.Footnote 1

Two-fifths (42 %) of respondents reported having relocated within the past 2 years, with a further quarter (24 %) indicating that they relocated more than two but fewer than five years prior. Figure 2 shows the distribution of responses to the importance-of-LRT-in-relocation decision question; 21 % of movers indicated that the LRT service was very important to their relocation decision, and a further 48 % reported that it was somewhat important.

Fig. 2
figure 2

Among LRT passengers that report having relocated within the past 5 years, the self-reported ‘importance’ of the LRT service in the relocation decision. Adapted from [10]

3 Analytical framework

A Type II Tobit discrete–continuous specification [11] was employed in the subsequent quantitative analysis, with the discrete dimension specified to be whether or not a person had relocated to the HBLR corridor within the prior 5 years and the continuous dimension (conditional on a respondent having relocated) the street-network distance of their new residence to the nearest LRT station.

Formally, we denote these dimensions as follows:

$$d_{i}^{*} =\varvec{\alpha}^{\prime }{\user2{z}}_{i} + e_{i} ,\quad i = \, 1, \ldots ,N,$$
(1)
$$d_{i} = \, 1,\quad {\text{if}}\;\;d_{i}^{*} > 0;\quad d_{i} = 0,\quad {\text{if}}\;\;d_{i}^{*} \le 0,$$
(2)
$$y_{i}^{*} =\varvec{\beta}^{\prime } {\user2{x}}_{i} + \varepsilon_{i} ,\;\;\;\;i = 1, \ldots ,n,\;\;\;\;n < N,$$
(3)
$$y_{i} = y_{i}^{*} ,\;\;{\text{if}}\;\;d_{i} = 1;\;\;\;\;y_{i} = \, 0,\;\;{\text{if}}\;\;d_{i} = \, 0,$$
(4)

where \(d_{i}^{*}\) and \(y_{i}^{*}\) are the latent, continuous dependent variables for individual worker i; N, and n are the numbers of observations in the full dataset (including respondents who did not relocate) and the subset of respondents who did relocate, respectively. \({\user2{z}}_{i}\) and x i are the vectors of observed variables that are treated as independent variables in the discrete and continuous models, respectively. Note that the variables within these two vectors are not mutually exclusive; the same variable can appear in both of the modeled dimensions. \(\varvec{\alpha}^{\prime }\) and \(\varvec{\beta}^{\prime }\) are corresponding vectors of parameters to be estimated, and e i and ε i are the disturbance terms which may be correlated through a correlation coefficient ρ that is independent of z i and x i. e i and ε i are bivariate normally distributed, with zero mean and unknown covariance matrix, denoted in Eq. (5) as follows:

$$\left[ {\begin{array}{*{20}c} {e_{i} } \\ {\varepsilon_{i} } \\ \end{array} } \right] \sim N\left( {\left[ {\begin{array}{*{20}c} 0 \\ 0 \\ \end{array} } \right],\left[ {\begin{array}{*{20}c} 1 & {\rho \sigma_{\varepsilon } } \\ {\rho \sigma_{\varepsilon } } & {\sigma_{\varepsilon }^{2} } \\ \end{array} } \right]} \right),$$
(5)

with \(\sigma_{\varepsilon }^{2}\) normalized to one for purposes of model identification.

The standard Heckman two-step estimator is employed [11], which is based on the conditional mean expressions and the truncated bivariate normal distribution of the error terms. The expected value of the observed dependent variable y i is

$${\varvec{\beta}}^{\prime } {\user2{x}}_{i} + \rho \sigma_{\varepsilon } \lambda_{i} \left( {\frac{{\alpha^{\prime}z_{i} }}{{\sigma_{e} }}} \right)E\left( {y_{i} \left| {{\user2{x}}_{i} ,{\user2{z}}_{i} } \right.} \right) = E\left( {y_{i}^{ * } \left| {{\user2{x}}_{i} ,{\user2{z}}_{i} ,d_{i} = 1} \right.} \right) =\varvec{\beta}^{\prime } {\user2{x}}_{i} + E\left( {\varepsilon_{i} \left| {e_{i} \succ -\varvec{\alpha}^{\prime } {\user2{z}}_{i} } \right.} \right),$$
(6)

which simplifies to

$${\varvec{\beta}}^{\prime } {\user2{x}}_{i} + \rho \sigma_{\varepsilon } \frac{{\phi (\varvec{\alpha}^{\prime } {\user2{z}}_{i} )}}{{\varPhi (\varvec{\alpha}^{\prime } {\user2{z}}_{i} )}},$$
(7)

where σ ε is fixed at one; \(\lambda_{i} ({\varvec{\alpha}}^{\prime } {\user2{z}}_{i} ) = \phi (\varvec{\alpha}^{\prime } {\user2{z}}_{i} )/\varPhi ({\varvec{\alpha}}^{\prime } {\user2{z}}_{i} )\) is the inverse Mills ratio; and ϕ and Φ are standard normal probability distribution function and cumulative distribution function, respectively. Equation (7) implies that ignoring the term ρσ ε λ i would in effect omit a variable from Eq. (3) under censoring. Thus, the ordinary least squares (OLS) regression of Eq. (2) will yield unbiased estimates of β only if ρ = 0 or if correlation between λ i and x i is zero. Equation (7) also demonstrates that we could estimate β consistently using the “relocators-only” subset of the survey data by an OLS regression of y i on x i and λ i \(({\varvec{\alpha}}^{\prime } {\user2{z}}_{i}\), if α were known). Based on this observation, Heckman’s two-step estimator is calculated by the following procedure:

  1. 1.

    First, the full sample is employed to estimate a binary probit model using standard maximum-likelihood techniques, to obtain estimates of α, i.e., Pr(d i  = 1) = Φ \(({\varvec{\alpha}}^{\prime } {\user2{z}}_{i})\) and Pr(d i  = 0) = 1 − Φ \(({\varvec{\alpha}}^{\prime } {\user2{z}}_{i})\).

  2. 2.

    Next, λ i \(({\varvec{\alpha}}^{\prime } {\user2{z}}_{i})\) is estimated for each survey respondent that reported having relocated i.

Finally, the sub-sample of only respondents that relocated is used to estimate β and β λ  = ρσ ε , by OLS of regressing y i on x i and the estimated λ i .Footnote 2

The OLS standard error estimates calculated by this estimation procedure require correction, as the error term in Eq. (7) may be heteroskedastic and we use fitted rather than actual values of λ i . Furthermore, the resulting estimates are consistent, but not asymptotically efficient (i.e., not minimum variance) under a standard assumption of normality. More efficient estimates can be obtained using the full information maximum-likelihood (FIML) approach, which can be expressed as follows:

$$\ln L = \sum\limits_{{d_{i} = 0}} {\ln\varPhi ( -\varvec{\alpha}^{\prime } {\user2 {z}}_{i} )} + \sum\limits_{{d_{i} = 1}} {\left[ { - \ln \sigma_{\varepsilon } + \ln \phi \left( {\frac{{y_{i} -\varvec{\beta}^{\prime } {\user2 {x}}_{i} }}{{\sigma_{\varepsilon } }}} \right) + \ln\varPhi \left( {\frac{{\varvec{\alpha}^{\prime } {\user2 {z}}_{i} + \rho \sigma_{\varepsilon }^{ - 1} (y_{i} -\varvec{\beta}^{\prime } {\user2 {x}}_{i} )}}{{\sqrt {1 - \rho^{2} } }}} \right)} \right]} .$$

Maximizing this likelihood function produces simultaneous estimates of the parameters of both the discrete and continuous dimensions (α, β, ρ, and σ ε ). If ρ = 0, the log likelihood function reduces to the sum of a probit and a standard OLS regression, which can each be estimated separately. In comparison to the two-step Heckman procedure described above, the FIML estimator is computationally intensive to numerically identify optimal values. Reasonable starting values for FIML that are close to the true parameter values are therefore required. In this study, the final values of the Heckman two-step estimation procedure were used as the starting values for the FIML procedure.

4 Estimation results

Prior to undertaking estimation of the statistical model, the degree of correlation between each pair of the candidate independent variables was calculated. All such correlation coefficients were found to be smaller than 0.40; it was therefore determined that explicit correction for multicollinearity was not necessary.

A structured specification search was then undertaken, using the standard Akaike Information Criterion (AIC) to select between alternative candidate specifications. AIC is a metric of global goodness-of-fit which penalizes added parameters, and is widely used as to determine objectively whether the improved goodness-of-fit due to adding additional free parameters is warranted on the grounds of information theory [12].

The preferred model specification and resulting parameter estimates are presented in Table 2.Footnote 3 Positive coefficients in the move/do not move dimension indicate that the relevant variable has a positive effect, ceteris paribus, on the likelihood of a survey respondent having relocated to the HBLR corridor within the prior 5 years and likewise positive coefficients in the distance between LRT station and new residence dimension are interpreted to mean that this distance increases, ceteris paribus, with the value of that variable.

Table 2 Estimation results

As shown in Table 2, the majority of parameter estimates are statistically significant at the p = 0.05 level of confidence. The signs of parameter estimates are consistent with a priori expectations, as discussed in the remainder of this section. Further, the structural parameters and are both estimated to be statistically different than 1.0 (t = 2.12, 2.34, respectively), indicating the presence of statistically significant correlation between the error terms of the discrete and continuous models.

Household income was found to relate negatively, net of confounding effects, with the likelihood of having relocated in the prior 5 years; in other words having a lower household income was associated with a lower level of residential stability. Likewise, having a higher household income was associated with shorter distances between the new residences of residential-movers and the nearest LRT station. The fact that higher income groups chose to live closer to the stations of the LRT network is consistent with access to the LRT being a normal good, though this analysis cannot determine this definitively. Likewise, this result is also consistent with bid-rent theory, in which locations with higher accessibility are allocated to the land use that is willing/able to pay the most to occupy them [13].

Being under age 35 was associated with the highest propensity to have relocated within the prior 5 years, and also with the greatest propensity to relocate near to the access points to the LRT system. These findings are consistent with a more general pattern of younger adults having greater residential mobility (cf. [14]), and also being more sensitive to the accessibility afforded by the LRT system. Similarly intuitive effects were found with respect to household size and the presence of children.

Automobile ownership was, by contrast, negatively associated with propensity to have relocated and, conditional on having moved, the proximity to the nearest LRT station. Having driven to access the LRT system for the surveyed journey was also negatively linked with having relocated (relative to having walked or taken another form of public transit). However, conditional on having relocated the largest distance between new home and nearest LRT station was associated with having used transit to access the LRT system—longer distance than both driving and walking.

Finally, Table 2 shows that the length of time that one has been using the LRT system is not significantly associated with the propensity to have relocated within the prior 5 years (t = 1.51), though is positively associated with the distance between [new] home and LRT station. Put another way, this suggests that people who started using the LRT service more recently are likely to live closer to the LRT system’s stations than people who have been using it for a longer period of time (all else equal).

5 Conclusions

This paper contributes to the body of literature regarding the effects of public transport investments on residential property markets. Specifically, we investigated the effects of a new LRT system (the HBLR system in New Jersey, US) on: (1) the likelihood of LRT passengers having relocated in the prior 5 years and (2) the proximity of the ‘new’ residential location to the LRT system’s stations. Give our a priori expectation of correlation in unobserved effects across these two dimensions of analysis, we employed a simultaneous discrete–continuous specification (a Type II Tobit). The empirical results suggest that the residual error terms were indeed correlated across these two dimensions; failing to take this correlation into account would have yielded biased and inefficient parameter estimates (i.e., effects).

The substantive results indicate that, among the sample of surveyed LRT passengers (n = 1,023), small household size, low income, being a young adult (under age 35) and low household car ownership are each independently associated with heightened propensity to have residentially relocated during the prior 5 years. Subsequently, conditional on having relocated, these same characteristics (with the exception of low income) are associated with relocating in close proximity to the LRT network’s access points (stations).

Further research will be required to establish whether the empirical findings reported here are indicative of generally applicable relationships, or are idiosyncratic to the HBLR system and/or the dynamics of the local real estate market in which it is located. For instance, the HBLR serves in part as a connecting service to heavy rail and ferry services to Manhattan’s labor market and cultural destinations. It would be worthwhile, for instance, to establish whether or not similar functional relationships hold for LRT systems that comprise the core of their urban area’s transit network (e.g., Portland Oregon’s MAX system or Calgary’s C-Train network), and for urban areas with characteristics different than the New York/New Jersey metropolitan region.