# Commuting patterns: the flow and jump model and supporting data

**Part of the following topical collections:**

## Abstract

A simple model, named the flow and jump model (FJM) is used for describing commuter fluxes at different distances. The model is based on a master equation which allows a local net probability flow and non-local jumps. FJM is in principle a one-parameter model, however it is found that by fixing this parameter we get a parameter free model, similar with the radiation model. We find that FJM offers an improved description for commuting data from USA, Italy and Hungary. For a special choice of the model parameter FJM leads to the radiation model.

## Keywords

Human mobility models Commuters data Population density## List of abbreviations

- GM
Gravity Model

- RM
Radiation Model

- RMwS
Radiation Model with Selection

- TCORM
Travel Cost Optimized Radiation Model

- FJM
Flow and Jump Model

## 1 Introduction

Commuter mobility patterns are in the focus of many recent studies. The problem by its nature belongs to the research field of human geography, sociology and economics. Nowadays however, researchers from many other fields became interested in the topic. The interest in such studies can be explained by the fact that many large electronic datasets became available for researchers, allowing to test both the assumptions and main results of the models. As a special case of human mobility, statisticians and data scientist are interested in universal patterns that govern the commuter fluxes at different spacial scales. Physicists and mathematicians are interested in simple models capable of explaining the observed patterns. A detailed review for the state of the art of the field of human mobility is given in the recent review article of Barbosa et al. [1].

Models for community fluxes, motivated phenomenologically by some simple socio-economic or probabilistic arguments, were proposed already in the early 1940 by Stouffer (*the intervening opportunities model*) [2] and by Block & Marschak in 1960 (*the random utility model*) [3]. Analogies with some classical physics phenomena were exploited by the very popular *gravity* and *generalized potential models* [4].

From modeling perspective a great leap in the understanding of human mobility patterns represented the *radiation model* introduced by Simini et al. [5]. In contrast with earlier models that were phenomenologically argued, the radiation model started from a basic socio-economic optimization assumption and derived a simple and compact formula for commuter fluxes. Relative to the earlier used models the compact result derived in [5] has also the advantage that it is parameter free. When compared however with real commuting and population data, the model contains an undetermined proportionality constant that makes the connection between the population and available job number. In such sense one can argue that this model is a one free parameter model. Other models of similar complexity, built also on realistic assumptions are the population-weighted opportunities model (PWO) [6] and a novel version of it where also memory effects are considered [7]. Recently a new, parameter-free model was introduced by Liu and Yan [8]. Their basic assumption is that individuals select destination locations that present higher opportunity benefits than the ones at the origin and the intervening opportunities between the origin and destination.

The radiation model was generalized for continuous population distribution [9] and was also made more realistic by allowing a realistic job selection for the individuals. This new model, *the radiation model with selection* offered an improved fit for the commuting data in USA. Further improvement for the simple radiation model was considered by taking into account also the travel cost involved in commuting (see for example [10]). In the line of this model the *travel cost optimized radiation model* introduced by us recently [11] offered an improved description for the commuting fluxes in Hungary. The main drawback of all these later generalization to the radiation model is that the parameter-free beauty of the radiation model is lost.

Here we offer a further generalization for the original radiation model, and prove it’s advantages relative to the earlier models using large-scale population density and commuter flux data from USA, Italy and Hungary. The nice aspect of this generalization is that our model is again a parameter-free model, since the only fitting parameter is fixed to a universal value.

## 2 Modelling framework

The *gravity model (GM)* is probably the most known approach to describe empirically the commuter fluxes between cities [12]. It is based on a phenomenological analogy with gravity, assuming that the interaction between two regions or cities depends in an inverse proportionality with the distance raised at a positive power and in direct proportionality with the size of the two regions/cities. Contrary with what is usually believed, GM is not only a simple analogy there are also theoretical arguments in favour of it. The oldest one is probably the one using the maximal entropy hypothesis [13, 14]. Other successful attempts are based on the principle of utility maximization in economics. Both deterministic [15, 16] and random utility theories [17] were considered.

*i*and

*j*is written as:

*i*and by \(r_{i,j}\) the distance between settlements

*i*and

*j*. \(F(x)\) is an arbitrary monotonically increasing kernel function, and

*α*and

*β*are fitting exponents. From the \(f_{i}(j)\) data one can also compute the \(P^{i}_{>}(W)_{\mathrm{GM}}\) probability, that a worker living in location

*i*commutes to a location that is outside of a disk containing a population

*W*and centred at its home:

*i*and reaching to location

*j*. Now, the \(P_{>}(W)_{\mathrm{GM}}\) probability that commuters travel to work at a distance where they pass a disk with population

*W*is:

*α*and

*β*exponent values.

*original radiation model (RM)*[5] is based on the simple assumption that jobseekers are optimizing their income by accepting the closest job offer that offers a better salary than the one available at their current address. Assuming a \(p_{\le}(z)\) distribution function for the incomes in the studied society the probability \(P_{>}(z|n)\) that a person with income

*z*refuses the closest

*n*jobs is:

*n*jobs, \(P_{>}(n)\), can be calculated as:

*W*population (\(n=\mu W\)), the radiation model predicts the probability that a person commutes to a location that is outside of a disk centered on its current location and containing a population

*W*:

*q*, the above presented simple argument can be generalized [9] (

*radiation model with selection (RMwS)*), leading to a result with two fitting parameters (\(q,\mu\)):

*travel cost optimized radiation model (TCORM)*takes into account the fact that travel costs are distance dependent so in addition with the transited jobs the travel distance,

*r*has to be considered when applying the arguments used in the radiation model. Assuming an exponential distribution kernel for the income distribution, and repeating the arguments from the original radiation model [11] one arrives again to a result with two fitting parameters:

*λ*fitting parameter incorporates both the value of

*μ*, the value of a proportionality constant between the travelled distance and cost of travel and a third constant governing the shape of an assumed exponential-type income distribution [11].

*Flow and Jump Model (FJM)*. Following the assumptions of the recently introduced

*growth and reset type models*(for a review please consult [18]) we assume now an inverse process: a backward probability flow supplemented by a jump process from the origin to any state with a given

*n*value. The discrete version of the process is depicted in Fig. 1. The continuous master equation has the form:

*n*state. For the state dependent \(\eta(n)\) and \(\gamma(n)\) rates we consider now simple kernels which makes sense for the commuting process. Definitely the transitions \(0\rightarrow n\) governed by the \(\gamma(n)\rho(n,t)\) rates describes the probability that workers choose a commuting job. \(\gamma(n)\) should decrease with distance (or correspondingly with

*n*) and the proportionality with \(\rho(n,t)\) suggests that where are already many commuters there should also be many good jobs, so it is attractive to commuters.

*C*is a constant which fixes also the time unit in the dynamical equation (9). The backward flow characterizes the tendency of the commuters to search for appropriate jobs that are closer to their living places, accepting with a bigger probability jobs that will approach them to their home. This net flow is described by the \(\eta(n)\) terms. The simplest choice that leads to a final equilibrium distribution is:

*a*it becomes similarly with RM a one-parameter model.

## 3 Data source and format

*USA*we processed a complete commuter and population database. We analyzed estimated population census data between 2006 and 2010 [20] using \(Q = 73\text{,}803\) settlements (nodes) (white circles in Fig. 2) and \(4\text{,}156\text{,}426\) commuter routes (edges) (blue lines between white circles in Fig. 2). We use the same dataset as the one used in [21], where the authors attempted a region-like geographic division of USA based on commuting patterns. For studying the geographical population distribution we used a database from years between 2006 and 2010 giving the estimated population of continental USA divided in \(11\text{,}078\text{,}286\) cells of 1 km

^{2}area [22]. We detail now the three different data subsets that were constructed by us and are the input for our calculations:

a. *Settlement data* where the *settlement code* and their *latitudes* and *longitudes* are given. In the case of USA, the total number of settlements is \(Q = 73\text{,}803\). These geographical locations are the source and targets for commuting. The data is in the format given below:

| | |

1 | 32.4771763256 | −86.4901731173 |

2 | 32.474292121 | −86.4733798888 |

3 | 32.4754563613 | −86.460168641 |

b. *Commuting data*, containing the source and targets for \(4\text{,}156\text{,}426\) directed travels to work. The data has the following structure: the first and second column contains the *source* and *target settlements* code and the third column gives the *number of commuters*. Below we illustrate the format of this data:

| | |

9719 | 9719 | 20,950 |

9703 | 9719 | 785 |

29,719 | 29,719 | 540 |

69,719 | 69,719 | 490 |

69,719 | 69,720 | 480 |

9711 | 9719 | 465 |

c. *Population distribution data*. The original dataset contains \(11\text{,}078\text{,}286\) square like cells of 1 km^{2} area with its *population*, the *latitude* and *longitude* for the middle point. In order to speed up our calculations we have spatially renormalized this data and obtained a less accurate resolution with 4 km^{2} size cells. This is done by collapsing the data of four neighbouring cells and averaging their latitudinal and longitudinal coordinates. As result we ended up with \(1\text{,}230\text{,}920\) cells containing a total population \(W=308\text{,}745\text{,}231\). The data we have worked with has the following structure:

| | |

18.0 | 51.8642065666 | −176.664361722 |

30.0 | 51.8621521667 | −176.6534376 |

9.0 | 51.8700427111 | −176.644826767 |

0.0 | 51.8704367889 | −176.633367733 |

7.0 | 51.8785901 | −176.629460933 |

219.0 | 51.8383112778 | −176.512803256 |

From the above three datasets one can compute the \(P_{>}(W)\) dependency. For USA we have used yet another dataset to prove the linear proportionality between the number of job openings and total population for a geographical region. For this we obtained the number of listed jobs for each state of the continental USA using the site [23]. In the day we have processed the data (12.02.2018) we found a total of \(2\text{,}596\text{,}391\) jobs. The population of the states was obtained using the estimate between 2006 and 2011, available on the Internet [22].

Apart of the large-scale data available for USA we have used two smaller-size datasets for Hungary and Italy. These two additional datasets contain the same three data subsets: settlement data, commuting data and population distribution data. The population distribution data was used in its original form with cells of sizes 1 km^{2}.

For *Hungary* we used the same commuting data as in [11]. Commuting data is between \(Q = 3176\) settlements, it contains \(81\text{,}664\) commuter routes [24] and the spatial distribution of population is for the \(W = 9\text{,}972\text{,}000\) total inhabitants [25] as measured in the 2011 population census.

The data for *Italy* contains \(Q = 8093\) settlements, \(556\text{,}120\) commuter routes and it is from the Italian population census from 2011 [26]. The total population \(W = 55\text{,}605\text{,}065\) is mapped in cells of 1 km^{2} area [27].

## 4 Data processing

During the data processing, we select one by one the settlements *i* as source for commuting and construct the disks with radius \(d(i,j)\), reaching to the target settlement *j*. This is illustrated schematically in Fig. 2. We count the total population \(w_{i}[j]\) inside this disk and record the number of commuters \(f_{i}(j)\) starting from settlement *i* and traveling to settlement *j*.

Having the data \(d(i,j)\), \(f_{i}(j)\), and \(w_{i}[j]\) for all the settlement pairs \((i,j)\) we compute the experimental \(P_{>}(W)\) probabilities.

*i*are denoted by \(N_{i}\).

We ordered the settlements according to their distance relative to *i*. Let \(h_{i}^{[k]}\) be the index of the settlement that is the *k*th one in this row (for example, \(h_{i}^{[1]}\) is the index of the settlement that is the closest to settlement *i* and \(h_{i}^{[2]}\) is the index of the settlement that is the second closest to *i*). We denote by \(s(i,w)\) the smallest number of settlements for which the population inside a disk centered in *i* becomes larger (or equal) than *w*.

*Q*denotes the total number of settlements and

*W*is the total population in the studied territory.) If no such number exists, then we will consider \(s(i,w) = Q\).

*i*are a transiting a disk with population

*W*inside it can be written as:

*i*, as it is illustrated on Fig. 3 for a given town in USA.

*i*settlements:

## 5 Results and discussions

*μ*. Boundary effects become important for large

*W*values (the disks centred on the settlements become largely incomplete due to the fact that they extend over the borders of continental USA). To minimize these effects we considered the data only up to \(W_{\mathrm{max}}= 1\text{,}000\text{,}000\). Also, to eliminate very short commuting routes (where commuting is questionable) we have imposed a lower threshold of \(W_{\mathrm{min}}=1000\). Fitting was realized on the \([W_{\mathrm{min}},W_{\mathrm{max}}]\) interval using the nonlinear fitting features of the Wolfram Mathematica

^{®}software. For the GM model, equation (3) does not lead to a compact functional form, so fitting was realised by considering a progressive mesh method for various

*α*and

*β*values in the interval \(\alpha\in[-1.0,2.5]\) and \(\beta\in[-1.0,2.5]\). The best fit parameters and the goodness of the fits (\(R^{2}\) correlation coefficient) are summarized in Table 1.

RM | RMwS | TCORM | GM | FJM | ||||
---|---|---|---|---|---|---|---|---|

| | | | | | | | |

USA data | 0.0000308 | 0.0000308 | 1.0 | 0.000119 | 0.0056 | 1.2 | 1.2 | 0.000062 |

\(\mathbf{R}^{2}\) | 0.971 | 0.971 | 0.992 | 0.993 | 0.993 |

*a*. We will show however that one can fix this parameter and get also an excellent fit on other datasets as well (Italy and Hungary).The clear improvement in fitting the data relative to the RMwS and TCORM models is however a great leap forward since these models offer a two-parameter fit. It is important to notice the fact that

*GM*offers also a good fit. This is again a two-parameter fit, but we will show in the followings on other datasets, that one cannot fix any of these parameters and remain with a fit quality that is comparable with FJM.

*α*and

*β*so that all datasets are reasonable well fitted. The negative value obtained for the

*α*is more than strange, and suggests again, that the GM model is seemingly not appropriate for fitting the Italian commuting data.

## 6 Conclusions

In order to describe the statistics of commuter fluxes at different distances we introduced the FJM model based on a mean-field type dynamical approach. The model takes into account indirectly that commuting to larger distances is costly and less probable. Relative to the classical models it offers an improved fit for commuter fluxes in USA, Hungary and Italy. The probability that commuters are traveling for their jobs over a population *W* is compactly given by equation (16). The model is a two-parameter one, although we have shown that one parameter can be fixed, so that all studied datasets are reasonable well explained. In such sense the model becomes similarly with the RM model a one-parameter one, and improves the RM model in a considerable manner.

In order to comment on the results obtained for USA, Italy and Hungary we review from Table 2 the best fit parameter *μ* obtained with the FJM model. The parameter *μ* characterises both the availability of jobs per population and the attractiveness of these jobs to jobseekers. A higher value of *μ* suggests that there are many jobs relative to the population, jobseekers are aware of them and consider them for a potential commuting. A smaller *μ* value suggests that the number of available jobs per population is smaller and jobseekers are very selective for commuting. The obtained fitting parameter for *μ* are in good agreement with the given heuristic justifications and confirms the known social and economic profile of USA, Italy and Hungary. Commuting is more common in USA relative to Europe and there are more available commuting jobs per population. Related to the value of the *a* exponent in equation (16), one can also draw some interesting conclusions. The difference from the original radiation model (where we have \(a=2\)) suggests an already known issue, i.e. commuters are selective, not all available jobs are acceptable for them and travel cost has to be taken into account in accepting a commuting job [7, 8, 9, 10, 11]. Due to this the \(C/\eta\) value is smaller than the one for a simple salary optimization mechanism where the commuters accept the closest job that improves their salary at home (assumption of RM). This can be done either by lowering the *C* constant or by increasing the value of *η*, or changing both of them. The seemingly universal value of \(a=7/4\) remains however a puzzle motivating further studies.

In conclusion, we believe that the FJM model proposed in the present study lies on simple and reasonable assumptions and the studied experimental data supports it’s predictions.

## Notes

### Acknowledgements

Not applicable.

### Availability of data and materials

The used data is available on the Internet by following the links from the “Data source and format” and “Data processing” sections. The processed data in the format indicated in the “Data source and format” section is available in the Figshare repository, doi:10.6084/m9.figshare.6151130, URL: https://figshare.com/s/b86965bb06ce018f52bf. In this repository one will find: the figs_data.zip file containing the data used for plotting the Figures, the Hungary.zip, Italy.zip and USA.zip files containing the processed data for Hungary, Italy and USA, respectively. The fig2.gdf file is a GUESS Graph Data Format file, which is editable in a simple text editor.

### Authors’ information

Z.N. is professor of theoretical physics, working in the area of interdisciplinary applications of statistical physics. He uses both analytical and computational models to understand complex phenomena from physics, economics, biology and sociology. G.T. is a senior investigator at the Hungarian Central Statistical Office. He is specialist in geographical data collection, in handling and processing large geographical datasets. L.V. is a PhD student in computational physics with a strong background in computer science and informatics.

### Authors’ contributions

ZN designed the study, elaborated the FJM model and wrote up the first version of the manuscript. LV analyzed the data and draw the figures. GT collected the data, interpreted them and putted in the desired form. All authors worked on the final version of the manuscript. All authors read and approved the final manuscript.

### Funding

Work supported by the Romanian Research Council UEFISCDI, Romania through grant Nr: PN-III-P4-PCE-2016-0363.

### Competing interests

The authors declare that they have no competing interests.

## References

- 1.Barbosa-Filho H, Barthélemy M, Ghoshal G, James RC, Lenormand M, Louail T, Menezes R, Ramasco JJ, Simini F, Tomasini M (2018) Human mobility: Models and Applications. Physics Reports. https://doi.org/10.1016/j.physrep.2018.01.001 MathSciNetCrossRefGoogle Scholar
- 2.Stouffer SA (1940) Intervening opportunities: a theory relating mobility and distance. Am Sociol Rev 5:845–867 CrossRefGoogle Scholar
- 3.Block H, Marschak J (1960) Random orderings and stochastic theories of responses. In: Contributions to probability and statistics, vol 2 pp 97–132 Google Scholar
- 4.Lukermann F, Porter PW (1960) Gravity and potential models in economic geography. Ann Assoc Am Geogr 50:493–504 CrossRefGoogle Scholar
- 5.Simini F, González MC, Maritan A, Barabási AL (2012) A universal model for mobility and migration patterns. Nature 484:96–100 CrossRefGoogle Scholar
- 6.Yan XY, Zhao C, Fan Y, Di Z, Wang WX (2014) Universal predictability of mobility patterns in cities. J R Soc Interface 11:20140834 CrossRefGoogle Scholar
- 7.Yan XY, Wang WX, Gao ZY, Lai YC (2017) Universal model of individual and population mobility on diverse spatial scales. Nat Commun 8:1639 CrossRefGoogle Scholar
- 8.Liu E, Yan XY New parameter-free mobility model. Preprint. arXiv:1808.06363
- 9.Simini F, Maritan A, Néda Z (2013) Human mobility in a continuum approach. PLoS ONE 8(3):e60069 CrossRefGoogle Scholar
- 10.Ren Y, Ercsey-Ravasz M, Wang P, Gonzales MC, Toroczkai Z (2014) Predicting commuter flows in spatial networks using a radiation model based on temporal ranges. Nat Commun 5:5347 CrossRefGoogle Scholar
- 11.Varga L, Tóth G, Néda Z (2017) An improved radiation model and its applicability for understanding commuting patterns in Hungary. Reg Statist 6(2):27–38 CrossRefGoogle Scholar
- 12.Stefanouli M, Polyzos S (2017) Gravity vs radiation model: two approaches on commuting in Greece. Transp Res Proc 24:65–72 CrossRefGoogle Scholar
- 13.Wilson AG (1967) A statistical theory of spatial distribution models. Transp Res 1:253–269 CrossRefGoogle Scholar
- 14.Hua CI, Porell F (1979) A critical review of the development of the gravity model. Int Reg Sci Rev 4(2):97–126 CrossRefGoogle Scholar
- 15.Sheppard ES (1978) Theoretical underpinnings of the gravity hypothesis. Geogr Anal 10(4):386–402 CrossRefGoogle Scholar
- 16.Niedercorn JH, Bechdolt BV (1969) An economic derivation of the “gravity law” of spatial interaction. J Regional Sci 9(2):273–282 CrossRefGoogle Scholar
- 17.Domencich T, McFadden DL (2015) Urban travel demand: a behavioral analysis. North-Holland, Amsterdam Google Scholar
- 18.Biró TS, Néda Z (2018) Unidirectional random growth with resetting. Physica A 499:355–361 MathSciNetCrossRefGoogle Scholar
- 19.Thurner S, Kyriakopoulos F, Tallis C (2007) Unified model for network dynamics exhibiting nonextensive statistics. Phys Rev E 76:036111 CrossRefGoogle Scholar
- 20.CTPP 2006–2010 Census Tract Flows, Commuting data, American Community Survey. https://www.fhwa.dot.gov/planning/census_issues/ctpp/data_products/2006-2010_tract_flows/
- 21.Dash Nelson G, Rae A (2016) An economic geography of the United States: from commutes to megaregions. PLoS ONE 11(11):e0166083 CrossRefGoogle Scholar
- 22.2006–2010 Population distribution, American Community Survey. https://www.census.gov/geo/maps-data/data/tiger-data.html
- 23.2018 USA job openings accessed at 10.02.2018. https://www.indeed.com/
- 24.2011 Census Tract Flow, Commuting data, Hungary. http://www.ksh.hu
- 25.2011 Population distribution, Hungary. http://ec.europa.eu/eurostat/cache/GISCO/geodatafiles/GEOSTAT-grid-POP-1K-2011-V2-0-1.zip
- 26.2011 Census Tract Flow, Commuting data, Italy. http://www.istat.it/storage/cartografia/matrici_pendolarismo/matrici_pendolarismo_2011.zip
- 27.2011 Population distribution, Italy. http://ec.europa.eu/eurostat/cache/GISCO/geodatafiles/GEOSTAT-grid-POP-1K-2011-V2-0-1.zip

## Copyright information

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.