A multi-source data fusion framework for joint population, expenditure, and time use synthesis

Hawkins, Jason; Habib, Khandker Nurul

doi:10.1007/s11116-022-10279-8

A multi-source data fusion framework for joint population, expenditure, and time use synthesis

Published: 26 March 2022

Volume 50, pages 1323–1346, (2023)
Cite this article

Transportation Aims and scope Submit manuscript

448 Accesses
1 Citation
Explore all metrics

Abstract

Data are important components of any research; however, it is often the case that the required data are not readily available. Researchers often fuse multiple datasets to obtain the data required to complete their work. In urban simulation, spatially referencing data is of paramount importance to capture local variations in travel and preferences. Data fusion typically obfuscates the spatial reference by merging records from different locations. Population synthesis is used to match these fused household records to a plausible location based on aggregate sociodemographic statistics. In some cases, researchers must also synthesize the necessary data. This paper outlines a data fusion workflow for a statistically valid synthetic population for use in urban simulation models. We develop the framework for the case of household-level expenditure and individuals’ time use patterns. The Greater Toronto Area (GTA) in Canada is used as a testbed. The results of the data fusion and synthesis are validated against statistics from a large-sample travel survey conducted in the GTA, showing a good fit with the validation dataset. Finally, we outline how the framework could be applied in other contexts where a single dataset is unavailable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Integration to Create Large-Scale Spatially Detailed Synthetic Populations

Getting the best of both worlds: a framework for combining disaggregate travel survey data and aggregate mobile phone data for trip generation modelling

Article Open access 22 July 2020

Uncovering temporal changes in Europe’s population density patterns using a data fusion approach

Article Open access 15 September 2020

Notes

Throughout the paper, we use the term expenditure to refer to monetary spending only. Time is referenced as either time spent or time allocated to an activity.
Ye et al. (2017) provide pseudo code for the algorithm in the original paper.

References

Astroza, S., Pinjari, A.R., Bhat, C.R., Jara-Díaz, S.R.: A microeconomic theory-based latent class multiple discrete-continuous choice model of time use and goods consumption. Transp. Res. Rec. 2664, 31–41 (2017). https://doi.org/10.3141/2664-04
Article Google Scholar
Backor, K., Golde, S., Nie, N.: Estimating survey fatigue in time use study. Paper Presented at the 2007 International Association for Time Use Research Conference, Washington, D.C., pp. 1–59 (2007)
Barthelemy, J., Toint, P.L.: Synthetic population generation without a sample. Transp. Sci. 47(2), 266–279 (2013). https://doi.org/10.1287/trsc.1120.0408
Article Google Scholar
Bhat, C.R.: A new generalized heterogeneous data model (GHDM) to jointly model mixed types of dependent variables. Transp. Res. B 79, 50–77 (2015). https://doi.org/10.1016/j.trb.2015.05.017
Article Google Scholar
Browning, M., Gørtz, M.: Spending time and money within the household. Scand. J. Econ. 114(3), 681–704 (2012)
Google Scholar
Dane, G., Arentze, T.A., Timmermans, H.J.P., Ettema, D.: Simultaneous modeling of individuals’ duration and expenditure decisions in out-of-home leisure activities. Transp. Res. Part A 70, 93–103 (2014). https://doi.org/10.1016/j.tra.2014.10.003
Article Google Scholar
Fang, L., Zhu, G.: Time allocation and home production technology. J. Econ. Dyn. Control 78, 88–101 (2017). https://doi.org/10.1016/j.jedc.2017.02.009
Article Google Scholar
Gargiulo, F., Ternes, S., Huet, S., Deffuant, G.: An iterative approach for generating statistically realistic populations of households. PLoS ONE (2010). https://doi.org/10.1371/journal.pone.0008828
Article Google Scholar
Hössinger, R., Aschauer, F., Jara-Díaz, S., Jokubauskaite, S., Schmid, B., Peer, S., Axhausen, K.W., Gerike, R.: A joint time-assignment and expenditure-allocation model: value of leisure and value of time assigned to travel for specific population segments. Transportation (2019). https://doi.org/10.1007/s11116-019-10022-w
Article Google Scholar
Huynh, N., Barthélemy, J., Perez, P.: A heuristic combinatorial optimisation approach to synthesising a population for agent-based modelling purposes. JASSS (2016). https://doi.org/10.18564/jasss.3198
Article Google Scholar
Jara-Díaz, S.R.: On the goods-activities technical relations in the time allocation theory. Transportation 30(3), 245–260 (2003). https://doi.org/10.1023/A:1023936911351
Article Google Scholar
Jara-Díaz, S., Rosales-Salas, J.: Understanding time use: Daily or weekly data? Transp. Res. Part A (2015). https://doi.org/10.1016/j.tra.2014.07.009
Article Google Scholar
Jara-Díaz, S.R., Munizaga, M.A., Greeven, P., Guerra, R., Axhausen, K.: Estimating the value of leisure from a time allocation model. Transp. Res. Part B 42(10), 946–957 (2008). https://doi.org/10.1016/j.trb.2008.03.001
Article Google Scholar
Jeong, B., Lee, W., Kim, D.-S., Shin, H.: Copula-based approach to synthetic population generation. PLoS ONE (2016). https://doi.org/10.1371/journal.pone.0159496
Article Google Scholar
Konduri, K.C., Tagle, S.A., Sana, B., Pendyala, R.M., Jara-díaz, S.R.: A joint analysis of time use and consumer expenditure data: An examination of two alternative approaches to deriving values of time. Transp. Res. Rec. 2231, 53–60 (2011)
Article Google Scholar
Lee, A.: Generating synthetic microdata from published marginal tables and confidentialised files. Comput. Sci. 17, 1–121 (2009)
Google Scholar
Lenorm, M., Deffuant, G.: Generating a synthetic population of individuals in households: Sample-free vs sample-based methods. JASSS (2013). https://doi.org/10.18564/jasss.2319
Article Google Scholar
Lohr, S.L.: Sampling: Design and Data Analysis, 2nd edn. Cengage Learning, Brooks/Cole (2010)
Google Scholar
Malatest, & DMG: Transportation tomorrow survey 2016. http://dmg.utoronto.ca/transportation-tomorrow-survey/tts-reports (2018)
Munizaga, M., Jara-Díaz, S., Olguín, J., Rivera, J.: Generating twins to build weekly time use data from multiple single day OD surveys. Transportation 38(3), 511–524 (2011). https://doi.org/10.1007/s11116-010-9311-z
Article Google Scholar
Pu, Y., Dai, S., Gan, Z., Wang, W., Wang, G., Zhang, Y., Henao, R., Carin, L.: JointGAN: Multi-Domain Joint Distribution Learning with Generative Adversarial Nets. (2018)
Rubin, D.B.: Discussion: Statistical disclosure limitation. J. off. Stat. 9(2), 461–468 (1993)
Google Scholar
Saadi, I., Mustafa, A., Teller, J., Farooq, B., Cools, M.: Hidden Markov model-based population synthesis. Transp. Res. Part B 90, 1–21 (2016). https://doi.org/10.1016/j.trb.2016.04.007
Article Google Scholar
Saadi, I., Farooq, B., Mustafa, A., Teller, J., Cools, M.: An efficient hierarchical model for multi-source information fusion. Expert Syst. Appl. 110, 352–362 (2018). https://doi.org/10.1016/j.eswa.2018.06.018
Article Google Scholar
Sakshaug, J.W., Raghunathan, T.E.: Generating synthetic microdata to estimate small area statistics in the American Community Survey. Stat. Transit. 15(3), 341–368 (2014)
Google Scholar
van Nostrand, C., Sivaraman, V., Pinjari, A.: Analysis of long-distance vacation travel demand in the United States: A multiple discrete-continuous choice framework. Transportation 40(1), 151–171 (2013). https://doi.org/10.1007/s11116-012-9397-6
Article Google Scholar
Williams, L.J., Hartman, N., Cavazotte, F.: Method variance and marker variables: A review and comprehensive cfa marker technique. Org. Res. Methods 13(3), 477–514 (2010). https://doi.org/10.1177/1094428110366036
Article Google Scholar
Ye, P., Hu, X., Yuan, Y., Wang, F.Y.: Population synthesis based on joint distribution inference without disaggregate samples. JASSS (2017). https://doi.org/10.18564/jasss.3533
Article Google Scholar
Zhang, A., Kang, J.E., Axhausen, K., Kwon, C.: Multi-day activity-travel pattern sampling based on single-day data. Transp. Res. Part C 89(2017), 96–112 (2018). https://doi.org/10.1016/j.trc.2018.01.024
Article Google Scholar

Download references

Acknowledgements

The study was funded by an NSERC CGS-D Scholarship and a CRDCN Emerging Scholar Grant by the first author and an NSERC Discovery Grant by the second author.

Author information

Authors and Affiliations

Department of Civil & Environmental Engineering, University of Nebraska – Lincoln, 900 N 16 St., Lincoln, NE, 68588, USA
Jason Hawkins
Percy Edward Hart Professor in Civil & Mineral Engineering, University of Toronto, 35 St. George Street, Toronto, ON, M4S 1A4, Canada
Khandker Nurul Habib

Authors

Jason Hawkins
View author publications
You can also search for this author in PubMed Google Scholar
Khandker Nurul Habib
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The authors confirm contribution to the paper as follows: study conception and design: JH; Analysis and interpretation of results: J. Hawkins; Draft manuscript preparation: JH. Overall supervision: KMNH. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Jason Hawkins.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Figs. 7, 8, 9 and 10.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hawkins, J., Habib, K.N. A multi-source data fusion framework for joint population, expenditure, and time use synthesis. Transportation 50, 1323–1346 (2023). https://doi.org/10.1007/s11116-022-10279-8

Download citation

Accepted: 11 March 2022
Published: 26 March 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s11116-022-10279-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-source data fusion framework for joint population, expenditure, and time use synthesis

Abstract

Access this article

Similar content being viewed by others

Data Integration to Create Large-Scale Spatially Detailed Synthetic Populations

Getting the best of both worlds: a framework for combining disaggregate travel survey data and aggregate mobile phone data for trip generation modelling

Uncovering temporal changes in Europe’s population density patterns using a data fusion approach

Notes

References

Acknowledgements