Skip to main content

Advertisement

Log in

A multi-source data fusion framework for joint population, expenditure, and time use synthesis

  • Published:
Transportation Aims and scope Submit manuscript

Abstract

Data are important components of any research; however, it is often the case that the required data are not readily available. Researchers often fuse multiple datasets to obtain the data required to complete their work. In urban simulation, spatially referencing data is of paramount importance to capture local variations in travel and preferences. Data fusion typically obfuscates the spatial reference by merging records from different locations. Population synthesis is used to match these fused household records to a plausible location based on aggregate sociodemographic statistics. In some cases, researchers must also synthesize the necessary data. This paper outlines a data fusion workflow for a statistically valid synthetic population for use in urban simulation models. We develop the framework for the case of household-level expenditure and individuals’ time use patterns. The Greater Toronto Area (GTA) in Canada is used as a testbed. The results of the data fusion and synthesis are validated against statistics from a large-sample travel survey conducted in the GTA, showing a good fit with the validation dataset. Finally, we outline how the framework could be applied in other contexts where a single dataset is unavailable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Throughout the paper, we use the term expenditure to refer to monetary spending only. Time is referenced as either time spent or time allocated to an activity.

  2. Ye et al. (2017) provide pseudo code for the algorithm in the original paper.

References

Download references

Acknowledgements

The study was funded by an NSERC CGS-D Scholarship and a CRDCN Emerging Scholar Grant by the first author and an NSERC Discovery Grant by the second author.

Author information

Authors and Affiliations

Authors

Contributions

The authors confirm contribution to the paper as follows: study conception and design: JH; Analysis and interpretation of results: J. Hawkins; Draft manuscript preparation: JH. Overall supervision: KMNH. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Jason Hawkins.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Figs. 7, 8, 9 and 10.

Fig. 7
figure 7

Synthetic data construction by joint distribution inference

Fig. 8
figure 8

Synthetic household construction from synthetic individual pool

Fig. 9
figure 9

Weekly time use pattern construction from daily time use patterns

Fig. 10
figure 10

Synthetic population construction for GTA from synthetic household pool

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hawkins, J., Habib, K.N. A multi-source data fusion framework for joint population, expenditure, and time use synthesis. Transportation 50, 1323–1346 (2023). https://doi.org/10.1007/s11116-022-10279-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11116-022-10279-8

Keywords

Navigation