Abstract
This paper suggests using a proportional hazard model to predict personal income, for the purpose of imputing missing income data in household travel surveys. The model has a hazard function that comprises two multiplicative components: (1) a non-parametric baseline hazard function that is dependent only on the income level and (2) a function that is dependent only on the other personal attributes of the survey respondents (excluding income). To estimate and validate the model, data is drawn from a travel characteristics survey conducted in Hong Kong in year 2001. The model is found to have a much higher accuracy when compared with a conventional ordered probit model based on the assumption that the logarithm of income is normally distributed.
Similar content being viewed by others
Notes
Although the likelihood function is in the form of an ordered logit model, it is a particular type of ordered logit model in which the logs of integrated baseline hazards (i.e., the δks) are assumed to be unknowns and treated as model parameters. The comparison model described in Sect. “An ordered probit model” is a standard ordered probit model because it is generally assumed that the logarithm of personal income follows a normal distribution.
References
Arup, Wilbur Smith Joint Venture.: Travel Characteristics Survey 2002, Final Report. Transport Department, Hong Kong Special Administrative Region, China (2003)
Ben-Akiva, M., Swait, J.: The Akaike likelihood ratio index. Transp. Sci. 20(2), 133–136 (1986)
Bhat C.R.: Estimation of travel demand models with grouped and missing income data. Travel demand modeling and network assignment models. Transp. Res. Rec.: J. Transp. Res. Board 1443, 45–53 (1994)
Bhat, C.R.: A hazard-based duration model of shopping activity with nonparametric baseline specification and nonparametric control for unobserved heterogeneity. Transp. Res. 30(3), 189–207 (1995)
Dempster, A.P., Laird, N., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B39, 1–38 (1977)
Han, A., Hausman, J.A.: Flexible parametric estimation duration and competing risk models. J. Appl. Econ. 15(1), 1–28 (1990)
Hensher, D.A., Mannering, F.L.: Hazard-based duration models and their application to transport analysis. Transp. Rev. 14(1), 63–82 (1994)
Kiefer, N.M.: Economic duration data and hazard functions. J. Econ. Lit. 16(2), 646–679 (1988)
Little, R.J.A., Rubin, D.M.: Statistical Analysis with Missing Data, 2nd edn. Wiley, New York (2002)
Madow, W.G., Nisselson, H., Olkin, I. (eds.): Incomplete Data in Sample Surveys. Academic Press, New York (1983)
Meyer, B.D.: Unemployment insurance and unemployment spells. Econometrica 58, 775–782 (1990)
Richardson, A.J., Loeis, M.: Estimation of missing income in household travel surveys. Forum Papers, 21st Australasian Transport Research Forum 1: 249–266 (1997)
Rubin, D.M.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (1987)
Yamamoto, T., Kitamura, R., Kimura, S.: Competing-risks-duration model of household vehicle transactions with indicators of changes in explanatory variables. Transp. Res. Rec.: J. Transp. Res. Board 1676, 116–123 (1999)
Acknowledgments
This study is supported by research grants HKU 7132/03E and No. 717306 from the Hong Kong Research Grant Council. The Transport Department of Hong Kong is gratefully acknowledged for providing survey data for model development. Two anonymous referees provided useful comments that resulted in improvements on an earlier draft of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tong, C.O., Lee, J.K.L. The use of a hazard-based duration model for imputation of missing personal income data. Transportation 36, 565–579 (2009). https://doi.org/10.1007/s11116-009-9213-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11116-009-9213-0