Abstract
We apply the powerful, flexible, and computationally efficient nonparametric Classification and Regression Trees (CART) algorithm to analyze real estate mortgage data. CART is particularly appropriate for our data set because of its strengths in dealing with large data sets, high dimensionality, mixed data types, missing data, different relationships between variables in different parts of the measurement space, and outliers. Moreover, CART is intuitive and easy to interpret and implement. We discuss the pros and cons of CART in relation to traditional methods such as linear logistic regression, nonparametric additive logistic regression, discriminant analysis, partial least squares classification, and neural networks, with particular emphasis on real estate. We use CART to produce the first academic study of Israeli mortgage default data. We find that borrowers’ features, rather than mortgage contract features, are the strongest predictors of default if accepting icbadli borrowers is more costly than rejecting “good” ones. If the costs are equal, mortgage features are used as well. The higher (lower) the ratio of misclassification costs of bad risks versus good ones, the lower (higher) are the resulting misclassification rates of bad risks and the higher (lower) are the misclassification rates of good ones. This is consistent with real-world rejection of good risks in an attempt to avoid bad ones.
Similar content being viewed by others
References
Abu-Hanna, A., and N. de Keizer. (2003). “Integrating Classification Trees with Local Logistic Regression in Intensive Care Prognosis,” Artificial Intelligence in Medicine (Forthcoming).
Ambrose, B. W., and R. J. Buttimer, Jr. (2000). “Embedded Options in the Mortgage Contract,” The Journal of Real Estate Finance and Economics 21, 95 111.
Ambrose, B. W., and A. B. Sanders. (2003). “Commercial Mortgage Backed Securities: Prepayment and Default,” Journal of Real Estate Finance and Economics 26, 175–192.
Ambrose, B. W., R. J. Buttimer, Jr., and C. A. Capone, Jr. (1997). “Pricing Mortgage Default and Foreclosure Delay,” Journal of Money, Credit, and Banking 29, 314–325.
Ambrose, B. W., C. A. Capone, Jr., and Y. Deng. (2001). “Optimal Put Exercise: An Empirical Examination of Conditions for Mortgage Foreclosure,” Journal of Real Estate Finance and Economics 23, 213–234.
Averbook, B. J., P. Fu, J. S. Rao, and E. G. Mansour. (2002). “A Long-term Analysis of 1,018 Patients with Melanoma by Classic Cox Regression and Tree-structured Survival Analysis at a Major Referral Center: Implications on the Future of Cancer Staging,” Surgery 132, 589–604.
Bloch, D. A., R. A. Olshen, and M. G. Walker. (2002). “Risk Estimation for Classification Trees,” Journal of Computational and Graphical Statistics 11, 263–288.
Breault, J. L., C. R. Goodall, and P. J. Fos. (2002). “Data Mining a Diabetic Data Warehouse,” Artificial Intelligence in Medicine 26, 37–54.
Breiman, L. (1996). “Bagging Predictors,” Machine Learning 24, 123–140.
Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. (1998). Classification and Regression Trees, New York: Chapman and Hall/CRC.
Capozza, D. R., D. Kazarian, and T. A. Thomson. (1997). “Mortgage Default in Local Markets,” Real Estate Economics 25, 631–655.
Capozza, D. R., D. Kazarian, and T. A. Thomson. (1998). “The Conditional Probability of Mortgage Default,” Real Estate Economics 26, 359–390.
Chandy, P. R., and E. H. Duett. (1990). “Commercial Paper Rating Models,” Quarterly Journal of Business and Economics 29, 79–101.
Clauretie, T. (1990). “A Note on Mortgage Risk: Default vs. Loss Rates,” AREUEA Journal 18, 202–206.
De’ath, G., and K. E. Fabricius. (2000). “Classification and Regression Trees: A Powerful yet Simple Technique for Ecological Data Analysis,” Ecology 81, 3178–3192.
Deng, Y. (1997). “Mortgage Termination: An Empirical Hazard Model with a Stochastic Term Structure,” Journal of Real Estate Finance and Economics 14, 309–331.
Deng, Y., J. M. Quigley, and R. Van Order. (2000). “Mortgage Terminations, Heterogeneity and the Exercise of Mortgage Options,” Econometrica 68, 275–307.
DeVaney, S. (1994). “The Usefulness of Financial Ratios as Predictors of Household Insolvency: Two Perspectives,” Financial Counseling and Planning 5, 15–24.
Faraggi, D., M. LeBlanc, and J. Crowly. (2001). “Understanding Neural Networks Using Regression Trees: An Application to Multiple Myeloma Survival Data,” Statistics in Medicine 20, 2965–2975.
Fix, E., and J. Hodges. (1951). “Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties,” Technical Report, Randolph Field Texas, USAF School of Aviation Medicine.
Foster, C., and R. Van Order. (1984). “An Option-Based Model of Mortgage Default,” Housing Finance Review 3, 351–372.
Freund, Y., and R. E. Schapire. (1997). “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boostinglo,” Journal of Computer and System Sciences 55, 119–139.
Friedman, J. H. (1991). “Multivariate Adaptive Regression Splines,” Annals of Statistics 19, 1–141.
Frydman, H., E. I. Altman, and D. L. Kao. (1985). “Introducing Recursive Partitioning for Financial Classification: The Case of Financial Distress,” The Journal of Finance 40, 269–292.
Fu, C. Y. (2003). “Combining Loglinear Models with Regression Tree (CART): an Application to Birth Data,” Computational Statistics and Data Analysis (Forthcoming).
Gerritsen, R. (1999). “Assessing Loan Risks: A Data Mining Case Study,” Exclusive Ore, Pennsylvania.
Goel, P. K., S. O. Prasher, R. M. Patel, J. M. Landry, R. B. Bonnell, and A. A. Viau. (2003). “Classification of Hyperspectral Data by Decision Trees and Artificial Neural Networks to Identify Weed Stress and Nitrogen Status of Corn,” Computers and Electronics in Agriculture 39, 67–93.
Hastie, T., R. Tibshirani, and J. H. Friedman. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Series in Statistics, New York: Springer Verlag.
Haughton, D., and S. Oulabi. (1997). “Direct Marketing Modeling with CART and CHAID,” Journal of Interactive Marketing 11, 42–52.
Hoffman, H. J. (1990). “Die Unwendung des CART-Verfahrens zur Statistichen Bonitatanalyse von Konsumentenkreditenl,” ZeitSchrft-fur-Betriebswirtschaft 60, 941–962.
Karolyi, A., and A. B. Sanders. (1998). “The Variation of Economic Risk Premiums in Real Estate Returns,” Journal of Real Estate Finance and Economics 17, 245–262.
Kau, J. B., and D. C. Keenan. (1993). “Transaction Costs, Suboptimal Termination, and Default Probabilities for Mortgages,” AREUEA Journal 21, 247–263.
Kau, J. B., D. C. Keenan, W. J. Muller, III, and J. F. Epperson. (1992). “A Generalized Valuation Model for Fixed-Rate Residential Mortgages,” Journal of Money, Credit, and Banking 24, 279–299.
Kau, J. B., D. C. Keenan, and T. Kim. (1994). “Default Probabilities for Mortgages,” Journal of Urban Economics 35, 278–296.
Kennedy, D. (1992). “Classification Techniques in Accounting Research: Empirical Evidence of Comparative Performance,” Contemporary Accounting Research 2, 419–442.
Kolyshkina, I., and R. Brookes. (2002). “Data Mining Approaches to Modeling Insurance Risk,” Report, PriceWaterhouseCoopers.
Komorad, K. (2002). “On Credit Scoring Estimation,” Master’s Thesis, Institute for Statistics and Econometrics, Humboldt University, Berlin.
Kuhnert, P. M., K. A. Do, and R. McClure. (2000). “Combining Non-Parametric Models with Logistic Regression: An Application to Motor Vehicle Injury Data,” Computational Statistics and Data Analysis 34, 371–386.
Lekkas, V., J. M. Quigley, and R. Van Order. (1993). “Loan Loss Severity and Optimal Mortgage Default,” Journal of the American Real Estate and Urban Economics Association 21, 353–371.
Markham, I., B. G. Mathien, and B. Wray. (2000). “Kanban Setting Through Artificial Intelligence: A Comparative Study of Artificial Neural Networks and Decision Trees,” Integrated Manufacturing Systems: The International Journal of Manufacturing Technology Management 11, 239–246.
Mezrick, J. J. (1994). “When is a Tree a Hedge?” Financial Analysts Journal 50, 75–81.
Michie, D., D. J. Spieglehalter, and C. C. Taylor. (eds) (1994). Machine Learning, Neural and Statistical Classification, London: Ellis Horwood Ltd.
Miles, M. (1990). “What is The Value of U.S. Real Estate?” Real Estate Review 20, 69–75.
Moisen, G. G., and T. S. Frescino. (2002). “Comparing Five Modelling Techniques for Predicting Forest Characteristics,” Ecological Modelling 30, 209–225.
O’Brien, T. V., and P. E. Durfee. (1994). “Classification Tree Software,” Marketing Research 6, 36–39.
Pomykalski, J. J., W. F. Truszkowski, and D. E. Brown. (1999). “Expert Systems,” In J. Webster (ed.), Wiley Encyclopedia for Electrical and Electronics Engineering, New York: John Wiley & Sons, Inc.
Quigley, J. M., and R. Van Order. (1995). “Explicit Tests of Contingent Claims Models of Mortgage Default,” The Journal of Real Estate Finance and Economics 11, 99–117.
Rousu, J., L. Flander, M. Suutarinen, K. Autio, P. Kontkanen, and A. Rantanen. (2003). “Novel Computational Tools in Bakery Process Data Analysis: a Comparative Study,” Journal of Food Engineering 57, 45–56.
Sanders, A. B. (2002). “Government Sponsored Agencies: Do the Benefits Outweigh the Costs?” Journal of Real Estate Finance and Economics 25, 121–127.
Sorensen, E. H., K. L. Miller, and C. K. Ooi. (2000). “The Decision Tree Approach to Stock Selection,” Journal of Portfolio Management 27, 42–52.
Stanton, R., and N. Wallace. (1998). “Mortgage Choice: What is the Point?” Real Estate Economics 26, 173–205.
Thearling, K. (2002). “Scoring Your Customers,” http://www.thearling.com.
Tronstad, R., and R. Gum. (1994). “Cow Culling Decisions Aadapted for Management with CART,” American Journal of Agricultural Economics 76, 237–249.
Vandell, K. D. (1993). “Handing Over the Keys: A Perspective on Mortgage Default Research,” Journal of the American Real Estate and Urban Economics Association 21, 211–246.
Vandell, K. (1995). “How Ruthless is Mortgage Default?” Journal of Housing Research 6, 245–264.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Feldman, D., Gross, S. Mortgage Default: Classification Trees Analysis. J Real Estate Finan Econ 30, 369–396 (2005). https://doi.org/10.1007/s11146-005-7013-7
Issue Date:
DOI: https://doi.org/10.1007/s11146-005-7013-7