Abstract
Dataset shift is present in almost all real-world applications, since most of them are constantly dealing with changing environments. Detecting fractures in datasets on time allows recalibrating the models before a significant decrease in the model’s performance is observed. Since small changes are normal in most applications and do not justify the efforts that a model recalibration requires, we are only interested in identifying those changes that are critical for the correct functioning of the model. In this work we propose a model-dependent backtesting strategy designed to identify significant changes in the covariates, relating a confidence zone of the change to a maximal deviance measure obtained from the coefficients of the model. Using logistic regression as a predictive approach, we performed experiments on simulated data, and on a real-world credit scoring dataset. The results show that the proposed method has better performance than traditional approaches, consistently identifying major changes in variables while taking into account important characteristics of the problem, such as sample sizes and variances, and uncertainty in the coefficients.
Similar content being viewed by others
References
Anderson R (2007). The Credit Scoring Toolkit. Oxford University Press: New York, USA.
Baesens B (2014). Analytics in a Big Data World. John Wiley and Sons: New York, USA.
Baesens B, Mues C, Martens D and Vanthienen J (2009). 50 years of data mining and or: Upcoming trends and challenges. Journal of the Operational Research Society 60 (S1): 16–23.
Basu A, Harris IR and Basu S (1997). Minimum distance estimation: The approach using density-based distances. Handbook of Statistics. Vol. 15: Robust Inference Elsevier: North Holland, Netherlands, pp 21–48.
Bergtold J, Yeager E and Featherstone A (2011). Sample size and robustness of inferences from logistic regression in the presence of nonlinearity and multicollinearity. In: Proceedings of the Agricultural & Applied Economics Associations 2011 AAEA & NAREA Joint Annual Meeting. Pittsburg, Pennsylvania, USA.
Beyene J and Moineddin R (2005). Methods for confidence interval estimation of a ratio parameter with application to location quotients. BMC Medical Research Methodology 5 (32): 1–7.
Birón M and Bravo C (2014). On the discriminative power of credit scoring systems trained on independent samples. Data Analysis, Machine Learning and Knowledge Discovery. Springer International Publishing, pp 247–254.
Bravo C, Maldonado S and Weber R (2013). Granting and managing loans for micro-entrepreneurs: New developments and practical experiences. European Journal of Operational Research 227 (2): 358–366.
Castermans G, Hamers B, Van Gestel T and Baesens B (2010). An overview and framework for PD backtesting and benchmarking. The Journal of the Operational Research Society 61 (3): 359–373.
Cieslak D and Chawla N (2007). Detecting fractures in classifier performance. In: Proceedings of the Seventh IEEE International Conference on Data Mining, Department of Computer Science and Engineering, University of Notredame, Indiana, USA, pp 123–132.
Fieller EC (1954). Some problems in interval estimation. Journal of the Royal Statistical Society, Series B 16 (2): 175–185.
Hofer V and Krempl G (2013). Drift mining in data: A framework for addressing drift in classification. Computational Statistics and Data Analysis 57 (1): 377–391.
Hosmer D and Lemeshow H (2000). Applied Logistic Regression. John Wiley & Sons: Hoboken, New Jersey, USA.
Kelly M, Hand D and Adams N (1999). The impact of changing populations on classifier performance. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, California, USA, pp 367–371.
Lewis EM (1992). An Introduction to Credit Scoring. Fair, Isaac & Co., Inc: California, USA.
Lima E, Mues C and Baesens B (2011). Monitoring and backtesting churn models. Expert Systems with Applications 38 (1): 975–982.
Moreno-Torres JG, Raeder TR, Aláiz-Rodríguez R, Chawla NV and Herrera F (2012). A unifying view on dataset shift in classification. Pattern Recognition 45 (1): 521–530.
Quiñonero Candela J, Sugiyama M, Schwaighofer A and Lawrence ND (eds). (2009). Dataset Shift in Machine Learning. MIT Press: Cambridge, Massachusetts, USA.
Robinson S, Brooks R and Lewis C (2002). Detecting shifts in the mean of a simulation output process. Journal of the Operational Research Society 53 (5): 559–573.
Schenker N and Gentleman JF (2001). On judging the significance of differences by examining the overlap between confidence intervals. The American Statistician 55 (3): 182–186.
Schlimmer J and Granger R (1986). Beyond incremental processing: tracking concept drift. In: Proceedings of the Fifth National Conference on Artificial Intelligence. San Francisco, CA, USA, pp 502–507.
Siddiqi N (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. John Wiley and Sons: Hoboken, New Jersey, USA.
Smirnov N (1948). Tables for estimating the goodness of fit of empirical distributions. Annals of Mathematical Statistics 19 (2): 279–281.
Utts JM and Heckard RF (2012). Mind on Statistics. 4th edn, Cengage Learning: Belmont, California, USA.
Yang Y, Wu X and Zhu X (2008). Conceptual equivalence for contrast mining in classification learning. Data and Knowledge Engineering 67 (3): 413–429.
Acknowledgements
The first author acknowledges the support of CONICYT Becas Chile PD-74140041. The second author was supported by CONICYT FONDECYT Initiation into Research 11121196. Both authors acknowledge the support of the Institute of Complex Engineering Systems (ICM: P-05-004- F, CONICYT: FBO16).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bravo, C., Maldonado, S. Fieller Stability Measure: a novel model-dependent backtesting approach. J Oper Res Soc 66, 1895–1905 (2015). https://doi.org/10.1057/jors.2015.18
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1057/jors.2015.18