Journal of the Operational Research Society

, Volume 62, Issue 6, pp 1067–1074 | Cite as

Statistical merging of rating models

Theoretical Paper


In this paper we introduce and discuss statistical models aimed at predicting default probabilities of Small and Medium Enterprises (SME). Such models are based on two separate sources of information: quantitative balance sheet ratios and qualitative information derived from the opinion mining process on unstructured data. We propose a novel methodology for data fusion in longitudinal and survival duration models using quantitative and qualitative variables separately in the likelihood function and then combining their scores linearly by a weight, to obtain the corresponding probability of default for each SME. With a real financial database at hand, we have compared the results achieved in terms of model performance and predictive capability using single models and our own proposal. Finally, we select the best model in terms of out-of-sample forecasts considering key performance indicators.


predictive models Bayesian merging probability of default parametric models survival analysis model selection 


  1. Abrahams CR and Zhang M (2009). Credit Risk Assessment: The New Lending System for Borrowers, Lenders and Investors. Wiley: Chichester, NY.Google Scholar
  2. Abrahams CR and Zhang M (2008). Fair Lending Compliance: Intelligence and Implications for Credit Risk Management. Wiley: Chichester, NY.Google Scholar
  3. Akaike H (1973). Information theory and an extension of the maximum likelihood principle . Second International Symposium on Information Theory, pp. 267–281.Google Scholar
  4. Altman E and Sabato G (2006). Modelling credit risk for SMEs: Evidence from the US market . ABACUS 19(6): 716–723.Google Scholar
  5. Bernardo J and Smith A (1994). Bayesian Theory . Wiley: London.CrossRefGoogle Scholar
  6. Burnham KP and Anderson DR (1998). Model Selection and Inference: A Practical Information—Theoretic Approach . Springer-Verlag: New York.CrossRefGoogle Scholar
  7. Choi Y, Cardie C, Riloff E and Patwardhan S (2005). Identifying sources of opinions with conditional random fields and extraction patterns. In: Proceedings of the 2005 Human Language Technology Conference/Conference on Empirical Methods in Natural Language Processing. Available at ACL Anthology, a digital archive of research pages in computational linguistics.Google Scholar
  8. Cohen J (1960). A coefficient of agreement for nominal scales . Educ Psychol Meas 20: 3746.CrossRefGoogle Scholar
  9. Crook J, Edelman D and Thomas L (2006). Recent developments in consumer credit risk assessment. Eur J Opl Res 183: 1569–1581.Google Scholar
  10. DeLong E, DeLong D and Clarke-Pearson D (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach . Biometrics 44: 837–845.CrossRefGoogle Scholar
  11. Dobson AJ (2002). Introduction to Generalised Linear Model . Chapman Hall: London.Google Scholar
  12. Duffie D and Singleton KJ (1997). An econometric model of the term structure of interest-rate swap yields . J Financ 52: 1287–1322.CrossRefGoogle Scholar
  13. Figini S and Giudici P (2009). Applied Data Mining for Business and Industry . Wiley: London.Google Scholar
  14. Fuertes AM and Kalotychou E (2006). Early warning systems for sovereign debt crises: The role of heterogeneity . Comput Stat Data An 51: 1420–1441.CrossRefGoogle Scholar
  15. Granger C and Pesaran M (2000). Economic and statistical measures of forecast accuracy . J Forecasting 19: 537–560.CrossRefGoogle Scholar
  16. Hastie T, Tibshirani R and Friedman JH (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction . Springer Verlag: New York.CrossRefGoogle Scholar
  17. Hosmer DW and Lemeshow S (2000). Applied Logistic Regression . Wiley: New York.CrossRefGoogle Scholar
  18. Kohavi R and John G (1997). Wrappers for feature subset selection . Artif Int 97: 273–324.CrossRefGoogle Scholar
  19. Ku LW, Liang Y-T, and Chen H-H (2006). Opinion extraction, summarization and tracking in news and blog Corpora. In: Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs.Google Scholar
  20. Li Y, Bontcheva K and Cunningham H (2004). An SVM based learning algorithm for information extraction. Machine Learning Workshop. Sheffield.Google Scholar
  21. Lin W, Wilson T, Wiebe J and Hauptmann A (2006). Which side are you on? Identifying perspectives at the document and sentence Levels. In: Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), June, New York. AAAI Press, Menlo Park, CA, pp 109–116.Google Scholar
  22. Merton RC (1974). On the pricing of corporate debt: The risk structure of interest rates . J Finance 29: 449–470.Google Scholar
  23. Riloff E, Patwardhan S and Wiebe J (2006). Feature subsumption for opinion analysis. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-06). Available at ACL Anthology, a digital archive of research pages in computational linguistics.Google Scholar
  24. Schwarz G (1978). Estimating the dimension of a model . Ann Stat 6: 461–464.CrossRefGoogle Scholar
  25. Vapnik V (1998). Statistical Learning Theory . Wiley: New York (Chichester, UK).Google Scholar
  26. Webb A (2003). Statistical Pattern Recognition, 2nd edn. Wiley: Chichester, NY.Google Scholar
  27. Wiebe J, Wilson T and Cardie C (2005). Annotating expressions of opinions and emotions in language . Lang Resour Eval 39(2–3): 165210.Google Scholar

Copyright information

© Operational Research Society 2010

Authors and Affiliations

  1. 1.University of Pavia, PaviaItaly

Personalised recommendations