Knowledge and Information Systems

, Volume 36, Issue 3, pp 731–747 | Cite as

Customer credit scoring based on HMM/GMDH hybrid model

  • Ge-Er Teng
  • Chang-Zheng HeEmail author
  • Jin Xiao
  • Xiao-Yi Jiang
Regular Paper


Hidden Markov model (HMM) has made great achievements in many fields such as speech recognition and engineering. However, due to its assumption of state conditional independence between observations, HMM has a very limited capacity for recognizing complex patterns involving more than first-order dependencies in customer relationships management. Group Method of Data Handling (GMDH) could overcome the drawbacks of HMM, so we propose a hybrid model by combining the HMM and GMDH to score customer credit. There are three phases in this model: training HMM with multiple observations, adding GMDH into HMM and optimizing the hybrid model. The proposed hybrid model is compared with other exiting methods in terms of average accuracy, Type I error, Type II error and AUC. Experimental results show that the proposed method has better performance than HMM/ANN in two credit scoring datasets. The implementation of HMM/GMDH hybrid model allows lenders and regulators to develop techniques to measure customer credit risk.


Hybrid model HMM GMDH Credit scoring CRM 



This research is supported by the Natural Science Foundation of China under Grant Nos. 71071101, 71101100 and 71211130018, New Teachers Fund for Doctor Stations, Ministry of Education under Grant No. 20110181120047, China Postdoctoral Science Foundation under Grant No. 2011M500418, Research Start-up Project of Sichuan University under Grant No. 2010SCU11012.


  1. 1.
    Abdou H, Pointon J, Elmasry A (2008) Neural nets versus conventional techniques in credit scoring in Egyptian banking. Expert Syst Appl 35(3):1275–1292CrossRefGoogle Scholar
  2. 2.
    Aksenova TI, Yurachkovsky YP (1988) A characterisation at unbiased structure and conditions of their J-optimality. Sov J Autom Inf Sci 21(4):36–42zbMATHGoogle Scholar
  3. 3.
    Anastasakis L, Mort N (2009) Exchange rate forecasting using a combined parametric and nonparametric self-organising modelling approach. Expert Syst Appl 36(10):12001–12011CrossRefGoogle Scholar
  4. 4.
    Anonymous Articles, software, books and presentations about the group method of data handling.
  5. 5.
    Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245–271MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Bourlard H, Morgan N, Wooters C, Renals S (1992) CDNN: a context dependent neural network for continuous speech recognition. In: IEEE international conference on acoustics, speech, and signal processing, vol 2, pp 349–352Google Scholar
  7. 7.
    Bourlard H, Wellekens C (1990) Links between Markov models and multilayer perceptrons. IEEE Trans Pattern Anal Mach Intell 12(12):1167–1178CrossRefGoogle Scholar
  8. 8.
    Bystroff C, Thorsson V, Baker D (2000) HMMSTR: a Hidden Markov Model for local sequence-structure correlations in proteins. J Mol Biol 301(1):173–190CrossRefGoogle Scholar
  9. 9.
    Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27CrossRefGoogle Scholar
  10. 10.
    Crook JN, Edelman DB, Thomas LC (2007) Recent developments in consumer credit risk assessment. Eur J Oper Res 183(3):1447–1465MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Frank A, Asuncion A (2010) UCI machine learning repository.
  12. 12.
    Gupta JND, Smith KA (2003) Neural networks in business: techniques and applications. IRM Press, USAGoogle Scholar
  13. 13.
    Henley WE, Dj Hand (1997) Construction of a k-nearest-neighbour credit-scoring system. IMA J Manag Math 8(4):305–321zbMATHCrossRefGoogle Scholar
  14. 14.
    Ivakhnenko A (1976) The group method of data handling in prediction problems. Sov Autom Control 9(6):21–30MathSciNetGoogle Scholar
  15. 15.
    Ivakhnenko A, Stepashko V (1985) Noise immunity of modeling. Naukova Dumka, KievGoogle Scholar
  16. 16.
    Joanes DN (1993) Reject inference applied to logistic regression for credit scoring. IMA J Manag Math 5(1):35–43CrossRefGoogle Scholar
  17. 17.
    Kayasith P, Theeramunkong T (2011) Pronouncibility index (\(\rm {\Pi }\)): a distance-based and confusion-based speech quality measure for dysarthric speakers. Knowl Inf Syst 27(3):367–391CrossRefGoogle Scholar
  18. 18.
    Khashman A (2010) Neural networks for credit risk evaluation: investigation of different neural models and learning schemes. Expert Syst Appl 37(9):6233–6239CrossRefGoogle Scholar
  19. 19.
    Kim Y (2006) Toward a successful CRM: variable selection, sampling, and ensemble. Decis Support Syst 41(2):542–553CrossRefGoogle Scholar
  20. 20.
    Laitinen EK (1999) Predicting a corporate credit analyst’s risk estimate by logistic and linear models. Int Rev Financ Anal 8(2):97–121CrossRefGoogle Scholar
  21. 21.
    Lee KF (1988) On large-vocabulary speaker-independent continuous speech recognition. Speech Commun 7(4):375–379CrossRefGoogle Scholar
  22. 22.
    Lee TS, Chiu CC, Chou YC, Lu CJ (2006) Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Comput Stat Data Anal 50(4):1113–1130MathSciNetCrossRefGoogle Scholar
  23. 23.
    Lee TS, Chiu CC, Lu CJ, Chen IF (2002) Credit scoring using the hybrid neural discriminant technique. Expert Syst Appl 23(3):245–254CrossRefGoogle Scholar
  24. 24.
    Lin SL (2009) A new two-stage hybrid approach of credit risk in banking industry. Expert Syst Appl 36(4):8333–8341CrossRefGoogle Scholar
  25. 25.
    Lukashin AV, Borodovsky M (1998) GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res 26(4):1107–1115CrossRefGoogle Scholar
  26. 26.
    Madala H, Ivakhnenko A (1994) Inductive learning algorithms for complex systems modeling. CRC press, Boca RatonzbMATHGoogle Scholar
  27. 27.
    Morgan N, Bourlard H (1990) Continuous speech recognition using multilayer perceptrons with Hidden Markov Models. In: International conference on acoustics, speech, and signal processing, vol 1, pp 413–416Google Scholar
  28. 28.
    Mueller JA, Lemke F (1999) Self-organising data mining: an intelligent approach to extract knowledge from data. ScriptSoftware International, BerlinGoogle Scholar
  29. 29.
    Oguz H, Gurgen F (2008) Credit risk analysis using Hidden Markov Model. In: International symposium on computer and information sciences, pp 1–5Google Scholar
  30. 30.
    Oliveira ALI, Braga PL, Lima RMF, Cornlio ML (2010) GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation. Inf Softw Technol 52(11):1155–1166CrossRefGoogle Scholar
  31. 31.
    Pudil P, Novovicová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125CrossRefGoogle Scholar
  32. 32.
    Rabiner L (1989) A tutorial on Hidden Markov Models and selected applications in speech recognition. In: Proceedings of the IEEE vol 77(2), pp 257–286Google Scholar
  33. 33.
    Abdel-Aal RE (2005) GMDH-based feature ranking and selection for improved classification of medical data. J Biomed Inform 38(6):456–468CrossRefGoogle Scholar
  34. 34.
    Robinson A (1994) An application of recurrent nets to phone probability estimation. IEEE Trans Neural Netw 5(2):298–305CrossRefGoogle Scholar
  35. 35.
    Rosenberg E, Gleit A (1994) Quantitative methods in credit management: a survey. Oper Res 42(4): 589–613zbMATHCrossRefGoogle Scholar
  36. 36.
    Schenk J, Rigoll G (2006) Novel hybrid NN/HMM modelling techniques for on-line handwriting recognition. In: Tenth international workshop on frontiers in handwriting recognition. SuvisoftGoogle Scholar
  37. 37.
    Smyth P (1994) Hidden Markov models for fault detection in dynamic systems. Pattern Recognit 27(1):149–164CrossRefGoogle Scholar
  38. 38.
    Srivastava A, Kundu A, Sural S, Majumdar A (2008) Credit card fraud detection using Hidden Markov Model. IEEE Trans Dependable Secur Comput 5(1):37–48CrossRefGoogle Scholar
  39. 39.
    Steiger DM, Sharda R (1996) Analyzing mathematical models with inductive learning networks. Eur J Oper Res 93(2):387–401zbMATHCrossRefGoogle Scholar
  40. 40.
    Thomas LC (2000) A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. Int J Forecast 16(2):149–172zbMATHCrossRefGoogle Scholar
  41. 41.
    Trentin E, Gori M (2001) A survey of hybrid ANN/HMM models for automatic speech recognition. Neurocomputing 37(1–4):91–126zbMATHCrossRefGoogle Scholar
  42. 42.
    Wang B, Japkowicz N (2010) Boosting support vector machines for imbalanced data sets. Knowl Inf Syst 25(1):1–20CrossRefGoogle Scholar
  43. 43.
    Wei H, He J, Tan J (2011) Layered hidden Markov models for real-time daily activity monitoring using body sensor networks. Knowl Inf Syst 29(2):479–494CrossRefGoogle Scholar
  44. 44.
    West D (2000) Neural network credit scoring models. Comput Oper Res 27(11–12):1131–1152zbMATHCrossRefGoogle Scholar
  45. 45.
    Westgaard S, van der Wijst N (2001) Default probabilities in a corporate bank portfolio: a logistic model approach. Eur J Oper Res 135(2):338–349zbMATHCrossRefGoogle Scholar
  46. 46.
    Xiao J, He CZ (2010) SODM based multiple classifiers fusion and its application in customer classification. J Ind Eng/Eng Manag 24(4):71–77Google Scholar
  47. 47.
    Young SJ, Evermann G, Gales MJF, Hain T, Kershaw D, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland PC (2006) The HTK book, version 3.4. Cambridge University Engineering Department, Cambridge, UKGoogle Scholar
  48. 48.
    Yu L, Wang SY, Lai KK (2008) Credit risk assessment with a multistage neural network ensemble learning approach. Expert Syst Appl 34(2):1434–1444CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2012

Authors and Affiliations

  • Ge-Er Teng
    • 1
  • Chang-Zheng He
    • 1
    Email author
  • Jin Xiao
    • 1
  • Xiao-Yi Jiang
    • 2
  1. 1.Business School of Sichuan UniversityChengduChina
  2. 2.Department of Mathematics and Computer ScienceUniversity of MünsterMünsterGermany

Personalised recommendations