Skip to main content

Parametric versus nonparametric methods in risk scoring: an application to microcredit


The importance of credit access to improve economic opportunities in developing markets is well established in the literature. However, there exists a strong need to mitigate adverse selection problems in microlending. A risk scoring model that more accurately predicts the likelihood of repayment of potential borrowers can help address this market imperfection and to benefit both lenders and borrowers. This paper compares the performance of nonparametric versus semiparametric and traditional parametric risk scoring models based on default probabilities. We show the advantages of relying on less structured, data-driven methods for risk scoring using both simulated data and data from credit loans granted to small and microenterprises in rural Peru. The estimation results indicate that nonparametric methods lead to a better evaluation of credit worthiness and can help prevent including potential “bad” borrowers and excluding “good” borrowers from sensitive microcredit markets.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3


  1. 1.

    As of December 2010, microfinance institutions reported reaching more than 205 million borrowers worldwide (Maes and Reed 2012). A separate issue pertains to whether microcredit has been an effective tool to lift poor people out of poverty by funding their microenterprises and increasing their wealth, considering that a large number of small businesses have been created through microcredits but only few have matured into larger businesses. Recent work evaluating the impact of microfinance using randomized field experiments provide mixed evidence regarding the effects of microcredit on household income and consumption (e.g., Banerjee et al. 2010; Dupas and Robinson 2009; Karlan and Zinman 2011).

  2. 2.

    There are also concerns that lending institutions have managed to sustain low interest rates and relatively high default rates due to subsidies and soft loans. Grameen Bank, for example, which charges an average real interest rate of 10 %, experienced losses close to 18 % of their outstanding loans from 1985 to 1996 after properly adjusting for their portfolio size (Armendariz and Morduch 2005).

  3. 3.

    See also Schreiner (2000) for additional discussion on credit scoring in microfinance.

  4. 4.

    Microfinance data in developing countries have been rather unexploited in general terms, in part due to the lack of information sharing across lending institutions.

  5. 5.

    We could also consider a continuous variable measuring the percentage of loan (installments) repaid by each individual.

  6. 6.

    The assumption that the threshold is zero is without loss of generality provided that X includes a constant.

  7. 7.

    An alternative estimator can be found in Ichimura (1993), but it is less efficient than the estimator proposed by Klein and Spady for binary choice models.

  8. 8.

    Klein and Spady add a trimming function to the log likelihood function, although trimming does not seem to matter in their simulations. Single index models further require two identification conditions under which the parameter vector \(\beta \) and function \(g(\cdot )\) can be sensibly estimated. First, the set of explanatory variables \(X\) must contain at least one continuous variable. Second, \(\beta \) cannot be identified without some location and scale restrictions (normalizations). One popular location-normalization is to not include a constant in \(X\); one popular scale-normalization is to assume that the first component of \(X\) has a unit coefficient and that this first component is a continuous variable. For further details on single index model estimations refer to Li and Racine (2006).

  9. 9.

    An alternative selection method is the standard rule-of-thumb procedure in which the bandwidth for covariate \(X_s \) is defined as \(h_s =X_{s,sd} n^{{-1}/{(4+q)}}\), where \(X_{s,sd} \) is the sample standard deviation of \(X_s , n\) is the number of observations in the working sample, and \(q\) is the total number of covariates in \(X\).

  10. 10.

    In this sense, the local linear estimator is similar to the standard linear probability model. We thank an anonymous referee for noting this.

  11. 11.

    See Racine (2008) for further details on nonparametric conditional mode models.

  12. 12.

    While the Probit model is implemented in Stata, the single index and nonparametric models are implemented in R using the np package.

  13. 13.

    McFadden et al. (1977) performance measure is equal to \(p_{11} +p_{22} -p_{12}^2 -p_{21}^2 \), where \(p_{ij} \) is the ijth entry (expressed as a fraction of the sum of all entries) in the 2 \(\times \) 2 confusion matrix of actual versus predicted (0,1) outcomes.

  14. 14.

    The Logit and linear probability model also perform very similar to the Probit model. Details are available upon request.

  15. 15.

    Note also that the differences in the MSPEs across models are more pronounced for “high” asset values, largely explained by the much lower correct default classification rate of the Probit and single index models.

  16. 16.

    Of course, it is possible that the odds of defaulting are linear to all covariates; but still in this (implausible) scenario, data-driven methods will perform at least similar to linear models.

  17. 17.

    The name of the bank is omitted due to confidentiality reasons.

  18. 18.

    Unfortunately, we only have information on asset (real estate) ownership but not on asset value. We also do not have information on debt ratio.

  19. 19.

    We estimate a random-effects Probit model since a client may be observed more than once in the database.

  20. 20.

    We also considered alternative data partitions (70–30 and 50–50 %) and obtained qualitatively similar results. The results are also not sensitive to repeated 60–40 % data partitions.

  21. 21.

    As indicated above, the local linear model may yield fitted values greater than one or less than zero. In this case, the fitted values range between \(-\)0.01 and 1.06, where 14 observations (out of 1,739) are greater than one and one observation is less than zero.

  22. 22.

    The predictive performance (both in-sample and out-of-sample) of the Logit and linear probability model are very similar to the performance of the Probit model. Further details are available upon request.

  23. 23.

    We also do not account for the probability of crop failure or climate conditions, but these variables are unlikely to explain default behavior in this case since the loans analyzed were granted to smallholder famers operating in a particular rural area in Peru.

  24. 24.

    The nonparametric method also points toward a nonlinear relationship between the odds of defaulting and other covariates.

  25. 25.

    Recent studies have also shown the potential gains of establishing a credit bureau system in microlending (de Janvry et al. 2010; Luoto et al. 2007).


  1. Armendariz B, Morduch J (2005) The economics of microfinance. MIT Press, Cambridge

    Google Scholar 

  2. Banerjee A, Duflo E, Glennerster R, Kinnan C (2010) The miracle of microfinance? Evidence from a randomized evaluation. Working paper, MIT Poverty Action Lab

  3. Capon N (1982) Credit scoring systems: a critical analysis. J Market 46(2):82–91

    Article  Google Scholar 

  4. Coleman B (2006) Microfinance in Northeast Thailand: who benefits and how much? World Dev 34(9):1612–1638

    Article  Google Scholar 

  5. de Janvry A, McIntosh C, Sadoulet E (2010) The supply- and demand-side impacts of credit market information. J Dev Econ 93(2):173–188

    Article  Google Scholar 

  6. Dupas P, Robinson J (2009) Savings constraints and microenterprise development: evidence from a field experiment in Kenya. NBER Working Paper No. 14693

  7. Fan J, Gijbels I (1996) Local polynomial modeling and its applications. Chapman and Hall, London

    Google Scholar 

  8. Ghosh P, Mookherjee D, Ray D (2000) Credit rationing in developing countries: an overview of the theory. In: Mookherjee D, Ray D (eds) Readings in the theory of development economics. Blackwell, London

    Google Scholar 

  9. Hand D, Henley W (1997) Statistical classification methods in consumer credit scoring: a review. J R Stat Soc Ser A 160(3):523–541

    Article  Google Scholar 

  10. Ichimura H (1993) Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J Econom 58(1–2):71–120

    Article  Google Scholar 

  11. Karlan D, Zinman J (2011) Microcredit in theory and practice: using randomized credit scoring for impact evaluation. Science 332:1278–1284

    Article  Google Scholar 

  12. Khandker S (2005) Microfinance and poverty: evidence using panel data from Bangladesh. World Bank Econ Rev 19(2):263–286

    Article  Google Scholar 

  13. Klein R, Spady R (1993) An efficient semiparametric estimator for binary response models. Econometrica 61(2):387–421

    Article  Google Scholar 

  14. Li Q, Racine J (2004) Nonparametric estimation of regression functions with both categorical and continuous data. J Econom 119(1):99–130

    Article  Google Scholar 

  15. Li Q, Racine J (2006) Nonparametric econometrics: theory and practice. Princeton University Press, Princeton

    Google Scholar 

  16. Luoto J, McIntosh C, Wydick B (2007) Credit information systems in less developed countries: a test with microfinance in Guatemala. Econ Dev Cult Change 55(2):313–334

    Article  Google Scholar 

  17. Maes J, Reed L (2012) State of the Microcredit Summit Campaign Report 2012. Microcredit Summit Campaign

  18. McFadden D, Puig C, Kirschner D (1977) Determinants of the long-run demand for electricity. Proc Am Stat Assoc 1:109–117

    Google Scholar 

  19. Pregibon D (1979) Data analytic methods for generalized linear models. PhD dissertation, University of Toronto

  20. Racine J (1997) Consistent significance testing for nonparametric regression. J Bus Econ Stat 15(3):369–378

    Google Scholar 

  21. Racine J (2008) Nonparametric econometrics: a primer. Found Trends Econom 3(1):1–88

    Article  Google Scholar 

  22. Racine J, Hart J, Li Q (2006) Testing the significance of categorical predictor variables in nonparametric regression models. Econom Rev 25(4):523–544

    Article  Google Scholar 

  23. Schreiner M (2000) Credit scoring for microfinance: can it work? J Microfinance 2(2):105–118

    Google Scholar 

  24. Tukey J (1949) One degree of freedom for non-additivity. Biometrics 5(3):232–242

    Article  Google Scholar 

Download references


We would like to thank Qi Li, Carlos Martins-Filho, Robert Kunst, and two anonymous referees for their valuable comments. We also thank Christopher Marciniak for his valuable research assistance.

Author information



Corresponding author

Correspondence to Manuel A. Hernandez.



Table 3 Description of variables
Table 4 Summary statistics
Table 5 Modeling the probability of default (dependent variable equal to one if client defaulted, zero otherwise)
Table 6 Predictive performance of different nonparametric regression models using loan data from SMEs in rural Peru
Table 7 Specification error test
Fig. 4

In-sample predictive performance of alternative binary choice models using simulated data with varying measurement error. a MSPE. b Predictive performance. Note The corresponding regressors are contaminated with an additive random error generated from a Normal distribution with mean zero and increasing standard deviations (1, 2, 3, 4, 5, 10, and 20). The predictive performance measure follows McFadden et al. (1977); the measure is equal to \(p_{11} +p_{22} -p_{12}^2 -p_{21}^2 \)where \(p_{ij} \) is the ijth entry in the standard 2 \(\times \) 2 confusion matrix of actual versus predicted (0,1) outcomes (using the standard 0.5 rule) in which the entries are expressed as a fraction of the sum of all entries. The single index results are based on Klein and Spady (1993) estimator using a Gaussian kernel function of order two. The nonparametric results follow a local linear least-squares procedure using also a Gaussian kernel type

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Hernandez, M.A., Torero, M. Parametric versus nonparametric methods in risk scoring: an application to microcredit. Empir Econ 46, 1057–1079 (2014).

Download citation


  • Risk scoring
  • Microcredit
  • Default models
  • Nonparametric methods

JEL Classification

  • C14
  • O16
  • G17