Skip to main content
Log in

Hurdle models of repayment behaviour in personal loan contracts

  • Published:
Empirical Economics Aims and scope Submit manuscript

Abstract

This paper proposes a hurdle model of repayment behaviour in loans with fixed instalments. Using information on previous and current contracts, the approach yields a model of customer behaviour, useful, for example, in assessing the impact of determinants of default, a natural concern for credit and behavioural scoring. Under plausible assumptions, a debtor in each period faces a number of missed payments, which depends on his previous repayment decisions; meanwhile, as most debtors are expected to meet financial obligations, the number of missed payments is bound to display excess zeros, with reference to a single-part law. Each sequence of missed payments is modelled by using the binomial thinning, a conceptual tool that allows for dependence between integers by defining the support of consecutive counts. Under suitable assumptions on heterogeneity, the model can be produced under a random effects approach, leading to a two-part panel data model, estimable by quasi-maximum likelihood. The proposed approach is illustrated using a panel data set on personal loans granted by a Portuguese bank.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. One alternative route to allow for dependence among discrete variables consists on assuming that these variables share common, often time-dependent, unobservable features. Such is the case of hidden Markov models, which is described and illustrated in detail in MacDonald and Zucchini (1997).

  2. Binomial thinning constitutes the most popular form of thinning, a general probabilistic operation that can be applied to random counts—see Weiss (2008) for a survey of thinning operations.

  3. These vectors may obviously include common covariates (indeed, they can be identical in view of the functional separability of the hurdle model—check (12).

  4. Henceforth, unless required, the individual index, i, is omitted.

  5. A likelihood ratio-type statistic is not valid because of a very probably incorrect variance assumption as well as neglected time dependence implied by the adopted likelihood.

  6. In practice, the sample estimate for the covariance matrix to be used with the Hausman test may not be invertible, nor positive semi-definite, even if invertible—see Lee (1996, Ch. 5.9) for an estimator of the covariance matrix of \(\hat{\varvec{\theta }}-\hat{\varvec{\theta }}_{ML}\) that is positive semi-definite, by construction.

  7. These observed differences suggest a deeper investigation into the small sample properties of both approaches in a wider range of situations—in any event, a hint that seems to reach beyond the scope of the present text, possibly justifying a separate note on his own.

  8. No convergence was achieved with the NLS method, which prevented the presentation of the corresponding estimates. This setback notwithstanding, a sound alternative method is provided by pooled QML.

References

  • Adke S, Gadag V (1995) A new class of branching processes. In: Heyde CC (ed) Branching processes. Springer, New York, pp 90–105

    Chapter  Google Scholar 

  • Al-Osh MA, Alzaid AA (1987) First-order integer valued autoregressive INAR(1) process. J Time Ser Anal 8:261–275

    Article  Google Scholar 

  • Brännäs K (1994) Estimation and testing in integer valued AR(1) models. Umeå Economic Studies 355. University of Umeå

  • Brännäs K (1995) Explanatory variables in the AR(1) model. Umeå Economic Studies 381. University of Umeå

  • Cameron AC, Trivedi PK (2005) Microeconometrics methods and applications. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Cameron AC, Trivedi PK (2013) Regression analysis of count data, 2nd edn. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Chamberlain G (1980) Analysis of covariance with qualitative data. Rev Econ Stud 47:225–238

    Article  Google Scholar 

  • Chamberlain G (1985) Heterogeneity, omitted variable bias, and duration dependence. In: Heckman JJ, Singer B (eds) Longitudinal analysis of labor market data. Cambridge University Press, Cambridge, pp 3–38

    Chapter  Google Scholar 

  • Cragg JG (1971) Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 39:829–844

    Article  Google Scholar 

  • Freeland RK, McCabe B (2004) Analysis of low count time series data by Poisson autoregression. J Time Ser Anal 25:701–722

    Article  Google Scholar 

  • Gouriéroux C, Monfort A, Trognon A (1984) Pseudo maximum likelihood methods: theory. Econometrica 52:681–700

    Article  Google Scholar 

  • Hall BH, Cummins C (2005) TSP 5.0 user’s guide. TSP International, Palo Alto (CA)

  • Hausman J, Hall BH, Grilishes Z (1984) Econometric models for count data with an application to the patents–R&D relationship. Econometrica 52:909–938

    Article  Google Scholar 

  • Heckman JJ, Willis RJ (1977) A beta-logistic model for the analysis of sequential labor force participation by married women. J Polit Econ 85:27–58

    Article  Google Scholar 

  • Honoré B, Kyriazidou E (2000) Panel data discrete choice models with lagged dependent variables. Econometrica 68(839):874

    Google Scholar 

  • Jazi MA, Jones G, Lai C-D (2012) First-order integer valued AR processes with zero inflated Poisson innovations. J Time Ser Anal 33:954–963

    Article  Google Scholar 

  • Joe H (1997) Multivariate models and dependence concepts. Chapman & Hall, London

    Book  Google Scholar 

  • Jung RC, Kukuk M, Liesenfeld R (2006) Time series of count data: modelling, estimation and diagnostics. Comput Stat Data Anal 51:2350–2364

    Article  Google Scholar 

  • Jung RC, Ronning G, Tremayne AR (2005) Estimation in conditional first order autoregression with discrete support. Stat Pap 46(2):195–224

    Article  Google Scholar 

  • Lee M-J (1996) Methods of moments and semiparametric econometrics for limited dependent variable models. Springer, New York

    Book  Google Scholar 

  • MacDonald IL, Zucchini W (1997) Hidden Markov and other models for discrete-valued time series. Chapman & Hall, London

    Google Scholar 

  • McKenzie E (1985) Some simple models for discrete variate time series. Water Resour Bull 21:645–650

    Article  Google Scholar 

  • McKenzie E (1988) Some ARMA models for dependent sequences of Poisson counts. Adv Appl Prob 22:822–835

    Article  Google Scholar 

  • McKenzie E (2003) Discrete variate time series, stochastic processes: modelling and simulation. In: Shanbag DN, Rao CR (eds) Handbook of statistics, vol 21. North-Holland, Amsterdam, pp 573–606

    Google Scholar 

  • Mullahy J (1986) Specification and testing of some modified count data models. J Econom 33:341–365

    Article  Google Scholar 

  • Pagan A, Vella F (1989) Diagnostic tests for models based on individual data: a survey. J Appl Econom 4:29–59

    Article  Google Scholar 

  • Ramalho EA, Ramalho JJS, Murteira J (2011) Alternative estimating and testing empirical strategies for fractional regression models. J Econ Surv 25:19–68

    Article  Google Scholar 

  • Ramsey JB (1969) Tests for specification errors in classical linear least squares regression analysis. J R Stat Soc B 31:350–371

    Google Scholar 

  • Santos Silva JMC, Murteira J (2009) Estimation of default probabilities with incomplete contracts data. J Empir Finance 16:457–465

    Article  Google Scholar 

  • Schweer S, Weiß CH (2014) Compound Poisson INAR(1) processes: stochastic properties and testing for overdispersion. Comput Stat Data Anal 77:267–284

    Article  Google Scholar 

  • Stanghellini E (2009) Introduzione ai metodi statistici per il credit scoring. Springer, Milano

    Book  Google Scholar 

  • Steutel FW, VanHarn K (1979) Discrete analogues of self-decomposability and stability. Ann Probab 7:893–899

    Article  Google Scholar 

  • Sun J, Zhao X (2013) Statistical analysis of panel count data. Springer, New York

    Book  Google Scholar 

  • Thomas LC, Edelman DB, Crook JN (2002) Credit scoring and its applications. SIAM, Philadelphia

    Book  Google Scholar 

  • Weiss C (2008) Thinning operations for modelling time series of counts—a survey. Adv Stat Anal 92:319–341

    Article  Google Scholar 

  • Windmeijer F (2006) GMM for panel count data models. CeMMAP working papers CWP21/06, Centre for Microdata Methods and Practice, Institute for Fiscal Studies

  • Winkelmann R (2004) Health care reform and the number of doctor visits—an econometric analysis. J Appl Econom 19:455–472

    Article  Google Scholar 

  • Wooldridge J (1997) Multiplicative panel data models without the strict exogeneity assumption. Econom Theory 13:667–678

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José M. R. Murteira.

Additional information

The authors wish to thank helpful remarks on previous versions of the text by João Santos Silva, John Mullahy and Isabel Proença, two anonymous Referees and the Editor, as well as Participants at the 9th International Conference on Computational and Financial Econometrics (CFE 2015), and the 57th Meeting of the Euro Working Group for Commodities and Financial Modelling. The usual disclaimer applies. José Murteira gratefully acknowledges financial support from Fundação para a Ciência e a Tecnologia, through strategic grant PEst-OE/EGE/UI0491/2011.

Appendix

Appendix

This Appendix presents algebraic derivations of expressions for relevant moments of the marginal, joint and conditional distributions involved in the sequence \(\left( {y_t ,t=1,\ldots ,T} \right) \).

Section 2.2 Expressions of the conditional moment \(E( {y_t^k |y_{t-1} }),k\in {\varvec{\mathcal{N}}}\), and the first unconditional moment \(E({y_t })\).

Proof

Consider the sequence of conditional moment generating functions (mgf), of \(y_t \), given \(y_{t-1} \), which, under the model’s assumptions and the definition of the binomial thinning operator can be written as

$$\begin{aligned} M_{\left. t \right| t-1} (s)\equiv & {} E_d \left( {E\left( {\exp \left( {sy_t } \right) |y_{t-1} ,d_t } \right) } \right) \nonumber \\= & {} p\exp \left( {s\left( {y_{t-1} +1} \right) } \right) +\left( {1-p} \right) \left( {1-p_1 +p_1 \exp (s)} \right) ^{y_{t-1} }. \end{aligned}$$
(17)

By evaluating derivatives of \(M_{\left. t \right| t-1} \left( s \right) \) at \(s=0\) one can produce conditional moments of any order: \(M_{\left. t \right| t-1}^{(k)} \left( 0 \right) =E\left( {y_t^k |y_{t-1} } \right) ,k\in {\varvec{\mathcal{N}}}\). For \(k=1\) and \(k=2\), this yields, respectively, (4) and

$$\begin{aligned} E\left( {y_t^2 |y_{t-1} } \right)= & {} p+\left( {2p+p_1 \left( {1-p_1 } \right) \left( {1-p} \right) } \right) y_{t-1} \nonumber \\&+\left( {p+p_1^2 \left( {1-p} \right) } \right) y_{t-1}^2 . \end{aligned}$$
(18)

The unconditional first moment of \(y_t , E({y_t})\), can then be obtained by successively applying the law of iterated expectations to \(E\left( {y_t |y_{t-1} } \right) \); with the initial condition \(y_0 \equiv 0\),

$$\begin{aligned} E\left( {y_t } \right)= & {} E\left( {E\left( {y_t |y_{t-1} } \right) } \right) =p+rE\left( {y_{t-1} } \right) \\= & {} p+rE\left( {E\left( {y_{t-1} |y_{t-2} } \right) } \right) =p\left( {1+r} \right) +r^{2}E\left( {y_{t-2} } \right) =\ldots \\= & {} p\left( {1+r+r^{2}+\ldots +r^{t-1}} \right) +r^{t}E\left( {y_0 } \right) =p\left( {1+r+r^{2}+\ldots +r^{t-1}} \right) . \end{aligned}$$

Trivially, if \(p=1 (\Leftrightarrow r=1)\), then \(E\left( {y_t } \right) =t\), and if \(p=0\) then \(E({y_t })=0\). In both cases \(y_t \) is degenerate so \(V({y_t })=0\). If p and \(p_1 \) are both less than 1 (so \(r<1), E\left( {y_t } \right) =p\left( {1-r^{t}} \right) /\left( {1-r} \right) \). Aside from the trivial case \(p=0, E\left( {y_t } \right) \) varies with t, so the sequence \({\varvec{y}}\) is not stationary. \(\square \)

Section 2.2 Expression of COV\(({y_t ,y_{t-k} })\); autocorrelation function.

Proof

Through successive application of the law of iterated expectations to (4), one can write the first conditional moment of \(y_t \), given \(y_{t-k} \), as

$$\begin{aligned} E( {y_t |y_{t-k} } )=p( {1+r+r^{2}+\ldots +r^{k-1}} )+r^{k}y_{t-k} . \end{aligned}$$
(19)

From

$$\begin{aligned} {\hbox {COV}}( {y_t ,y_{t-k} } )= & {} E( {y_t ( {y_{t-k} -E( {y_{t-k} } )} )} )\\= & {} E( {E( {y_t |y_{t-k} } )( {y_{t-k} -E( {y_{t-k} } )} )} ) \end{aligned}$$

and from (19) it follows that

$$\begin{aligned} {\hbox {COV}}( {y_t ,y_{t-k} } )= & {} E( {( {p( {1+r+r^{2}+\ldots +r^{k-1}} )+r^{k}y_{t-k} } )( {y_{t-k} -E( {y_{t-k} } )} )} )\nonumber \\= & {} r^{k}E( {y_{t-k} ( {y_{t-k} -E( {y_{t-k} } )} )} )=r^{k}V( {y_{t-k} } ). \end{aligned}$$
(20)

In this expression \(V({y_t})=E( {y_t^2 } )-E( {y_t } )^{2}\) with \(E( {y_t^2 } )\) expressed as the solution to the difference equation (obtained from (18))

$$\begin{aligned} E( {y_t^2 } )= & {} E( {E( {y_t^2 |y_{t-1} } )} )\nonumber \\= & {} p+( {p+r( {1-p_1 } )} )E( {y_{t-1} } )+( {p+rp_1 } )E( {y_{t-1}^2 } ), \end{aligned}$$

with \(E( {y_{t-1} } )\) given in (5) and initial condition \(E( {y_1^2 } )=p\). The variance \(V( {y_t } )\) involves t so the autocorrelation function, CORR\(( {y_t ,y_{t-k} } )=r^{k}\sqrt{V( {y_{t-k} } )/V( {y_t } )}\), depends not only on only on lag (k) but also on t. \(\square \)

Section 2.2 General expression of \(E({y_t|{\varvec{x}}})\) with time-varying covariates—\({\varvec{x}}\equiv ( {{\varvec{x_1}} ,\ldots ,{\varvec{x_T}} } )\) does not include lags of the dependent variable.

Proof

Suppose that neither p or \(p_1 \) involve lags of the dependent variable. Then,

$$\begin{aligned} E( {y_1 |{\varvec{x}}_{\varvec{1}} } )= & {} p( {{\varvec{x_1}} } ),\\ E( {y_2 |{\varvec{x}}_{( \mathbf{2} )} } )= & {} E( {E( {y_2 |y_1 ,{\varvec{x}}_{( \mathbf{2} )} } )} )\\= & {} p( {{\varvec{x}}_\mathbf{2} } )+r( {{\varvec{x}}_\mathbf{2} } )E( {y_1 |{\varvec{x}}_\mathbf{1} } )=p( {{\varvec{x_2}} } )+r( {{\varvec{x_2}} } )p( {{\varvec{x_1}} } ),\\ E( {y_3 |{\varvec{x}}_{( \mathbf{3} )} } )= & {} E( {E( {y_3 |y_2 ,{\varvec{x}}_{( \mathbf{3} )} } )} )\\= & {} p({{\varvec{x}}_\mathbf{3} } )+ r( {{\varvec{x}}_\mathbf{3}} )E( {y_2 |{\varvec{x}}_{(\mathbf{2})} } )\\= & {} p( {{\varvec{x_3}} } )+r( {{\varvec{x_3}} } )( {p( {{\varvec{x_2}}} )+r( {{\varvec{x_2}} } )p( {{\varvec{x_1}} } )})\\= & {} p( {{\varvec{x}}_{\varvec{3}} } )+p( {{\varvec{x}}_\mathbf{2} } )r( {{\varvec{x}}_\mathbf{3} } )+ p( {{\varvec{x}}_\mathbf{1} } )r( {{\varvec{x}}_\mathbf{2} } )r( {{\varvec{x_3}} } ). \end{aligned}$$

A mathematical induction argument ensures the general result

$$\begin{aligned} E\left( {y_t |{\varvec{x}}} \right) =\left\{ {{\begin{array}{ll} {p\left( {{\varvec{x_1}}} \right) ,} &{} {t=1,} \\ {p_t +\mathop \sum \nolimits _{j=1}^{t-1} p_j \mathop \prod \nolimits _{k=j+1}^t r_k ,} &{} {t\ge 2,} \\ \end{array} }} \right. \end{aligned}$$

which reduces to (5) when all covariates (therefore p and \(p_1 )\) are time invariant. \(\square \)

Section 3.4 Multiplicative unobserved effects.

Let \({\varvec{e}}_i \equiv \left( {e_{i1} ,\ldots ,e_{iT_i } } \right) \) and suppose that the mean of \(y_{it} \) is affected by a multiplicative time-invariant effect, that is, \(E\left( {y_{it} |{\varvec{e}}_{\varvec{i}} } \right) =e_i E\left( {y_{it} } \right) \) (for simplicity, time-invariant observable covariates are assumed so they are omitted). Hence, for \(t=1\),

$$\begin{aligned} E\left( {y_{i1} |{\varvec{e}}_{\varvec{i}}} \right) =e_i E\left( {y_{i1} } \right) =e_i\,Pr \left( {d_{i1} =1} \right) =e_i p. \end{aligned}$$

For \(t\ge 2\), conditionally on \(d_{it} =0\) and \(y_{i,t-1} , y_{it}\) follows a binomial p.f. with number of Bernoulli trials given by \(y_{i,t-1} \), so any unobserved effect will intervene in the probability of success \(p_1 \): denote this as \(p_1 \left( {e_{1it} } \right) \). Recalling (1), one can check that, for \(t=2\), the term \(e_{1i2} \) must verify the equation

$$\begin{aligned} E\left( {y_{i2} |e_i } \right)= & {} e_i E\left( {y_{i2} } \right) \Leftrightarrow \\ e_i p\left( {1+e_i p+p_1 \left( {e_{1i2} } \right) \left( {1-e_i p} \right) } \right)= & {} e_i p\left( {1+p+p_1 \left( {1-p} \right) } \right) \Leftrightarrow \\ e_i p+p_1 \left( {e_{1i2} } \right) \left( {1-e_i p} \right)= & {} p+p_1 \left( {1-p} \right) . \end{aligned}$$

This equation is formally different from those obtained for \(t>2\), so, for each t, the roots \(e_{1it} \) of the corresponding implicit equations vary with t. In words, an assumption of time-invariant heterogeneity affecting the conditional expectation of \(y_{it} \) rests on a time-varying unobservable affecting \(p_1 \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Murteira, J.M.R., Augusto, M.A.G. Hurdle models of repayment behaviour in personal loan contracts. Empir Econ 53, 641–667 (2017). https://doi.org/10.1007/s00181-016-1140-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00181-016-1140-2

Keywords

JEL Classification

Navigation