Modeling dependent credit rating transitions: a comparison of coupling schemes and empirical evidence

Three coupling schemes for generating dependent credit rating transitions are compared and empirically tested. Their distributions, the corresponding variances and default correlations are characterized. Using Standard and Poor’s data for OECD countries, parameters of the models are estimated by the maximum likelihood method and MATLAB optimization software. Two pools of debtors are considered: with 5 and with 12 industry sectors. They are classified into two non-default credit classes. First portfolio mimics the Dow Jones iTraxx EUR market index. The default correlations evaluated for 12 industry sectors are confronted with their counterparts known for the US economy.


Introduction
Credit risk analysis requires modeling of dependent defaults. A classical approach, due to Merton (1974), employed a stochastic process to describe the (latent) value of a firm. A default event here is triggered by breaching some specified threshold. Termed as structural models, they treat credit risk correlation between two debtors as the correlation between the respective stochastic processes determining values of the firms. For example, within the CreditMetrics approach, see Gupton et al. (1997), dependent defaults of several firms are modeled by using a multivariate Gaussian distribution.
A more realistic and technically sophisticated setting for generating dependent defaults, so-called reduced form models, allows the default probability to depend on several economic factors. Some of them are latent while the others may be observable. The total risk is typically decomposed into an idiosyncratic part and a common component. The latter is often interpreted as a systemic factor. The relative strength of the components and, consequently, correlations between assets are parameterized by deterministic weights. Different types of copulae are used. A variety of distributions have been considered. There are models formulated in discrete as well as in continuous time. For particular examples, see among others Li (2000), Jarrow and Yu (2001), Hull and White (2001), Bangia et al. (2002), Lando (2004), Hull and White (2004), McNeil and Wendin (2007), Stefanescu et al. (2009), Choroś-Tomczyk et al. (2013. Frey and McNeil (2003) analyze and classify the existing approaches to generating dependent defaults.
Within the CreditMetrics approach, "where migration analysis is a corner stone, that is, the study of changes in the credit quality of names through time" (see Gupton et al. 1997, page iv), a (discrete-time) Markovian transition matrix is estimated. It governs the evolution of a representative debtor through credit classes.
While credit risk models concentrate, typically, on dependent defaults, studies on systemic risk attempt to analyze the events that precede a default. See Upper (2011) for a comprehensive analysis of simulation methods in systemic risk analysis. In other words, the whole interdependent migration process of the debtors has to be considered.
The Markovian property of the credit rating migration process, its time-homogeneity and the discrete-time setting have been criticized and several refinements of it have been suggested. See Altman (1998), Bangia et al. (2002), Lando and Skødeberg (2002), Frydman and Schuermann (2008), Korolkiewicz and Elliott (2008), Stefanescu et al. (2009), Xing et al. (2012) among others. Conceptually, dependence of transition probabilities upon macroeconomic factors has been introduced and the corresponding models have been empirically tested. In technical terms, these refinements relay on hidden Markov models and they employ a variety of estimation techniques. In order to render models of the migration process more realistic, continuous time settings have been introduced and estimated.
Taking a credit rating Markovian transition matrix as a marginal distribution, a joint distribution of the whole pool of debtors can be obtained by a coupling scheme. This possibility of introducing dependence among credit rating migrations of the debtors constituting a portfolio is analyzed in Kaniovski and Pflug (2007) and in Wozabal and Hochreiter (2012).
In both cases, transition probabilities are modified according to binary unobserved tendency variables, that can be interpreted in a context of business cycles. Every migration is governed by an idiosyncratic term and a common component. Unlike in the case of a reduced form model, the weights that determine the relative strength of the components are random. A tendency variable affects the distribution of the corresponding common component in the following way. For a credit class, conditional on "favorable" realizations of the corresponding tendency variables, migrations towards a better credit quality become more likely, whereas worsening of the credit quality will be less probable.
In Kaniovski and Pflug (2007), the common component remains the same for all debtors belonging to a credit class irrespective of their industry sectors. Wozabal and Hochreiter (2012) introduced an alternative coupling scheme. It implies much weaker dependence among the debtors. In their case, conditional on realizations of the corresponding tendency variables, the common tendencies affecting a pool of debtors characterized by a combination of a credit class and an industry sector are identically distributed and independent.
In what follows next, the model by Kaniovski and Pflug (2007) is referred to as Scheme 1, Index 2 is assigned to its modification by Wozabal and Hochreiter (2012) and the coupling techniques introduced here is labeled by 3.
The distributions corresponding to these three coupling techniques are compared. It is shown that variances of the number of defaults and correlations of credit events are the largest for first scheme whereas they are the smallest for second one. Consequently, the coupling scheme suggested here takes an intermediate position regarding the known techniques.
While for one-year correlations of credit events there are explicit formulas, estimating multi-year credit events' correlations bootstrapping has to be used. In the latter case, repeated Monte-Carlo runs of the model generate transition sample paths, that are treated then by a standard statistical algorithm for sample correlation.
Using a Standard and Poor's (S&P's) data set, parameters of these coupling models are estimated. There are two portfolios considered: with 5 and with 12 industry sectors. The debtors are classified into two non-default credit classes.
The maximum likelihood estimates are obtained by MATLAB optimization software: Interior Point algorithm (IP) and Sequential Quadratic Programming (SQP) method.

Coupling schemes
Consider a portfolio containing debtors that are non-homogeneous in their credit ratings and industry sectors. Let there be M ≥ 2 non-default credit classes. Numbering them in a descending order, we assign 1 to the most secure assets, while the next to default credit class is indexed by M. Defaulted firms receive the index M + 1. There are S ≥ 1 industry sectors. Following the CreditMetrics approach, see Gupton et al. (1997), it is assumed that credit rating migrations are governed by an M × (M + 1) Markovian transition matrix P with elements p i, j . That is, p i, j stands the probability of a transition within one year, from ith credit rating to jth. Since M + 1 is an absorb-ing state of the corresponding Markov chain, p M+1,i = I {i=M+1} . Here I {A} denotes the indicator function of a statement A, The credit rating migrations occur at times t = 1, 2, . . .. Set N k,i (t) for the number of debtors from industry sector k in credit class i at time t. At the beginning there are The coupling techniques generate counts N k,i (t), t > 1 such that: • the evolutions of debtors through credit classes are dependent; • the corresponding random process of credit rating transitions is time homogeneous and every individual migration is governed by the same Markovian transition matrix P. Assign a number n = 1, 2, . . . , N (1) to every debtor in the portfolio at time t = 1. Set X n (t) for the credit rating at time t ≥ 1 of the firm numbered by n. Then X n (t) is a discrete-time Markov chain with M + 1 states. Its transient states are 1, 2, . . . , M.
Denote by s(n) the industry sector of firm n. The rating randomly changes in time, becoming X n (2) at time t = 2, while the assignment to the sector s(n) remains constant. The evolution of the whole portfolio is captured by a multi-dimensional random process X (t) = (X 1 (t), X 2 (t), . . . , X N (1) (t)) whose components are identically distributed and dependent. Let us look at a transition from time t = 1 to time t = 2.
Here η n stands for a common component in the transition from X n (1) to X n (2). It introduces a dependence mechanism among X n (2). Random variables {ξ n }, {η n } and {δ n } are independent. Since all debtors are assumed to be governed by the same Markovian transition matrix, Random variables δ n are independent in n and P{δ n = 1} = q X n (1),s(n) . for all χ ∈ {0, 1} M . The distribution π(·) is given as an input parameter for the simulation and may be determined by estimation from observed data.
The common component has the following structure. When χ i = 1, all of the random variables η n , such that X n (1) = i, cannot assume values larger than i. If the credit class transitions of every debtor belonging to credit class i were governed exclusively by the corresponding η n , this would mean that the credit rating of such debtors may not worsen. For this reason, the situation when χ i = 1 is termed as a non-deteriorating tendency. In the same way, χ i = 0 implies that all of the random variables {η n }, such that X n (1) = i, take on exclusively the values exceeding i. This is a deterioration (of their credit ratings).
There are three possibilities for the dependence mechanism. For Scheme 2, Wozabal and Hochreiter (2012) assume that, conditionally on , η n are independent in n. A unique common component governs all debtors belonging a credit class and these random variables are conditionally on independent for different credit classes in Scheme 1, see Kaniovski and Pflug (2007). (More formally: random variables η n and η l are conditionally on independent for X n (1) = X l (1), while η n = η l for X n (1) = X l (1).) Here an intermediate variant is introduced. For Scheme 3 it is assumed that all debtors which belong to the same combination of credit rating and industry sector are affected by the same common component and these random variables are independent for different combinations. (In short: η n = η l if X n (1) = X l (1) and s(n) = s(l), otherwise random variables η n and η l are conditionally on independent.) The conditional distribution of η n is defined as follows: where conditional probabilities p i, j (·) read: Here Counts N k,i (2) at time t = 2 are obtained by the following formula: Denote by D k,i (2) the number of debtors from industry sector k defaulted at time 2 that had credit rating i at time 1. Then Since X (t) is a time-homogeneous random process, X (t) (as well as the corresponding counts N k,i (t) and D k,i (t)) can be defined analogously for t ≥ 3. We summarize the three models in Table 1. Model 3 This paper Identical, if n and l belong to the same rating class and the same industry sector, otherwise independent

Input parameters
In order to run the model, the following inputs are required: • a M × (M + 1) Markovian transition matrix P; • a distribution π(·) of the tendency vector; • a M × S matrix Q whose entries q i,s are probabilities of success of Bernoulli random variables {δ n }. Since P and π(·) are related. However these M relations are not sufficient neither to identify a Markovian matrix P given a distribution π(·) nor for finding a π(·) given a P. See Bahadur (1961) for an exhaustive characterization of distributions on binary strings. Given a Markovian matrix P and a M × M matrix of correlation coefficients , Kaniovski and Pflug (2007) introduced a quadratic optimization problem in order to find a distribution π(·). Note that only for M = 2 there is an explicit formula for π(·), because Given migration counts, Wozabal and Hochreiter (2012), employing a heuristic global optimization technique, identify π(·) for a given P by the maximum likelihood method.
Since rating agencies report their (annual) Markovian transition matrices, conventionally a transition matrix P is assumed to be known and all estimation efforts concentrate on finding Q and π(·) as factors determining dependencies among components of a portfolio.

Distributions of defaults
Let us denote by − − → Mul(N , p 1 , . . . , p k ) a multinomial distribution with probabilities of success p i and number of trials N as well as a k-dimensional random vector with this distribution. At time t = 2, debtors are allocated to credit classes according to a randomization of the following distributions: The corresponding weights are π( χ).
In order to compare variances of these randomized distributions, observe that the contributions due to debtors with credit rating i to the variances of j-th coordinate are related as follows: where i = 1, . . . , M, j = 1, . . . , M + 1. Consequently, Scheme 2 (1) implies the smallest (largest) variances.

Likelihood functions and optimization problems
The likelihood function for Scheme 2 is given in Wozabal and Hochreiter (2012) by Time instants from t = 1 through t = T correspond to the period of observation. I t (s, m 1 , m 2 ) denotes the number of companies in sector s that have moved from credit class m 1 to credit class m 2 in period t. Containing no unknowns, the multiplier I cannot affect the outcome of maximization the likelihood function. It is ignored in the calculations reported next. By a similar argument that is sketched in "Appendix", likelihood functions for models 1 and 3 are obtained as I × L 1 (π(·), Q), respectively. Here The components of Q and π(·) belong to [0, 1]. There are linear constrains: The first one states that the values π(·) form a probability distribution, while the remaining equalities are relations (1). Conceptually they mean that ith coordinate of a feasible tendency vector takes value 1 with probability p + i .

Input data
Using a S&P's data set covering 10,413 firms from 30 OECD countries for T = 16 years, from 1991 through 2006, two cases, with S = 5 and with S = 12 industry sectors, are analyzed. There are M = 2 non-default credit classes: investment grade and non-investment grade debtors. The investment grade debtors are characterized by S&P's ratings from A A A to B B B, while the non-investment grade ones occupy ratings from B B downward. An investment grade debtor, a non-investment grade one and a defaulted debtor are indexed by 1, 2 and 3, respectively. The first pool of debtors mimics the portfolio generating the Dow Jones iTraxx EUR market index. It comprises a part of the data set represented by debtors belonging to the following industry sectors: 1 -auto and industrial 2 -consumer 3 -energy with utilities 4 -finance and insurance 5 -telecommunications, media and technology.
The second pool contains all debtors of the data set, classified into the following industries: 1 -aero, auto, capital goods, metal 2 -consumer, service 3 -energy, natural resources 4 -financial institutions 5 -forest and building products, homebuilders 6 -health care, chemicals 7 -high technology, computers, office equipment 8 -insurance, real estate 9 -leisure time, media 10 -telecommunications 11 -transportation 12 -utilities.
The same list of twelve industry sectors was analyzed by Nagpal and Bahar (2001), who dealt with a S&P's data set covering American firms for the period from 1991 through 1999. They reported one-, five-and seven-year default correlations and suggested practical applications to credit risk analysis based on them. Using a traditional statistical technique, these authors encountered a natural pitfall: "too few defaults (seven) to draw any meaningful conclusions" in sector of telecommunications. See Nagpal and Bahar (2001), p. 94. Their results serve as a benchmark for the estimates of credit event correlations suggested here.

Estimates and their interpretation
Applying time averages, the following Markovian matrix is obtained: The values in parentheses are standard deviations of the respective probabilities. Logarithms of L i (π(·), Q) have to be maximized in a unit hypercube subject to constraints (2) and (3). According to Allman et al. (2009), statistical estimation problems of this kind have typically multiple solutions. Given this, a variety of methods and initial approximations have to be tried, including the use of a solution obtained by one of the methods as a starting point for the rest.
Unlike Wozabal and Hochreiter (2012), who maximized L 2 (π(·), Q) by a heuristic global optimization method tailored for this case, here standard constrained optimization programs of MATLAB are used. The package contains two suitable methods: IP and SQP algorithms. In all cases the optimal values and the corresponding solutions were identical for both algorithms. Each time a maximum point was found in a couple of seconds. The gradient and the Hessian matrix were estimated numerically. In order to find a solution, the SQP method required some 30 % less iterations than the IP algorithm. This is consistent with what is reported in the literature on constrained optimization. See, for example, Nocedal and Wright (2006). Given an initial point, the (local) maximum value found by the SQP algorithm, was at least as good as the solution of the IP algorithm. That is, typically a maximum point reported below was "discovered" by the SQP algorithm and then "confirmed" by the IP method.
Also c 1,2 = 0.9727, where c 1,2 stands for Corr ( 1 , 2 ). Probabilities q Since for two Bernoulli random variables lack of correlation is equivalent to independence, small in absolute value c 1,2 and c 1,2 mean that coordinates of (1) and (3) are almost independent. In other words, hidden tendencies governing investment grade debtors depend very weakly on the corresponding tendencies for non-investment grade debtors. The sign minus may indicate a mismatch among the trends.
Turning to matrices Q (i) , note that the larger q X n (1),s(n) is, the weaker will be the impact of the common tendency on the evolution of debtor n. Investment grade debtors appear to be affected almost exclusively by idiosyncratic factors. Second and third schemes seem to imply a stronger dependence on common factors than first one. This conclusion does not contradict to the claim that Scheme 1 (2) generates the strongest (weakest) dependence pattern for a fixed set of parameters. In fact, here distributions π (i) (·) are different for all cases and this prevents a comparison of matrices Q (i) . Moreover, the reported estimates represent a "reaction" of the corresponding model to the actually observed counts. That is, if distributions π (i) (·) were the same for all schemes, in order to reproduce a given dependence pattern, Scheme 1 could have required a weaker impact of common components and, consequently, larger entries of matrix Q (1) than Schemes 2 and 3.
The quality of the estimates for π(·) and Q depends upon the following three factors. First, entries of P can be evaluated with errors. Their magnitude depends upon numbers of migrations in the data set. In fact, some of the standard deviations quoted above do not look negligible in comparison with the corresponding transition probabilities. Facing a sample of counts, one has to ignore this factor taking matrix P as a given input. Second, it is not guaranteed that the numerical methods, if even both of them arrive at the same result, find a global maximum point. Third, sixteen years of observation may be insufficient for achieving a good precision even if the (global) maximum points were found correctly. In fact, the (sample) likelihood function in hand is based on a finite sample of counts I t (s, m 1 , m 2 ). Consequently, there can be a bias between the true parameter values and the numerically found maximum points.
For the above transition matrix P, simulations have been run for different values of Q and π(·), each time 100 trials for 5 industry sectors and 16 years. At the beginning of every time instant t in each credit class i and in each industry sector k there were N k,i (t) = 100 debtors. In other words, new firms were added into the portfolio in order to substitute defaulted ones. In every run both of the optimization algorithms were used to improve each other. For schemes in hand, the deviation between the estimates and the true values was approximately 0.1. The bias appears to be attributable to a finite sample size rather than to an error in finding a global maximum point of a (sample) likelihood function. Carreira-Perpiñán and Renals (2000) in a similar numerical study demonstrate that the known theoretical complexity of such statistical settings does not prevent from successful applications of them.
In the case of Scheme 3 the same distribution π(·) came out, while Q (3) reads: As compared with the case of five industries, only for Scheme 3, distributions π (3) (·) differ profoundly. In fact, a slight mismatch transforms into nearly perfect synchronicity of trends.
Matrices Q (i) estimated here give rise to a wider scope of conceptual interpretations than those in the situation with five industries. In general, the contribution of common tendencies seems to be stronger. Because the corresponding distributions π(·) coincide, matrices Q (2) and Q (3) can be compared.
Since the observed dependence pattern, as characterized by the transition counts, is identical for both schemes, intuitively the entries of Q (2) should not exceed their counterparts of Q (3) . (Remember, an identical parametrization implies a stronger, as compared with Scheme 2, dependence for Scheme 3 only between debtors belonging to the same credit class and the same industry sector. Then, in order to produce the observed "strength" of dependence, one would expect the corresponding elements of Q (2) to be smaller than their analogs of Q (3) .) Among 24 entries of Q (3) , 6 or 25 % are not consistent with this intuition. Considering the two credit classes separately, reveals that the situation is better, 2 out 12 against 4 out 12 values, respectively 17 and 33 %, in the case of non-investment grade debtors. However, interpreting these values, one has to take into account that they are obtained numerically, using a procedure, where precision of the estimates cannot be guaranteed. Among the most evident factors affecting the final result are different numbers of counts for different industries. In particular, counts in industry sector 8 exceed 40 times counts in sector 10.
Whole economy. Since Nagpal and Bahar (2001) analyze default correlations for the whole economy, in order to obtain counterparts of their estimates, π(·) and Q characterizing the whole data set were found. That is, here S = 1 and the required counts obtain by summing up the corresponding numbers of transitions over all industries. Scheme 2: 1,2 = −0.0267.
Since the distributions corresponding to the three schemes are different, it is not possible to decide which of them fits the best to the data set. However, assuming that one of them is the true distribution, the likelihood ratio can be used in order to rank the remaining two according to their similarity to the true one.
For estimates π (i) (·) and Q (i) given above set These values for five and twelve industry sectors as well as for the whole economy, separated by a slash, are given in the following table (Table 2): Intuitively, considering a true statistical model and its alternatives, the smallest likelihood ratio, or, equivalently, its logarithm, can indicate the most similar of them to the true one. In particular, if Scheme 1 is the correct model, Scheme 3 fits data in hand better than Scheme 2. In the same way, considering Scheme 2 as the true model, we see that Scheme 3 would be more suitable than Scheme 1. Finally, if Scheme 3 were the correct model, Scheme 1 would be preferred to Scheme 2. This informal argument based on likelihood ratios shows, that, once again, Scheme 3 takes an intermediate position between Schemes 1 and 2. For debtors numbered by n and r ,

Correlations of credit events
if the following relations hold true: Note that Schemes 1-3 imply that, respectively: For fixed P, Q and π(·), these relations imply that ρ k,l i, j (1) coincide for the three coupling schemes as long as debtors have different credit ratings, that is, i = j. If these correlations are not equal to zero, then they will be positive of negative depending upon the sign of If debtors have the same credit rating, then all ρ k,l i,i (1) are non-negative: the largest for Scheme 1 and the smallest for Scheme 2. Scheme 3 is characterized by intermediate values: if debtors are from different industry sectors, default correlations are identical to those for Scheme 2, while if debtors belong to the same industry sector, the correlations coincide with their counterparts for Scheme 1. One year Five years Seven years ρ 1,1 (1) ρ 1,2 (1) ρ 2,2 (1) ρ 1,1 (1) ρ 1,2 (1) ρ 2,2 (1) ρ 1,1 (5) ρ 1,2 (5) ρ 2,2 (5) ρ 1,1 (7) ρ 1,2 (7) ρ 2,2 (7) Turning to the case in hand, note that whenever an entry of Q equals 1, all default correlations involving the corresponding debtors will be 0. Moreover, if π(0, 0) = 0, as it is the case for first coupling scheme, then Having a triple P, Q and π(·) and substituting these inputs in relations (4), oneyear default correlations can be found, while multi-year events correlations can be estimated by a traditional statistical technique based on repeated runs of the model, in other words, Monte-Carlo simulations. Note that the actual iTraxx portfolio contains only investment grade titles.
Default correlations corresponding to the triples estimated for coupling Scheme 3 are summarized in the next three tables. The columns One year, formula contain the values obtained by formula (4), while the other correlations are estimated using averages based on 100000 independent observations of the respective random variables (Tables 3, 4, 5).
The tables demonstrate, that analytically evaluated default correlations follow very well their sample counterparts, as long as these values are not too small. In particular, this is the case for non-investment grade debtors. The poor match for the correlations close to 0 is caused by the multiplier 1 − q i,k . It makes the correlations evaluated according to formula (4) equal to 0, if q i,k is sufficiently close to 1.  (2007)) the variances and the correlations were the smallest (largest), the scheme suggested in this paper takes an intermediate position. Using real data concerning OSCD countries, parameters of the models were estimated by standard optimization methods available in MATLAB for two portfolios. One of them mimics the Dow Jones iTraxx EUR market index. The other one, covering the same industry sectors as a study of Nagpal and Bahar (2001), who analyzed American firms, allows a quantitative comparison of the corresponding dependence patters in these two economic environments. A bootstrap procedure was suggested in order to estimate correlations of credit events. The corresponding Monte-Carlo estimates match their counterparts obtained analytically.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. On the other hand, each of I t (s, m 1 , i) debtors in industry sector s, migrating to rating i, is driven either by the common or by the idiosyncratic component. The corresponding events occur with probabilities 1 − q m 1 ,s or q m 1 ,s p m 1 ,i , respectively. Since these migrations are independent in s, all transitions from m 1 to i occur with probability S s=1 (q m 1 ,s p m 1 ,i + 1 − q m 1 ,s ) I t (s,m 1 ,i) .
By these two observations, probability of all transitions starting at m 1 reads Since, given a realization χ , common components are independent in m 1 , the corresponding terms have to be multiplied. Then the whole evolution at time t takes place with probability π( χ) In the case of Scheme 3, common components are independent in s and i. Therefore the products over all industry sectors and over all non-default credit classes come to exist. For industry s, the sum in m 2 corresponds to mutually exclusive events A m 2 = {The respective common component assumes the value m 2 .} Conditional on A m 2 , credit rating m 2 is reachable either, with probability q m 1 ,s p m 1 ,m 2 , through an idiosyncratic move, or, with probability 1 − q m 1 ,s , through the common component. All other credit ratings, j = m 2 , are reachable in this case only by idiosyncratic moves and the corresponding probabilities are (q m 1 ,s p m 1 , j ) I t (s,m 1 , j) . By independence, multiplying the respective probabilities, the following term is obtained π( χ)