Skip to main content

Closed-form maximum likelihood estimator for generalized linear models in the case of categorical explanatory variables: application to insurance loss modeling


Generalized linear models with categorical explanatory variables are considered and parameters of the model are estimated by an exact maximum likelihood method. The existence of a sequence of maximum likelihood estimators is discussed and considerations on possible link functions are proposed. A focus is then given on two particular positive distributions: the Pareto 1 distribution and the shifted log-normal distributions. Finally, the approach is illustrated on an actuarial dataset to model insurance losses.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3


  • Albert A, Anderson JA (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71(1):1–10

    MathSciNet  MATH  Article  Google Scholar 

  • Beirlant J, Goegebeur Y (2003) Regression with response distributions of pareto-type. Comput Stat Data Anal 42(4):595–619

    MathSciNet  MATH  Article  Google Scholar 

  • Beirlant J, Goegebeur Y, Verlaak R, Vynckier P (1998) Burr regression and portfolio segmentation. Insur Math Econ 23(3):231–250

    MATH  Article  Google Scholar 

  • Beirlant J, Goegebeur Y, Teugels J, Segers J (2004) Statistics of extremes: theory and applications. Wiley, Hoboken

    MATH  Book  Google Scholar 

  • Bühlmann H, Gisler A (2006) A course in credibility theory and its applications. Springer, Berlin

    MATH  Google Scholar 

  • Chavez-Demoulin V, Embrechts P, Hofert M (2015) An extreme value approach for modeling operational risk losses depending on covariates. J Risk Insur 83(3):735–776

    Article  Google Scholar 

  • Davison A, Smith R (1990) Models for exceedances over high thresholds. J R Stat Soc Ser B 52(3):393–442

    MathSciNet  MATH  Google Scholar 

  • Fahrmeir L, Kaufmann H (1985) Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann Stat 13(1):342–368

    MathSciNet  MATH  Article  Google Scholar 

  • Fienberg SE (2007) The analysis of cross-classified categorical data, 2nd edn. Springer, Berlin

    MATH  Book  Google Scholar 

  • Goldburd M, Khare A, Tevet D (2016) Generalized linear models for insurance rating. CAS monograph series number 5. Casualty Actuarial Society, Arlington

    Google Scholar 

  • Haberman SJ (1974) Log-linear models for frequency tables with ordered classifications. Biometrics 30(4):589–600

    MathSciNet  MATH  Article  Google Scholar 

  • Hambuckers J, Heuchenne C, Lopez O (2016) A semiparametric model for generalized pareto regression based on a dimension reduction assumption. HAL.

  • Hogg RV, Klugman SA (1984) Loss distributions. Wiley, Hoboken

    Book  Google Scholar 

  • Johnson N, Kotz S, Balakrishnan N (2000) Continuous univariate distributions, vol 1, 2nd edn. Wiley, Hoboken

    MATH  Google Scholar 

  • Lehmann EL, Casella G (1998) Theory of point estimation, 2nd edn. Springer, Berlin

    MATH  Google Scholar 

  • Lipovetsky S (2015) Analytical closed-form solution for binary logit regression by categorical predictors. J Appl Stat 42(1):37–49

    MathSciNet  Article  Google Scholar 

  • McCullagh P, Nelder JA (1989) Generalized linear models, vol 37. CRC Press, Boca Raton

    MATH  Book  Google Scholar 

  • Nelder JA, Wedderburn RWM (1972) Generalized linear models. J R Stat Soc Ser A 135(3):370–384

    Article  Google Scholar 

  • Ohlsson E, Johansson B (2010) Non-life insurance pricing with generalized linear models. Springer, Berlin

    MATH  Book  Google Scholar 

  • Olver FWJ, Lozier DW, Boisvert RF, Clark CW (eds) (2010) NIST handbook of mathematical functions. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Ozkok E, Streftaris G, Waters HR, Wilkie AD (2012) Bayesian modelling of the time delay between diagnosis and settlement for critical illness insurance using a burr generalised-linear-type model. Insur Math Econ 50(2):266–279

    MathSciNet  MATH  Article  Google Scholar 

  • R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna.

  • Reiss R, Thomas M (2007) Statistical analysis of extreme values, 3rd edn. Birkhauser, Basel

    MATH  Google Scholar 

  • Rigby R, Stasinopoulos D (2005) Generalized additive models for location, scale and shape. Appl Stat 54(3):507–554

    MathSciNet  MATH  Google Scholar 

  • Smyth GK, Verbyla AP (1999) Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics 10(6):696–709

    Article  Google Scholar 

  • Silvapulle MJ (1981) On the existence of maximum likelihood estimators for the binomial response models. J R Stat Soc Ser B (Methodological) 43(3):310–313

    MathSciNet  MATH  Google Scholar 

  • Venables W, Ripley B (2002) Modern applied statistics with S. Springer, Berlin

    MATH  Book  Google Scholar 

Download references


This research benefited also from the support of the ‘Chair Risques Emergents ou atypiques en Assurance’, under the aegis of Fondation du Risque, a joint initiative by Le Mans University, Ecole Polytechnique and MMA company, member of Covea group. The authors thank Vanessa Desert for her active support during the writing of this paper. The authors are also very grateful for the useful suggestions of the two referees. This work is supported by the research project “PANORisk” and Région Pays de la Loire (France).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Christophe Dutang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Proofs of Sect. 3

Proof for the one-variable case

Proof of Theorem 3.1

We have to solve the system

$$\begin{aligned} \left\{ \begin{array}{ll}S(\varvec{\vartheta }) = 0\\ \varvec{R}\varvec{\vartheta }=0. \end{array}\right. \end{aligned}$$

The system \(S(\varvec{\vartheta })=0\) is

$$\begin{aligned} \left\{ \begin{array}{ll}\displaystyle \sum _{i=1}^n\ell '(\eta _i)\left( y_i - b'\circ \ell (\eta _i)\right) = 0\\ \displaystyle \sum _{i=1}^nx_i^{(2),j}\ell '(\eta _i)\left( y_i - b'\circ \ell (\eta _i)\right) = 0,\quad \forall j\in J. \end{array}\right. \end{aligned}$$

that is

$$\begin{aligned} \left\{ \begin{array}{ll}\displaystyle \sum _{j\in J}\ell '(\vartheta _{(1)}+\vartheta _{(2),j})\left( \sum _{i=1}^nx_i^{(2),j}y_i - m_jb'\circ \ell (\vartheta _{(1)} + \vartheta _{(2),j})\right) = 0\\ \ell '(\vartheta _{(1)}+\vartheta _{(2),j})\left( \displaystyle \sum _{i=1}^nx_i^{(2),j}y_i - m_jb'\circ \ell (\vartheta _{(1)} + \vartheta _{(2),j})\right) = 0,\quad \forall j\in J. \end{array}\right. \end{aligned}$$

The first equation in the previous system is redundancy, and

$$\begin{aligned} S(\varvec{\vartheta }) = 0 \Leftrightarrow \ell '(\vartheta _{(1)}+\vartheta _{(2),j})\left( \sum _{i=1}^nx_i^{(2),j}y_i - m_jb'\circ \ell (\vartheta _{(1)} + \vartheta _{(2),j})\right) = 0,\quad \forall j\in J. \end{aligned}$$

Hence if \(Y_i\) takes values in \(\mathbb {Y}\subset b'(\varLambda )\), and \(\ell \) injective, we have

$$\begin{aligned} \vartheta _{(1)}+\vartheta _{(j)} = g(\overline{Y}_n^{(j)})\quad \forall j\in J. \end{aligned}$$

The system (23) is

$$\begin{aligned} \left\{ \begin{array}{ll}\varvec{Q}\varvec{\vartheta }= \varvec{g({\bar{Y}})}\\ \varvec{R}\varvec{\vartheta }=0. \end{array}\right. \Leftrightarrow \left( \begin{array}{c} \varvec{Q} \\ \varvec{R}\end{array}\right) \varvec{\vartheta }=\left( \begin{array}{c}\varvec{g({\bar{Y}})}\\ 0\end{array}\right) . \end{aligned}$$

Let us compute the determinant of the matrix \(M_d = \left( \begin{array}{c} \varvec{Q} \\ \varvec{R}\end{array}\right) \). Consider \(\varvec{R} = (r_0,r_1,\ldots ,r_d)\). We have

$$\begin{aligned} M_d = \left( \begin{array}{c@{\quad }c} \varvec{1}_d &{} I_d \\ r_0 &{} \varvec{r} \end{array}\right) = \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 1 &{} 1 &{} 0 &{} \dots \\ 1 &{} 0 &{} 1 &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} \ddots &{} \ddots \\ 1 &{} 0 &{} \dots &{} 0 &{} 1 \\ r_0 &{} r_1 &{} &{} \dots &{} r_d \\ \end{array}\right) , \text { with } \varvec{r}= \left( \begin{array}{c@{\quad }cc} r_1 &{} \dots &{} r_d \\ \end{array}\right) , \varvec{1}_d = \left( \begin{array}{c} 1 \\ \vdots \\ 1 \end{array}\right) . \end{aligned}$$

The determinant can be computed recursively

$$\begin{aligned} \det (M_d) = r_d \left| \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} 1 &{} 1 &{} 0 &{} \dots \\ 1 &{} 0 &{} \ddots &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} 1 \\ 1 &{} 0 &{} \dots &{} 0 \\ \end{array}\right| - \left| \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} 1 &{} 1 &{} 0 &{} \dots \\ 1 &{} 0 &{} \ddots &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} 1 \\ r_0 &{} r_1 &{} \dots &{} r_{d-1} \\ \end{array}\right| = (-1)^{d+1} r_d - \det (M_{d-1}). \end{aligned}$$

Since \( \det (M_1) = -r_0+ r_1 \) and \( \det (M_2) = -r_2 -(-r_0 +r_1) = r_0 - r_1 -r _2, \) we get \(\det (M_d) = (-1)^d r_0+ (-1)^{d+1}(r_1+\dots + r_d) =(-1)^d( r_0 - r_1-\dots -r_d)\). This determinant is non zero as long as \(r_0 \ne \sum _{j=1}^d r_j\).

Now we compute the inverse of matrix \(M_d\) by a direct inversion.

$$\begin{aligned} \left( \begin{array}{c@{\quad }c} \varvec{1}_d &{} I_d \\ r_0 &{} \varvec{r} \end{array}\right) \left( \begin{array}{c@{\quad }c} \varvec{a}' &{} b \\ C &{} \varvec{d} \end{array}\right) = \left( \begin{array}{c@{\quad }c} I_d &{} \varvec{0} \\ \varvec{0}' &{} 1 \end{array}\right) \Leftrightarrow \left\{ \begin{array}{ll}\varvec{1}_d \varvec{a}' + I_d C = I_d \\ b \varvec{1}_d + I_d \varvec{d} = \varvec{0} \\ r_0 \varvec{a}' + \varvec{r} C = \varvec{0}' \\ b r_0 + \varvec{r} \varvec{d} = 1 \end{array}\right. \Leftrightarrow \left\{ \begin{array}{ll}C = I_d - \frac{1}{-r_0 + \varvec{r} \varvec{1}_d}\varvec{1}_d\varvec{r} \\ \varvec{d}= \frac{1}{-r_0+\varvec{r} \varvec{1}_d} \varvec{1}_d \\ \varvec{a}' = \frac{\varvec{r}}{-r_0 + \varvec{r} \varvec{1}_d} \\ b = \frac{-1}{-r_0+\varvec{r} \varvec{1}_d} \\ \end{array}\right. \end{aligned}$$

Let us check the inverse of \(M_d\)

$$\begin{aligned}&\left( \begin{array}{c@{\quad }c} \varvec{1}_d &{} I_d \\ r_0 &{} \varvec{r} \end{array}\right) \left( \begin{array}{c@{\quad }c} \frac{\varvec{r}}{-r_0 + \varvec{r} \varvec{1}_d} &{} \frac{-1}{-r_0+\varvec{r} \varvec{1}_d} \\ I_d - \frac{\varvec{1}_d\varvec{r}}{-r_0 + \varvec{r} \varvec{1}_d} &{} \frac{\varvec{1}_d}{-r_0+\varvec{r} \varvec{1}_d} \end{array}\right) \\&\qquad = \left( \begin{array}{c@{\quad }c} \frac{\varvec{1}_d \varvec{r}}{-r_0 + \varvec{r} \varvec{1}_d} + I_d - \frac{\varvec{1}_d\varvec{r} }{-r_0 + \varvec{r} \varvec{1}_d} &{} \frac{-\varvec{1}_d}{-r_0+\varvec{r} \varvec{1}_d} +\frac{\varvec{1}_d}{-r_0+\varvec{r} \varvec{1}_d}\\ r_0 \frac{\varvec{r}}{-r_0 + \varvec{r} \varvec{1}_d}+ \varvec{r} - \frac{\varvec{r}\varvec{1}_d\varvec{r}}{-r_0 + \varvec{r} \varvec{1}_d} &{} \frac{-r_0}{-r_0+\varvec{r} \varvec{1}_d}+ \frac{\varvec{r}\varvec{1}_d}{-r_0+\varvec{r} \varvec{1}_d} \end{array}\right) \\&\qquad = \left( \begin{array}{c@{\quad }c} I_d &{} 0 \\ 0 &{} 1 \end{array}\right) . \end{aligned}$$

So as long as \(r_0 \ne \sum _{j=1}^d r_j\)

$$\begin{aligned} \widehat{\varvec{\vartheta }}_n = \left( \begin{array}{c@{\quad }c} \frac{\varvec{r}}{-r_0 + \varvec{r} \varvec{1}_d} &{} \frac{-1}{-r_0+\varvec{r} \varvec{1}_d} \\ I_d - \frac{\varvec{1}_d\varvec{r}}{-r_0 + \varvec{r} \varvec{1}_d} &{} \frac{\varvec{1}_d}{-r_0+\varvec{r} \varvec{1}_d} \end{array}\right) \left( \begin{array}{c} \varvec{g}({\varvec{{\bar{Y}}}})\\ 0 \end{array}\right) = \left( \begin{array}{c} \frac{\varvec{r} \varvec{g({\bar{Y}})}}{-r_0 + \varvec{r} \varvec{1}_d} \\ {\varvec{g({\bar{Y}})}} - \varvec{1}_d\frac{\varvec{r} \varvec{g({\bar{Y}})}}{-r_0 + \varvec{r} \varvec{1}_d} \end{array}\right) . \end{aligned}$$

In an other way, the system (24) is equivalent to

$$\begin{aligned} (\varvec{Q}',\varvec{R}')\left( \begin{array}{c} \varvec{Q} \\ \varvec{R}\end{array}\right) \varvec{\vartheta }= \varvec{Q}' \varvec{g({\bar{Y}})}, \end{aligned}$$

and for \((\varvec{Q}\, \varvec{R})\) of full rank, the matrix \((\varvec{Q}'\varvec{Q} + \varvec{R}'\varvec{R})\) is invertible and \( {\varvec{\vartheta }} = (\varvec{Q}'\varvec{Q} + \varvec{R}'\varvec{R})^{-1}\varvec{Q}'\varvec{g({\bar{Y}})}. \)\(\square \)

Examples—Choice of the contrast vector \(\varvec{R}\)

  1. 1.

    Taking \(r_0=1, \varvec{r}=\varvec{0}\) leads to \( -r_0 + \varvec{r} \varvec{1}_d=-1 \Rightarrow \widehat{\varvec{\vartheta }}_n = \left( \begin{array}{c} 0 \\ \varvec{g({\bar{Y}})} \end{array}\right) . \)

  2. 2.

    Taking \(r_0=0, \varvec{r}=(1,\varvec{0})\) leads to

    $$\begin{aligned} -r_0 + \varvec{r} \varvec{1}_d=1 \Rightarrow \widehat{\varvec{\vartheta }}_n = \left( \begin{array}{c} g({\bar{Y}}_n^{(1)})\\ 0\\ g({\bar{Y}}_n^{(2)}) - g({\bar{Y}}_n^{(1)})\\ \vdots \\ g({\bar{Y}}_n^{(d)}) - g({\bar{Y}}_n^{(1)})) \end{array}\right) . \end{aligned}$$
  3. 3.

    Taking \(r_0=0, \varvec{r}=\varvec{1}\) leads to

    $$\begin{aligned} -r_0 + \varvec{r} \varvec{1}_d=d \Rightarrow \widehat{\varvec{\vartheta }}_n = \left( \begin{array}{c} \overline{\varvec{g({\bar{Y}})}}\\ g({\bar{Y}}_n^{(1)}) - \overline{\varvec{g({\bar{Y}})}}\\ \dots \\ g({\bar{Y}}_n^{(d)}) - \overline{\varvec{g({\bar{Y}})}} \end{array}\right) , \text { with } \overline{\varvec{g({\bar{Y}})}} = \dfrac{1}{d}\displaystyle \sum _{j=1}^dg(\overline{Y}_n^{(j)}). \end{aligned}$$

Proof of Remark 3.4

We have to solve the system

$$\begin{aligned} S(\vartheta ) = 0 \Leftrightarrow \displaystyle \sum _{i=1}^n\ell '(\eta )\left( y_i - b'\circ \ell (\eta )\right) = 0. \end{aligned}$$

If \(\ell \) is injective, the system simplifies to

$$\begin{aligned} \displaystyle \sum _{i=1}^n y_i - n b'\circ (b^\prime )^{-1}\circ g^{-1}(\eta ) = 0 \Leftrightarrow \eta = g\left( \begin{array}{c}{\overline{y}}_n\end{array}\right) \Leftrightarrow \theta = g\left( \begin{array}{c}{\overline{y}}_n\end{array}\right) . \end{aligned}$$

\(\square \)

Proof of Remark 3.5

Let \(Y_i\) from the exponential family \(F_{exp}(a,b,c,\lambda ,\phi )\). It is well known, that the moment generating function of \(Y_i\) is

$$\begin{aligned} \mathbf {E}e^{t Y_i} =\exp \left( \frac{b(\lambda +ta(\phi )) - b(\lambda )}{a(\phi )}\right) . \end{aligned}$$

Hence, the moment generating function of the average \({\overline{Y}}_m\) is

$$\begin{aligned} M_{{\overline{Y}}_m}(t) = \left( \exp \left( \frac{b(\lambda +\frac{t}{m} a(\phi )) - b(\lambda )}{a(\phi )}\right) \right) ^m = \exp \left( \frac{b(\lambda +t a(\phi )/m) - b(\lambda )}{a(\phi )/m}\right) . \end{aligned}$$

So we get back to a known result that \({\overline{Y}}_m\) belongs to the exponential family \(F_{exp}(x\mapsto a(x)/m,b,c,\lambda ,\phi )\) (e.g. McCullagh and Nelder 1989).

In our setting, random variables in the average \(\overline{Y}_n^{(j)}\) are i.i.d. with functions abc and parameters \(\lambda =\ell (\vartheta _{(1)}+\vartheta _{(j)})\) and \(\phi \). And \({\overline{Y}}_n^{(j)}\) also belongs to the exponential family with the same parameter but with the function \({\bar{a}}:x\mapsto a(x)/m_j\). In particular,

$$\begin{aligned} \mathbf {E}{\overline{Y}}_n^{(j)} = b'(\ell (\vartheta _{(1)}+\vartheta _{(j)})) = g^{-1}(\vartheta _{(1)}+\vartheta _{(j)}),~ \text{ Var }{\overline{Y}}_n^{(j)} = \frac{a(\phi )}{m_j} b''(\ell (\vartheta _{(1)}+\vartheta _{(j)})). \end{aligned}$$

But the computation of \(\mathbf {E}g({\overline{Y}}_n^{(j)})\) remains difficult unless g is a linear function. By the strong law of large numbers, as \(m_j\rightarrow +\,\infty \), the estimator is consistent since

$$\begin{aligned} {\overline{Y}}_n^{(j)}{\mathop {\underset{n\rightarrow +\infty }{\longrightarrow }}\limits ^{\text {a.s.}}} g^{-1}(\vartheta _{(1)}+\vartheta _{(j)}) \Rightarrow g({\overline{Y}}_n^{(j)}){\mathop {\underset{n\rightarrow +\infty }{\longrightarrow }}\limits ^{\text {a.s.}}} g(g^{-1}(\vartheta _{(1)}+\vartheta _{(j)}))=\vartheta _{(1)}+\vartheta _{(j)}. \end{aligned}$$

By the Central Limit Theorem (i.e. \({\overline{Y}}_n^{(j)}\) converges in distribution to a normal distribution) and using the Delta Method, we obtain that the following

$$\begin{aligned}&\sqrt{m_j}\left( g({\overline{Y}}_n^{(j)}) - \vartheta _{(1)}+\vartheta _{(j)}\right) {\mathop {\underset{n\rightarrow +\infty }{\longrightarrow }}\limits ^{\mathcal {L}}} \\&\quad {\mathcal {N}}\left( 0, a(\phi )b''(\ell (\vartheta _{(1)}+\vartheta _{(j)})) g'(g^{-1}(\vartheta _{(1)}+\vartheta _{(j)}))^2 \right) . \end{aligned}$$

\(\square \)

Proof of Corollaries 3.1

The log likelihood of \(\widehat{\varvec{\vartheta }}_n\) is defined by

$$\begin{aligned} \log L(\widehat{\varvec{\vartheta }}_n\,|\,\underline{\varvec{y}}) = \frac{1}{a(\phi )}\sum _{i=1}^n \left( y_i \ell ({\widehat{\eta }}_i) - b(\ell ({\widehat{\eta }}_i))\right) + \sum _{i=1}^nc(y_i,\phi ). \end{aligned}$$

In fact, we must be verified than \(\ell ({\widehat{\eta }}_i)\) does not depend on g function. If we consider \(\widehat{\varvec{\vartheta }}_n\) defined by (8), we have \(\varvec{Q}\widehat{\varvec{\vartheta }}_n = \varvec{g({\bar{y}})}\) , since \(\widehat{\varvec{\vartheta }}_n\) is solution of the system (23), i.e. \(\varvec{Q}(\varvec{Q}'\varvec{Q} + \varvec{R}'\varvec{R})^{-1}\varvec{Q}'=I\) Using \({\widehat{\eta }}_i= (\varvec{Q}\widehat{\varvec{\vartheta }}_n)_j\) for i such that \(x_i^{(2),j}=1\) we obtain

$$\begin{aligned} \ell ({\widehat{\eta }}_i)= \displaystyle \sum _{j=1}^d\ell \circ g(\bar{y}_n^{(j)})x_i^{(2),j} = \displaystyle \sum _{j=1}^d\ell \circ \ell ^{-1}\circ (b')^{-1}({\bar{y}}_n^{(j)})x_i^{(2),j} = \displaystyle \sum _{j=1}^d (b')^{-1}({\bar{y}}_n^{(j)})x_i^{(2),j}, \end{aligned}$$


$$\begin{aligned} \log L(\widehat{\varvec{\vartheta }}_n\,|\,\underline{\varvec{y}}) = \frac{1}{a(\phi )}\sum _{j=1}^d\sum _{i, x_i^{(2)}=v_j} \left( y_i (b')^{-1}\left( {\overline{y}}_n^{(j)}\right) - b\left( \left( b'\right) ^{-1}\left( {\overline{y}}_n^{(j)}\right) \right) \right) + \sum _{i=1}^nc(y_i,\phi ). \end{aligned}$$

In the same way,

$$\begin{aligned} \widehat{\mathbf {E}Y_i}= & {} b'(\ell ({\widehat{\eta }}_i)) = \sum _{j=1}^d \bar{y}_n^{(j)}x_i^{(2),j}, \quad \widehat{\text{ Var }Y_i} = a(\phi )b''(\ell ({\widehat{\eta }}_i)) \\= & {} a(\phi )\sum _{j=1}^d b''\circ (b')^{-1}({\bar{y}}_n^{(j)})x_i^{(2),j}. \end{aligned}$$

\(\square \)

Proof for the two-variable case

Proof of Theorem 3.2

The system \(S(\varvec{\vartheta })=0\) is

$$\begin{aligned} \left\{ \begin{array}{ll}\displaystyle \sum _{i=1}^n\ell '(\eta _i)\left( y_i - b'\circ \ell (\eta _i)\right) = 0\\ \displaystyle \sum _{i=1}^nx_i^{(3),l}\ell '(\eta _i)\left( y_i - b'\circ \ell (\eta _i)\right) = 0,\quad \forall l\in L\\ \displaystyle \sum _{i=1}^nx_i^{(2),k}\ell '(\eta _i)\left( y_i - b'\circ \ell (\eta _i)\right) = 0,\quad \forall k\in K\\ \displaystyle \sum _{i=1}^nx_i^{kl}\ell '(\eta _i)\left( y_i - b'\circ \ell (\eta _i)\right) = 0,\quad \forall (k,l)\in KL^\star . \end{array}\right. \end{aligned}$$

that is

$$\begin{aligned} \left\{ \begin{array}{ll}\displaystyle \sum _{(k,l)\in KL^\star }\ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\left( \sum _{i=1}^nx_i^{(k,l)}y_i - m_{k,l}b'\circ \ell (\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\right) = 0\\ \displaystyle \sum _{k\in K_l^\star }\ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\left( \sum _{i=1}^nx_i^{(k,l)}y_i - m_{k,l}b'\circ \ell (\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\right) = 0\quad \forall l\in L\\ \displaystyle \sum _{l\in L_k^\star }\ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\left( \sum _{i=1}^nx_i^{(k,l)}y_i - m_{k,l}b'\circ \ell (\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\right) = 0\quad \forall k\in K\\ \ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\left( \displaystyle \sum _{i=1}^nx_i^{(k,l)}y_i - m_{k,l}b'\circ \ell (\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\right) = 0\quad \forall (k,l)\in KL^\star . \end{array}\right. \end{aligned}$$

The system have exactly \(1+d_2+d_3\) redundancies, and \(S(\varvec{\vartheta })=0\) reduces to

$$\begin{aligned}&\ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl})\left( \displaystyle \sum _{i=1}^nx_i^{(k,l)}y_i - m_{k,l}b'\circ \ell (\vartheta _{(1)}+\vartheta _{(2),k} \right. \nonumber \\&\quad \left. + \vartheta _{(3),l} + \vartheta _{kl}) \right) = 0\quad \forall (k,l)\in { KL}^\star . \end{aligned}$$

Hence the system has rank \({ KL}^\star \) and if \(Y_i\) takes values in \(\mathbb {Y}\subset b'(\varLambda )\), and \(\ell \) injective, we have

$$\begin{aligned} \vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl} = g({\bar{Y}}_n^{(k,l)})\quad \forall (k,l)\in KL^\star . \end{aligned}$$

In the same way of proof of Theorem 3.1, we have to solve

$$\begin{aligned} \left\{ \begin{array}{ll}\varvec{Q}\varvec{\vartheta }= \varvec{g({\bar{Y}})}\\ \varvec{R}\varvec{\vartheta }=\varvec{0}. \end{array}\right. \end{aligned}$$

that is, because \(\varvec{Q}\varvec{Q}'+\varvec{R}\varvec{R}'\) is full rank, in the same way of proof of Theorem 3.1

$$\begin{aligned} {\varvec{\vartheta }} = (\varvec{Q}'\varvec{Q} + \varvec{R}'\varvec{R})^{-1}\varvec{Q}'\varvec{g({\bar{Y}})}. \end{aligned}$$

In that case, the MLE solves a least square problem with response variable \(\varvec{g({\bar{Y}})}\), explanatory variable \(\varvec{Q}\) under a linear constraint \(\varvec{R}\).

  1. 1.

    Under linear contrasts (\({\tilde{C}}_0\)), the model (10) is equivalent to model (6) with \(J=KL^\star \) modalities. Hence the solution is evident.

  2. 2.

    Under linear contrasts (\({\tilde{C}}_\varSigma \) ), the system

    $$\begin{aligned} \vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl} = g({\bar{Y}}_n^{(k,l)})\quad \forall (k,l)\in KL^\star \end{aligned}$$

    implies that

    $$\begin{aligned} \sum _{(k,l)\in KL^\star }m_{k,l}(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l} + \vartheta _{kl}) = \sum _{(k,l)\in KL^\star }m_{k,l} g({\bar{Y}}_n^{(k,l)}). \end{aligned}$$


    $$\begin{aligned} \sum _{(k,l)\in KL^\star }m_{k,l}= & {} n,\quad \sum _{(k,l)\in KL^\star }m_{k,l}\vartheta _{(2),k} = \sum _{k\in K}\sum _{l\in L^\star _k}m_{k,l}\vartheta _{(2),k}\nonumber \\= & {} \sum _{k\in K}m^{(2)}_k\vartheta _{(2),k}= 0,\\ \sum _{(k,l)\in KL^\star }m_{k,l}\vartheta _{(3),l}= & {} \sum _{l\in L}\sum _{k\in K^\star _l}m_{k,l}\vartheta _{(3),l}= \sum _{l\in L}m^{(3)}_l\vartheta _{(3),l}= 0,\nonumber \\&\quad \sum _{(k,l)\in KL^\star }m_{k,l}\vartheta _{kl} =0, \end{aligned}$$

    we get \(\vartheta _{(1)} = \dfrac{1}{n}\displaystyle \sum \nolimits _{(k,l)\in KL^\star }m_{k,l} g({\bar{Y}}_n^{(k,l)}).\) In the same way, taking summation over \(K^\star _l\) for \(l\in L\) and over \(L^\star _k\) for \(k\in K\), we found \(\vartheta _{(2),k}\) and \(\vartheta _{(3),l}\), and then \(\vartheta _{kl}\).

With main effect only, the system \(S(\varvec{\vartheta })=0\) is

$$\begin{aligned} \left\{ \begin{array}{ll}\displaystyle \sum _{i=1}^n\ell '(\eta _i)y_i = \sum _{i=1}^n g^{-1}(\eta _i)\ell '(\eta _i) \\ \displaystyle \sum _{i=1}^nx_i^{(3),l}\ell '(\eta _i) y_i = \sum _{i=1}^nx_i^{(3),l} g^{-1}(\eta _i)\ell '(\eta _i) \quad \forall l\in L\\ \displaystyle \sum _{i=1}^nx_i^{(2),k}\ell '(\eta _i)y_i = \sum _{i=1}^nx_i^{(2),k} g^{-1}(\eta _i)\ell '(\eta _i),\quad \forall k\in K \end{array}\right. \end{aligned}$$

There are \(1+d_2+d_3\) equations for \(1+d_2+d_3\) parameters, but each explanatory variable are colinear. So, the two additional constraints \(\varvec{R}\varvec{\vartheta }=0\) ensures that a solution exist for the remaining \(d_2+d_3-1\) parameters. Using \(\sum _k x_i^{(2),k}=1\), the second set of equations becomes \(\forall l\in L\)

$$\begin{aligned}&\displaystyle \sum _{k\in K}\ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l}) {\bar{y}}_n^{(k,l)} m_{k,l} \\&\quad = \sum _{k\in K} g^{-1}(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l}) \ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l}) m_{k,l} \end{aligned}$$

Similarly, the third set of equations becomes \(\forall k\in K\)

$$\begin{aligned}&\displaystyle \sum _{l\in L}\ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l}) {\bar{y}}_n^{(k,l)} m_{k,l} \\&\quad = \sum _{l\in L} g^{-1}(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l}) \ell '(\vartheta _{(1)}+\vartheta _{(2),k} + \vartheta _{(3),l}) m_{k,l} \end{aligned}$$

Even with a canonical link \(\ell (x)=x\) so that \(\ell '(x)=1\), this system is not a least-square problem for a nonlinear g function. \(\square \)

Calculus of the Log-likelihoods appearing in Sects. 4 and 5

Consider the Pareto GLM described on (13) and (15). The b function is \(b(\lambda ) = -\log (\lambda )\), using corollary 3.1 we have \(\ell (\hat{\eta }_i) = (b')^{-1}(\overline{z}_n^{(j)})=-(\overline{z}_n^{(j)})^{-1}\) for j such that \(x_i^{(2),j}=1\) and

$$\begin{aligned} \log L(\widehat{\varvec{\vartheta }}_n\,|\,\underline{\varvec{z}}) = \sum _{j=1}^d\sum _{i, x_i^{(2),j}=1} \left( z_i/{\overline{z}}_n^{(j)} - \log \left( -{\overline{z}}_n^{(j)} \right) \right) = n -\sum _{j=1}^d m_j \log \left( -{\overline{z}}_n^{(j)} \right) . \end{aligned}$$

Compute the original log likelihood of Pareto 1 distribution:

$$\begin{aligned} \log L(\varvec{\vartheta }\,|\,\underline{\varvec{y}}) = \sum _{i=1}^n \big (\log \ell (\eta _i) + \ell (\eta _i)\log \mu - (\ell (\eta _i) +1)\log y_i \big ). \end{aligned}$$

Hence with \(z_i=-\log (y_i/\mu )\),

$$\begin{aligned} \log L(\widehat{\varvec{\vartheta }}_n\,|\,\underline{\varvec{y}})= & {} \sum _{j=1}^d\sum _{i, x_i^{(2),j}=1}\left( -\log (-\overline{z}_n^{(j)}) -\frac{\log \mu }{{\overline{z}}_n^{(j)}} + \frac{\log (y_i)}{ {\overline{z}}_n^{(j)}} - \log y_i \right) \\= & {} n - \sum _{j=1}^d m_j\log (-{\overline{z}}_n^{(j)}) - \sum _{i=1}^n\log y_i =\log L(\widehat{\varvec{\vartheta }}_n\,|\,\underline{\varvec{z}})- \sum _{i=1}^n\log y_i. \end{aligned}$$

Now consider the shifted log-normal GLM described on (18) and (19). Here, the b function is \(b(\lambda )=\lambda ^2/2\), hence using Corollary 3.1, we have \(\ell (\hat{\eta }_i) = (b')^{-1}(\overline{z}_n^{(j)})=\overline{z}_n^{(j)}\) for j such that \(x_i^{(2),j}=1\) and Eq. (21) holds.

Let us compute the original log likelihood of the shifted log normal distribution:

$$\begin{aligned} \log L(\varvec{\vartheta }\,|\,\underline{\varvec{y}})= & {} \sum _{i=1}^n\left( - \log (x_i-\mu ) - \log (\sqrt{2\pi \phi }) -\dfrac{(\log (x_i-\mu ) - \ell (\eta _i))^2}{2\phi }\right) \\= & {} - \sum _{i=1}^n z_i - n\log (\sqrt{2\pi \phi }) - \sum _{i=1}^n \dfrac{(z_i - \ell (\eta _i))^2}{2\phi }, \end{aligned}$$

with \(z_i=\log (y_i-\mu )\). Hence

$$\begin{aligned} \log L(\widehat{\varvec{\vartheta }}\,|\,\underline{\varvec{y}})= & {} - \sum _{i=1}^n z_i - n\log (\sqrt{2\pi \phi }) - \frac{1}{2\phi }\sum _{j=1}^d\sum _{i, x_i^{(2),j}=1} (z_i - \overline{z}_n^{(j)})^2. \end{aligned}$$

Using \( {\widehat{\phi }} = \frac{1}{n}\sum _{j\in J}\sum _{i, x_i^{(2),j}=1}\left( z_i - {\bar{z}}_n^{(j)}\right) ^2 \) leads to the desired result.

Link functions and descriptive statistics

See Fig. 4 and Table 10.

Fig. 4
figure 4

Graphs of link functions

Table 10 Empirical quantiles and moments (in euros)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Brouste, A., Dutang, C. & Rohmer, T. Closed-form maximum likelihood estimator for generalized linear models in the case of categorical explanatory variables: application to insurance loss modeling. Comput Stat 35, 689–724 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Regression models
  • Heavy-tailed distributions
  • Explicit MLE
  • Insurance claim modeling