Multivariate mixed Poisson Generalized Inverse Gaussian INAR(1) regression

In this paper, we present a novel family of multivariate mixed Poisson-Generalized Inverse Gaussian INAR(1), MMPGIG-INAR(1), regression models for modelling time series of overdispersed count response variables in a versatile manner. The statistical properties associated with the proposed family of models are discussed and we derive the joint distribution of innovations across all the sequences. Finally, for illustrative purposes different members of the MMPGIG-INAR(1) class are fitted to Local Government Property Insurance Fund data from the state of Wisconsin via maximum likelihood estimation.


Introduction
In recent years, there has been a growing interest in modelling integer-valued time series of univariate and multivariate count data in a plethora of different scientific fields such as sociology, econometrics, manufacturing, engineering, agriculture, biology, biometrics, genetics, medicine, sports, marketing, and insurance. In particular, regarding the univariate case (Al-Osh and Alzaid 1987) and McKenzie (1985) were the first to consider an INAR(1) model based on the so-called binomial thinning operator. Subsequently, many articles focused on extending this setup by applying different thinning operators or by varying the distribution of innovations. For more details, the interested reader can refer to Weiß (2018), Davis et al. (2016), 1 3 Scotto et al. (2015), Weiß (2008) among many more. The INAR(1) model with Poisson marginal distribution (Poisson INAR(1)) has been the most popular choice due to the simplicity of its log-likelihood function that implies that the formality of parameter estimation via maximum likelihood (ML) estimation is straightforward. Also, Freeland and McCabe (2004) considered an extension of the model by allowing for regression specifications on the mean of the Poisson innovation as well as parameter of binomial thinning operator. On the other hand, the literature which focuses on the multivariate case is less developed. In particular, Latour (1997) introduced a multivariate GINAR(p) model with a generalized thinning operator. Karlis and Pedeli (2013) and Karlis (2011, 2013a, b) focused on the diagonal case under which the thinning operators do not introduce cross correlation among different counts. In this case, the dependence structure introduced by innovations. Additionally, Ristić et al. (2012), Popović (2016), Popović et al. (2016) and Nastić et al. (2016) constructed multivariate INAR distributions with cross correlations among counts and random coefficients thinning. Finally, Karlis and Pedeli (2013) extended the setup of the previous articles by allowing for negative cross correlation via a copula-based approach for modelling the innovations.
In this paper, we extend the model proposed by Pedeli and Karlis (2011) by introducing the multivariate mixed Poisson-Generalized Inverse Gaussian INAR(1), MMP-GIG-INAR(1), regression model for multivariate count time series data. The MMP-GIG-INAR(1) is a general three parameter distribution family of INAR(1) models driven by mixed Poisson regression innovations where the mixing densities are chosen from the Generalized Inverse Gaussian class of distributions. Thus, the proposed modelling framework can provide the appropriate level of flexibility for modelling positive correlations of different magnitudes among time series of different types of overdispersed count response variables. In particular, depending on the values taken by the shape parameter, the MMPGIG-INAR(1) family includes many members, such as the mixed Poisson-Inverse Gaussian (PIG), as special cases and several others as limiting cases, such as the Negative Binomial, or Poisson-Gamma, the Poisson-Inverse Gamma (PIGA), the Poisson-Inverse Exponential, the Poisson-Inverse Chi Squared and the Poisson-Scaled Inverse Chi Squared distributions. Therefore, it can accommodate different levels of overdispersion depending on the chosen parametric form of the mixing density. Furthermore, the MMPGIG-INAR(1) family of models is constructed by assuming that the probability mass function (pmf) of the MMPGIG innovations is parameterized in terms of the mean parameter which results in a more orthogonal parameterization that facilitates maximum likelihood (ML) estimation when regression specifications are allowed for the mean parameters of the MMPGIG-INAR(1) regression model. For expository purposes, we derive the joint probability mass functions and the derivatives of several special cases of the MMPGIG-INAR(1) family which are used as innovations. These models are fitted to time series of claim count data from the Local Government Property Insurance Fund (LGPIF) data in the state of Wisconsin. At this point it is worth noting that modelling the correlation between different types of claims from the same and/or different types of coverage it is very important from a practical business standpoint. Many articles have been devoted to this topic, see for example, Bermúdez and Karlis (2011), Bermúdez and Karlis (2012), Shi and Valdez (2014a, b), Abdallah et al. (2016), Bermúdez and Karlis (2017), Pechon et al. (2018), Pechon et al. (2019), Vernic (2019), Denuit et al. (2019), Fung et al. (2019), Bolancé et al. (2020), Pechon et al. (2021), Jeong and Dey (2021), Gómez-Déniz and Calderín-Ojeda (2021), Tzougas and di Cerchiara (2021a, b). However, with the exception of very few articles, such as Bermúdez et al. (2018) and Bermúdez and Karlis (2021), the construction of bivariate INAR(1) models which can capture the serial correlation between the observations of the same policyholder over time and the correlation between different claim types remains a largely uncharted territory. This is an additional contribution of this study.
The rest of the paper proceeds as follows. Section 2 presents the derivation of the MMPGIG-INAR(1) model. Statistical properties of the MMPGIG innovations are discussed in Sect. 3. In Sect. 4, we present a description of the alternative special cases of the MMPGIG-INAR(1) family. Section 5 discusses the parameter estimation for these models based on the maximum likelihood method and integer-valued prediction. Section 6 contains our empirical analysis for the LGPIF data set. Finally, concluding remarks are given in Sect. 7.

Generalized setting
Let and be non-negative integer-valued random vectors in ℝ m . Let be a diagonal matrix in ℝ m×m with elements p i ∈ (0, 1) . The multivariate Poisson-Generalized Inverse Gaussian INAR(1) is defined as where the thinning operator • is the widely used binomial thinning operator such that p i •X i,t = ∑X i,t k=1 U k where U k are independent identically distributed Bernoulli random variables with success probability p i , i.e. P(U k = 1) = p i . Hence p i •X i,t is binomially distributed with size X i,t and success probability p i . Then the distribution function f p i (x, X i,t ) can be easily written down as Note that given X i,t , X j,t i ≠ j , p i •X i,t and p j •X j,t , are independent of each other. To adapt the heteroscedasticity arising from the data, {R i,t } i=1,…,m are mixed Poisson random variables Po( t i,t ) with the random effect t . The rate i,t is characterized by its observed covariate z i,t ∈ ℝ a i ×1 for some positive integer a i and they are connected through a log link function such that log( i,t ) = z T i,t i where i ∈ ℝ a i ×1 . Furthermore, {R i,t } i=1,…,m share the same random effect t with mixing distribution G( ) , which means the dependent structure among X i,t can be controlled by the choice of distribution and its corresponding size of parameters. The joint distribution of t is We let t be a continuous random variable from the Generalized Inverse Gaussian distribution with density function g( ) where −∞ ≤ ≤ ∞, > 0, > 0 and K ( ) is the modified Bessel function of the third kind of order and argument such that The Generalized Inverse Gaussian distribution is a widely used family. For example, it includes the Inverse Gaussian as special case and the Gamma and Inverse Gamma as limiting cases. To avoid identification problems for mixed Poisson regression random variable t , the mean of t is restricted to one, i.e.
[ t ] = 1 , and all the parameters , , will be either fixed or a function of another parameter . With these two constraints, there is only one parameter that is free to vary. (e.g. for Inverse Gaussian distribution, = − 1 2 and = = ). The joint distribution of t becomes an MPGIG distribution In Sect. 5, we will discuss in detail the distribution function f ( , t) for some special cases. Finally, it should be noted that several articles discuss multivariate versions of MPGIG distribution and/or the MPIG distribution which is a special case for = −0.5 , see, for instance, Barndorff-Nielsen et al. (1992), Ghitany et al. (2012) Amalia et al. (2017, Mardalena et al. (2020), Tzougas and di Cerchiara (2021b) and Mardalena et al. (2021). However, this is the first time that the MMPGIG-INAR(1) distribution family of INAR(1) models driven by mixed Poisson regression innovations are considered for modelling time series of count response variables.

◻
The marginalization property can enable, for example insurers, to easily price those policyholders who only engage in some but not all lines of business. The last property is about the identifiability of t , which will ensure the uniqueness of the model.
Proof With the assumption that the covariate is of full rank and the log-link function is monotonic such that log( i,t ) = z T i,t i , it is obvious that the identification problem for the mixed Poisson regression random variable t reduces to identification for mixed Poisson random variable (without regression), which means the set of parameter can be re-parametrized as * Then the 'if' statement is obvious since the same set of parameters will definitely lead to the same joint distribution function. For the 'only if' statement, to match two distribution functions, all the moments (mean,variance, covariance) must reconcile. From the moment properties above, matching the [R i,t ] will lead to i,t =̃i ,t . Likewise, given that the first moment is matched, only =̃ will lead to the same Var(R i,t ) . Matching these moments already leads to * R =̃ * R , then the covariance Cov(R i,t , R j,t ) must match with each other. ◻

Model specification
The distributional properties of t , in particular the correlation structure and 'tailedness' of the distribution, are mainly determined by the innovation t , more specifically, the mixing density g( ) . On the other hand, the explicit form of the derivatives of f ( , t) can significantly accelerate the computational speed when performing estimation. Hence, the distribution function f ( , t) as well as its derivatives are derived for two limiting cases (Gamma, Inverse Gamma) and some other special cases (GIG with unit mean and different values of ). Throughout this session, we

Mixing by Gamma distribution
If t is univariate, the resulting distribution is known as the negative binomial distribution and this result can be easily extended to the multivariate case which is called the multivariate negative binomial distribution (see e.g. Marshall and Olkin 1990;Boucher et al. 2008;Cheon et al. 2009). The gamma density is obtained by letting = , = 2 and = 0 in generalized Inverse Gaussian density in Sect. 2.4. The resulting mixing density has the following form: with unit mean and variance 1 . Then the expectation 2.3 can be evaluated explicitly can be figured out easily except f ( ,t) which involves the gamma function. The derivative of the gamma function can be derived by utilizing the alternative Weierstrass's definition such that which is valid for all complex number z except non-positive integers and is Euler-Mascheroni constant. Then the derivative can be derived by differentiating its log transform log Γ(z + 1) , which leads to the series expansion of digamma function Then the derivative f ( ,t) can be derived steps by steps. First let us simplify the The derivative is then ◻

Mixing by Inverse Gamma
The Inverse gamma distribution, which is another limiting case of generalized Inverse Gaussian distribution, is discussed in Sect. 9.3 (Johnson et al. 1995). Inverse gamma random variable has a relatively thicker right tail and a low probability in taking the values closed to 0. In this case, the density function g( ) is obtained by letting = 0, = 2 and = − − 1 such that with mean 1 and variance 1 −1 for > 1 . It is also called the reciprocal gamma distri- In this case, numerical differentiation is applied to calculate log K ( ) since the parameter appears both in the order and argument of the modified Bessel function K ( ).

Mixing by Generalized Inverse Gaussian
Likewise, if t is univariate, the distribution of t is known as the Poisson Generalized Inverse Gaussian distribution. To comply with constraints we made in Sect. 2, the mixing density function has following form with unit mean and variance var( t ) = 1 Then the distribution function f ( , t) becomes where a = c + 2S t , b = c and p = S k + . Furthermore, we let be constant and fixed in order to avoid potential identification problems which may appear when performing estimation. In general, however, the derivative with respect to is really hard to find since the constant c involves the Bessel function. On the other hand, it is worth noting that var( t ) is roughly unbounded when ∈ [−2, 0] and the skewness and kurtosis are decreasing with respect to , which can be easily verified by some statistical software on computer. So, we will discuss cases where = − 1 2 , − 3 2 , − 3 4 , two of which have 'explicit' distributions in the sense that the constant c can be evaluated in closed form.

Generalized Inverse Gaussian with
In this case, the resulting distribution known as the Poisson Inverse Gaussian distribution is investigated by many authors (see,e.g Sichel 1974Sichel , 1982Atkinson and Yeh 1982;Stein and Juritz 1988 among others). When = − 1 2 , c = 1 and the distribution function f becomes For convenience, we reparametrize the above density by squaring the parameter such that where p = S k − 1 2 and = √ 2 + 2S . The derivatives of f ( , t) with respect to different parameters can be derived by making use of the derivative of K ( ) with respect to its argument such that then it leads to the following derivatives where i = (0, … , 0, 1, 0, … , 0) T ∈ ℝ m×1 is vector with i-th element being one and 0 elsewhere.

Generalized Inverse Gaussian with = − 3 2
In this case, the constant c = 1+ and the variance var( t ) = 1 which is exactly the same as the variance of Inverse Gaussian case but the random effect t will in general have larger skewness and kurtosis. The resulting distribution function is where p = S k − 3 2 and = √ 2 + 2( + 1)S t . The derivatives with respect to different parameters can be derived similar to that of Inverse Gaussian case (4.14)

Multivariate mixed Poisson Generalized Inverse Gaussian…
The remaining case where = − 3 4 cannot be simplified since the c = K 1∕4 ( ) K 3∕4 ( ) cannot be written down in terms of basic functions. Hence numerical differentiation has to be applied when evaluating f ( ,t) and Table 1 summarise the parametrization of all mixing densities and Table 2 shows the moments formula for each mixing density. Although the formula for variances is slightly different due to its parametrization, they can be easily reparameterized and compared with each other. It turns out that the Inverse Gamma has the largest skewness and kurtosis while the Gamma density has the smallest, which means the 'tailedness' of those density increases in a 'top-down' order according to the Table. Hence, one can choose different density to accommodate different tail structure encountered in real data.

Maximum likelihood estimation for the MMPGIG-INAR(1) model
In this section, we derive the log likelihood function and score function of the MMPGIG-INAR(1) model defined above for the general case. Let the whole parameter set be = {p i , i , |i = 1, … , m} and then the log likelihood function ( ) for this discrete Markov chain is just the product of their conditional probability function such that ( ) = ∏ t P ( t � t−1 ) , where the conditional probability is the convolution of m+1 distribution functions such that  Inverse Gaussian where the expectation is taken with respect to the random variable t . The following proposition gives ( ) and its score functions. are already discussed in Sect. 4 for different cases. Hence, the maximum likelihood estimators can be obtained through numerical algorithms, for example Newton-raphson, Quasi-Newton and so on. However, optimization will be computational intensive as m increases. One can solve this issue by adopting the (5.1) composite likelihood method introduced in Pedeli and Karlis (2013a), where the high dimensional likelihood function was reduced to a sum of bivariate cases.

Integer-valued prediction
Based on the estimates obtained by maximum likelihood and the random sequence ( 1 , … , n ) , the h-steps ahead distribution of n+h conditional on n is given by where ̂ is obtained from above estimation procedure. In the classical time series model, one would minimise MSE(h) = [(̂ n+h − n+h ) 2 | n ] to obtain the optimal linear predictor such that ̂ n+h = [ n+h | n ] . However, this would inevitably introduce real value for ̂ n+h , which is not coherent to the integer-valued nature of MMPGIG-INAR(1) model. To solve this, one can instead use the median ̃ n+h of n+h , the 50% quantile, as prediction value for the model, which is also discussed by Pavlopoulos and Karlis (2008) and Homburg et al. (2019) . In the univariate case, the median is obtained by minimising the mean absolute error MAE(h) = [|X n+h − X n+h ||X n ] . The idea here can be extended to the multivariate case so that the median ̃ n+h is called geometric median, which is calculated by minimising the expected Euclidean distance On the other hand, the expectation can be evaluated numerically by simulating the random samples of n+h .

Empirical analysis
The data used in this section come from the Local Government Property Insurance Fund (LGPIF) from the state of Wisconsin. This fund provides property insurance to different types of government units, which includes villages, cities, counties, towns 2,t respectively, where j is the identifier for each policyholder. Then the relationship between X i,t and X In what follows, basic statistical analysis is shown in Table 3 and Figs. 1 and 2. The proportion of zeros for the two types of claims is higher than 90% during the period 2006-2010. Also, both types of claims exhibit overdispersion, since their variances exceeds their means during this period. Furthermore, the overdispersion for X 2,t is even stronger than that of X 1,t , which indicates the need to employ an overdispersed distribution for this data. Additionally, the correlation tests for X 1,t and X 2,t show a positive correlation between the two claim types. At this point it is worth noting that modelling positively correlated claims has been explored by many articles. See for example, Bermúdez and Karlis (2011) Tzougas and di Cerchiara (2021a, b) and Bermúdez and Karlis (2021). Finally, the proportion of zeros and kurtosis show that the marginal distributions of X 1,t , X 2,t are positively skewed and exhibit a fat-tailed structure which indicates the appropriateness of adopting a positive skewed and fat-tailed distribution (GIG distribution). The description and some summary statistics for all the explanatory variables (covariates z 1,t , z 2,t ) that are relevant to X 1,t , X 2,t are shown in Table 4. Variables 1-5 including 'TypeVillage' are categorical variables to indicate the entity types of a policyholder. Due to the strongly heavy-tailed structure appearing in variables 6 and 9 which can drastically distort the model fitting, those variables are transformed by means of the 'rank' function in R software and then standardized, which can mitigate the effect of outliers. Variables 6-8 are relevant to IM claim X 1,t while variables 9,10 provide information for CN claims X 2,t . The covariate z 1,t includes variables 1-8 and z 2,t contains variables 1-5 and variables 9, 10. These covariates act as the regression part for i,t mentioned in Sect. 2, which may help explained part of the heterogeneity between X 1,t and X 2,t .
The MMPGIG-INAR(1) with m = 2 , is applied to model the joint behaviour of X (j) 2,t across all the policyholders. Note that when Gamma mixing density is used in MPGIG INAR(1), the resulting model will be the "BINAR(1) Process with BVNB Innovations" in Pedeli and Karlis (2011), which we will used as comparison benchmark for other choices of mixing density. The the likelihood function would simply become where j ( ) is the likelihood function for policyholder j. Note that all the policyholders with the same type of claim X i,. will share the same set of parameters p i , i where R i,t ∼ Pois( i,t i,t ), i = 1, 2 and random effect i,t is independent of i. Similarly, the likelihood functions for these models will have the same form as Eq. (6.1) but different joint distribution Pr(X 2,t ) . For comparison purposes, we fit the bivariate Poisson mixture regression model with the training data starting from 2007 because BMP model does not need to consider lag responses.
All the estimations is implemented in R software by the 'optim' function with method 'BFGS' (quasi-Newton method). The gradient functions with respect to all the parameters are derived in Sects. 4 and 5 and they can be input as gradient argument in 'optim' function, which will significantly decrease the amount of computational time compared to numerical gradient function in default setting.
Model fitting results are shown in Tables 5 and 6. All the results show a great improvement by adopting a time series model compared to BMP results in Table 6. When focusing on the results of BINAR in Table 6, except the case where the mixing density is GIG = − 3 2 , there is an significant improvement by introducing the fat-tailed distribution as mixing density in t compared to Gamma case. On the other hand, the improvement from the optimal TINAR to the optimal BINAR (cells are in bold face) is obvious, which is indicated by lower AIC and BIC of BINAR with GIG = − 3 4 compared to TINAR with GIG = − 3 4 and Inverse Gaussian. It implies that there is significant correlation between two claim sequences. Maximum likelihood estimates for three cases are given in Table 7 as well as their standard deviations. The standard derivations are estimated by inverting the numerical Hessian matrix. From Table 7 we see that the estimates for p i , i are very close to each other while the estimated is significantly different among three mixing densities, which is expected because influences the tail and correlation structure of the bivariate sequence X 1,t , X 2,t . Furthermore, we see that the explanatory variables have a similar effect (positive and/or negative) and are almost identical for both response variables in the case of of all three models. Finally, the variables which are statistically significant at a 5 % threshold for X 1,t are TypeCounty, TypeMisc, TypeVillage, NoClaimCreditIM, and those which are statistically significant at a 5 % threshold for X 2,t are TypeCity, TypeCounty, TypeVillage, CoverageIM, and CoverageCN.
The Fig. 2 below presents prediction for both types of claims at t = 2011 with n 2 = 1025 policyholders based on geometric median Eq. (5.5). It seems that the

3
Multivariate mixed Poisson Generalized Inverse Gaussian… prediction for number of policyholders who make no claims are reasonably good while the prediction for X 1,t are generally underestimated at tail and the prediction for X 2,t are overestimated at the tail. On the other hand, Table 8 shows the prediction sum of squared error (PSSE) and frequency of some basic combination of observations, namely (0, 0), (1, 0), (0, 1), (1, 1) for the best fitted models within three classes, bivariate mixed Poisson regression, Two independent INAR(1) and bivariate INAR(1). It is again clear that the introduction of autoregressive part makes sense as it greatly reduce the prediction error. Although the best TINAR model has the closet frequency of (0, 0), the best BINAR model has the lowest overall prediction error.

Concluding remarks
In this paper we proposed the MMPGIG-INAR(1) regression model for modelling multiple time series of different types of count response variables. The proposed model, which is an extension of BINAR(1) regression model that was introduced by Pedeli and Karlis (2011), can accommodate positive correlation and multivariate overdispersion in a flexible manner. In particular, the Generalized Inverse Gaussian class includes many distributions as its special and limiting cases that can be used for modelling the innovations t . Thus, the proposed modelling framework can efficiently capture the stylized characteristics of alternative complex data sets. Furthermore, due to the simple form of its density function, statistical inference for the MMPGIG-INAR(1) model is straightforward via the ML method, whereas other models that have been proposed in the literature, such as copula-based models, may result in numerical instability during the ML estimation procedure. For demonstration purposes different members of the proposed famly of models were fitted to LGPIF data from the state of Wisconsin. Finally, it is worth mentioning that a possible line of further research could be to also consider cross correlation, meaning that the non-diagonal elements of can take positive values.