1 Introduction

Reliability and system engineers often encounter the difficulty of dealing with uncertainties present in the system where more than one study variables are of interest to them. Medical experts also face the similar situations when the life of patients goes at stake for the failure of vital organs like heart, brain, kidney, liver, lungs, and the likes. Choice of discrete or continuous and bivariate or multivariate distributions depends on the nature and number of the study variables. A vast literature exists on the construction of probability distributions. For this, one may refer to [1]. No hard and fast criteria could be established to construct probability distributions. More details on this issue may be found in [1,2,3].

If bivariate continuous distributions are to be used, we could choose from parametrical distributions to analyze bivariate lifetime data suggested in the literature, [4,5,6,7,8]. As our study corresponds to the reliability of system generating bivariate count data, so the most suitable distribution to model such types of datasets is believed to be the bivariate generalized geometric distribution (BGGD) proposed by [9]. Many bivariate distributions for continuous random variables are introduced in the literature to be used in data analysis, especially in applications of survival data in the presence of censored data and covariates (see, for example, [10,11,12,13,14,15,16,17,18,19,20]. The recent study includes [21]. Alternatively, it can be observed in the literature that it is not very common the use of bivariate distributions for survival data assuming discrete data. Some discrete bivariate distributions have been introduced in the literature as the bivariate geometric distribution of [4] or the bivariate geometric distribution of [22], but these discrete distributions are still not very popular in the analysis of bivariate lifetime data, especially in the presence of censored data and covariates (see also, [4, 23,24,25,26,27,28]).

Classical methods are frequently used in the analysis but they suffer from a certain drawbacks. The frequentists consider parameters to be unknown fixed quantities and they just rely on the current data and deprive the results of any prior information available about the parameters of interest. However, the Bayesians treat the parameters as random quantities and hence assign a probability distribution to the parameters. The Bayesian analysis is a modern inferential technique that endeavors to estimate the model parameters taking both the current data and prior information about the parameters into account. As a result, we get a posterior distribution that is believed to average the current data and the prior information. The posterior distribution thus derived is the achilis heal and the work-bench of the Bayesians to infer about the parameters based on numerical procedures and the entire estimation is then based on the very posterior distribution. A good review of the advantages of using Bayesian methods may be seen in [29]. The posterior distributions often have complex multidimensional functions that require the use of Markov chain Monte Carlo (MCMC) methods to draw results, [30,31,32,33,34]. Its use is very popular for analyzing bivariate continuous or discrete random variables in presence of censored data and covariates (see for example, [29, 35,36,37,38,39,40]. Recently, [41] has considered weighted bivariate geometric distribution and [42] has used the q-Weibull distribution in classical as well as Bayesian frameworks. In recent years, the use of Markov Chain Monte Carlo (MCMC) methods has gained much popularity, [43, 44] and [45].

It has been established that the BGGD is a good choice to model and analyze reliability count data appearing in medicine, engineering. The probability mass function (pmf) of BGGD is given as

$$f\left(x,y;\alpha,\,{\theta }_{1},\,{\theta }_{2}\right)=\frac{\alpha {\theta }_{1}^{x}{\theta }_{2}^{y}\left(1-{\theta }_{1}\right)\left(1-{\theta }_{2}\right)(1-{{\alpha }^{^{\prime}}}^{2}{\theta }_{1}^{2x+1}{\theta }_{2}^{2y+1})}{\left(1-{\alpha }^{^{\prime}}{\theta }_{1}^{x+1}{\theta }_{2}^{y}\right)\left(1-{\alpha }^{^{\prime}}{\theta }_{1}^{x+1}{\theta }_{2}^{y+1}\right)\left(1-{\alpha }^{^{\prime}}{\theta }_{1}^{x}{\theta }_{2}^{y}\right)\left(1-{\alpha }^{^{\prime}}{\theta }_{1}^{x}{\theta }_{2}^{y+1}\right)},$$
(1)

for \(x=\mathrm{0,1},\dots , y=\mathrm{0,1},\dots , 0<{\theta }_{1}<1, 0<{\theta }_{2}<1, \alpha >0, {\alpha }^{^{\prime}}=1-\alpha .\) And the cumulative distribution function (CDF) is

$$F\left(x,y\right)=\frac{1-{\theta }_{1}^{x}{\theta }_{2}^{y}}{1-{\alpha }^{^{\prime}}{\theta }_{1}^{x}{\theta }_{2}^{y}}, \alpha >0, {\alpha }^{^{\prime}}=1-\alpha .$$

Here \(\alpha\), \({\theta }_{1}\) and \({\theta }_{2}\) are unknowns parameters that control the behavior of the datasets emerging from the BGGD. Estimating the unknown parameters is ultimate goal of the inferential statistics.

Due to variety of applications of the BGGD, the efficient estimation of the PDF and the CDF of the BGGD is the purpose of the present study. In [9] have recently worked out the classical maximum likelihood estimators for the BGGD. Taking into account and to avail the aforesaid advantages, we estimate the parameters of the BGGD in Bayesian framework. We have used the MCMC methods to draw results and applied different model selection criteria to compare the methods under consideration. Such as ML, AIC, AICC, BIC is also known as Schwarz criterion), and HQC.

2 The Frequentist Approach of Statistical Analysis

In statistical terminology, the data generating pattern of any system or model depends on the system-specific characteristics, called parameters. So the data being generated from the model is believed to advocate the values of the parameters causing the system to generate the dataset. The uncertainty associated with the data values is defined in terms of frequencies of the data values emerging again and again from the system under study. The objective of the analysis is to infer the characteristics of the system or model from the relevant data collected randomly. It is considered as the default approach to be used in variety of areas of sciences. Commonly used frequentist methods of statistical inference include uniformly minimum variance unbiased estimation, maximum likelihood (ML) estimation, percentile estimation, least squares estimation, weighted least squares estimation, etc. But we just report the most commonly used ML estimation method whose results will henceforth be compared with their Bayesian counterparts.

2.1 Maximum Likelihood (ML) Estimation

The likelihood function gives the probability of the situation that the model, system or distribution under study have witnessed to generate the observed sample. The frequentist method of maximum likelihood estimation professed by [46] calls for choosing those values of the parameters that maximize the probability of the very observed sample. We generally opt for algebraic maximization of the likelihood function to find the ML estimates, but we may also opt for evaluating the probabilities of the observed samples at all possible values of the parameters and to choose those parametric values as the estimates that maximize the evaluated probabilities of the observed samples.

Algebraically, the ML estimation may be proceeded as follows. Let us consider the random sample of size \(n\) from the bivariate correlated data \(\left({x}_{i},{y}_{j}\right)\) for \(i=\mathrm{1,2},\dots ,{n}_{1}\) and \(j=\mathrm{1,2},\dots ,{n}_{2}\) from the BGGD \(f\left(x,y;\alpha ,{\theta }_{1},{\theta }_{2}\right)\) given in (11). The log likelihood function \(l\left(x,y;\alpha ,{\theta }_{1},{\theta }_{2}\right)\) may be written as

$$l\left(x,y;\alpha ,{\theta }_{1},{\theta }_{2}\right)=\left({n}_{1}+{n}_{2}\right)\mathrm{log}\left(\alpha \right)+{n}_{1}\overline{x}\mathrm{log }\left({\theta }_{1}\right)+{n}_{2}\overline{y}\mathrm{log }\left({\theta }_{2}\right)+{n}_{1}\mathrm{log}\left(1-{\theta }_{1}\right)+{n}_{2}\mathrm{log}\left(1-{\theta }_{2}\right)+{\sum }_{i=1}^{{n}_{1}}{\sum }_{j=1}^{{n}_{2}}(1-{{\alpha }^{^{\prime}}}^{2}{\theta }_{1}^{2x+1}{\theta }_{2}^{2y+1})-{\sum }_{i=1}^{{n}_{1}}{\sum }_{j=1}^{{n}_{2}}\left(1-{\alpha }^{^{\prime}}{\theta }_{1}^{x+1}{\theta }_{2}^{y}\right)-{\sum }_{i=1}^{{n}_{1}}{\sum }_{j=1}^{{n}_{2}}\left(1-{\alpha }^{^{\prime}}{\theta }_{1}^{x+1}{\theta }_{2}^{y+1}\right)-{\sum }_{i=1}^{{n}_{1}}{\sum }_{j=1}^{{n}_{2}}\left(1-{\alpha }^{^{\prime}}{\theta }_{1}^{x}{\theta }_{2}^{y}\right)-{\sum }_{i=1}^{{n}_{1}}{\sum }_{j=1}^{{n}_{2}}\left(1-{\alpha }^{^{\prime}}{\theta }_{1}^{x}{\theta }_{2}^{y+1}\right)$$

Equating to zero the first partial derivatives of the log-likelihood function \(l\left(x,y;\alpha ,{\theta }_{1},{\theta }_{2}\right)\) with respect to the set of unknown parameters \(\alpha ,{\theta }_{1},{\theta }_{2}\) yields the normal equations which may be solved simultaneously to get the required ML estimates. However, if the normal equations are too complex to be solved simultaneously, we have to proceed to numerical methods or by direct maximization of the log-likelihood function.

2.2 ML estimation–Algebraic approach

Let us consider the real dataset for the BGGD given in Appendix A1 in Table 8 appearing in [47,48,49], where X represents the counts of surface and Y, the count of interior faults in 100 lenses. The summary of the data is presented in Table 1 along with its figurative representation made in Fig. 1.

Table 1 Summary of the observed data
Fig. 1
figure 1

Graphs of the BGGD at the observed data points

Obviously, the observed data is positively skewed and negatively correlated. As stated in Sect. 3.1, the normal equations we get using the observed dataset are complicated and hence the estimates are found using the numerical methods. Following [9], the ML estimates, standard error (SE) and 95% confidence intervals for the parameters are reported in Table 2.

Table 2 ML estimates, standard error and 95% confidence intervals

2.3 ML Estimation–Graphical approach

As already discussed in Sect. 3.1, we may opt for direct maximization of the likelihood function to find the ML estimates. The ML estimation theory suggested by [46] calls for choosing those values of the parameters that maximize the probability of the observed sample, and these values are regarded as the ML estimates. This technique is used here to find the ML estimates by plotting the observed dataset of the BGGD at different parametric values. The resulting plots generated in R package and are displayed in Fig. 2. We observed that the highest probability is obtained at \(\mathrm{\alpha }=\) 2.288 \(, {\uptheta }_{1}=\) 0.676 \(, {\uptheta }_{2}=\) 0.652 (the 1st one of the plots of Fig. 2), hence they may be regarded as the ML estimates.

Fig. 2
figure 2

BGGD at the observed data points and different parametric values and the parameters including the ML estimates, i.e., \(\boldsymbol{\alpha }=\) 2.288 \(,{{\varvec{\theta}}}_{1}=\) 0.676 \(,{{\varvec{\theta}}}_{2}=\) 0.652 (First cell)

The graph in first cell corresponds to that produced by using the ML estimates. The maximum likelihood is found to be 3.2619E-192 at the ML estimates, i.e., \(\alpha =\) 2.288 \(, {\theta }_{1}=\) 0.676 \(, {\theta }_{2}=\) 0.652. It is also interesting to note that the value of highest ordinate, i.e., 0.02822 appears at the data pair (\(x=2,y=1\)) at the ML estimates and the value of negative log-likelihood is found to be -432.957.

3 The Bayesian Approach of Statistical Analysis

A brief overview on this topic is already given in Sect. 1. Bayesian method combines prior information about the model parameters with dataset using Bayes rule yielding the posterior distribution. The Bayes rule is named after Thomas Bayes, whose work on this topic was published in 1763, 2 years after his death, [50]. To establish Bayesian inference set-up, we need a model or system in the form of a probability distribution controlled by a set of parameters to be estimated, the sample dataset generated by the model or distribution, and a prior distribution based on the prior knowledge of experts regarding the parameters of interest. These elements are formally combined in to posterior distribution which is regarded as a key-distribution and work-bench for the subsequent analyses. The algorithm is explained in [42].

If \(f(D|\theta )\) is the data distribution depending upon the vector of parameters \(\theta\),\(p(\theta )\) is the joint prior distribution of vector of parameters \(\theta\), \(L(D|\theta )\) is the likelihood function defining the joint probability of the observed sample data \(D\) conditional upon the parameter vector \(\theta\), then the posterior distribution of \(\theta\) conditional upon the data \(D\), denoted by \(f\left(\theta |D\right)\), is given by

$$f\left(\theta |D\right)=\frac{p(\theta )L(D|\theta )}{{\int }_{\theta }p(\theta )L(D|\theta )d\theta },$$

The denominator is also termed as predictive distribution of data and is usually treated as the normalizing constant to make it a proper density. It may be omitted in evaluating the Bayes estimates but must be retained in comparing the models. The posterior distribution has the potential to balance the information provided by the data and prior distribution. It is of the supreme interest of the Bayesians but often has very complex and complicated nature and hence needs numerical methods to evaluate it.

3.1 The Prior Distributions

It has already been highlighted that main difference between frequentist and Bayesian approaches is to incorporate prior information regarding the model parameters into the analysis. The formal way of doing so it to quantify the initial knowledge of experts in the form of a prior distribution that can adequately fit to the nature of the parameter and the experts’ opinion. The parameters of the prior distribution, known as hyperparameters, are elicited in the light of the subjective expert opinion. So, the prior is leading if selected and elicited carefully and adequately, otherwise it may be misleading.

3.1.1 Uninformative Priors

In the situations when there is lack of knowledge about the model parameters, we choose vague, defuse or flat priors. As the BGGD is based on the set of parameters \(\alpha ,{\theta }_{1}, {\theta }_{2}\), so we assume uninformative uniform priors for the parameters as follows:

$$\alpha \sim \mathrm{uniform}\left({h}_{1},{h}_{2}\right), { \theta }_{1}\sim \mathrm{uniform}\left({h}_{3},{h}_{4}\right), { \theta }_{2}\sim \mathrm{uniform}\left({h}_{5},{h}_{6}\right),$$

where \(\alpha >0, 0<{\theta }_{1}<1, 0<{\theta }_{2}<1,\) and \({h}_{i}>0\), for \(i=\mathrm{1,2},\dots ,6\) are the set of hyperparameters associated with the priors.

3.1.2 Informative Priors

When there is sufficient information available about the model parameters, we assign informative priors to the model parameters in such a way that they can adequately represent the knowledge available for the model parameters being examined. In the present situation, we assign an exponential distribution to \(\alpha\) and independent beta distributions to \({\theta }_{1}\) and \({\theta }_{2}\) given as under

$$\left.\begin{array}{c}p\left(\boldsymbol{\alpha }\right)={{\varvec{h}}}_{7}{{\varvec{e}}}^{-{{\varvec{h}}}_{7}\boldsymbol{\alpha }}, \alpha >0,\\ p\left({{\varvec{\theta}}}_{1}\right)=\frac{{\varvec{\Gamma}}\left({{\varvec{h}}}_{8}+{{\varvec{h}}}_{9}\right)}{{\varvec{\Gamma}}\left({{\varvec{h}}}_{8}\right){\varvec{\Gamma}}\left({{\varvec{h}}}_{9}\right)}{{\varvec{\theta}}}_{1}^{{{\varvec{h}}}_{8}-1}{\left(1-{{\varvec{\theta}}}_{1}\right)}^{{{\varvec{h}}}_{9}-1}, 0<{{\varvec{\theta}}}_{1}<1, \\ p\left({{\varvec{\theta}}}_{2}\right)=\frac{{\varvec{\Gamma}}\left({{\varvec{h}}}_{10}+{{\varvec{h}}}_{11}\right)}{{\varvec{\Gamma}}\left({{\varvec{h}}}_{10}\right){\varvec{\Gamma}}\left({{\varvec{h}}}_{11}\right)}{{\varvec{\theta}}}_{2}^{{{\varvec{h}}}_{10}-1}{\left(1-{{\varvec{\theta}}}_{2}\right)}^{{{\varvec{h}}}_{11}-1}, 0<{{\varvec{\theta}}}_{2}<1,\end{array}\right\}.$$

Here again \({h}_{i}>0\), for \(i=\mathrm{7,8},\dots ,11\) are the set of hyperparameters associated with the priors that should be elicited in the light of expert opinion. It is to notice that the elicitation of hyperparameters is beyond the scope of this study, so we would opt for merely choosing the values of the hyperparameters to be used in the subsequent Bayesian analysis.

3.2 The Posterior Distribution

Being specific to estimation of the parameters of BGGD, let the vector of parameters of interest and the data are denoted by \(\theta =(\alpha ,{\theta }_{1}, {\theta }_{2})\) and \(D=(X,Y)\) respectively. The data distribution is denoted by \(f(D|\theta )\) and prior distribution by \(p(\theta )\). Then using the Bayes rule, the posterior distribution denoted by \(f(\theta |D)\) may be written as

$${\varvec{f}}\left\{{\varvec{\theta}}|{\varvec{D}}\right\}=\frac{{\varvec{p}}\left(\boldsymbol{\alpha }\right){\varvec{p}}\left({{\varvec{\theta}}}_{1}\right)\boldsymbol{ }{\varvec{p}}\left({{\varvec{\theta}}}_{2}\right)\boldsymbol{ }{\varvec{L}}\left\{{\varvec{D}}|{\varvec{\theta}})\right\}}{{\int }_{\boldsymbol{\alpha }=1}^{\boldsymbol{\infty }}{\int }_{{{\varvec{\theta}}}_{1}=0}^{1}{\int }_{{{\varvec{\theta}}}_{2}=0}^{1}{\varvec{p}}\left(\boldsymbol{\alpha }\right){\varvec{p}}\left({{\varvec{\theta}}}_{1}\right)\boldsymbol{ }{\varvec{p}}\left({{\varvec{\theta}}}_{2}\right)\boldsymbol{ }{\varvec{L}}\left\{{\varvec{D}}|{\varvec{\theta}})\right\}{\varvec{d}}\boldsymbol{\alpha }\boldsymbol{ }{\varvec{d}}{{\varvec{\theta}}}_{1}{\varvec{d}}{{\varvec{\theta}}}_{2}}.$$

where, all the notations are already defined. The posterior distribution \(f\left\{\theta |D\right\}\) may also be written as kernel density in proportional form as

$$f\left\{\theta |D\right\}\propto p\left(\alpha \right)p\left({\theta }_{1}\right) p\left({\theta }_{2}\right) L\left\{D|\theta )\right\}.$$

The marginal posterior distributions \(g\left(\alpha |D\right),{ g(\theta }_{1}|D)\) and \({g(\theta }_{2}|D)\) of the parameters \(\alpha ,{\theta }_{1}\) and \({\theta }_{2}\) may be found by integrating out the nuisance parameters from the posterior distribution \(f\left\{\alpha ,{\theta }_{1}, {\theta }_{2}|D\right\}\) as follows:

$$g\left(\alpha |D\right)={\int }_{{\theta }_{1}=0}^{1}{\int }_{{\theta }_{2}=0}^{1}f\left\{\theta |D\right\} d{\theta }_{1}d{\theta }_{2},$$
$$g\left({\theta }_{1}|D\right)={\int }_{\alpha =1}^{\infty }{\int }_{{\theta }_{2}=0}^{1}f\left\{\theta |D\right\}d\alpha d{\theta }_{2},$$

And

$${g(\theta }_{2}|D)={\int }_{\alpha =1}^{\infty }{\int }_{{\theta }_{1}=0}^{1}f\left\{\theta |D\right\}d\alpha d{\theta }_{1}.$$

3.3 Bayes Estimates

To work out the Bayes estimates of the parameters \((\alpha ,{\theta }_{1}, {\theta }_{2})\), we need to specify some loss function. A variety of loss functions is used to derive the Bayes estimates. Under the well-known squared error loss function, the Bayes estimates \(\widehat{\alpha }\), \({\widehat{\theta }}_{1}\) and \({\widehat{\theta }}_{2}\) are the arithmetic means of their marginal posterior distributions, and are evaluated as

$$\begin{array}{l}\widehat{\alpha }=E\left(\alpha |D\right)={\int }_{0}^{\infty }\alpha .g\left(\alpha |D\right)d\alpha ,\\ {\widehat{\theta }}_{1}=E\left({\theta }_{1}|D\right)={\int }_{0}^{1}{\theta }_{1}.g\left({\theta }_{1}|D\right)d{\theta }_{1},\\ {\widehat{\theta }}_{2}=E\left({\theta }_{2}|D\right)={\int }_{0}^{1}{\theta }_{2}.g\left({\theta }_{2}|D\right)d{\theta }_{2}.\end{array}$$

The marginal posterior distributions are generally of complicated and complex forms and hence need numerical methods to evaluate them. Markov Chains Monte Carlo (MCMC) is the most frequently used numerical method to be used in Bayesian inference. So we also proceed with MCMC with the WinBUGS package to find the posterior summaries of the parameters of interest.

3.4 The MCMC Method

The MCMC method selects random sample from the probability distribution according to random process termed as Markov Chain where every new step of the process depends on the current state and is completely independent of previous states. The MCMC methods can be implemented using any of the standard softwares like R, Python, etc., but the most specific software being used for the Bayesian analysis is Windows based Bayesian inference using Gibbs Sampling (WinBUGS). We have implemented WinBUGS using the following scheme.

  1. (i)

    Define model based on probability mass function (11) of BGGD and then click Check Model menu in WinBUGS software.

  2. (ii)

    Load data given in Appendix.

  3. (iii)

    Specify the nodes and run the codes for 10,000 times following a burn-in of 5000 iterations.

WinBUGS codes used to analyze the data are given in Appendix A2.

3.5 Bayesian Results Under Uniform Non-informative Priors

Here we have assumed uniform priors for all the set of parameters under study as defined in section 0, and the resulting Bayes estimates, standard errors, medians and 95% highest density regions along with the values of the hyperparameters are presented in Table 3.

Table 3 Summary of Bayes estimates under the uninformative priors

It is observed that the elicited hyperparameters have high impact on the Bayes estimates. The initial values have no effect no significant effects on the posterior estimates if the true convergence is achieved.

3.6 Convergence Diagnostics

Sequential plots are used in WinBUGS to assess difficulties in MCMC and realization of the model. In MCMC simulations, the values of the parameters of interest are sampled from the posterior distributions. So the estimates will be convergent if the posterior distributions are stationary in nature and the Markov chain will seem to be mixing well. To check convergence, different graphical representations of parametric behavior are used in MCMC implemented through WinBUGS.

3.6.1 History Time Series Plots

The time series history plots of parameters are presented in Fig. 3. Here, Markov chain seems to be mixing well enough and is being sampled from the stationary distribution. The plots are in the form of horizontal bands with no long upward or downward trends. It indicates that the Markov chain has converged.

Fig. 3
figure 3

Time series plots of the parameters \(\boldsymbol{\alpha },{{\varvec{\theta}}}_{1},\boldsymbol{ }{{\varvec{\theta}}}_{2}\)

3.6.2 Dynamic Traces and Autocorrelation Function

The traces of the parameters and autocorrelation graphs of \(\mathrm{\alpha },{\uptheta }_{1}\) and \({\uptheta }_{2}\) are presented in Fig. 4. These graphs also confirm convergence.

Fig. 4
figure 4

Dynamic traces and autocorrelation function

3.7 Bayes Estimates Using Informative Exponential-Beta Priors

The ideal characteristic of Bayesian analysis is that it can accommodate the prior information shared by the field experts about the unknown parameters in the analysis. It is important to notice that the experts may not be the experts of statistics and hence cannot translate their expertise in statistical terms. So it is the sole responsibility of the statisticians to formally utilize the experts’ prior information to elicit the values of the hyperparameters of the prior density which are to be subsequently used in the Bayesian analysis. Elicitation of hyperparameters is beyond the scope of our study. However, an exhaustive discussion on the elicitation of hyperparameters may be found in [51]. We have chosen the values of the hyperparameters with drastic changes and a summary of the Bayes estimates against all values are presented in Table 4.

Table 4 Summary of Bayes estimates under the informative priors

It is observed that the elicited hyperparameters have high impact on the Bayes estimates. The initial values have no significant effect on the posterior estimates if the convergence in Markov chain is achieved. However, the change in the initial values causes a slight change in the parametric estimates.

3.8 Possible Predictive Inference

After finding out the Bayes estimates, it is necessary to evaluate them based on the predictive inference. As we have the data having 100 observations, so predicted sample of size 100 observations is generated based on the Bayes estimates obtained through MCMC based analysis. The predicted data along with their summaries are presented in Tables 5, 6.

Table 5 The predicted expected data of Counts of surface (X) and interior faults (Y) in 100 lenses
Table 6 Summary of the predicted dataset based on Bayes estimates

Obviously, there exist some differences in the predicted estimates as compared to those of the original observed dataset. Definitely, these changes may be due to different Bayes estimates that are evaluated after accommodating the prior information about the model parameters via the hyperparameters.

4 Comparison of the Frequentist and Bayesian Approaches

An important aspect of this study is to compare the Bayes estimation method with the classical ML estimation method. We have accomplished this by using different model selection criteria presented as under.

4.1 Model Selection Criteria

The classical and Bayesian methods of estimation are compared using the model selection criteria, i.e., ML, AIC, AICC, BIC, and HQC, which are defined by

$$ML = -2 ln\left[L\left(\alpha ,{\theta }_{1},{\theta }_{2}\right)\right]$$
$$AIC = -2 ln\left[L\left(\alpha ,{\theta }_{1},{\theta }_{2}\right)\right]+2k$$
$$AICC = -2 ln[L\left(\alpha ,{\theta }_{1},{\theta }_{2}\right)] + 2k(n/(n - k - 1)$$
$$BIC = -2 ln[L\left(\alpha ,{\theta }_{1},{\theta }_{2}\right)] + k ln(n)$$

and

$$HQC =-2 ln\left[L\left(\alpha ,{\theta }_{1},{\theta }_{2}\right)\right]+ 2k ln\left[ln\left(n\right)\right].$$

Here \(ln[L\left(\alpha ,{\theta }_{1},{\theta }_{2}\right)]\) denotes the log-likelihood, \(n\) denotes the number of observations and k denotes the number of parameters of the distribution under consideration. The smaller the values of these criteria are, the better the fit is. For more discussion on these criteria, see [52, 53]. The Maximum likelihood estimates, uninformative Bayes estimates and informative Bayes estimates along with their associated values of the model selection criteria are reported in Table 7.

Table 7 Model selection criteria for classical and Bayes estimation methods for the data of the example

Here we witness that the values of model selection criteria produced by Bayes method are less than those produced by the ML estimation method, which declare the Bayesian method more appropriate. Definitely, it is due to the distinct characteristic of the Bayesian methods that they incorporate the prior information related to the model parameters. However, it is pertinent to note that these results are sensitive to the selection of values of the hyperparameters. Hence a careful elicitation of the hyperparameters is demanding and earnest need of using the Bayesian methods. Carefully selected or elicited values of the hyperparameters may lead to even better estimates.

5 Summary and Conclusions

The bivariate generalized geometric distribution is believed to model reliability count datasets emerging from diverse phenomena. To understand the data generating phenomena, it is necessary to estimate the model parameters of the BGGD. To accomplish this, statistical theory offers two competing approaches, namely the frequentist and Bayesian approaches. The former approach is based on current data only; whereas, the later one utilizes prior information in addition to the current dataset produced by system or phenomenon. This study offers a comparison between the frequentist and Bayesian estimation approaches. To elaborate the frequentist approach, different descriptive measures and the maximum likelihood estimates are evaluated. The Bayesian estimation approach has also been illustrated by using uninformative and informative priors. We have worked out the posterior summaries of the parameters comprising posterior means, standard errors, medians, credible intervals and predictions for both types of the priors using the MCMC simulation technique. Correlated bivariate count dataset on counts of surface and interior faults is used for the illustration purpose. Comparison of the two estimation methods has been made using different model selection criteria. It is proved by working out all the estimates that all the model selection criteria including ML, AIC, AICC, BIC, and HQC have proved that the Bayesian approach outperforms the competing ML approach across the board. It has also been observed that the results may coincide if the information contained in the prior distribution and the datasets agree. However, the improved prior information may improve the results. As a future study, it is recommended that the Bayesian analysis of datasets may be done by using the formally elicited hyperparameters of the priors instead of values chosen by the experimenter.