1 Introduction

The smoothed Dirichlet distribution has the same parametric form as the Dirichlet distribution that forces the neighboring cells to be closer to each other by adding a penalty function. Many researchers currently force the neighboring cells to be closer to each other by using different parameter values (\(\varvec{\alpha }\)) for the Dirichlet distribution. There is a big necessity for a suitable distribution for the above issue, and our proposed distribution fills that gap, allowing for efficient modeling of data.

A new parametric family of distributions was introduced and developed with some extensions to the Dirichlet distribution to suit different purposes [11]. introduced generalized Dirichlet distribution has a more general covariance structure than Dirichlet distribution. The grouped Dirichlet distribution (GDD) is a multivariate generalization of the Dirichlet distribution, which uses to model incomplete categorical data; it was first described by [5]. Also [6], introduced nested Dirichlet distribution to model incomplete categorical data. The spherical-Dirichlet distribution was introduced by [1], which is obtained by transforming the Dirichlet distribution on the simplex to the corresponding space on the hyper-sphere.

[4] proposed a different version of the smoothed Dirichlet distribution in the context of smoothed language model representation of documents. This proposed smoothed Dirichlet distribution is the same as the Dirichlet distribution except that the domain of the distribution is restricted to include only smoothed language models [3]. Used the above-mentioned smoothed Dirichlet distribution for image categorization using a smoothed simplex. The above proposed smoothed Dirichlet distribution uses smoothed domain and mainly focuses on multimedia data, whereas our proposed smoothed Dirichlet distribution can apply to any context of data [7]. Proposed the modified Dirichlet distribution that simultaneously performs smoothing by making the parameters (\(\varvec{\alpha }\)) to be negative.

2 Basic Properties

In this section, we introduce the proposed smoothed Dirichlet distribution and its basic properties.

2.1 Probability Density Function

The smoothed Dirichlet distribution was suggested by [2] as a variation to the Dirichlet distribution that forces the successive cell categories to be closer to each other. Let \({\textbf{x}}=(x_{1}, x_{2}, \ldots , x_{K})^{t}\) be a vector with K components where \(x_{j} \ge 0\) for \(j=1, 2, \ldots , K\) and \(\sum _{j=1}^K x_{j} = 1\). Also, let \(\varvec{\alpha } = (\alpha _1, \alpha _2, \ldots , \alpha _K)^{t}\), where \(\alpha _j >0\) for each j and \(\delta >0\). The smoothed Dirichlet (SD) probability density function is

$$\begin{aligned} f({\textbf{x}}|\varvec{\alpha }, \delta ) = C(\varvec{\alpha }, \delta ) \prod _{j=1}^K x_{j}^{\alpha _j -1} \text {exp}(-\delta \Delta ({\textbf{x}})), \end{aligned}$$

where \(\text {exp}(-\delta \Delta ({\textbf{x}}))\) is a penalty function that forces successive \(x_{j}\)’s to be close to each other with higher probability than under the standard Dirichlet distribution satisfying \(\sum _{j=1}^K x_{j} = 1\). \(\delta\) dictates the extent to which the neighbouring \(x_{j}\)s to be close. For instance, a large value of \(\delta\) forces realizations of \({\textbf{x}}\) to have small values of \(\Delta ({\textbf{x}})\). The constant \(C(\varvec{\alpha }, \delta )\) can be written as

$$\begin{aligned} C(\varvec{\alpha }, \delta )&= \dfrac{D(\varvec{\alpha })}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{x}}))]} \text { with } {\textbf{x}}\sim \text {Dirichlet}(\varvec{\alpha }) \text { and } D(\varvec{\alpha }) = \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)}. \end{aligned}$$
(1)

From now on, we refer to smoothed Dirichlet distribution (SDD) as \({\textbf{X}} = (X_1, X_2, \ldots , X_K)^{t}\) \(\sim \text {SDD} (\varvec{\alpha }, \delta , \Delta )\). Here \(\Delta = \Delta ({\textbf{X}})\) and its importance is discussed later.

2.2 Moments

In this sub-section, we compute the first, second, and \(n{\text {th}}\) order moments, variance, and covariance of the smoothed Dirichlet distribution. First, we compute the mean of the smoothed Dirichlet distribution is

$$\begin{aligned} E(X_i)&= C(\varvec{\alpha }, \delta ) \int \ldots \int x_{i}^{\alpha _i} \prod _{j \ne i} x_{j}^{\alpha _j -1} \text {exp}(-\delta \Delta ({\textbf{x}})) dx_1 dx_2 \ldots dx_K. \end{aligned}$$

Let \(\gamma _{il} = 0, \forall l, l \ne i\), and \(\gamma _{ii} = 1\). Here \(\varvec{\gamma }_i = (\gamma _{i1}, \gamma _{i2}, \ldots , \gamma _{iK})^{t}\). Then

$$\begin{aligned} E(X_i)&= \dfrac{C(\varvec{\alpha }, \delta )}{C(\varvec{\alpha +\gamma _{i}}, \delta )} \int \ldots \int C(\varvec{\alpha +\gamma _{i}}, \delta ) \prod _{j=1}^K x_{j}^{\alpha _{j} + \gamma _{ij} -1} \text {exp}(-\delta \Delta ({\textbf{x}})) dx_1 dx_2 \ldots dx_K,\\&= \dfrac{C(\varvec{\alpha }, \delta )}{C(\varvec{\alpha +\gamma _{i}}, \delta )}, \end{aligned}$$

where \(C(\varvec{\alpha +\gamma _{i}}, \delta ) = \dfrac{1}{\int \ldots \int \prod _{j=1}^K x_{j}^{\alpha _{j} + \gamma _{ij} - 1} \text {exp}(-\delta \Delta ({\textbf{x}})) dx_1 dx_2 \ldots dx_K}\).

Using (1),

$$\begin{aligned} E(X_i)&= \dfrac{\dfrac{D(\varvec{\alpha })}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}}{\dfrac{D(\varvec{\alpha +\gamma _{i}})}{E_{\varvec{\alpha +\gamma _{i}}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}}, \\&= \dfrac{D(\varvec{\alpha })}{D(\varvec{\alpha +\gamma _{i}})} \times \dfrac{E_{\varvec{\alpha +\gamma _{i}}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}. \end{aligned}$$

Here

$$\begin{aligned} D(\varvec{\alpha +\gamma _{i}})&= \dfrac{\Gamma (\sum _{j=1}^K (\alpha _{j} + \gamma _{ji}))}{\prod _{j=1}^K \Gamma (\alpha _{j} + \gamma _{ij})},\nonumber \\&= \dfrac{\Gamma (\sum _{j=1}^K \alpha _{j}+1)}{\prod _{j \ne i}^K \Gamma (\alpha _{j})\Gamma (\alpha _{i}+1)},\nonumber \\&= \dfrac{\Gamma (\sum _{j=1}^K \alpha _{j})\sum _{j=1}^K \alpha _{j}}{\prod _{j \ne i}^K \Gamma (\alpha _{j})\Gamma (\alpha _{i})\alpha _{i}},\nonumber \\&= \dfrac{\sum _{j=1}^K \alpha _{j}}{\alpha _{i}} \times \dfrac{\Gamma (\sum _{j=1}^K \alpha _{j})}{\prod _{j=1}^K \Gamma (\alpha _{j})},\nonumber \\&= \dfrac{\sum _{j=1}^K \alpha _{j}}{\alpha _{i}} D(\varvec{\alpha }). \end{aligned}$$
(2)

Substituting (2),

$$\begin{aligned} E(X_i)&= \dfrac{\alpha _{i}}{\sum _{j=1}^K \alpha _{j}} \times \dfrac{E_{\varvec{\alpha +\gamma _{i}}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}. \end{aligned}$$
(3)

Similarly, we compute the expected value for \(X_{i}^n\) as

$$\begin{aligned} E(X_{i}^n)&= \int \ldots \int x_{i}^n f({\textbf{x}}|\varvec{\alpha }, \delta ) dx_1 dx_2 \ldots dx_K,\\&= \int \ldots \int x_{i}^n C(\varvec{\alpha }, \delta ) \prod _{j=1}^K x_{j}^{\alpha _j -1} \text {exp}(-\delta \Delta ({\textbf{x}})) dx_1 dx_2 \ldots dx_K,\\&= C(\varvec{\alpha }, \delta ) \int \ldots \int x_{i}^{\alpha _i+n-1} \prod _{j \ne i} x_{j}^{\alpha _j -1} \text {exp}(-\delta \Delta ({\textbf{x}})) dx_1 dx_2 \ldots dx_K. \end{aligned}$$

Then

$$\begin{aligned} E(X_{i}^n)&= \dfrac{C(\varvec{\alpha }, \delta )}{C(\varvec{\alpha }+n\varvec{\gamma }_{i}, \delta )} \int \ldots \int C(\varvec{\alpha }+n\varvec{\gamma }_{i}, \delta ) \prod _{j=1}^K x_{j}^{\alpha _{j} +n\gamma _{ij} -1} \text {exp}(-\delta \Delta ({\textbf{p}})) dx_1 dx_2 \ldots dx_K,\\&= \dfrac{C(\varvec{\alpha }, \delta )}{C(\varvec{\alpha }+n\varvec{\gamma }_{i}, \delta )}, \end{aligned}$$

where \(C(\varvec{\alpha }+n\varvec{\gamma }_{i}, \delta ) = \dfrac{1}{\int \ldots \int \prod _{j=1}^K x_{j}^{\alpha _{j} + n\gamma _{ij} -1} \text {exp}(-\delta \Delta ({\textbf{x}})) dx_1 dx_2 \ldots dx_K}\).

Using (1),

$$\begin{aligned} E(X_{i}^n)&= \dfrac{\dfrac{D(\varvec{\alpha })}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}}{\dfrac{D(\varvec{\alpha }+n\varvec{\gamma }_{i})}{E_{\varvec{\alpha }+n\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}},\\&= \dfrac{D(\varvec{\alpha })}{D(\varvec{\alpha }+n\varvec{\gamma }_{i})} \times \dfrac{E_{\varvec{\alpha }+n\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}. \end{aligned}$$

Here

$$\begin{aligned} D(\varvec{\alpha }+n\varvec{\gamma }_{i})&= \dfrac{\Gamma (\sum _{j=1}^K (\alpha _{j} + n\gamma _{ij}))}{\prod _{j=1}^K \Gamma (\alpha _{j} + n\gamma _{ij})},\nonumber \\&= \dfrac{\Gamma (\sum _{j=1}^K \alpha _{j}+n)}{\prod _{j \ne i}^K \Gamma (\alpha _{i})\Gamma (\alpha _{i}+n)},\nonumber \\&= \dfrac{\Gamma (\sum _{j=1}^K \alpha _{j}+n)}{\prod _{j \ne i}^K \Gamma (\alpha _{j})\Gamma (\alpha _{i}+n)} \times \dfrac{\Gamma (\alpha _{i})}{\Gamma (\alpha _{i})} \times \dfrac{\Gamma (\sum _{j=1}^K \alpha _{j})}{\Gamma (\sum _{j=1}^K \alpha _{j})},\nonumber \\&= \dfrac{\Gamma (\sum _{j=1}^K \alpha _{j}+n)}{\Gamma (\sum _{j=1}^K \alpha _{j})} \times \dfrac{\Gamma (\alpha _{i})}{\Gamma (\alpha _{i} +n)} \times \dfrac{\Gamma (\sum _{j=1}^K \alpha _{j})}{\prod _{j=1}^K \Gamma (\alpha _{j})},\nonumber \\&= \dfrac{\Gamma (\sum _{j=1}^K \alpha _{j}+n)}{\Gamma (\sum _{j=1}^K \alpha _{j})} \times \dfrac{\Gamma (\alpha _{i})}{\Gamma (\alpha _{i} +n)} \times D(\varvec{\alpha }). \end{aligned}$$
(4)

Substituting (4),

$$\begin{aligned} E(X_{i}^n)&= \dfrac{\Gamma (\sum _{j=1}^K \alpha _{j})}{\Gamma (\sum _{j=1}^K \alpha _{j}+n)} \times \dfrac{\Gamma (\alpha _{i} +n)}{\Gamma (\alpha _{i})} \times \dfrac{E_{\varvec{\alpha }+n\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}. \end{aligned}$$
(5)

More generally, the product moments of smoothed Dirichlet distribution random variables can be expressed as

$$\begin{aligned} E\left( \prod _{i=1}^K X_{i}^{n_i}\right)&= \dfrac{\Gamma (\sum _{j=1}^K \alpha _{j})}{\Gamma (\sum _{j=1}^K (\alpha _{j}+n_j))} \times \left[ \prod _{i=1}^K\dfrac{\Gamma (\alpha _{i} +n_i)}{\Gamma (\alpha _{i})} \right] \times \dfrac{E_{\varvec{\alpha }+{\varvec{n}}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}, \end{aligned}$$
(6)

where \(\varvec{\alpha }+{\varvec{n}} = (\alpha _1+n_1, \alpha _2+n_2, \ldots , \alpha _K+n_K)^{t}\). We can easily compute the second moment by plugging \(n=2\) to (5),

$$\begin{aligned} E(X_{i}^2)&= \dfrac{\Gamma (\sum _{j=1}^K \alpha _{j})}{\Gamma (\sum _{j=1}^K \alpha _{j}+2)} \times \dfrac{\Gamma (\alpha _{i} +2)}{\Gamma (\alpha _{i})} \times \dfrac{E_{\varvec{\alpha }+2\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}, \nonumber \\&=\dfrac{\alpha _{i}}{\sum _{j=1}^K \alpha _{j}} \times \dfrac{\alpha _{i}+1}{\sum _{j=1}^K \alpha _{j}+1} \times \dfrac{E_{\varvec{\alpha }+2\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}. \end{aligned}$$
(7)

Then the variance of the smoothed Dirichlet distribution is

$$\begin{aligned} \text {Var} (X_i)&= E(X_{i}^2) - (E(X_{i}))^2, \nonumber \\&= \dfrac{\alpha _{i}}{\sum _{j=1}^K \alpha _{j}} \times \dfrac{\alpha _{i}+1}{\sum _{j=1}^K \alpha _{j}+1} \times \dfrac{E_{\varvec{\alpha }+2\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]} \nonumber \\&\quad - \left( \dfrac{\alpha _{i}}{\sum _{j=1}^K \alpha _{j}} \times \dfrac{E_{\varvec{\alpha }+\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]} \right) ^2, \nonumber \\&= \dfrac{\alpha _{i}}{\left( \sum _{j=1}^K \alpha _{j}\right) ^2 \left( \sum _{j=1}^K \alpha _{j} +1\right) E_{\varvec{\alpha }}^2 [\text {exp}(-\delta \Delta ({\textbf{X}}))]} \nonumber \\&\times \left( (\alpha _i +1)\left( \sum _{j=1}^K \alpha _{j}\right) E_{\varvec{\alpha }+2\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))] E_{\varvec{\alpha }} [\text {exp}(-\delta \Delta ({\textbf{X}}))] \right. \nonumber \\&\quad \left. - \alpha _i \left( \sum _{j=1}^K \alpha _{j}+1\right) E_{\varvec{\alpha }+\varvec{\gamma }_{i}}^2 [\text {exp}(-\delta \Delta ({\textbf{X}}))]\right) . \end{aligned}$$
(8)

Next we compute the covariance of \(X_i\) and \(X_l\) (\(\text {Cov} (X_i, X_l)\)), \(i, l =1, 2, \ldots , K \text { and } i \ne l\). Using (6),

$$\begin{aligned} E (X_i X_l)&= \dfrac{\Gamma (\sum _{j=1}^K \alpha _{j})}{\Gamma (\sum _{j=1}^K \alpha _{j}+2)} \times \left[ \dfrac{\Gamma (\alpha _{i}+1)}{\Gamma (\alpha _{i})} \right] \times \left[ \dfrac{\Gamma (\alpha _{l}+1)}{\Gamma (\alpha _{l})} \right] \times \dfrac{E_{\varvec{\alpha }+\varvec{\gamma }_i + \varvec{\gamma }_l}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}\\&= \dfrac{\alpha _i \alpha _l}{\sum _{j=1}^K \alpha _{j} \left( \sum _{j=1}^K \alpha _{j}+1\right) } \times \dfrac{E_{\varvec{\alpha }+\varvec{\gamma }_i + \varvec{\gamma }_l}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}. \end{aligned}$$

Then

$$\begin{aligned} \text {Cov} (X_i, X_l)&= E (X_i X_l) - E (X_i) E(X_l)\\&= \dfrac{\alpha _i \alpha _l}{\sum _{j=1}^K \alpha _{j} \left( \sum _{j=1}^K \alpha _{j}+1\right) } \times \dfrac{E_{\varvec{\alpha }+\varvec{\gamma }_i + \varvec{\gamma }_l}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]} \\&- \dfrac{\alpha _{i}}{\sum _{j=1}^K \alpha _{j}} \times \dfrac{E_{\varvec{\alpha }+\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]} \times \dfrac{\alpha _{l}}{\sum _{j=1}^K \alpha _{j}} \times \dfrac{E_{\varvec{\alpha }+\varvec{\gamma }_{l}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}\\&= \dfrac{\alpha _{i}\alpha _{l}}{\left( \sum _{j=1}^K \alpha _{j}\right) ^2 \left( \sum _{j=1}^K \alpha _{j} +1\right) E_{\varvec{\alpha }}^2 [\text {exp}(-\delta \Delta ({\textbf{X}}))]}\\&\times \left( \scriptstyle \left( \sum _{j=1}^K \alpha _{j}\right) E_{\varvec{\alpha }+\varvec{\gamma }_{i}+\varvec{\gamma }_{l}}[\text {exp}(-\delta \Delta ({\textbf{X}}))] E_{\varvec{\alpha }} [\text {exp}(-\delta \Delta ({\textbf{X}}))] \right. \\&\quad \left. - \left( \sum _{j=1}^K \alpha _{j}+1\right) E_{\varvec{\alpha }+\varvec{\gamma }_{i}} [\text {exp}(-\delta \Delta ({\textbf{X}}))] E_{\varvec{\alpha }+\varvec{\gamma }_{l}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]\right) . \end{aligned}$$

2.3 Sample Generation

We use the acceptance-rejection method to generate a random sample from the smoothed Dirichlet distribution with parameters \(\varvec{\alpha }, \delta\), and \(\Delta\). In the acceptance-rejection sampling method, we generate sampling values from a target distribution \({\varvec{X}}\) by using a proposal distribution \({\varvec{Y}}\). We generate the values from \({\varvec{Y}}\) instead of \({\varvec{X}}\) and accept the values of \({\varvec{Y}}\) if \(f({{\varvec{x}}}) \le C g({\varvec{y}})\) where f and g are probability density functions of \({\varvec{X}}\) and \({\varvec{Y}}\), and C is a constant. We consider the Dirichlet distribution as the target distribution. Here are the steps of the acceptance-rejection algorithm.

  1. 1.

    Generate a set of random samples \({\varvec{y}}_1, {\varvec{y}}_2, \ldots , {\varvec{y}}_M\) from the Dirichlet distribution with given \(\varvec{\alpha }\).

  2. 2.

    Compute \(E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{y}}))]\) using all the generated random samples.

  3. 3.

    Generate a random number u from Uniform(0,1) and set \(i=1\).

  4. 4.

    If \(u < C= \dfrac{\text {exp}(-\delta \Delta ({\textbf{y}}_i)) }{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{y}}_i))]}\) accept \({\varvec{y}}_i\) and otherwise reject \({\varvec{y}}_i\), set \(i=i+1\) and repeat the steps again until \(i=M\).

2.4 Role of the Penalty Term (\(\text {Exp}(-\delta \Delta ({\textbf{x}}))\))

As we mentioned before, the penalty function which forces successive \(x_{j}\)’s to be close to each other with higher probability than under the standard Dirichlet distribution satisfying \(\sum _{j=1}^K x_{j} = 1\). The roles of \(\delta\) and \(\Delta\) are very important when constructing the smoothed Dirichlet distribution. The penalty parameter (\(\delta\)) dictates the extent to which cell probabilities of neighboring categories have to be similar. Some examples of \(\Delta\) functions that we consider are

$$\begin{aligned} \Delta _1 = \sum _{j=1}^{K-1} (x_{j+1} - x_j)^2, \Delta _2 = \sum _{j=1}^{K-1} (\text {log } x_{j+1} - \text {log } x_j )^2, \text { and }\Delta _3 = \sum _{j=2}^{K-1} (x_{j+1} - 2x_j + x_{j-1})^2. \end{aligned}$$

From now on, for our analysis, we use \(\Delta = \sum _{j=1}^{K-1} (x_{j+1} - x_{j})^2\). When \(x_1 = x_2 = \ldots = x_K = \frac{1}{K}\) then

$$\begin{aligned} \Delta = \left( \frac{1}{K} - \frac{1}{K}\right) ^2 +\left( \frac{1}{K} - \frac{1}{K}\right) ^2 + \ldots + \left( \frac{1}{K} - \frac{1}{K}\right) ^2 = 0. \end{aligned}$$

This is the minimum value of \(\Delta = \sum _{j=1}^{K-1} (x_{j+1} - x_{j})^2\). Also, the maximum value of \(\Delta\) is attained when one of the \(x_j=1\), \(2 \le j \le K-1\). Then the maximum value of \(\Delta\) is

$$\begin{aligned} \Delta&= \sum _{j=1}^{K-1} (x_{j+1} - x_{j})^2,\\&\le \sum _{j=1}^{K-1} x_{j+1}^2 + \sum _{j=1}^{K-1} x_{j}^2,\\&\le \sum _{j=1}^{K-1} x_{j+1} + \sum _{j=1}^{K-1} x_{j} \le 2. \end{aligned}$$

Fig. 1 shows the effect of different \(\delta\) and \(\varvec{\alpha }\) for the same penalty function \(\Delta = \sum _{j=1}^{K-1} (x_{j+1} - x_{j})^2\) after simulating data from the smoothed Dirichlet distribution with \(K=3\). The plots in each row of this Fig. 1 are for the same \(\varvec{\alpha }\), and the plots in each column are for the same \(\delta\) value. We consider \((1, 1, 1)^{t}\), \((5, 5, 5)^{t}\), \((10, 10, 10)^{t}\), \((10, 5, 1)^{t}\), \((1, 5, 10)^{t}\) and \((1, 10, 1)^{t}\) as \(\varvec{\alpha }\) for the plots in each row of Fig. 1. The color scale runs from yellow (lowest value) to red (highest value).

Fig. 1
figure 1

The effect of \(\delta\) for different \(\varvec{\alpha }\); row 1 with \((1, 1, 1)^{t}\), row 2 with \((5, 5, 5)^{t}\), row 3 with \((10, 10, 10)^{t}\), row 4 with \((10, 5, 1)^{t}\), row 5 with \((1, 5, 10)^{t}\) and row 6 with \((1, 10, 1)^{t}\)

As \(\alpha _j\) increases, the distribution becomes more tightly concentrated around the center of the simplex for a given \(\delta\) value. Also, for a given \(\varvec{\alpha }\), the distribution becomes more tightly concentrated around the center of the simplex as \(\delta\) increases. If \(\alpha _j\) values are different, then we will get an asymmetric (non-central) distribution with higher values for the highest \(\alpha _j\). Also, the highest penalty occurs at \((x_1 = 0, x_2 = 1, x_3 = 0)^{t}\) for this penalty function.

Fig. 2
figure 2

The effect of \(\delta\) and the penalty function for same \(\varvec{\alpha } = (5, 5, 5)^{t}\); row 1 with \(\Delta _1\), row 2 with \(\Delta _2\), and row 3 with \(\Delta _3\)

Figure 2 shows the effect of different \(\delta\) and penalty functions for \(\varvec{\alpha } = (5, 5, 5)^{t}\) after simulating data from the smoothed Dirichlet distribution with \(K=3\). The first row of this Fig. 2 is for the penalty function \(\Delta _1= \sum _{j=1}^{K-1} (x_{j+1} - x_{j})^2\). The second and third rows of this Fig. 2 are for the penalty functions \(\Delta _2 = \sum _{j=1}^{K-1} (\text {log } x_{j+1} - \text {log } x_j )^2\) and \(\Delta _3 = \sum _{j=2}^{K-1} (x_{j+1} - 2x_j + x_{j-1})^2\) respectively. It is clear that the distribution becomes more tightly concentrated around the center of the simplex for the penalty function \(\Delta _1 = \sum _{j=1}^{K-1} (\text {log } x_{j+1} - \text {log } x_j )^2\) than other penalty functions. Also, the highest penalty occurs at \((x_1 = 0, x_2 = 1, x_3 = 0)^{t}\) for all of these penalty functions.

2.5 The Upper and Lower Limits for \(E(X_i)\) and \(\text {Var} (X_i)\)

Next, we compute the upper and lower limits for \(E(X_i)\) and \(\text {Var} (X_i)\). The upper and lower limits help to find out the possible range for \(E(X_i)\) and \(\text {Var} (X_i)\). First, we compute the upper and lower limits for \(E(X_i)\). We know that the value of \(\Delta ({\textbf{x}}) = \sum _{j=1}^{K-1} (x_{j+1} - x_{j})^2\) is in [0, 2]. Then

$$\begin{aligned} 0 \le \Delta ({\textbf{x}}) \le 2\\ -2\delta \le -\delta \Delta ({\textbf{x}}) \le 0\\ \text {exp} (-2\delta ) \le \text {exp} (-\delta \Delta ({\textbf{x}}) ) \le 1. \end{aligned}$$

Now we can compute

$$\begin{aligned}&\int \ldots \int \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} \text {exp} (-2\delta ) dx_1 dx_2 \ldots dx_K \\&\qquad \le \int \ldots \int \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} \text {exp} (-\delta \Delta ({\textbf{x}}) ) dx_1 dx_2 \ldots dx_K\\&\quad \le \int \ldots \int \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} dx_1 dx_2 \ldots dx_K,\\&\quad \text {exp} (-2\delta ) \int \ldots \int \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} dx_1 dx_2 \ldots dx_K\\&\qquad \le \int \ldots \int \text {exp} (-\delta \Delta ({\textbf{x}}) ) \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} dx_1 dx_2 \ldots dx_K\\&\quad \le \int \ldots \int \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} dx_1 dx_2 \ldots dx_K. \end{aligned}$$

Then we will get

$$\begin{aligned} \text {exp} (-2\delta ) \le E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))] \le 1. \end{aligned}$$
(9)

Similarly,

$$\begin{aligned}&\int \ldots \int x_i \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} \text {exp} (-2\delta ) dx_1 dx_2 \ldots dx_K \\&\qquad \le \int \ldots \int x_i \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} \text {exp} (-\delta \Delta ({\textbf{x}}) ) dx_1 dx_2 \ldots dx_K\\&\quad \le \int \ldots \int x_i \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} dx_1 dx_2 \ldots dx_K,\\&\quad \text {exp} (-2\delta ) \int \ldots \int x_i \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} dx_1 dx_2 \ldots dx_K \\&\quad \le \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \dfrac{\prod _{j=1}^K \Gamma (\alpha _{j} + \gamma _{ij})}{\Gamma (\sum _{j=1}^K \alpha _{j} + \gamma _{ij})}\int \ldots \int \text {exp} (-\delta \Delta ({\textbf{x}}) )\\&\qquad \dfrac{\Gamma (\sum _{j=1}^K \alpha _{j} + \gamma _{ij})}{\prod _{j=1}^K \Gamma (\alpha _{j} + \gamma _{ij})} \prod _{j=1}^K x_{j}^{\alpha _{j} + \gamma _{ij} -1} dx_1 dx_2 \ldots dx_K\\&\quad \le \int \ldots \int x_i \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} dx_1 dx_2 \ldots dx_K, \end{aligned}$$

Then

$$\begin{aligned} \text {exp} (-2\delta ) E_{\varvec{\alpha }} (X_i) \le \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \dfrac{\prod _{j \ne i} \Gamma (\alpha _{j}) \Gamma (\alpha _{i}+1)}{\Gamma (\sum _{j=1}^K \alpha _{j}+1)} E_{\varvec{\alpha }+\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))] \le E_{\varvec{\alpha }} (X_i), \end{aligned}$$
$$\begin{aligned} \text {exp} (-2\delta ) E_{\varvec{\alpha }} (X_i) \le \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j} E_{\varvec{\alpha }+\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))] \le E_{\varvec{\alpha }} (X_i). \end{aligned}$$
(10)

By using 9 and 10,

$$\begin{aligned} \text {exp} (-2\delta ) E_{\varvec{\alpha }} (X_i) \le \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j} \right) \dfrac{E_{\varvec{\alpha }+\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]} \le \dfrac{1}{\text {exp} (-2\delta )}E_{\varvec{\alpha }} (X_i). \end{aligned}$$
(11)

We know that \(E_{\varvec{\alpha }} (X_i) = \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j}\). Then,

$$\begin{aligned} \text {exp} (-2\delta ) \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j} \right) \le \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j} \right) \dfrac{E_{\varvec{\alpha }+\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]} \le \dfrac{1}{\text {exp} (-2\delta )}\left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j} \right) , \end{aligned}$$
$$\begin{aligned} \text {exp} (-2\delta ) \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j} \right) \le E (X_i) \le \text {exp} (2\delta )\left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j} \right) . \end{aligned}$$
(12)

The lower limit of \(E (X_i)\) is \(\text {exp} (-2\delta ) \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j} \right)\) and the upper limit of \(E (X_i)\) is \(\text {exp} (2\delta )\) \(\left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j} \right)\). Note that when \(\delta \rightarrow 0\), \(E (X_i) \rightarrow \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j}\) which is the expected value of the Dirichlet distribution. Next, we compute the upper and lower limits that help to find out the possible range for \(E(X_i)\) and \(\text {Var} (X_i)\). We know that

$$\begin{aligned} \text {exp}(x) = \sum _{k=1}^{\infty } \dfrac{x^k}{k!}. \end{aligned}$$

This is also called the Maclaurin series. Then,

$$\begin{aligned} \text {exp}(-2\delta )&= 1 + \dfrac{(-2\delta )^1}{1!}+ \dfrac{(-2\delta )^2}{2!} + \ldots \\&= 1 - 2 \delta + 2 \delta ^2 - \dfrac{4}{3}\delta ^3+ \ldots \\&= 1 -2 \delta + O(\delta ^2). \end{aligned}$$

Similarly, \(\text {exp}(2 \delta ) = 1 + 2 \delta + O(\delta ^2)\). Then,

$$\begin{aligned} (1 -2 \delta ) \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j} \right) \le E (X_i) \le (1 + 2 \delta ) \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j} \right) . \end{aligned}$$
(13)

Also,

$$\begin{aligned}&\int \ldots \int x_{i}^2 \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} \text {exp} (-2\delta ) dx_1 dx_2 \ldots dx_K \\&\quad \le \int \ldots \int x_{i}^2 \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} \text {exp} (-\delta \Delta ({\textbf{x}}) ) dx_1 dx_2 \ldots dx_K\\&\quad \le \int \ldots \int x_{i}^2 \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} dx_1 dx_2 \ldots dx_K,\\&\quad \text {exp} (-2\delta ) \int \ldots \int x_{i}^2 \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} dx_1 dx_2 \ldots dx_K \\&\quad \le \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \dfrac{\prod _{j=1}^K \Gamma (\alpha _{j} + 2\gamma _{ij})}{\Gamma (\sum _{j=1}^K \alpha _{j} + 2\gamma _{ij})}\int \ldots \int \text {exp} (-\delta \Delta ({\textbf{x}}) )\\&\quad \dfrac{\Gamma (\sum _{j=1}^K \alpha _{j} + 2\gamma _{ij})}{\prod _{j=1}^K \Gamma (\alpha _{j} + 2\gamma _{ij})} \prod _{j=1}^K x_{j}^{\alpha _{j} + 2\gamma _{ij} - 1} dx_1 dx_2 \ldots dx_K\\&\quad \le \int \ldots \int x_{i}^2 \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} dx_1 dx_2 \ldots dx_K, \end{aligned}$$
$$\begin{aligned} \text {exp} (-2\delta ) E_{\varvec{\alpha }} (X_{i}^2) \le \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \dfrac{\prod _{j \ne i} \Gamma (\alpha _{j}) \Gamma (\alpha _{i}+2)}{\Gamma (\sum _{j=1}^K \alpha _{j}+2)} E_{\varvec{\alpha }+2\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))] \le E_{\varvec{\alpha }} (X_{i}^2), \nonumber \\ \text {exp} (-2\delta ) E_{\varvec{\alpha }} (X_{i}^2) \le \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j}\right) \left( \dfrac{\alpha _i+1}{\sum _{j=1}^K \alpha _j+1}\right) E_{\varvec{\alpha }+2\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))] \le E_{\varvec{\alpha }} (X_{i}^2). \end{aligned}$$
(14)

By using 9 and 14,

$$\begin{aligned} \text {exp} (-2\delta )E_{\varvec{\alpha }} (X_{i}^2) \le \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j}\right) \left( \dfrac{\alpha _i+1}{\sum _{j=1}^K \alpha _j+1}\right) \dfrac{E_{\varvec{\alpha }+2\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]} \le \dfrac{1}{\text {exp} (-2\delta )}E_{\varvec{\alpha }} (X_{i}^2). \end{aligned}$$
(15)

Also,

$$\begin{aligned} E_{\varvec{\alpha }} (X_{i}^2)&= \int \ldots \int x_{i}^2 \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \prod _{j=1}^K x_{j}^{\alpha _j-1} dx_1 dx_2 \ldots dx_K,\nonumber \\&= \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j)} \dfrac{\prod _{j=1}^K \Gamma (\alpha _{j} + 2\gamma _{ij})}{\Gamma (\sum _{j=1}^K \alpha _{j}^ + 2\gamma _{ij}) } \int \ldots \int \nonumber \\&\quad \dfrac{\Gamma (\sum _{j=1}^K \alpha _{j} + 2\gamma _{ij})}{\prod _{j=1}^K \Gamma (\alpha _{j} + 2\gamma _{ij})} \prod _{j=1}^K x_{j}^{\alpha _{j} + 2\gamma _{ij} -1} dx_1 dx_2 \ldots dx_K ,\nonumber \\&= \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j}\right) \left( \dfrac{\alpha _i+1}{\sum _{j=1}^K \alpha _j+1}\right) . \end{aligned}$$
(16)

Substituting 16 to 14,

$$\begin{aligned}&\text {exp} (-2\delta )\left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j}\right) \left( \dfrac{\alpha _i+1}{\sum _{j=1}^K \alpha _j+1}\right) \le \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j}\right) \left( \dfrac{\alpha _i+1}{\sum _{j=1}^K \alpha _j+1}\right) \dfrac{E_{\varvec{\alpha }+2\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\lambda \Delta ({\textbf{X}}))]}\nonumber \\&\quad \le \text {exp} (2\delta )\left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j}\right) \left( \dfrac{\alpha _i+1}{\sum _{j=1}^K \alpha _j+1}\right) ,\nonumber \\&\quad (1 -2 \delta ) \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j}\right) \left( \dfrac{\alpha _i+1}{\sum _{j=1}^K \alpha _j+1}\right) \le \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j}\right) \left( \dfrac{\alpha _i+1}{\sum _{j=1}^K \alpha _j+1}\right) \dfrac{E_{\varvec{\alpha }+2\varvec{\gamma }_{i}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}\nonumber \\&\quad \le (1 +2 \delta ) \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j}\right) \left( \dfrac{\alpha _i+1}{\sum _{j=1}^K \alpha _j+1}\right) . \end{aligned}$$
(17)

We know that, \(\text {Var} (X_i) = E(X_{i}^2) - (E(X_i))^2\). Then we compute the lower limit of \(\text {Var} (X_i)\) \((\text {Var}_{LL} (X_i))\) and the upper limit of \(\text {Var} (X_i) (\text {Var}_{UL} (X_i))\) separately.

$$\begin{aligned} \text {Var}_{LL} (X_i)&= (1 -2 \delta ) \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j}\right) \left( \dfrac{\alpha _i+1}{\sum _{j=1}^K \alpha _j+1}\right) - \left[ (1 +2 \lambda ) \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j} \right) \right] ^2,\\&= \left( \dfrac{ \alpha _i}{\left( \sum _{j=1}^K \alpha _j \right) ^2 \left( \sum _{j=1}^K \alpha _j+1\right) } \right) \left[ (1 -2 \delta ) (\alpha _i+1) \left( \sum _{j=1}^K \alpha _j \right) \right. \\&\quad \left. - \alpha _i (1+2\delta )^2 \left( \sum _{j=1}^K \alpha _j+1\right) \right] ,\\&= \left( \dfrac{ \alpha _i}{\left( \sum _{j=1}^K \alpha _j \right) ^2 \left( \sum _{j=1}^K \alpha _j+1\right) } \right) \\&\qquad \times \left[ \left( \sum _{j=1}^K \alpha _j - \alpha _i \right) -2 \delta \left( 2 \delta \alpha _i + \sum _{j=1}^K \alpha _j +3 \alpha _i \sum _{j=1}^K \alpha _j + 2\delta \alpha _i \sum _{j=1}^K \alpha _j + 2 \alpha _i\right) \right] ,\\&= \left( \dfrac{ \alpha _i \left( \sum _{j=1}^K \alpha _j - \alpha _i \right) }{\left( \sum _{j=1}^K \alpha _j \right) ^2 \left( \sum _{j=1}^K \alpha _j+1\right) } \right) \\&\quad - \left( \dfrac{2 \delta \alpha _i}{\left( \sum _{j=1}^K \alpha _j \right) ^2 \left( \sum _{j=1}^K \alpha _j+1\right) } \right) \left( 2 \delta \alpha _i \left( \sum _{j=1}^K \alpha _j +1\right) \right. \\&\quad \left. + (1+\alpha _i) \sum _{j=1}^K \alpha _j + 2 \alpha _i \left( \sum _{j=1}^K \alpha _j +1 \right) \right) , \end{aligned}$$

and

$$\begin{aligned} \text {Var}_{UL} (X_i)&= (1 +2 \delta ) \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j}\right) \left( \dfrac{\alpha _i+1}{\sum _{j=1}^K \alpha _j+1}\right) - \left[ (1 -2 \delta ) \left( \dfrac{\alpha _i}{\sum _{j=1}^K \alpha _j} \right) \right] ^2,\\&= \left( \dfrac{ \alpha _i}{\left( \sum _{j=1}^K \alpha _j \right) ^2 \left( \sum _{j=1}^K \alpha _j+1\right) } \right) \left[ (1 +2 \delta ) (\alpha _i+1) \left( \sum _{j=1}^K \alpha _j \right) \right. \\&\quad \left. - \alpha _i (1-2\delta )^2 \left( \sum _{j=1}^K \alpha _j+1\right) \right] ,\\&= \left( \dfrac{ \alpha _i}{\left( \sum _{j=1}^K \alpha _j \right) ^2 \left( \sum _{j=1}^K \alpha _j+1\right) } \right) \\&\qquad \times \left[ \left( \sum _{j=1}^K \alpha _j - \alpha _i \right) -2 \delta \left( 2 \delta \alpha _i - \sum _{j=1}^K \alpha _j -3 \alpha _i \sum _{j=1}^K \alpha _j + 2\delta \alpha _i \sum _{j=1}^K \alpha _j - 2 \alpha _i\right) \right] ,\\&= \left( \dfrac{ \alpha _i \left( \sum _{j=1}^K \alpha _j - \alpha _i \right) }{\left( \sum _{j=1}^K \alpha _j \right) ^2 \left( \sum _{j=1}^K \alpha _j+1\right) } \right) \\&\quad - \left( \dfrac{2 \delta \alpha _i}{\left( \sum _{j=1}^K \alpha _j \right) ^2 \left( \sum _{j=1}^K \alpha _j+1\right) } \right) \left( 2 \delta \alpha _i \left( \sum _{j=1}^K \alpha _j +1\right) \right. \\&\quad \left. - (1+\alpha _i) \sum _{j=1}^K \alpha _j -2 \alpha _i \left( \sum _{j=1}^K \alpha _j +1 \right) \right) . \end{aligned}$$

Note that when \(\delta \rightarrow 0\), both \(\text {Var}_{LL} (X_i)\) and \(\text {Var}_{UL} (X_i) \rightarrow \dfrac{\alpha _{i}\left( \sum _{j=1}^K \alpha _{j} - \alpha _{i}\right) }{(\sum _{j=1}^K \alpha _{j})^2 (\sum _{j=1}^K \alpha _{j} +1)}\) which is the variance of the Dirichlet distribution. Next, we discuss the marginal distribution of the smoothed Dirichlet distribution.

2.6 Marginal Distributions

We know that the marginal distributions of the standard Dirichlet distribution are beta distributions. The smoothed Dirichlet (SD) probability density function is

$$\begin{aligned} f({\textbf{x}}|\varvec{\alpha }, \delta ) = \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\prod _{j=1}^K \Gamma (\alpha _j) E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{x}}))]} \prod _{j=1}^K x_{j}^{\alpha _j -1} \text {exp}(-\delta \Delta ({\textbf{x}})). \end{aligned}$$

Let \(A=\sum _{j=1}^K \alpha _j\). Then

$$\begin{aligned} f(x_1|\varvec{\alpha }, \delta )&= \dfrac{\Gamma (A)}{\Gamma (\alpha _1) \Gamma (A - \alpha _1) E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{x}}))]} x_{1}^{\alpha _1 -1} (1 - x_{1})^{A - \alpha _1 -1}\text {exp}(-\delta \Delta ({\textbf{x}})),\nonumber \\&= \dfrac{\Gamma (A)}{\Gamma (\alpha _1) \Gamma (A - \alpha _1)} x_{1}^{\alpha _1 -1} (1 - x_{1})^{A - \alpha _1 -1} \dfrac{\text {exp}(-\delta \Delta ({\textbf{x}}))}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{x}}))]}. \end{aligned}$$
(18)

The marginal probability density function of \(X_1\) is a product of the probability density function of Beta\((\alpha _1, A - \alpha _1)\) and the ratio of \(\dfrac{\text {exp}(-\delta \Delta ({\textbf{x}}))}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{x}}))]}\). Now let’s obtain the marginal distribution of \(x_2\). The joint distribution of \(x_1\) and \(x_2\) is

$$\begin{aligned}{} & {} f(x_1, x_2|\varvec{\alpha }, \delta ) = \dfrac{\Gamma (A)}{\Gamma (\alpha _1) \Gamma (\alpha _2) \Gamma (A - \alpha _1 - \alpha _2) E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{x}}))]}\\{} & {} \quad x_{1}^{\alpha _1 -1} x_{2}^{\alpha _2 -1} (1 - x_{1} - x_{2})^{A - \alpha _1 - \alpha _2 -1}\text {exp}(-\delta \Delta ({\textbf{x}})). \end{aligned}$$

Then

$$\begin{aligned} f(x_2|\varvec{\alpha }, \delta )&= \int _{0}^{1-x_2} \dfrac{\Gamma (A)}{\Gamma (\alpha _1) \Gamma (\alpha _2) \Gamma (A - \alpha _1 - \alpha _2) E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{x}}))]} \\&\quad x_{1}^{\alpha _1 -1} x_{2}^{\alpha _2 -1} (1 - x_{1} - x_{2})^{A - \alpha _1 - \alpha _2 -1}\text {exp}(-\delta \Delta ({\textbf{x}})) dx_1,\\&= \dfrac{\Gamma (A)}{\Gamma (\alpha _1) \Gamma (\alpha _2) \Gamma (A - \alpha _1 - \alpha _2)} \dfrac{x_{2}^{\alpha _2 -1}}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{x}}))]} \\&\quad \int _{0}^{1-x_2} x_{1}^{\alpha _1 -1} (1 - x_{1} - x_{2})^{A - \alpha _1 - \alpha _2 -1}\text {exp}(-\delta \Delta ({\textbf{x}})) dx_1. \end{aligned}$$

Substitute \(x_1 = (1 - x_2)u\), then

$$\begin{aligned} f(x_2|\varvec{\alpha }, \delta , u)&= \dfrac{\Gamma (A)}{\Gamma (\alpha _1) \Gamma (\alpha _2) \Gamma (A - \alpha _1 - \alpha _2)} \dfrac{x_{2}^{\alpha _2 -1} (1 - x_{2})^{A - \alpha _2 -1}}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta (x_2, u))]} \\&\quad \int _{0}^{1} u^{\alpha _1 -1} (1 - u)^{A - \alpha _1 - \alpha _2 -1}\text {exp}(-\delta \Delta (x_2,u)) du. \end{aligned}$$

3 Estimation of Parameters and Bayesian Inference

We now outline the estimation of the parameters of the smoothed Dirichlet distribution. We first derive estimators for \(\alpha _j, j=1, 2, \ldots , K\) and \(\delta\) using the method of moments (MOM).

3.1 Method of Moments (MOM)

Suppose we have a random sample with n random vectors \({\varvec{X}}_1, {\varvec{X}}_2, \ldots , {\varvec{X}}_n\) such that \({\varvec{X}}_{i} = (X_{i1}, X_{i2}, \ldots , X_{iK})^{t}, X_{ij} \ge 0\) and \(\sum _{j=1}^K X_{ij} = 1\). We know that the first and second population moments are

$$\begin{aligned} E(X_{j})&= \dfrac{\alpha _{j}}{\sum _{l=1}^K \alpha _{l}} \times \dfrac{E_{\varvec{\alpha +\gamma _{j}}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}, \end{aligned}$$

and

$$\begin{aligned} E(X_{j}^2)&=\dfrac{\alpha _{j}}{\sum _{l=1}^K \alpha _{l}} \times \dfrac{\alpha _{j}+1}{\sum _{l=1}^K \alpha _{l}+1} \times \dfrac{E_{\varvec{\alpha }+2\varvec{\gamma }_{j}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}. \end{aligned}$$

We define the first and second sample moments as

$$\begin{aligned} m_{1j} = \dfrac{1}{n} \sum _{i=1}^n x_{ij}, \end{aligned}$$

and

$$\begin{aligned} m_{2j} = \dfrac{1}{n} \sum _{i=1}^n x_{ij}^2 \end{aligned}$$

for \(j=1, 2, \ldots , K\). We have \(K-1\) first-order moment equations and \(K-1\) second-order moment equations to solve for K unknown \(\alpha _j\) and \(\delta\). Then

$$\begin{aligned} \dfrac{\alpha _{j}}{\sum _{l=1}^K \alpha _{l}} \times \dfrac{E_{\varvec{\alpha +\gamma _{j}}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]} = \dfrac{1}{n} \sum _{i=1}^n x_{ij}, \end{aligned}$$
(19)

and

$$\begin{aligned} \dfrac{\alpha _{j}}{\sum _{l=1}^K \alpha _{l}} \times \dfrac{\alpha _{j}+1}{\sum _{l=1}^K \alpha _{l}+1} \times \dfrac{E_{\varvec{\alpha }+2\varvec{\gamma }_{j}}[\text {exp}(-\delta \Delta ({\textbf{X}}_i))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}&= \dfrac{1}{n} \sum _{i=1}^n x_{ij}^2\nonumber \\ \dfrac{1}{n} \sum _{i=1}^n x_{ij} \times \dfrac{\alpha _{j}+1}{\sum _{l=1}^K \alpha _{l}+1} \times \dfrac{E_{\varvec{\alpha }+2\varvec{\gamma }_{j}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }+\varvec{\gamma }_{j}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}&= \dfrac{1}{n} \sum _{i=1}^n x_{ij}^2\nonumber \\ \dfrac{\alpha _{j}+1}{\sum _{l=1}^K \alpha _{l}+1} \times \dfrac{E_{\varvec{\alpha }+\varvec{\gamma }_{j}}[X_j\text {exp}(-\delta \Delta ({\textbf{X}}))]}{E_{\varvec{\alpha }+\varvec{\gamma }_{j}}[\text {exp}(-\delta \Delta ({\textbf{X}}))]}&= \dfrac{\sum _{i=1}^n x_{ij}^2}{\sum _{i=1}^n x_{ij}} \end{aligned}$$
(20)

for \(j=1, 2, \ldots , K\). There is no closed-form solution for \(\alpha _j\) and \(\delta\) in solving simultaneously 19 to 20, so we must solve numerically to obtain the corresponding method of moments estimators for the parameters.

3.2 Bayesian Inference

In the Bayesian paradigm, if the posterior distribution is in the same probability distribution family as the prior distribution, then the prior is called a conjugate prior distribution. Like the Dirichlet distribution, the smoothed Dirichlet distribution is also a conjugate prior to the multinomial cell probabilities vector. The distribution of cell counts \(({\textbf{X}})\) is given by

$$\begin{aligned} {\textbf{X}} = (X_{1}, X_{2}, \ldots , X_{K})^{t} \sim \text {Multinomial}(n,\text {prob}= {\textbf{p}}), \end{aligned}$$

where \({\textbf{p}} = (p_{1}, p_{2}, \ldots , p_{K})^{t}\) denotes the vector of cell probabilities for the multinomial population. We suggest using the smoothed Dirichlet distribution as the prior distribution for \({\textbf{p}}\),

$$\begin{aligned} {\textbf{p}} = (p_{1}, p_{2}, \ldots , p_{K})^{t} \sim \text {SD} (\varvec{\alpha }, \delta , \Delta ). \end{aligned}$$

Then, the posterior distribution of \({\textbf{p}}\), given the observed counts (\(({\textbf{x}})\) is

$$\begin{aligned} {\textbf{p}} | {\textbf{x}}; \varvec{\alpha } \sim \text {SD}(\varvec{\alpha } + {\varvec{x}}, \delta , \Delta ). \end{aligned}$$

We introduce the smoothed Dirichlet-multinomial distribution, a compound distribution of a multinomial distribution, and a smoothed Dirichlet distribution. The marginal distribution of \({\varvec{x}}\) is given by

$$\begin{aligned} f(\varvec{x|\alpha })&= \int f(\varvec{x|p}) f(\varvec{p|\alpha }) d{\varvec{p}}\\&= \int {n \atopwithdelims (){\varvec{x}}} \prod _{j=1}^K p_{j}^{x_j} \times C(\varvec{\alpha }, \delta ) \prod _{j=1}^K p_{j}^{\alpha _j -1} \text {exp}(-\delta \Delta ({\textbf{p}})) d{\varvec{p}}\\&= {n \atopwithdelims (){\varvec{x}}} \dfrac{C(\varvec{\alpha }, \delta )}{C(\varvec{\alpha +x}, \delta )} \int C(\varvec{\alpha +x}, \delta ) \prod _{j=1}^K p_{j}^{x_j+\alpha _j -1} \text {exp}(-\delta \Delta ({\textbf{p}})) d{\varvec{p}}\\&= {n \atopwithdelims (){\varvec{x}}} \dfrac{C(\varvec{\alpha }, \delta )}{C(\varvec{\alpha +x}, \delta )}\\&= {n \atopwithdelims (){\varvec{x}}} \dfrac{\Gamma (\sum _{j=1}^K \alpha _j)}{\Gamma (n+\sum _{j=1}^K \alpha _j)} \prod _{j=1}^K \left( \dfrac{\Gamma (x_j+\alpha _j)}{\Gamma (\alpha _j)} \right) \dfrac{E_{\varvec{\alpha +x}}[\text {exp}(-\delta \Delta ({\textbf{p}}))]}{E_{\varvec{\alpha }}[\text {exp}(-\delta \Delta ({\textbf{p}}))]} \end{aligned}$$

4 Data Analysis

4.1 A Simulation Study

An in-depth simulation study would be useful to demonstrate and compare how the proposed and existing estimators perform over simulated datasets and considering different scenarios. Nevertheless, we performed a brief simulation study using 10000 Monte Carlo simulations, knowing the true cell probabilities. We report the Mean Squared Error (MSE) and compare the estimators below. Also, we varied the number of populations (N - 100, 200, and 500) and the number of categories (K- 3, 6, and 9) to explore the performance of the estimators. We generated data from multinomial distributions with sample sizes ranging from 15 to 75. We considered two scenarios for the true cell probabilities. Here we compare our proposed method with the Dirichlet process method proposed by [10] and the weighted likelihood method by [8]. Note that for the empirical Bayes method, we considered the standard Dirichlet distribution. In the proposed method we considered the penalty function \(\Delta = \sum _{j=1}^{K-1} (x_{j+1} - x_{j})^2\) for the smoothed Dirichlet distribution. Also, \(\delta\) changes with the scenarios, N and K, and it is ranged from 100 to 300.

4.1.1 Scenario 1

In this scenario, the true cell probabilities are strictly decreasing but the differences between successive \({\textbf{x}}\)’s are the same. For example, when \(K=3\), the true cell probabilities are \({\textbf{x}} = \left( \dfrac{3}{6}, \dfrac{2}{6}, \dfrac{1}{6}\right) ^{t}\).

Table 1 MSE values for scenario 1

Table 1 provides the mean squared error values for each estimator based on Scenario 1. When N is fixed and K increases the MSE decreases. The MSE also decreases when K is fixed and N increases. The Bayesian shrinkage estimator based on a smoothed Dirichlet prior is the best estimator based on the MSE.

4.1.2 Scenario 2

In this scenario, the true cell probabilities are increasing and decreasing (zig-zagag pattern). For example, when \(K=3\), the true cell probabilities are \({\textbf{p}} = \left( \dfrac{1}{12}, \dfrac{10}{12}, \dfrac{1}{12}\right) ^{t}\).

Table 2 MSE values for scenario 2

Table 2 provides the mean squared error values for each estimator based on Scenario 2. As in previous scenarios, when N is fixed and K increases the MSE decreases. The MSE also decreases when K is fixed and N increases. The estimator based on weighted likelihood approach is the best estimator based on the MSE for this scenario. The Bayes estimator based on the smoothed Dirichlet prior is not doing well in this case, which is not surprising given it isn’t designed to handle this case where successive cell probabilities are different.

4.2 Real Data Analysis

Real-world situations often arise where outcomes in certain categories are not observed due to limited sample size and small, but non-zero, cell probabilities. In such cases, the Maximum Likelihood Estimation (MLE) method may yield poor results by underestimating the actual cell probabilities. However, a proposed approach is highly valuable and applicable in such scenarios, particularly when dealing with ordinal categories that possess a natural ordering. This approach ensures that the borrowing of information among neighboring cell categories is conceptually meaningful, making it an effective and useful methodology for analyzing data applications with missing outcomes in the presence of small probabilities.

Let’s now consider the estimation of \({\textbf{p}}\) using a smoothed Dirichlet prior. For the data application, we consider 2018 Major League Baseball (MLB) batting data from the Baseball-Reference website (www.baseball-reference.com). We consider data for all the regular season games taking place between March 29, 2018, and October 12, 2018. Our analysis includes \(m=556\) players with at least 25 plate appearances. [8] proposed an estimator based on the weighted likelihood approach to predict a good baseball batting metric for each batter, especially when a batter has a few plate appearances. This estimator borrows information from other similar batters to make inferences about a target. Our proposed Bayesian estimator using smoothed Dirichlet prior distribution borrows information across other batters but, more importantly, also across neighboring ordinal categories (batting outcomes) to improve the estimation of cell probabilities. [9] used this proposed Bayesian estimator using smoothed Dirichlet prior distribution to estimate the distribution of positive COVID-19 cases across age groups for Canadian health regions.

In our analysis, we consider \(K=11\) possible outcomes to batting; SO - strikeout, GO - ground out, AO - air out, SH - sacrifice hit, SF - sacrifice fly, HBP - hit by a pitch, BB - bases on balls/walk, S - single, D - double, T - triple and HR - home run. The outcome of batting in baseball can be divided into discrete categories; this is the basis for constructing metrics that evaluate the batting performance of players. It is also the basis of our analysis. Let \(x_{ij}\) be the number of plate appearances in which the batting outcome j occurs for the \(i^{\text {th}}\) batter \((j=1, 2, \ldots , K)\) and denote the number of plate appearances for the \(i^{\text {th}}\) batter by \(n_i\). In this paper, we considered \(K=11\) batting outcomes, and the joint distribution of the counts for these 11 discrete categories for batter i is given by

$$\begin{aligned} \begin{array}{lcl} \mathbf {x_i} = (x_{i1}, x_{i2}, \ldots ,x_{i11})^{t}\sim & {} \text {Multinomial}(n_i,\mathbf {p_i}), \end{array} \end{aligned}$$

where \(\mathbf {p_i} = (p_{i1}, p_{i2}, \ldots ,p_{i11})^{t}\) represents the vector of outcome specific probabilities satisfying \(\displaystyle \sum _{j} p_{ij}=1\). Taking a Bayesian approach, assume that

$$\begin{aligned} \begin{array}{lcl} \mathbf {p_i} = (p_{i1}, p_{i2}, \ldots , p_{i11})^{t}\sim & {} \text {SD}(\varvec{\alpha } = (\alpha _1, \alpha _2, \ldots , \alpha _{11})^{t}, \delta , \Delta ). \end{array} \end{aligned}$$

Then, the posterior distribution for the \(i{\text {th}}\) batter is given by

$$\begin{aligned} \mathbf {p_i} | {\textbf{x}}; \varvec{\alpha } \sim \text {SD}(\varvec{\alpha } + {\varvec{x}}, \delta , \Delta ). \end{aligned}$$

It is clear that there are mainly two types of groups; strikeout, ground out, and air out are outs/dismissals, and single, double, triple, and home run are hits. Borrowing of information across cell categories within each group only. For our analysis, we use \(\Delta = \sum _{j=1}^{K-1} (p_{i(j+1)} - p_{ij})^2\). We modify this penalty function slightly so that the borrowing of information across cell categories is done within each group only. Assuming the batting outcomes are arranged in the given order (SO, GO, \(\ldots\), HR) as above, the modified penalty function is given by

$$\begin{aligned} \Delta = (p_{i2}-p_{i1})^2+(p_{i3}-p_{i2})^2 + (p_{i9}-p_{i8})^2+(p_{i10}-p_{i9})^2+(p_{i11}-p_{i10})^2. \end{aligned}$$

Fig. 3 shows the estimates of the cell probabilities for the top 10 batters using smoothed Dirichlet prior with different \(\delta\) that are very close to the MLE. This behavior was to be expected, given the top 10 batters have a large number of plate appearances: in their case, when \(\delta\) increases, we can see small fluctuations from the MLEs. On the other hand, Fig. 4 shows the estimates of the cell probabilities for the last 10 batters are very close to the overall proportion \({\bar{p}}_j = \frac{\sum _{i=1}^{556} x_{ij}}{\sum _{i=1}^{556} \sum _{j=1}^{11} x_{ij}}\). For the last ten batters that only have a small number of plate appearances, when \(\delta\) increases, we can see large fluctuations from the MLEs.

Fig. 3
figure 3

Comparison of MLE, the posterior expectation using SD prior with different \(\delta\) and overall proportion of outcomes for the top 10 batters

Fig. 4
figure 4

Comparison of MLE, posterior expectation using SD prior with different \(\delta\) and the overall proportion of outcomes for the last 10 batters with fewer number of plate appearances

5 Conclusions

The proposed smoothed Dirichlet distribution constitutes a superior alternative for borrowing information among neighboring cells. The proposed smoothed Dirichlet distribution forces the probabilities of neighboring cells to be closer to each other than under the standard Dirichlet distribution. Inference results were in close agreement with the behavior we expected for the data application. This data application also shows that the smoothed Dirichlet distribution is flexible enough to accommodate different ways of borrowing information from the cells. Future research may be aimed at determining a suitable \(\delta\) for the smoothed Dirichlet distribution.