Normal distribution
Table 1 Convergence comparison for AIFS, IFS, and EM using crop data.
To illustrate how the steplength adjustment works in an empirical context, the AIFS method is compared with the IFS method with \(q^{(t)}=1.5\), and also with the EM algorithm. Note that \(q^{(t)}=1.5\) is an arbitrary number, and the IFS method with \(q_1^{(t)}=q_2^{(t)}=1\) is identical to the EM algorithm. Hereafter, these are referred to as AIFS, IFS, and EM, respectively. Apple crop data from Table 7.4 of Little and Rubin (2002) are used for expository purposes. The data take the same form as in Sect. 2.1 when \(n=18\) and \(m=12\). The three methods to be compared share the initial values, \({\varvec{\mu }}^{(0)}=(30,30)^{\prime }\) and \(\varSigma ^{(0)}=\mathrm{diag}(100,100)\), and the convergence criterion \(\Vert \nabla \ell _\mathrm{obs}({\varvec{\alpha }}^{(t)})\Vert <10^{-4}\), where \(\mathrm{diag}(\varvec{x})\) denotes a diagonal matrix with \(\varvec{x}\) in its diagonal elements and zeros otherwise.
Table 1 gives \(\varDelta ({\varvec{\alpha }}^{(t)})=|\ell _\mathrm{obs}({\varvec{\alpha }}^{(t)})-\ell _\mathrm{obs}(\varvec{\alpha }^{*})|\), \(s^{(t)}q^{(t)}\) of AIFS, IFS, and \(q^{(t)}\) of EM. All values are rounded to four decimal places. Thus, note that 0.0000 is not equal to zero, but a rounded zero. Where “–” appears, this denotes an empty cell because the steplength is not used or the computation is terminated. In the initial stages, EM comes closest to \(\ell _\mathrm{obs}(\varvec{\alpha }^{*})\). In later stages, when \(\varDelta ({\varvec{\alpha }}^{(t)})\) comes closer to zero, EM is further from the convergent point than the other two methods, while AIFS and IFS become closer around \(\varvec{\alpha }^{*}\). As per Theorem 2, it is confirmed that AIFS converges faster than IFS and EM around the convergent point, and as per this example, it is also confirmed that AIFS converges faster than the others even for the whole sequence.
The standard error of the estimate obtained by the (A)IFS method is generated by the IFS-Variance method. In this example, the second derivative of \(\ell _\mathrm{obs}(\varvec{\alpha })\) can be explicitly obtained (e.g., Savalei 2010), and thus the standard error matrix is also obtained. The distance between the former and the latter, measured in terms of the sum of squares, was less than \(10^{-8}\). This indicates that the IFS-Variance method works well here.
Poisson mixture distribution
Three parameter estimation methods are compared using empirical data: the AIFS, the IFS with the randomly guessed initial value \(q^{(t)}=2\), and the EM algorithm. These are denoted by AIFS, IFS, and EM, respectively. Mortality data from the 1910-1912 London Times newspaper (Hasselblad 1969; Titterington 1984) are used for this purpose. The data record the number of deaths per day and the number of days with \(y_i\) deaths (\(i=1,\ldots ,1096\)). The data are a better fit to a two-component Poisson mixture model than a single component model (Lange 1995b). The initial value is the moment estimates \((\pi ^{(0)},\alpha _1^{(0)},\alpha _2^{(0)})=(0.2870,2.582,1.101)\) and the convergence criterion is \(\Vert \nabla \ell _\mathrm{obs}({\varvec{\alpha }}^{(t)})\Vert <10^{-4}\) for all models.
Figure 1 shows the convergence process for AIFS, IFS, and EM. The horizontal axis indicates the iteration number to the 300th iteration and the vertical axis is \(\varDelta ({\varvec{\alpha }}^{(t)})\). Notice that until the 300th iteration AIFS converges but the remaining two methods are still in the process of convergence. The log-likelihood values are all \(-1989.946\) for AIFS, IFS, and EM when these methods satisfy the convergence criterion. The value is reported as the maximum log-likelihood function value (Lange 1995b). The number of iterations until convergence is 196, 1474, and 2208 for AIFS, IFS, and EM, respectively. The figure shows that AIFS converges much faster than the other two methods and the decrease is not so smooth as the other two; IFS and EM behave smoothly. Around the convergent point of IFS and EM, which is not shown in the figure, IFS satisfies the convergence criterion faster than EM. The limitation of EM here is its slow behavior around the convergent point. It is confirmed that AIFS converges faster than IFS, whilst IFS is faster than EM for the whole sequence.
The standard error can be obtained through the IFS-Variance method. In this example, the standard error of the estimates for \((\pi ,\alpha _1,\alpha _2)\) is obtained in an explicit form by taking the second derivative of \(\ell _\mathrm{obs}(\varvec{\alpha })\). The sum of squares of the difference of all elements between the former and the latter is less than \(10^{-6}\). Thus, the IFS-Variance method again works well.
Multivariate t-distribution
Multivariate t-distributions are often used for data analysis, especially in the context of robust statistics (McLachlan and Peel 2000). Let the observed data, \(\varvec{y}_1,\ldots ,\varvec{y}_n\), follow p-variate t-distribution \(t_p({\varvec{\mu }},\varSigma ,v)\), where \({\varvec{\mu }}\) is a location vector, \(\varSigma \) is a scale matrix, and v is known degrees of freedom. The log-likelihood function of this t-distribution with \(\varvec{y}_i\) (\(i=1,\dots ,n\)) is denoted by \(\ell _\mathrm{obs}(\varvec{\alpha })\). In the following, the parameters, \({\varvec{\mu }}\) and the nonredundant elements in \(\varSigma \), to be estimated are written as \(\varvec{\alpha }\). To estimate \(\varvec{\alpha }\), it is assumed that the complete data are \((\varvec{y}_i,z_i)\) (\(i=1,\ldots ,n\)) and that
$$\begin{aligned} \varvec{y}_i|z_i\sim N_p({\varvec{\mu }},z_i^{-1}\varSigma ),~~z_i\sim v^{-1}\chi _v^2, \end{aligned}$$
where \(N_p({\varvec{\mu }},z_i^{-1}\varSigma )\) is a normal distribution with mean \({\varvec{\mu }}\) and variance \(v^{-1}\varSigma \), and \(\chi _v^2\) is a chi-squared distribution with v degrees of freedom. The complete-data log-likelihood function is
$$\begin{aligned} \ell _\mathrm{com}(\varvec{\alpha })\propto & {} - \frac{n}{2} \log |v^{-1}\varSigma | -\sum _{i=1}^n\frac{1}{2} (\varvec{y}_i-{\varvec{\mu }})^{\prime } (z_i^{-1}\varSigma )^{-1}(\varvec{y}_i-{\varvec{\mu }}). \end{aligned}$$
The (A)IFS method is derived as follows. The first derivative of \(\ell _\mathrm{obs}({\varvec{\alpha }}^{(t)})\) is
$$\begin{aligned}&\nabla \ell _\mathrm{obs}({\varvec{\alpha }}^{(t)}) \\&\quad = \sum _{i=1}^n \frac{v+p}{v+h_i^{(t)}} \left( \begin{array}{c} \left( \varSigma ^{(t)}\right) ^{-1}(\varvec{y}_i-{\varvec{\mu }}^{(t)})\\ 0.5vec(\left( \varSigma ^{(t)}\right) ^{-1}(\varvec{y}_i-{\varvec{\mu }}^{(t)})(\varvec{y}_i-{\varvec{\mu }}^{(t)})^{\prime }\left( \varSigma ^{(t)}\right) ^{-1}) \end{array} \right) \\&\qquad - \sum _{i=1}^n \left( \begin{array}{c} {\varvec{0}}\\ 0.5 vec(\left( \varSigma ^{(t)}\right) ^{-1}) \end{array} \right) , \end{aligned}$$
where \(h_i^{(t)}=(\varvec{y}_i-{\varvec{\mu }}^{(t)})^{\prime } \left( \varSigma ^{(t)}\right) ^{-1}(\varvec{y}_i-{\varvec{\mu }}^{(t)})\). To derive \(\nabla \ell _\mathrm{obs}({\varvec{\alpha }}^{(t)})\), \(\nabla ^{10}Q({\varvec{\alpha }}^{(t)}|{\varvec{\alpha }}^{(t)})\) is used, since it is easier to calculate. The inverse matrix of the expected second derivative is
$$\begin{aligned} J_\mathrm{com}(\varvec{\alpha })^{-1}= \left( \begin{array}{cc} \varSigma &{} O\\ O &{} 2\varSigma \otimes \varSigma \end{array} \right) . \end{aligned}$$
Thus, the iteration equation of the IFS method is given as
$$\begin{aligned} {\varvec{\mu }}^{(t+1)}= & {} {\varvec{\mu }}^{(t)} + q_1^{(t)}\frac{1}{n}\sum _{i=1}^n \frac{v+p}{v+h_i^{(t)}} (\varvec{y}_i-{\varvec{\mu }}^{(t)}),\\ \varSigma ^{(t+1)}= & {} \varSigma ^{(t)} + q_2^{(t)} \frac{1}{n}\sum _{i=1}^n\nonumber \\&\times \left\{ \frac{v+p}{v+h_i^{(t)}} (\varvec{y}_i-{\varvec{\mu }}^{(t)})(\varvec{y}_i-{\varvec{\mu }}^{(t)})^{\prime } - \varSigma ^{(t)} \right\} , \end{aligned}$$
where \(q_1^{(t)}\) and \(q_2^{(t)}\) indicate the different steplengths.
The IFS method with
$$\begin{aligned} q_1^{(t)} = n\left( \sum _{i=1}^n\frac{v+p}{v+h_i^{(t)}}\right) ^{-1}~\mathrm{and}~ q_2^{(t)} = 1 \end{aligned}$$
(8)
becomes the iteration of the EM algorithm. These steplengths can be obtained in the same method as in the mixture of Poisson distributions. Let \(\varvec{d}_1\) be the direction for \({\varvec{\mu }}\). The equation that produces the exact optimum steplength is
$$\begin{aligned} \frac{d}{dq_1^{(t)}} \ell _\mathrm{obs}({\varvec{\mu }}^{(t)}+q_1^{(t)}\varvec{d}_1^{(t)},\varSigma ^{(t)})=0, \end{aligned}$$
This is difficult to solve, thus the following is solved instead:
$$\begin{aligned} \sum _{i=1}^n \frac{v+p}{v+h_i^{(t)}} \varvec{d}_1^{(t)\prime }\left( \varSigma ^{(t)}\right) ^{-1} (\varvec{y}_i-{\varvec{\mu }}^{(t)}-q_1^{(t)}\varvec{d}_1^{(t)}) = 0, \end{aligned}$$
where \(h_i^{(t)}\) is fixed at the current estimate, although it is actually a function of \(\varvec{\alpha }^{(t)}\). This leads to \(q_1^{(t)}\) of Eq. (8). In a similar way for \(q_2^{(t)}\), note \(q_2^{(t)}=1\). The fixation is justified when the direction vector according to the IFS method is sufficiently close to zero, that is, when \({\varvec{\alpha }}^{(t)}\) is near the convergent point. Toward the end, the directions by the EM algorithm and the IFS method are almost identical.
The IFS method with
$$\begin{aligned} q_1^{(t)} = q_2^{(t)} = n\left( \sum _{i=1}^n\frac{v+p}{v+h_i^{(t)}}\right) ^{-1} \end{aligned}$$
becomes the iteration of the parameter-expansion EM algorithm (PX-EM algorithm; Liu et al. 1998). This was developed to accelerate the convergence of the EM algorithm. The IFS method with the common steplength (i.e., the PX-EM algorithm) is expected to be faster than that with different steplengths (i.e., the EM algorithm). The problem is how this steplength can be improved.
Here, these four methods are compared: the AIFS method, the IFS method with the initial value \(q^{(t)}=1.2\), the PX-EM algorithm, and the EM algorithm. These are referred to as AIFS, IFS, PXEM, and EM, respectively. To compare these methods, data of size \(n=100\) are randomly generated from the multivariate t-distribution with the location parameter \({\varvec{\mu }}=(1,2,3)^{\prime }\) and the covariance matrix
$$\begin{aligned} \varSigma = \left( \begin{array}{ccc} 1.0&{}\quad 0.9 &{}\quad 0.5 \\ 0.9&{}\quad 1.0&{}\quad 0.3\\ 0.5&{}\quad 0.3&{}\quad 1.0 \\ \end{array} \right) \end{aligned}$$
with six degrees of freedom. For all four computation methods, the initial values are the moment estimates, and the convergence criterion is \(\Vert \nabla \ell _\mathrm{obs}({\varvec{\alpha }}^{(t)})\Vert <10^{-4}\).
Table 2 Convergence comparison for AIFS, PXEM, IFS, and EM using randomly generated data Table 2 shows the process of convergence across the four methods in terms of \(\varDelta ({\varvec{\alpha }}^{(t)})\) and the steplength chosen to improve the current value at each iteration. Empty cells, “–”, occur for the same reason as per the foregoing normal distribution example. All methods attain the same maximum value of the log-likelihood function. All values are rounded to four decimal places; thus, again, 0.0000 for \(\varDelta ({\varvec{\alpha }}^{(t)})\) denotes a rounded zero. The number of iterations until convergence differ across the methods as follows: 10, 15, 11, and 19 for AIFS, IFS, PXEM, and EM, respectively. Neither AIFS or IFS required any steplength adjustments. AIFS and PXEM require almost the same number of iterations until convergence. From the first to the fourth iteration, AIFS is closer to the convergent point than IFS and EM, but farther than PXEM. After the fifth iteration, AIFS begins to get closer to the convergent point than PXEM. This indicates that AIFS improves itself more slowly than PXEM in the initial iterations, but more quickly in subsequent iterations. In this example, it is confirmed that AIFS converges almost equivalently to PXEM, that they are faster than IFS, and that IFS is faster than EM, even for the whole sequence. Although in the application of PXEM an extended model is needed that contains the original model as a special case, AIFS can be applied to cases where such an extended model cannot be found. In those cases, AIFS can substitute for PXEM.
Dirichlet distribution
The (A)IFS method is applied for the parameter estimation of a Dirichlet distribution. The Dirichlet distribution has no explicit form of the iteration for the EM algorithm since it involves the gamma function and, in practical cases, its derivative. The EM algorithm must resort to the Newton–Raphson type iterations, losing its simplicity. On the other hand, the (A)IFS method could compute the parameter without resorting to such a computational method.
Let \(\varvec{y}=(y_1,\ldots ,y_p)^{\prime }\) be a data vector taken independently from the Dirichlet distribution with density function,
$$\begin{aligned} \frac{\varGamma (\sum _{j=1}^p\alpha _j)}{\varGamma (\alpha _1)\varGamma (\alpha _2)\ldots \varGamma (\alpha _p)}y_1^{\alpha _1-1}y_2^{\alpha _2-1}\ldots y_p^{\alpha _p-1}, \end{aligned}$$
(9)
where \((\alpha _1,\ldots ,\alpha _p)\) is the parameter. The Dirichlet random vector is derived by transforming the independent random variables from gamma distributions. Assume that \(x_1,\ldots ,x_p\) are independently positive random variables and each \(x_j\) is distributed as a gamma distribution with parameter \(\alpha _j\) (\(j=1,\ldots ,p\)). By setting
$$\begin{aligned} \frac{x_j}{\sum _{j=1}^{p}x_j} =y_j,~~j=1,\ldots ,p, \end{aligned}$$
vector \((y_1,\ldots ,y_p)\) is distributed as Eq. (9) (Mosimann 1962). \((y_1,\ldots ,y_p)\) can be regarded as incomplete data and \((x_1,\ldots ,x_p)\) as complete data for the estimation of parameters \((\alpha _1,\ldots ,\alpha _p)\).
Assume that we have data of sample size n taken independently from the Dirichlet distribution, \(\varvec{y}_{1},\ldots ,\varvec{y}_{n}\) with \(\varvec{y}_i=(y_{i1},\ldots ,y_{ip})\). The observed-data log-likelihood function is
$$\begin{aligned} \ell _\mathrm{obs}(\varvec{\alpha })= & {} n \log \varGamma \left( \sum _{j=1}^p \alpha _j\right) -n \sum _{j=1}^p \log \varGamma (\alpha _j) \\&+ \sum _{j=1}^p(\alpha _j-1) \sum _{i=1}^n \log y_{ij}, \end{aligned}$$
where \(\varvec{\alpha }=(\alpha _1,\ldots ,\alpha _p)^{\prime }\). The complete-data log-likelihood function is, ignoring an irrelevant term,
$$\begin{aligned} \ell _\mathrm{com}(\varvec{\alpha }) = -n \sum _{j=1}^p \log \varGamma (\alpha _j) + \sum _{j=1}^{p} (\alpha _j-1) \sum _{i=1}^n \log x_{ij}. \end{aligned}$$
As mentioned earlier, applying the EM algorithm would produce an inexplicit form of the iteration. In fact, the EM algorithm leads to
$$\begin{aligned} Q(\varvec{\alpha }|{\varvec{\alpha }}^{(t)})= & {} -n \sum _{j=1}^p \log \varGamma (\alpha _j) + \sum _{j=1}^{p} (\alpha _j-1) \\&\sum _{i=1}^n E[\log x_{ij}|\varvec{y}_i;{\varvec{\alpha }}^{(t)}]. \end{aligned}$$
This equation cannot be explicitly solved with respect to \(\varvec{\alpha }\) since the derivation of \(\varGamma (\alpha _j)\) is involved. To derive the M-step, we need to resort to the Newton–Raphson type iterations. In this case, use of the EM algorithm would not bring benefits such as simplicity.
On the other hand, the (A)IFS method does not solve the first derivative of the Q-function. To apply the (A)IFS method, we need a direction vector, \(\varvec{d}=n^{-1}J_\mathrm{com}(\varvec{\alpha })^{-1}\nabla \ell _\mathrm{obs}(\varvec{\alpha })\). First, we compute the Fisher information matrix for a single datum.
$$\begin{aligned} J_\mathrm{com}(\varvec{\alpha })=- \frac{1}{n} E\left[ \frac{\partial \ell _\mathrm{com}(\varvec{\alpha })}{\partial \varvec{\alpha }\partial \varvec{\alpha }^{\prime }} \right] = \mathrm{diag}(\varPsi _1(\alpha _1),\ldots ,\varPsi _1(\alpha _p)), \end{aligned}$$
where \(\varPsi _1(\alpha _j)\) is the first derivative of a digamma function \(\varPsi (\alpha _j)\) with respect to \(\alpha _j\). Next, we compute the first derivative of the observed-data log-likelihood function.
$$\begin{aligned} \nabla \ell _\mathrm{obs}(\varvec{\alpha })= & {} n \mathrm{diag}\left( \varPsi _1\left( \sum _{j=1}^p \alpha _j \right) \right) \\&-\, n \mathrm{diag} \left( \varPsi _1(\alpha _1),\ldots ,\varPsi _1(\alpha _p) \right) + \sum _{i=1}^n \log y_{ij}. \end{aligned}$$
As a result, the direction vector has its jth element as
$$\begin{aligned} \frac{\varPsi _1\left( \sum _{k=1}^p \alpha _k \right) }{\varPsi _1(\alpha _j)} - 1 + \frac{\sum _{k=1}^n y_{kj}/n}{\varPsi _1(\alpha _j)}. \end{aligned}$$
In this case, the iteration becomes identical to that of the EM gradient algorithm (Lange 1995a) except for steplength.
The AIFS method is compared with the IFS method for the data from Mosimann (1962). The data represent the proportions of the diet ingredients for ducklings. The data have its size \(n=23\), and \(p=3\) variables. The same data are used in the paper about the EM gradient algorithm by Lange (1995a). The EM gradient algorithm has the same form of the iteration as that of the IFS method shown in Eq. (1). Both the EM gradient algorithm and the IFS method have an arbitrary steplength. I use the steplength of two, because Lange (1995a) suggests the steplength of two approximately halves the number of iterations until convergence for the EM gradient algorithm. Hereafter, the AIFS method, and the IFS method with the steplength of two are referred to as AIFS, and IFS, respectively.
Starting from the initial value (1.0., 1.0, 1.0), AIFS and IFS converge to the same value of the observed-data log-likelihood function 73.1250 with the same estimates (3.22, 20.38, 21.69), which is set to be \(\varvec{\alpha }^{*}\) for later use. The convergence criterion is \(\Vert \nabla \ell _\mathrm{obs}({\varvec{\alpha }}^{(t)})\Vert <10^{-4}\) as before. As a result, the numbers of iterations until convergence are 193 when AIFS is applied, and 735 when IFS is applied. The process of convergence to the 50th iteration is shown in Fig. 2. The horizontal axis indicates the iteration number to the 50th iteration and the vertical axis is \(\varDelta ({\varvec{\alpha }}^{(t)})=|\ell _\mathrm{obs}({\varvec{\alpha }}^{(t)})-\ell _\mathrm{obs}(\varvec{\alpha }^{*})|\). Notice that both methods do not converge yet in the graph. During initial iterations, IFS is closer to the convergent point than AIFS, and after those initial iterations, AIFS becomes closer. This example illustrates that AIFS and IFS work for models for which the EM algorithm could not be used without resorting to numerical methods, and, on the other hand, that AIFS and IFS treat such models without counting on the other computational methods. It also shows, with the data of diet ingredients for ducklings, that IFS, which is equivalent to the EM gradient algorithm, can be drastically improved to AIFS which appropriately adjusts the steplength at each iteration.
Computing time
In the previous four subsections, from the perspective of the number of iterations until convergence, the (A)IFS was compared with the EM algorithm and its variants. The theorems given above also mainly concern the number of iterations. Another interesting perspective is the computing time. In this subsection, we make the comparison among the AIFS method, the IFS method, the PX-EM algorithm, and the EM algorithm in terms of the computing time using a simulation technique.
We conduct four simulations corresponding to each example above. In each simulation, the same set of data as each example is used with the same initial values to make the same comparison. We make the parameter estimation 1000 times and measure the computing time with the unit of millisecond, obtaining the mean and standard deviation of the computing times. Note that although the multivariate t-distribution uses a randomly generated set of data, the simulation uses a fixed set of data as in Sect. 5.3.
Table 3 Computing time (millisecond) and its standard deviation in parentheses. Note that “dist.” stands for “distribution,” and all the values are rounded At each iteration, the (A)IFS method takes more computation than the EM algorithm does. This is because the (A)IFS uses steplength adjustment, that is, the (A)IFS gives an initial steplength and adjusts it if necessary. Thus, if the numbers of iterations until convergence do not vary between the (A)IFS method and the EM algorithm for the same data, the (A)IFS method should take a longer computing time than the EM algorithm. However, this is not the case: the number of iterations until convergence varies between the (A)IFS method and the EM algorithm. As all the examples show, the (A)IFS method takes a smaller number of iterations until convergence. Although the computational amount due to use of the (A)IFS method increases at each iteration, the total computing time should be shorter under the condition that the number of iterations until convergence becomes small enough. Thus, the moderate decrease in the number of iterations due to use of the (A)IFS method would offset the increase in the computational amount due to use of the (A)IFS method, resulting in no change in the total computing time. The large decrease in the number of iterations would shorten the total computing time.
Table 3 shows the result of the simulation. In the table, AIFS, and IFS denote the AIFS method, the IFS with the initial steplength of 1.5 for the normal distribution example and 2 for the Poisson mixture distribution example and the Dirichlet distribution example, and 1.2 for the multivariate t-distribution example. PXEM, and EM denote the parameter-expansion EM algorithm, and the EM algorithm, respectively. The mean of computing times is given with its standard deviation in parentheses. All values are rounded to four decimal places.
In the normal distribution example, the number of iterations decrease from 24 of EM to 17 of IFS, and to 12 of AIFS. The decrease in the number of iterations is not large enough to offset the increase in the computation amount due to use of IFS and AIFS. The computing time order may almost reflect the computational amount order. AIFS needs more computational amount than IFS to compute the initial steplength and adjust it, and IFS needs more than EM to adjust the initial steplength of 1.5.
In the rest of the simulation result, AIFS always takes the shortest computing time. This reflects that the number of iterations due to use of AIFS is small enough, and the computational amount due to AIFS increases a little. AIFS uses a steplength of a theoretically good initial value, and the selected steplength needs almost no adjustment. IFS outperforms EM in the multivariate t-distribution example, and not in the normal distribution example and the Poisson mixture distribution example. The reason might be that IFS uses a steplength of a guessed initial value, and the steplength needs its appropriate adjustment if any.
The result as a whole indicates that the AIFS method works better in terms of computing time as well as the number of iterations. Taking into consideration that the AIFS method can compute the standard error of the estimates without additional programs, the AIFS method is all the more favorable than the IFS method, the EM algorithm, and its variants.
A practical problem in deciding whether to use the AIFS method is that it is unknown before executing the actual computation which of the AIFS method, the IFS method, or the EM algorithm takes the shortest computing time. However, it is beyond the scope of this paper and it will be dealt with in a future paper.