On maximum likelihood estimation of the concentration parameter of von Mises–Fisher distributions

Maximum likelihood estimation of the concentration parameter of von Mises–Fisher distributions involves inverting the ratio Rν=Iν+1/Iν\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_\nu = I_{\nu +1} / I_\nu $$\end{document} of modified Bessel functions and computational methods are required to invert these functions using approximative or iterative algorithms. In this paper we use Amos-type bounds for Rν\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_\nu $$\end{document} to deduce sharper bounds for the inverse function, determine the approximation error of these bounds, and use these to propose a new approximation for which the error tends to zero when the inverse of Rν\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_\nu $$\end{document} is evaluated at values tending to 1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1$$\end{document} (from the left). We show that previously introduced rational bounds for Rν\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_\nu $$\end{document} which are invertible using quadratic equations cannot be used to improve these bounds.

It can be shown (e.g., Schou 1978) that for ν ≥ 0, R ν is strictly increasing, and satisfies the Riccati equation R ν (t) = 1 − ((2ν + 1)/t)R ν (t) − R ν (t) 2 . As R ν and hence also its derivatives can efficiently be computed via its Perron or Gauss continued fraction representation (Gautschi and Slavik 1978;Tretter and Walster 1980;Song et al. 2012), solving R ν (t) = ρ should conveniently be achievable by standard iterative root finding techniques, provided that good starting approximations are available (which is particularly important in the right tail of R ν where R ν is rather "flat").
In this paper, we use a family of bounds for R ν first introduced in Amos (1974) to provide substantially sharper bounds for R −1 ν , which have approximation error at most 3ρ/2, and use these results to suggest a new approximation. We establish that these improved bounds also hold for the Dhillon-Sra approximation, which thus has the same maximal approximation error. We also show that the error of the suggested new approximation tends to zero for ρ → 1−, whereas the error tends to −1/2 for the Dhillon-Sra approximation, which thus is too large for large ρ. Finally, we investigate whether the rational bounds for R ν developed by Nåsell (1978) can be used to obtain improved explicit bounds for R −1 ν , and show that for the rational bounds which can be inverted by solving quadratic equations, no such improvement is possible.
We thus see that the Dhillon-Sra approximation is not invalidated by the available inverse Amos-type bounds (in the sense of being outside the range provided by these bounds), and has the same maximal approximation error as these bounds (indicating that it is indeed a good approximation).
As the Nåsell bounds considered in the previous theorem are dominated by the Amos-type bounds used for R ν , the same must be true for the respective inverses.

Numerical comparisons
In the following we compare the number of iterations required to reach convergence by two different algorithms based on nested intervals for finding roots using (1) the Tanabe et al. (2007) and (2) the newly established bounds for initialization. The two algorithms used are (a) a one-dimensional root finding algorithm as implemented in function uniroot() available in R (R Core Team 2013) and (b) the variant of the Newton-Fourier method for the case of strictly increasing concave functions (see e.g., Atkinson 1989, pp. 62-64). Concavity of R ν can be established by using that R ν is the pointwise minimum of a set of Amos-type functions (see Hornik and Grün 2013, Theorem 11). It is straightforward to show that the second derivative of these Amostype functions is non-positive and hence R ν is concave, because it is the pointwise minimum of concave functions. The numbers of iterations required are determined for a range of different d and κ values which are selected similar to those used in Sra (2012), i.e., κ = 100, 500, 1,000, 5,000, 10,000, 20,000, 50,000, 10,000 and d = 500, 1,000, 5,000, 10,000, 20,000, 10,000. Each value of d is combined with each value of κ and they are used to determine the corresponding ρ. For the given d and implied ρ the root finding algorithms are employed to determine κ again. For uniroot() convergence is assessed using argument tol and the Newton-Fourier algorithm is stopped if the relative distance of the midpoint to the endpoints of the interval is smaller than tol. This pre-specified precision value tol is varied and set to values 10 e for exponents e ∈ {−6, −8, −10, −12}.
The results are given in Table 1. For the Newton-Fourier algorithm the newly established bounds always require the same or a smaller number of iterations. In 31 % of the cases one iteration less is required, reducing the number of iterations required by 60 or 20 %. When using uniroot(), there is an improvement achievement in 55 % of the cases, the same number of iterations are required in 42 % of the cases and only in 3 % of the cases a deterioration occurs and one iteration more is required when using the newly established bounds for initialization.
Overall it has to be noted that the number of iterations is rather small to reach convergence when either of the two sets of intervals is used for initialization. However, the general tendency to require even less iterations of the newly established bounds will nevertheless be of interest from a computational point of view if these roots are solved repeatedly, which is required when for example the expectation-maximization algorithm is used to estimate mixtures of von Mises-Fisher distributions (Banerjee et al. 2005).