Bayesian Inference on the Bimodality of the Generalized von Mises Distribution

This article introduces Bayesian inference on the bimodality of the generalized von Mises (GvM) distribution for planar directions (Gatto and Jammalamadaka in Stat Methodol 4(3):341–353, 2007). The GvM distribution is a flexible model that can be axial symmetric or asymmetric, unimodal or bimodal. Two inferential approaches are analysed. The first is the test of null hypothesis of bimodality and Bayes factors are obtained. The second approach provides a two-dimensional highest posterior density (HPD) credible set for two parameters relevant to bimodality. Based on the identification of the two-dimensional parametric region associated with bimodality, the inclusion of the HPD credible set in that region allows us to infer on the bimodality of the underlying GvM distribution. A particular implementation of the Metropolis–Hastings algorithm allows for the computation of the Bayes factors and the HPD credible sets. A Monte Carlo study reveals that, whenever the samples are generated under a bimodal GvM, the Bayes factors and the HPD credible sets do clearly confirm the underlying bimodality.


Introduction
Measurements of directional type appear in many scientific fields: the direction flight of a bird and the direction of earth's magnetic pole are two examples. These directions can be in the plane, viz. in two dimensions, as in the first example, or they can be in the space, viz. in three dimensions, as in the second example. These measurements are called directional data, and they appear in various scientific fields: in the analysis of protein structure, cf. e.g. Kim et al [23], in machine learning, cf. e.g. Navarro et al [30], in ecology, cf. e.g. Jammalamadaka and Ulric [35], in ornithology, cf. e.g. Schmidt-Koenig [38], in oceanography, cf. e.g. Lin and Dong [26], in meteorology, c.f. e.g. Zhang et al [41], etc. A two-dimensional direction is a point in R 2 without magnitude, e.g. a unit vector. It can also be represented as a point on the circumference of the unit circle or as an angle, measured for example in radians and after fixing the null direction and the sense of rotation (clockwise or counterclockwise). Because of this circular representation, observations on two-dimensional directional data are distinctively called circular data. During the last two or three decades, there has been a considerable rise of interest for statistical methods for the analysis of directional data. Recent applications can be found, for example, in Ley and Verdebout [25]. Some monographs on this topic are Mardia and Jupp [27], Jammalamadaka and SenGupta [20], Ley and Verdebout [24] and also Pewsey et al. [32]. A basic introduction is given in Gatto and Jammalamadaka [14], and a recent review can be found in Pewsey and García-Portugués [31]. We refer to Fisher [8] for the presentation of circular data from various scientific fields.
This article considers the problem of Bayesian inference on the bimodality of the generalized von Mises (GvM) circular distribution, which is defined in (1). The number of modes for a circular distribution has practical relevance in various fields. It is the case, for example, in meteorology, precisely in the context of the analysis of directions of winds, and in biology, precisely in the study of directions taken by various animals.
Besides its greater flexibility in terms of asymmetry and bimodality, the GvM distribution possesses the following properties that other asymmetric or multimodal circular distributions usually do not have. First, after a re-parametrization, the GvM distribution belongs to the canonical exponential class. In this form, it admits a minimal sufficient and complete statistic. One can refer to Section 2.1 of Gatto and Jammalamadaka [13] for these facts. Second, the maximum likelihood estimator and the trigonometric method of moments estimator of the parameters are the same; cf. Section 2.1 of Gatto [10]. As explained in the next paragraph, the computation of the maximum likelihood estimator is simpler with the GvM distribution than with the mixture of two vM distributions. Third, it is shown in Section 2.2 of Gatto and Jammalamadaka [13] that for fixed trigonometric moments of orders one and two, the GvM distribution is the one with largest entropy. The "maximum entropy principle" for selecting a distribution on the basis of partial knowledge tells that one should always choose distributions having maximal entropy, within distributions satisfying the partial knowledge. In Bayesian statistics, whenever a prior distribution has to be selected and information on the first two trigonometric moments is available, then the GvM is the optimal prior, cf. Berger [4], Section 3.4. For other information theoretic properties of the GvM, see Gatto [11].
Mixture of circular distributions has been proposed in the literature. For example, McVinish and Mengersen [28] examined the use of Bayesian mixtures of triangular distributions in the context of semiparametric analysis of circular data. The mixture of two vM or other distributions provides an alternative bimodal or asymmetric model to the GvM. However, that mixture model does not share the given properties of the GvM. The mixture is not necessarily more practical. While the likelihood of the GvM distribution is bounded, the likelihood of the mixture of the vM(μ 1 , κ 1 ) and the vM(μ 2 , κ 2 ) distributions is unbounded. As κ 1 → ∞, the likelihood when μ 1 is equal to any one of the sample values tends to infinity. This follows from I 0 (κ 1 ) ∼ (2πκ 1 ) −1/2 e κ 1 , as κ 1 → ∞; cf. Abramowitz and Stegun [1], 9.7.1 at p. 377. For alternative estimators to the maximum likelihood for vM mixtures, refer to Spurr and Koutbeiy [39].
The GvM distribution has been applied in various scientific fields, and some recent applications are: Zhang et al. [41], in meteorology, Lin and Dong [26], in oceanography, Astfalck et al. [2], in offshore engineering, and Christmas [7], in signal processing. In spectral analysis of stationary processes, the bimodality of the spectral distribution is an important property. For instance, the frequency domain approach to fatigue life analysis is common in the design of mechanical components that are subject to random loads. In this context, bimodal spectral densities are often observed in the chassis components of road vehicles, cf. Fu and Cebon [9]. In the context of time series, Gatto [12] proposed the GvM spectral distribution as the one that maximizes the spectral entropy. However, the Bayesian literature on the GvM model is practically inexistent: one can only find the Bayesian test on the symmetry of the GvM distribution by Salvador and Gatto [36].
As mentioned, the aim of this article is to introduce Bayesian inference on the bimodality of the GvM distribution. Two approaches of Bayesian inference are considered. The first concerns test of hypotheses, where the null hypothesis H 0 is the bimodality of (1) and the alternative hypothesis H 1 is unimodality. This test was motivated by Basu and Jammalamadaka [3], who proposed a similar test for mixtures of vM distributions. The Bayes factor of this test is obtained. The second approach concerns HPD credible set. Two parameters relevant for bimodality are obtained and the HPD credible set for those parameters is compared with the subset of all parameters yielding bimodality.
We can summarize the Bayesian test as follows. Define by the set of all vectors of parameters ψ = (κ 1 , κ 2 , μ 1 , μ 2 ) of (1) that lead to bimodality and denote the prior probability of bimodality by p = P ψ ∈ . Because we assume that the elements of ψ are independent, the computation of p can be done by simple Monte Carlo. Assume that the components of the sample θ = (θ 1 , . . . , θ n ) are independent and identically distributed (i.i.d.), with GvM distribution (1). Let P [B|θ] be the conditional probability given θ, viz. the sample, for any measurable set B. The Bayesian test requires the posterior probability of bimodality P ψ ∈ |θ .
In the posterior distribution, the components of ψ are generally not independent. The computation of the posterior probability is done by Markov chain Monte Carlo (MCMC). We remind that MCMC is a class of iterative simulation techniques that is based on three main algorithms: the Metropolis, due to Metropolis et al. [29], the Metropolis-Hastings, due to Hastings [17] and the Gibbs sampler due to Geman and Geman [16]. MCMC algorithms allow to generate a Markov chain whose equilibrium distribution coincides with the posterior distribution. The states of the Markov chain are recorded only after a certain initial period, usually called burn-in period: after this period the generated values of the Markov chain become very close to generations from the posterior distribution.
Precisely, we propose the following nested simulation scheme for the posterior distribution of ψ. The main algorithm is the Gibbs sampler: we generate from the full conditional distributions of each parameter. The full conditional of one element of ψ is the posterior conditional distribution of that element given all other parameters fixed. The generation from each one of the full conditionals is done with the Metropolis-Hastings (MH) algorithm. Here, the sampling or instrumental density is a piecewise function that interpolates the full conditional.
The simulations from prior and posterior distributions are used to compute prior and posterior probability of bimodality. These probabilities allow us to infer on the bimodality by means of Bayes factors and HPD credible sets.
The next sections of the article present the following topics. Section 2 presents two Bayesian approaches for the bimodality of the GvM distribution: the testing approach, with the Bayes factor, and the HPD approach. It shows how to obtain posterior and full conditional distributions, for some given prior distributions. Then, a nested simulation algorithm from the posterior distribution is explained. It uses the Gibbs sampler and a MH algorithm. Section 3 presents the results of the simulation study. Bayes factors for unimodal and bimodal samples are obtained. HPD credible intervals for single parameters are presented. A bivariate HPD credible set for two parameters that determine bimodality of the GvM is given. Gelman's convergence diagnostic is presented. This method consists in studying the variance between, the variance within and the shrink factor of m simultaneous chains. We refer to [15] for this. Simulation results under a more general prior are given at the end of the numerical study. Final remarks are given in Sect. 4.

Bayesian Inference and MCMC
This section presents the methodology for Bayes inference on bimodality for the GvM, precisely the derivation of BF and HPD credible sets and the related MCMC. Section 2.1 provides the derivation of the posterior distribution. Section 2.2 concerns simulation from the posterior distribution. Its multidimensional and complicated form requires MCMC. We suggest using the Gibbs sampling from the posterior distribution, in conjunction with the MH algorithm for sampling from the one-dimensional full conditionals.
Inferential aspects related to BF and HPD credible sets are explained in Sect. 2.3.
The prior distribution of κ j is the uniform distribution over 0,κ j , for someκ j > 0, for j = 1, 2. Note that although the major part of the simulation study that follows uses these uniform priors, the more general beta prior is also considered at the end of the simulation study (in Sect. 3.3). The prior distribution of μ 1 is the vM(ν 1 , τ 1 ) distribution with density given by (3). Thus, where τ 1 > 0 and ν 1 ∈ [0, 2π). The prior distribution of μ 2 is the axial vM with density where τ 2 > 0 and ν 2 ∈ [0, π) and τ 2 > 0. We use the notation μ 2 ∼ vM 2 (ν 2 , τ 2 ).
The likelihood of the sample θ = (θ 1 , . . . , θ n ) is given by: The priors (4) and (5) and the likelihood (6), yield the joint posterior density of (κ 1 , κ 2 , μ 1 , μ 2 ) given by where I denotes the indicator function. We note in particular that, although (κ 1 , κ 2 , μ 1 , μ 2 ) are a priori independent, they are a posteriori dependent. The number of modes of the GvM(μ 1 , μ 2 , κ 1 , κ 2 ) distribution is determined by the nature of the roots of the quartic Gatto and Jammalamadaka [13]. An algebraic analysis that can be found in Salvador and Gatto [37] shows that the null hypothesis of bimodality admits the analytical expression Note that repeated roots are admissible. Unfortunately, the boundary of W does not admit a closed-form expression. We can, however, determine it numerically, and a graphical representation of W is given in Fig. 7a. The alternative hypothesis H 1 is the general one, namely that the roots are of any other possible nature.

Computational Aspects
Bayesian inference on the bimodality of the GvM distribution requires simulation from prior and posterior distributions. Prior simulations are straightforward and posterior simulations are done by MCMC. Besides Monte Carlo, we mention that we compute the normalizing constant of the GvM distribution by where δ ∈ [0, π) and κ 1 , κ 2 > 0, see, e.g. Gatto and Jammalamadaka [13].

Prior Simulation
It follows from the assumed a priori independence of the parameters; their simulation is elementary. For the sake of completeness, we describe here their generation. Simulating the prior probability of bimodality is accomplished with the following simple Monte Carlo algorithm. Let ψ = (κ 1 , κ 2 , μ 1 , μ 2 ).

Posterior Simulation via MCMC
The computation of the posterior probability of bimodality is more complicated. We compute the posterior distribution ψ by MCMC and precisely by Gibbs sampling (cf. Geman and Geman [16]). By iterative generations from the full conditionals, we obtain a Markov chain whose stationary distribution is the posterior (7) (cf. e.g. Robert and Casella [34], p. 371-424).
The full conditional of κ 1 , namely the conditional density of κ 1 given μ 1 , μ 2 , κ 2 and θ , is given by The full conditional of κ 2 is given by The full conditional of μ 1 is given by The last full conditional is for μ 2 and it is given by exp τ 2 cos 2ν 2 cos 2μ 2 + τ 2 sin 2ν 2 sin 2μ 2 Given the above full conditionals, the Gibbs sampling scheme is as follows.
Posterior Gibbs sampling of ψ 1. Select an arbitrary starting value However, the generation from the full conditionals with one of basic methods for univariate distributions (such as inversion, acceptance-rejection) seems neither simple nor efficient. The complication arises from the part 1/G n 0 (δ, κ 1 , κ 2 ). Either by the expansion (10) or by numerical integration of (2), its evaluation is relatively complicated in the context of simulation and it tends to take very small values when n is large. We apply the MH algorithm presented in Robert and Casella [34], p. 276. This algorithm requires a density that is close to the full conditional of interest and from which the simulation is simple. We call this surrogate density as instrumental density, and we present it just after the following MH algorithm.

For a selected burn-in
Thus, our posterior simulation algorithm has two levels of MCMC. The instrumental density g in the above MH algorithm is the normalized piecewise linear interpolation of the full conditional. When the knots are (ψ 0 , y 0 ), (ψ 1 , y 1 ), . . . , (ψ k , y k ), it is given by Let us explain the simulation from the instrumental density of κ 1 , for example. The interpolation of the full conditional f 1 (κ 1 |μ 1 , μ 2 , κ 2 , θ ) is taken with 5 interpolation points, A-E in Fig. 1, which are chosen as follows. We fix the first and last points with abscissae values ε = 0.2 andκ 1 , respectively. The remaining points are generated uniformly in the interval (ε,κ 1 ). The instrumental density g is the normalized interpolation function. We generate from g with the inverse transform method. Figure 1 shows the full conditional f 1 (κ 1 |μ 1 , μ 2 , κ 2 , θ ) and its piecewise interpolation function.

Bayes Factor, HPrD and HPD Credible Sets
This section briefly summarizes the definitions of the Bayes factor, of the HPrD and of the HPD credible sets.
In our setting, the Bayes factor B 01 is an indicator of the support of the sample for the null hypothesis H 0 of bimodality of the GvM, against the alternative hypothesis H 1 of unimodality. It is precisely the ratio of the posterior odds over the prior odds, of  Alternatively to the Bayes factor, Bayesian inference can rely on the HPD credible set. We firstly introduce a general concept. Let f be a density over R. The highest density region (HDR) at level 1 − α is the set where f α is the largest constant such that P [R ( f α )] ≥ 1−α, for some (typically small) α ∈ (0, 1); cf. e.g. Hyndman [19]. In particular, it follows that, amongst all regions with coverage probability 1 − α, the HDR has the smallest volume. Computational aspects of HDR are studied in Wright [40], Hyndman [18] and Hyndman [19].
In the Bayesian setting where f is the posterior density, the HDR is called HPD credible set; cf. e.g. Box and Tiao [5]. When f is the prior density, we call the HDR as the highest prior density (HPrD) credible set. We refer to Chen and Shao [6] for computational aspects of HPD credible sets. Bimodality of the GvM is related to the parameters ρ and δ, as explained in Sect. 2.1. Thus, Bayesian inference on bimodality can rely on bivariate HPD credible sets for (ρ, δ).

Simulation Study
In this section, we apply the methods that we proposed for the inference on the bimodality of the GvM: the MCMC algorithm of Sect. 2.2.2, the convergence diagnostics by Gelman et al. [15] and the Bayes factors and HPD of Sect. 2.3.
• A sample θ of n = 100 values is generated from the unimodal GvM(π, π/2, 1.2, 0.2) in Sect. 3.1.1, for the Bayes factor. This sample is also used in Sect. 3.3.1, for the Bayes factor with an alternative prior. • A sample θ of n = 100 values is generated from the bimodal GvM(π, π/2, 0.16, 0.2) in Sect. 3.1.2, for the Bayes factor. This sample is also used in Sect. 3.3.3, for the Bayes factor with an alternative prior. • A sample θ of n = 100 values is generated from the bimodal GvM(π, π/2, 1.5, 1.5) in Sect. 3.2, for the HPD credible set, and also used at the end of Sect. 3.2, for convergence diagnostic. • Two samples θ of n = 10 and n = 200 values are additionally generated from the same bimodal GvM(π, π/2, 1.5, 1.5) in Sect. 3.2, for the study of the influence of the sample size n on the HPD credible intervals. • A sample θ of n = 200 values is generated from the unimodal GvM(π, π/2, 1.2, 0.2) in Sect. 3.3.2, for the Bayes factor with an alternative prior.
All these generated samples are displayed through their kernel density estimations. The hyperparameters for the prior specifications are chosen as follows: Prior and posterior probabilities of bimodality are obtained by simple Monte Carlo and Gibbs sampling, respectively, as explained in Sect. This computational intensive Monte Carlo study was run with the software R on a main computer cluster. Computer programs are available at http://www.stat.unibe.ch.

Bayes Factor
This section presents simulation results for the Bayes factor of the test of bimodality of the GvM. For the given sample θ from a GvM, we compute prior and posterior probabilities of bimodality by simulating {ψ (t) } t=1,...,T from the prior and posterior distributions, respectively, with the algorithms of Sect. 2.2. These prior and posterior simulations are iterated N = 500 times, and the retained Bayes factor is the average of N simulated Bayes factors. The section is divided into two parts: in part 3.1.1 we generate the sample θ from the unimodal GvM, whereas in part 3.1.2 we generate the sample θ from the bimodal GvM.

Unimodal GvM Sample
We generate the sample θ of size n = 100 from the unimodal GvM(π, π/2, 1.2, 0.2). Unimodality is formally justified by the fact that the point (ρ, δ) = (1.5, π/2) is really in the region W c of unimodality. This region is shown in white in Fig. 7a only for values of ρ smaller than 1.25, but for values larger than 1.25 the region does remain white. The kernel density estimation of this sample is given in Fig. 2a, which confirms unimodality.
We Bayes factor obtained with this study is given bȳ The aim of the process of averaging N = 500 values is the control of the simulation variability. The boxplot of the generated N values of B 01 can be found in Fig. 2b. The asymptotic normal confidence interval at level 0.95 for the Bayes factor is given by . Obviously, although the N Bayes factors may not be normally distributed, their mean is asymptotically normal. We thus obtain the confidence interval (0.0437, 0.0438).
As the sample θ is generated from a unimodal GvM, our prior evidence of bimodality is no longer supported by the sample and thusB 01 does not support bimodality. The Bayes factor gives negative evidence of bimodality according to Table 1. Note that very few large values are discarded from Fig. 2b, but they are nevertheless considered in the computation of the average Bayes factor and of the confidence interval.

Bimodal GvM Sample
We now consider a sample of size n = 100 θ generated from a bimodal GvM, precisely from GvM(π, π/2, 0.16, 0.2). We compute (ρ, δ) = (0.2, π/2), which belongs to the region of bimodality W shown in grey in Fig. 7a. The kernel density estimation of θ shown in Fig. 3a is also indicating bimodality. As before, we compute the representative Bayes factor by the mean of N = 500 values. The prior probability is the same as in In this case, the Bayes factor gives decisive evidence of bimodality. Figure 3b shows the boxplot of the Bayes factors. Here also, very few large values are discarded from the boxplot but are considered for the average of the Bayes factors and for the confidence interval.

Remark: Non-GvM Sample
In this remark, we study the behaviour of this test when the sample does not follow the assumed GvM distribution and thus when the given likelihood is inappropriate. We The density estimation of this non-GvM sample is given in Fig. 4a, and we can observe some clear evidence of trimodality. Figure  This value lies just within the range of positive evidence of bimodality of Table 1. Thus, two of the three modes have been associated with a single one, but it appears difficult to tell which ones they are. Perhaps the two modes of similar height (at η 1 and η 2 ) have been confounded or perhaps the lower mode (at η 3 ) has been neglected. Although we cannot provide an unambiguous explanation, we note that the amount of positive evidence for bimodality is minor. The behaviour of the test in this erroneous situation seems reasonable. This test of bimodality is indeed constructed around the GvM distribution. The null hypothesis is characterized by the region of bimodality of the GvM, W given in (9), which is obtained from the nature of the roots of the quartic (8). The adequacy of the sample with the GvM model could be checked with the goodness-of-fit tests of Section 7.2 of Jammalamadaka and SenGupta [20]. Let us remind that various types of tests are available: Kuiper's test (the circular version of Kolmogorov-Smirnov test), Watson's test (the circular version of Cramer-von Mises test), Ajne's test (the circular version of Chi-square test) and Rao's test (based on spacings between sample angles). The GvM distribution function is required for some of these tests, and it can be found in Section 3.3 of Gatto [12].

HPD and HPrD Credible Sets
This section provides a simulation of the HPD credible set of (ρ, δ), which allows to determine the a posteriori evidence of uni-or bimodality. It also provides the simulation of HPrD and HPD credible intervals for κ 1 , κ 2 , μ 1 , μ 2 . One sample θ of size n = 100 is considered, and it is generated under the bimodal GvM(π, π/2, 1.5, 1.5). Its kernel density estimation is shown in Fig. 5b, and we can see that bimodality is preserved.
We begin with the presentation of the HPrD and HPD credible intervals. These intervals are given in Table 2. We see that all HPD credible intervals are narrower than the HPrD credible intervals and that they contain their true value. The level is 0.95. Then, we illustrate the effect of the sample size n on the HPD credible interval by considering two additional samples of sizes n = 10 and 200 that are generated from the same GvM(π, π/2, 1.5, 1.5). Their density estimation is given in Fig. 5a, c, respectively, and we see that bimodality is preserved. We restrict the analysis to the parameter μ 1 . The results given in Table 3 show that, as n increases, the HPD credible intervals at level 0.95 become narrower around the true value. A graph of this can be found in Fig. 6, where the posterior densities, HPrD and HPD credible intervals are given. The grey regions have area 0.95, and they are based over the HPD credible intervals. The HPrD credible intervals are displayed with a bold line along the abscissae. As the sample size n increases, the HPD credible intervals become narrower around the true value of the parameter. Computations are done using emp.hpd of the R-package TeachingDemos, with α = 0.05.
As mentioned in Sect. 2.1, the bimodality of the GvM is determined by the number of real roots in [−1, 1] of the quartic q ρ,δ in (8). We can infer on the bimodality of the GvM through the bivariate HPD credible set of (ρ, δ). We use the same sample θ of size n = 100. The level of the HPD credible set is 0.95, and we compute the HPD with the function HPDregionplot of the R-package emdbook.   shows that the HPD credible set obtained from the simulation is completely included in W, the region of bimodality of the set of the parameters (ρ, δ) given in (9). This region of bimodality is shown in grey. The graph with the region of bimodality is due to Salvador and Gatto [37] and Pfyffer and Gatto [33]. Thus, the HPD credible set gives us a clear confirmation of the bimodality. Figure 7b shows the posterior of (ρ, δ) and indicates that the HPD of Fig. 7a should maintain the connected form at different levels.
We now verify the convergence of the simulations. We consider only the sample θ of size n = 100, which is also used in the first part of this section.  In order to apply Gelman's diagnostic, we generate m = 5 chains. According to Gelman et al. [15], the values of the variance between b, the variance within w and the shrink factor r in Table 4 confirm convergence for each one of the four components of the chain. Thus, convergence of the whole Markov chain {ψ (t) } t=1,...,T is achieved. A visual representation of the convergence is done using the command gelman.plot of the package coda that plots the shrink factor r evaluated at different instants of the simulation. The Markov chains are divided into bins according to the parameters bin.width and max.bins, which are the number of simulations per segment and the maximum number of bins, respectively. The shrink factor is repeatedly calculated. The first shrink factor is calculated with 50 simulations, the second with 50 + bin.width simulations, the third with 50 + 2 · bin.width simulations, and so on. In Fig. 8, we can see how the shrink factor of each parameter κ 1 , κ 2 , μ 1 , μ 2 evolves with respect to the iterations of the chain: the convergence of the chains associated with κ 1 and κ 2 appears slower, since at the beginning the shrink factors fluctuate. But then they stabilize around the value 1. Thus, we can consider the test passed and convergence is achieved. Fig. 7 a Uni-and bimodality regions of GvM together with HPD credible set of (ρ, δ) at level 0.95. b Posterior density of (ρ, δ) corresponding to the HPD credible set given in a Visuals results for the convergence of the single parameters can be found in Fig. 9. The four histograms of the simulations are close to four graphs of the full conditionals (solid line), indicating that each one of the four components of the Markov chain converges to the respective full conditional.

More General Prior
So far, the prior distribution of κ j has been the uniform distribution over 0,κ j , for someκ j > 0, for j = 1, 2. This choice may not always be convenient mainly because of the abrupt jump of the densities atκ 1 andκ 2 , making the choice of these values rather consequent. Because we may want a prior that decays slowly on the right tail, we can consider the two beta priors with densities proportional to κ j /κ j α j −1 1 − κ j /κ j β j −1 , ∀κ j ∈ [0,κ j ], with α j , β j > 0, for j = 1, 2. Thus, the uniform priors used so far and given in Sect. 2.2.1 are retrieved with α j = β j = 1, for j = 1, 2. In the simulation study, we select α j = β j = 2 and, as before,κ j = 3, for j = 1, 2. The priors for μ 1 , μ 2 are taken as in Sect. 2.2.2. All hyperparameters are as in Sect.

3.
Thus, the full conditional of κ 1 is given by and the full conditional of κ 2 is The full conditionals of μ 1 and μ 2 are as in Sect. 2.2.2.
We use the unimodal sample of size n = 100 and the bimodal sample of size n = 100 that are used in Sects. 3.1.1 and 3.1.2, respectively. Moreover, we consider a third additional unimodal sample with larger size n = 200.

Unimodal Sample of Size n = 100
We first consider the same unimodal sample of size n = 100 of Sect. 3.1.1, which is generated from the GvM(π, π/2, 1.2, 0.2) distribution. The density estimation with this sample is given in Fig. 2a The boxplot of the resulting Bayes factors is given in Fig. 10a. The posterior probability of bimodalityP[H 0 |θ] is indeed smaller than its priorP[H 0 ], but not much smaller. The mean Bayes factor shows negative evidence of bimodality, but its value is quite larger than the value obtained in Sect. 3.1.1.

Unimodal Sample of Size n = 200
In order to clarify the question of the size of the previous Bayes factor, we increase the sample size to n = 200. A new sample θ of size n = 200 is generated from GvM(π, π/2, 1.2, 0.2). Its kernel density estimation is given in Fig. 10c The Bayes factor gives again negative evidence of bimodality, but it is now more substantial. The boxplot of the simulated Bayes factors can be found in Fig. 10c. The Bayes factor and the posterior probability are significantly lower than with the sample of size n = 100. This is what we would have expected. The prior probability of bimodalityP[H 0 ] = 0.882 is bigger than the one of Sect. 3.1.1. Thus, we need an unimodal sample of larger size in order to obtain a sufficiently small posterior probability of bimodality.

Bimodal Sample of Size n = 100
We now consider the bimodal sample used in Sect. 3.1.2, of n = 100 replications generated from GvM(π, π/2, 0.16, 0.2). The density kernel estimation of the sample can be found in Fig. 3a The Bayes factor gives thus decisive evidence of bimodality. The boxplots of the resulting Bayes factors can be found in Fig. 10d.

Final Remarks
This article studies Bayesian inference on the bimodality of the GvM distribution: MCMC algorithms for computing Bayes factors and HPD regions are proposed and tested by simulation. Thus, it extends the list of available inferential methods for the GvM model: trigonometric method of moments estimation, cf. Gatto [10], and Bayesian test of symmetry, cf. Salvador and Gatto [36].
In Sect. 1, various interesting properties of the GvM model are mentioned together with the importance of bimodality in various applied fields. For example, data on wind directions Gatto and Jammalamadaka [13] show bimodality. Bimodality is indeed more frequent and important with circular than it is with linear data. In this context, the proposed Bayesian tests are practically relevant.
One may extend this study in two ways. First by proposing a frequentist test of bimodality. One could write a likelihood ratio test and exploit the partition of the parametric set in terms of unimodality and bimodality which is shown in Fig. 7a. A second and perhaps more challenging way of extending this study would be by considering prior distributions with dependent GvM parameters. One would thus generalize the full conditionals and the MCMC algorithm of Sect. 2.2.2. We would then obtain circularcircular, circular-linear or other unusual types of joint distributions. A difficulty would be to obtain specific simulation methods from these joint distributions. Another difficulty would be the quantification of the dependence of the joint distributions on these manifolds. We could use or adapt the circular-circular and circular-linear correlation coefficients that are given, for example, in Sections 8.2-8.5 of Jammalamadaka and SenGupta [20].
Funding Open access funding provided by University of Bern.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.