When a non-native-English-speaking scientist submits his/her manuscript to an international ecological journal, he/she often asks English proofreading of a professional or of his/her native-English-speaking colleague. However, interestingly, there are few authorized systems to encourage proofreading of statistical methods. One reason for this trend might be that statistical methods have no authorized standards as does scientific English. Statistical practices, and in some cases paradigms, are quite different among scientific fields. Population ecologists, who are believed to be relatively better at statistics than ecologists specializing in other fields, also have to consider which statistical methods and paradigms they should apply to their own researches. Are we population ecologists actually good at statistics? I would say no. Most of us only specialize in specific statistical methods and paradigms.

That would be why Dr. Takashi Saitoh who was the president of the Society of Population Ecology asked Dr. Kohji Yamamura and myself to organize a special feature on statistics of population ecology from a broad perspective. In this introductory review, I briefly list questions and concerns about statistics that I have felt during my career as a population ecologist. I first discuss the dominance of classical frequentist approaches in chronological order for which they appeared for me personally, then I briefly discuss the two newcomers, Bayesian and evidential statistics, and finally, I introduce the three contributing articles for this special feature. This special feature is based on a symposium held in Tsukuba, Japan, on 11 October 2014.

Dominance of classical frequentist approaches

The vast majority of textbooks on statistics in the library of my university in the middle of the 1980s were, and might still be, classified as classical frequentist statistics. Here “classical frequentist” refers to non-Bayesian or non-evidential, and mainly consists of null-hypothesis testing and P value worship approaches that assume normal distributions of original or transformed target variables. As other students of population ecology, I had to start learning classical statistics when I was a graduate student. I have always wondered why regressions and ANOVA-type methods have two steps: significance tests of explanatory variables for the data variation as a whole followed by significance tests for parameters or means of sub-units. Even for a simple one-way ANOVA test for three categories, once we detect a significant difference among the three categories, we cannot simply claim that the largest mean value for a category is larger than those of the other two categories. I was taught that we had to perform appropriate post hoc tests even when the plot of mean values clearly showed the difference.

The former step is model fitting, and the latter one is parameter estimation. These two steps sometimes invoke different statistical methods, e.g., model fitting with information criteria, such AIC or BIC, and parameter estimation with Bayesian methods. The former requires post hoc tests to compare parameters of the best models, but the latter can spontaneously compare multiple parameters after obtaining their posterior probabilities by checking the overlap of their posterior distributions. post hoc tests are a variant of multiple comparison (Hsu 1998). Multiple comparison per se does not inherently mean post hoc tests, and there are relevant a priori tests of multiple comparison. The difference between post hoc and a priori comparison is the epistemological attitude towards data collection by researchers. If one designed the comparison before his/her data collection, the test is a priori but it should be treated as post hoc if one did the comparison after his/her data collection. This epistemological difference would affect the complicatedness of calculating appropriate variances in the comparison. Much simpler methods of post hoc comparison, for example the Bonferroni test or its variants (Holm 1979; Moran 2003), often require some kind of programming skills, so one would preferably be able to claim, “I did design the comparison beforehand!”

My supervisor, Dr. Koichi Fujii, mastered statistics under Dr. Robert R. Sokal who is the author of the famous textbook, Biometry (Sokal and Rohlf 1981), which has a good flavor of classical statistics. My friends believed that I would become an obedient successor of this “normal distribution empire.” Then Dr. Nobuhiro Minaka who taught statistics at various institutes and universities at that time, secretly sent me his image of a statistical mandala at the end of the 1980s (Fig. 1). I was very excited about this mandala because with it I learned that there were options other than the “normal distribution empire.” Moreover, those other options were extremely attractive. After that, Dr. Mark L. Taper visited my laboratory as a post-doctoral fellow of the National Science Foundation, USA, and introduced me to the bossa nova of statistics. At the time, besides discriminating egg shapes of two bean beetle species (Taper and Ponciano 2015), Dr. Taper was struggling with quantitative genetic problems using MANOVA (Taper 1990). He was always aware of the statistical power of constructed statistical models. He often questioned me about how many replicates we needed to obtain significant differences among treatments considering statistical power. He recommended that I read a textbook by Dr. Jerrold H. Zar (Zar 1984) rather than Biometory (Sokal and Rohlf 1981). Zar’s book (2nd ed.) was, as far as I knew, the only book that started the first chapter with frequency data analysis, which taught me the meaning of degrees of freedom.

Fig. 1
figure 1

Minaka’s statistical mandala. This image is recreated from Dr. Nobuhiro Minaka’s original one posted at

At the beginning of the 1990s, there was a small boom of randomization statistics (Noreen 1989; Good 1993; Edgington 1995; Manly 1997) among young behavioral ecologists in Japan. Dr. Eiiti Kasuya and his collaborators claimed, “from now on, randomization will take over those classical statistics such as ANOVA and multiple regressions.” They emphasized that randomization methods were custom made, so we could adjust statistics so as to ask any question and judge any problems. Randomization tests were first innovated by Dr. Ronald A. Fisher (Salsburg 2001) and extensively developed by Dr. Bradley Efron (Efron 1982; Hall 1992) as the jackknife, bootstrap, and other resampling methods for reconstructing parameter distribution of populations. One can reconstruct the background distribution believed to exist by simply or honestly resampling obtained data. It is just like believing that nature is full of fractals (Peitgen et al. 1992). Resampling plans need sophisticated stratification of variables if you have problems with multiple variables. I was not sure how to apply randomization tests to all of the statistical problems illustrated in Dr. Minaka’s mandala (Fig. 1).

In the middle of the 1990s, many population ecologists in Japan routinely used generalized linear models (GLM; e.g., Dunteman 1984; Dobson 1990; Crawley 1993) for their analyses. They fit models to their data, and examined parameter values for the models. Some models showed quite low powers of explanation, or had low adjusted or generalized determination coefficient (R 2) values (Nagelkerke 1991), but their discussions were based on highly significant parameters of the models. Some researchers applied information criteria, such AIC and its variants, but again they derived conclusions from significant parameters even though there might have been alternative models with similar AIC values. Model selection and the following parameter summarization were somehow estranged from one another.

Significance tests for parameters often ask whether the parameter values are greater or less than zero. We all know the criticisms against the silly null hypothesis that reflect a lack of thinking about plausible alternatives, so finding little/no support for the nulls does little to provide evidence for the alternatives (Burnham et al. 2011). So we perhaps forget the criticisms when we perform GLMs. Earnest population ecologists are aware of random effects as well as fixed effects, but decisions on whether factors are fixed or random effects are often arbitrary (Royle and Dorazio 2008). Not a few articles encourage scientists to get rid of P values and testing between null-model and non-null-model hypotheses (e.g., Anderson et al. 2000; Stephens et al. 2005). Recently the scientific journal, “Basic and Applied Social Psychology,” has gone so far as to ban P value significance tests (Trafimow and Marks 2015)! But many scientific articles still adopt classical statistical methods. This situation resembles that of Mac and Linux users blaming Windows because of its inability to stop malware proliferation, while at the same time, Windows users make up the vast majority of the world’s computer-using population.

Bossa-nova statistics from Bayesian and evidential approaches

Bayesian approaches are the most recent trend for population ecology (e.g., Ellison 2004; Qian and Shen 2007). As for randomization methods, evangelists of Bayesian statistics claimed that “everything is solved with Bayesian” (e.g., Albert 2007; McCarthy 2007; Gill 2008). Several Bayesian introductory textbooks criticize classical approaches, sometimes even consuming an entire chapter, and introduce Bayesian methods as a replacement for all of them (e.g., McCarthy 2007; McGrayne 2011). Some extremist opinions claim that Bayesian philosophy cannot coexist with classical philosophy (e.g., Ellison 2004). There was, in fact, stubborn resistance against Bayesian approaches from old schools of thought (e.g., Yamamura 2015). Students would ask, “well, we can obtain posterior distribution of target parameters, but how can we say those parameters are significantly different from zero?” Some textbooks even introduce significance tests in terms of Bayesian approaches (e.g., Albert 2007). “Then which model should we select?” is another question. Bayes factor, DIC and BIC have been proposed, but there exist pros and cons for each of them (Ward 2008; Spiegelhalter et al. 2014; Hooten and Hobbs 2015). On the other hand, there are more moderate Bayesian evangelists that would not mind combining Bayesian with other, even classical, approaches (e.g., Bolker 2008; Royle and Dorazio 2008; Qian 2010).

As the rise of randomization approaches heavily depended on advances in computer sciences, new and practical Bayesian approaches, such Markov chain Monte Carlo (MCMC, Dorazio 2015) and Hamiltonian Monte Carlo (Stan Development Team 2015) have been enabled by progress in calculation techniques with computers. Development of Bayesian-statistics-oriented languages, such OpenBUGS, WinBUGS, JAGS, and Stan, also accelerated the spread of Bayesian approaches (Kruschke 2011; Kéry and Schaub 2012; Stan Development Team 2015). After copying BUGS scripts from books, adjusting parameters for prior probability of one’s data, and then calculating the statistical scripts, posterior distributions are returned. It is often recommended to check states of convergence of the posterior distribution by trace plots or \(\hat{R}\) values (Gelman and Rubin1992), but those checks do not guarantee parameter convergence (Dorazio 2015).

Evidential statistical approaches appear more modest in propagation than Bayesian and other approaches (Taper and Lele 2004). They mainly rely on the invariant characteristic of maximum likelihood or variants of information criteria, and provide simple but clear ways to tell which models should be selected. Interestingly, all the following tools were invented by Dr. Fisher: P value, randomization test, ANOVA, and maximum likelihood estimates. Dr. Fisher himself strongly criticized Bayesian approaches (e.g., McGrayne 2011), but evidential approaches seek a harmonious collaboration with Bayesian methods as well as with classical methods. So far there seems not to have been any big booms in evidential approaches in Japan or in other regions of the world.

Walking through Bayesian, Fisherian, error, and evidential statistical approaches

Dr. Minaka’s mandala (Fig. 1) shows us nearly the whole scope of statistics that we population ecologists should be aware of. I felt that a more simplified version of the mandala could be drawn. A similar trial was done by Dr. Efron who categorizes himself as a Fisherian (Efron 1998, Fig. 2), but here I would like to propose an even simpler mandala (Fig. 3). The horizontal line in Fig. 3 shows the one-dimensional problem space of statistics. The shaded rectangle shows the domain of classical methods, or “normal distribution empire.” Yes, there are many problems outside of the rectangle: On the left-hand side, the amount of data is too small to apply a t test or an ANOVA. On the right-hand side, we have plenty of data, but they are too entangled to apply a simple ANOVA or even a MANOVA. So, for situations represented by the left-hand side, proper guidance would be, “collect more data!” How much data is necessary to shift into the gray rectangle region? And, what about situations in which, we cannot collect more data? Non-parametric methods, and sometimes Bayesian methods, are often invoked to support small sample sizes (e.g., Hinton 2004). Note that neither non-parametric nor Bayesian methods were invented for that purpose (Neave and Worthington 1988; Noether 1991; Sprent 1993; Salsburg 2001).

Fig. 2
figure 2

Efron’s statistical mandala. For each aspect, the sitting place of Dr. Fisher indicates the position of the Fisherian between Bayesian and frequentist. This image is recreated from Fig. 1 in Efron (1998)

Fig. 3
figure 3

A simplified statistical mandala. The horizontal axis indicates a statistical space that ranges from small/simple to large/complex. The shaded rectangle shows the domain of the normal distribution empire. Statistical problems often lie outside of the shaded rectangle

The problem is more serious if your data are located on the right-hand side of the shaded rectangle in Fig. 3. Explanatory variables are complicatedly correlated, and variables to be explained are also highly entangled. Applying classification methods, such cluster analyses and correspondence analyses, may reveal distant relationships among the variables, but some criteria for grouping them are necessary. The proper guidance for such situations is merely “muddle through whatever tools you have!” (Taper and Ponciano 2015). One way of “muddling through” might be to construct hierarchical models with the Bayesian method or variants of GLM methods. But still one should be aware of the non-identifiability problem (Raue et al. 2013). MCMC methods are so powerful, and output tentative posterior probability of parameters; however this may be scientifically nonsensical (see Dorazio 2015; Taper and Ponciano 2015).

This Special Feature is another, albeit non-visualized and rather verbal, mandala. You have to read through it, but after that, you will be able to visualize your own image in order to solve your statistical problems. In this Special Feature, we have three contributing papers by four statistics experts from different disciplines: Dr. Mark L. Taper and Dr. José M. Ponciano from evidential statistics, Dr. Robert M. Dorazio from Bayesian statistics, and Dr. Kohji Yamamura from Fisherian statistics.

Dr. Taper and Dr. Ponciano first overview different statistical approaches: Fisherian, Bayesian, error, and evidential, in terms of population ecology. Their long introduction shows conflicts among the approaches from methodological as well as philosophical points of view. Then, they discuss the evidential statistical approach in depth. This explanation might be a good place to start for those have never heard the name, “evidential statistical approach.” The final section is a detailed list of misunderstandings and confusion of statistics in general, with which population ecologists will no doubt be confronted at some point in their research. Readers might be willing to compare these comments with previous ones from different points of view (e.g., Burnham et al. 2011).

Dr. Dorazio demonstrates contemporary views and attitudes of Bayesian approaches. Based on the learning aspects of Bayesian approaches, he tries to persuade us that “hierarchical modeling” is an engine for current research in the field of population ecology. He strongly recommends Bayesian approaches as a first-choice method. He is not a fanatical Bayesian evangelist at all, and discusses the pros and cons of Bayesian approaches. In particular, he admits that the weakness in choosing prior probability and model comparison has not yet been solved solely within Bayesian approaches, and hence, he recommends combinations with other statistical approaches for those issues. He also provides brief but lucid explanations of MCMC techniques, which most users of Bayesian software packages leave them as black-boxes. His examples are very useful and practical even for Bayesian beginners.

Dr. Yamamura describes himself as a Fisherian rather than a frequentist. He has repeatedly claimed in academic meetings that “Bayesian estimates can be used as an approximation to maximum likelihood (ML) estimates,” which becomes the title of his article. His main criticism of Bayesian approaches is the mal-effects of inappropriate prior probabilities of parameters. He then proposes a Bayesian approximation of objective ML with appropriate transformation that makes the posterior distribution close to a normal one. He explains his idea, named as “empirical Jeffreys prior,” with a practical example of sika deer populations in Hokkaido, Japan. The approximation method is, as Dr. Taper has repeatedly indicated, believed to have a tight relationship with data cloning (Lele et al. 2007).

After reading through the above three articles, I am convinced that readers will have a better understanding of what model selection is and of what parameter estimation is, as well as learn what kinds of tools, such ML and Bayesian procedures, have been implemented for those purposes. Discriminating as well as properly combining (not confusing) these two aspects will work as a compass as readers “muddle through” the mandalas of statistics.