On the Importance of Power Analyses for Cognitive Modeling
The high prevalence of underpowered empirical studies has been identified as a centerpiece of the current crisis in psychological research. Accordingly, the need for proper analyses of statistical power and sample size determination before data collection has been emphasized repeatedly. In this commentary, we argue that—contrary to the opinions expressed in this special issue’s target article—cognitive modeling research will similarly depend on the implementation of power analyses and the use of appropriate sample sizes if it aspires robustness. In particular, the increased desire to include cognitive modeling results in clinical and brain research raises the demand for assessing and ensuring the reliability of parameter estimates and model predictions. We discuss the specific complexity of estimating statistical power for modeling studies and suggest simulation-based power analyses as a solution to this challenge.
KeywordsCognitive modeling Power analysis Sample size Cognitive neuroscience Computational psychiatry Simulations
In their target article of the current special issue, Lee et al. (2019) take the current crisis in psychological research as a starting point and motivation to propose a catalog of good research practices for cognitive modeling. This catalog features many plausible and laudable methods such as preregistering models and evaluation criteria, testing the generalizability of a model, and keeping track of all changes made during iterative model development (“postregistration”). Surprisingly, Lee et al. (2019) dismiss the potential need for a priori power analyses and sample size1 determination rather swiftly, stating that the respective recommendations developed for other psychological research would “not carry over to the standard practices of cognitive modelers” (p. 4). They concede that in a specific case, that is, when conducting null hypothesis significance tests (NHST) and collecting data non-iteratively, power analyses are required, but for other instances of cognitive modeling the results of classic power analyses are considered negligible.
In this commentary, we argue that analyses of power—the chance of detecting a true positive result given it exists—are as critical for cognitive modeling as they are for other research areas within and beyond psychology. We comment on the specific intricacies of performing a power analysis in cognitive modeling, stress the importance of developing new techniques to meet these challenges, and provide examples of simulation-based solutions. Finally, we claim that this issue will become even more critical, as insights from cognitive modeling are increasingly used in neuroscientific research and even sought to be implemented in clinical applications.
Power Analyses—a Necessary Part of the Toolbox for Good Scientific Practices
Multiple sources have contributed to the current crisis of confidence in psychology and related fields, including underpowered studies, flexibility in data analysis, and publication bias (Munafò et al. 2017; Poldrack et al. 2017). We believe a systematic a priori power analysis belongs in the general toolbox of good scientific practice for cognitive modeling. Power in cognitive modeling can mean the ability to identify a data-generating model or a set of parameters, or to finding an effect. Notably, ignoring power considerations can diminish the benefits of other recommendable practices, such as preregistration.
For example, suppose cognitive modelers wish to demonstrate that their favorite model A provides a better account of a cognitive phenomenon than an alternative model B. They preregister both the “players of the game” (i.e., models A and B) as well as the “rules of the game” (i.e., the evaluation criterion), but do not conduct an a priori power analysis to determine their sample size. This leaves the researchers the freedom to test only a small number of participants or to check if the data favor model A after each participant, engaging in “optional stopping.” In statistical inference, both these questionable research practices are known to increase the probability of false positives (Button et al. 2013; Wagenmakers 2007). Consequently, we argue that the practice of preregistering cognitive models alone will not ensure improved replicability as long as a power analysis is not part of the preregistration.
Are Concerns of Sample Size less Relevant for Cognitive Modelers?
Power plays a critical role in sample size planning. An insufficient sample size in cognitive modeling can lead to imprecise estimates of model parameters and biased inference about models themselves. Regarding model parameters, sample size can increase the precision of parameter estimates (e.g., Wagenmakers et al. 2007) and more data points facilitate parameter discriminability (e.g., Broomell and Bhatia 2014). This also affects NHST among model parameters. But even in projects not comparing estimated parameters by NHST, sample size influences inferences drawn about models themselves. Consider model comparison and the need to account for model complexity: Penalties for model complexity are not sample-size invariant. In small samples, some flexibility-penalizing goodness-of-fit measures favor relatively simple models (e.g., AIC; Busemeyer and Wang 2000), whereas other measures favor relatively complex models (normalized maximum likelihood/minimum description length). In small samples, nested models can sometimes be treated as more complex than full models, although nested models are simpler than the models they are nested in (Navarro 2004). Even for non-nested models, the complexity rank of two models can switch as sample size increases (Heck et al. 2014; Wu et al. 2010). In fact, Heck et al. (2014) put forward minimum sample size requirements for the specific case of multinomial processing tree models, akin to model-dependent rules of thumb for sample sizes. Also, sample size affects the accuracy of model recovery: Small samples decrease the proportion of correctly identified models in model comparisons across a range of fit indices, as shown in simulations (Pitt et al. 2002). The interplay between sample size, model flexibility, and task design, therefore, leads to an important but complex role of conducting power analyses in cognitive modeling.
These issues concerning sample size may suffice to illustrate why we believe that cognitive modeling benefits from power analyses. As Lee et al. (2019) state, simplistic NHST-based recipes for power may not carry over to cognitive modeling unless NHST is key; yet, it is because of the immense degrees of freedom and the rise in complexity that come with cognitive modeling that modelers need to be even more careful in justifying their design and analysis decisions. Power analyses, particularly for determining sample size, matter in more than just in corner cases. Rather, the issue with power analyses for cognitive modeling seems that quantifying a cognitive model’s power is a particularly hard computational problem. One crucial step is to develop methods and tools to advance this very problem.
Power analysis in cognitive modeling is complicated by challenges that warrant further theoretical work. The concept of “power,” for instance, is not well defined in model-based studies; model recoverability and task design affect power when using cognitive models. Given these challenges, we agree with Lee et al. (2019) that it might be impossible to develop rules of thumb for sample sizes in cognitive modeling studies. However, this does not preclude modelers from the responsibility to perform power analyses. Instead, we argue that these analyses can be based on simulating planned studies.
Example: Power Analysis for Detecting Differences in Cognitive Strategies
The Increasing Role of Cognitive Modeling in Neuroscience and Clinical Research
The target article’s comment on power analysis leaves the impression that cognitive modelers engage much less in NHST than other scientists. Although modelers may indeed possess a comparatively high affinity to Bayesian statistics, and simulation studies often do not require inferential statistics, we still believe that the prevalence of NHST among the cognitive modeling community is high. For instance, when qualitative predictions of cognitive models—such as context effects in decision-making (Busemeyer et al. 2019)—are validated by testing whether experimental data meet them, these tests are often performed with frequentist statistics (e.g., Gluth et al. 2017; Trueblood et al. 2014; Tsetsos et al. 2012; but see Evans et al. (2019) for an example of using Bayesian statistics). Furthermore, it should be noted that even a Bayesian data analysis approach does not exempt researchers from outlining the study design before data acquisition (Schönbrodt and Wagenmakers, 2018). More specifically, Schönbrodt and Wagenmakers criticize the absence of systematic designing of experiments that rely on Bayes factors and introduce three possible design classes: a fixed-n design, an open-ended sequential sampling design, and a hybrid design with sequential sampling and maximum n. They argue that—similar to the case of NHST—the highest probability for making a false-positive error occurs at early termination of sequential designs. To address this problem, they recommend to define a minimum sample size and to start applying the (Bayesian) inferential statistics only after this sample size has been reached. Similar to our proposal, Schönbrodt and Wagenmakers’ approach relies on simulating hypothetical experiments given that the complexity of the matter obviates analytical solutions.
On a more general note, a new age of “computational psychiatry” has recently been put forward, according to which cognitive but also neurobiological and biophysical modeling should be embraced as a promising method to improve diagnoses and therapies in psychiatry (Huys et al. 2016; Montague et al. 2012). To fulfill this promise, cognitive modelers must provide robust and reliable tools to infer latent cognitive mechanisms from (potentially aberrant) overt behavior. Among other things, this will require understanding how sample sizes, task designs, and model-fitting procedures affect the recoverability of model parameters.
Simply applying the power analytic recommendations from statistical inference to cognitive modeling may indeed not help improve the robustness of model-based cognitive research. But sample size profoundly impacts the inferences we draw from formal models, such as those in computational psychiatry, which can harm the robustness of scientific findings despite pre- and postregistering formal models. If we subscribe to Lee and colleagues’ goal of achieving robustness in cognitive modeling, we believe in the necessity to begin studying how, when, and to which degree sample size affects the inferences we draw from and about models (beyond but also in the cases in which NHST are essential). Then we might achieve the degree of robustness we wish for in modeling in cognitive science. We may also find that cognitive modeling needs new power analyses tools and discover new and challenging avenues of methodological research on power analyses.
We thank the members of the Decision Neuroscience and Economic Psychology groups at the University of Basel for critical discussions of the target article in our journal club. We thank Florian Seitz for his work on the power simulation.
S.G. was supported by a grant from the Swiss National Science Foundation (SNSF Grant 100014_172761).
- Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P., et al. (2019). Robust modeling in cognitive science. ArXiv. https://doi.org/10.31234/osf.io/dmfhk.
- Poldrack, R. A., Baker, C. I., Durnez, J., Gorgolewski, K. J., Matthews, P. M., Munafò, M. R., Nichols, T. E., Poline, J. B., Vul, E., & Yarkoni, T. (2017). Scanning the horizon: towards transparent and reproducible neuroimaging research. Nature Reviews Neuroscience, 18, 115–126.CrossRefGoogle Scholar