Commentary on “Robust Modeling in Cognitive Science: Misunderstanding the Goal of Modeling”
The article “Robust Modeling in Cognitive Science” (2019) by Lee et al. makes several recommendations about best practices for cognitive science modelers. Many of these are reasonable and will not be discussed in this commentary. I believe several other critically important recommendations either put too much emphasis on less important components of good practice, or are somewhat misguided, and suggest that these are distorted in part because they are based on a misunderstanding of, or a failure to take into account, the goals of modeling. This commentary will highlight those areas where I believe the recommendations for good modeling practice deserve refinement and change.
KeywordsModeling Modeling goals Modeling practices Practice of science
The article “Robust Modeling in Cognitive Science” (2019) by Lee et al. makes several recommendations about best practices for cognitive science modelers. Some of these are reasonable and will not be discussed in this commentary. However, I believe several critically important recommendations either put too much emphasis on less important components of good practice, or are somewhat misguided, and suggest that these are distorted in part because they are based on a misunderstanding of the goals of modeling. This commentary will highlight those areas where I believe the recommendations for good modeling practice deserve refinement and restatement.
Consider first the recommendations for good practices before the data are collected. These include pre-registration and registered reports in various forms. I have argued elsewhere in venues not specifically addressing these issues for modelers (e.g., Shiffrin et al. 2018) that various forms of pre-registration are in many and perhaps most cases unhelpful and perhaps counterproductive. These arguments are much more cogent in the case of cognitive science modeling. The models we use in cognitive science are all wrong and in almost all cases crude approximations to reality, but useful because they help explain the primary factors causing the data patterns observed. This leads me to focus on models that provide causal accounts and not those that are primarily aimed to provide quantitative descriptions of observed patterns of data. Causal models have two major components: Those representing the primary causal factors and those that tune the model for the exact paradigm and setting of the experiment(s) modeled. A good modeler is acutely aware of the difference between these two components and takes care to show that the qualitative patterns of data and the major portion of variance in the data are produced by the causal components and not the components needed to fit the model to the paradigm and setting. Nonetheless, both components are necessary to fit the results of any study or studies, and as a result, it is not helpful to make exact model predictions in advance. Instead, modelers work back from the observed data, adjusting the model to take into account the components specific to the paradigm and setting, while trying to establish a primary role for the causal components of the model (causal components that could have been part of a prior theory or model, or could be new for the current study or studies). There should be almost no situations where it makes sense to pre-register predictions of a model, because the model is known to be wrong (usually very much so) and not tuned to the new aspects of the new study or studies. In addition, failure of advance predictions are often due to the lack of omniscience on the part of the theorist, a well-known part of being human; there is nothing better than seeing the data to lead one down a path toward better understanding.
It is worth taking a higher level perspective on this issue: The article clearly makes an attempt to import recommendations for good practice arising out of the “reproducibility crisis,” a movement not aimed at modeling research, but rather at empirical research and its analysis and interpretation. Among other matters, that movement argues against HARK-ing, hypothesizing after the data are known. In fact, almost all of science is based and should be based on HARK-ing, but in fact, this procedure can be and sometimes is misused in the hands of some scientists. Whatever the arguments against HARK-ing when used in empirically oriented research, they hardly ever apply in the case of modeling oriented research: The modeling enterprise is rooted in hypothesizing from the data. Of course, every model, once developed, should be tested and explored further; such additional testing would have the primary aims of assessing generalization and the way to improve and/or replace the model with one better. However, the modeler should (as before) start with the new data and then use them to improve and or replace the prior model or models.
A second form of pre-registration that the article suggests (though apparently misplaced in the post-registration section) is pre-registration of the method of analysis and evaluation. This assumes we know what data will arrive, which generally is not the case, even for a good theory, because each experimental setting changes. Beyond this point, science and modeling progress when new data cause us to refine our existing models and theories, or even better, develop new and better ones. The new data patterns generally motivate new and more appropriate methods of analysis and evaluation, compared with those anticipated in advance. Thus, it is routine and appropriate to design one’s methods of analysis and evaluation based on the data observed, rather than on the data that had been wrongly anticipated. Is there a worry that researchers would try to hide the fact that the data collected were unexpected, perhaps because they are trying to defend the applicability of a prior theory? This may sometimes happen but is rather foolish, because researchers become famous and science progresses when new data are found and new theories designed to explain them. Protecting again such foolishness is hardly a strong argument for pre-registration. Furthermore, pre-registering one’s prior model and its predictions is likely to produce an even stronger bias to confirm that model (e.g., Cialdini 2001).
The target article goes further to suggest that pre-registration is useful because much data proves non-diagnostic with respect to model comparisons, partly because different evaluation metrics provide conflicting answers. One does not need pre-registration to deal with such cases, but rather honesty concerning the fact that the data provide weak conclusions at best (in which case, a good scientists might well choose to delay publication until more diagnostic data are available). Just as important, there is no good reason to commit in advance to one evaluation metric when it is obviously better to employ several, and when the data might suggest some better suited for analysis than others.
For these reasons (and others), I believe “registered modeling reports” are an even worse idea than other forms of pre-registration. They add to the burden faced by authors and reviewers. At best, they will lead us to confirm our wrong models. At worst, the fact that we observe failures of advance predictions or see the need for new analyses and models is not a problem but the way that science and modeling should move forward.
Consider next “post-registration” and practices after the data are seen. Much space is devoted in the article to checklists and commitment to certain utility functions. The variability in studies, data outcomes, goals, resources, and so on that I mentioned earlier imply that such recommendations will work well when the results turn out as expected, which seldom happens. Results that turn out as expected tell us very little compared with the cases where new and unexpected results are found. To look at this a slightly different way, checklists and advance commitments look backwards to confirming the past and reifying what we think we now know, rather than looking to new results and models in the future; in addition, they are based at heart on an idea that there is one best way to carry out science, failing to take into account the immense diversity of ways we make advances.
Let me make a few more remarks concerning the article’s suggestions concerning model comparison. Unfortunately, these overly emphasize quantitative fit, thus missing the more important goal of modeling: helping us understand the mechanisms and causal processes underlying cognition and behavior. Among other things, this goal implies that it is the qualitative pattern of results across conditions in both the present study and in prior studies that are more important than quantitative measures of fit. It is also critical when comparing models to understand for each the degree to which the patterns of predictions for that model are due to the proposed causal mechanisms and not the auxiliary assumptions used to adapt the model to the setting and the results. This is partly assessed by exploring the parameter space of the model, something that should always be done. However, such exploration is only partly helpful, because assessment of the real causal properties often requires interventions such as deletion of aspects of the model, and because many critically important elements of a model are not parameterized, but assumed in order to fit the data, or fit the patterns of data. Theory building and modeling in science have little choice but to operate quantitatively, but it is best to keep in mind that there is much more to evaluating the worth of models.
Related to this point is the troubling statement on page 7: “It is critical, however, that exploratory evidence not be misinterpreted as confirmatory evidence if the model was not anticipated before the data were seen.” Pretty much, all modeling is, and should be, carried out after the data are seen. Even more important, the evaluation and merit of a model should not be based strongly on the fact that it had been anticipated (many modelers most often try to fit some version of their prior model, when one exists, but the prior model is almost always changed when applied to the present data), but on all the many criteria for evaluating models: elegance, simplicity, ability to account for a variety of findings from the current study(s) and prior studies, and more along these lines. I can think of numerous examples of new and largely unexpected data leading to the formulation of new quantitative and conceptual theories that were compelling and elegant despite being formulated to explain the data. I will give just one example, the research carried out by George Sperling and Charlie Chubb and colleagues on statistical summary representations as seen in centroid judgments and their use in understanding attentional filtering (Sun et al. 2016). They collected an immense amount of largely unanticipated data in many conditions and formulated an elegant model to explain the processes at work. Their model loses no credit because it was formulated to explain their data; rather it stands out as a perfect example of how a good model that explains a wide array of data stands on its own.
One other point of disagreement: I believe the test of a usefulness of a theory or model should not be based mostly on whether it works in practical applications, as suggested in the section “Solution-oriented modeling.” Equal importance should be given to basic research that produces advances in theory.
To wrap up: There are elements in the target article with which I agree. However, of the four summary points in the conclusion, I agree strongly with only the third “undertaking detailed evaluation of models to understanding their strengths and weaknesses,” and disagreeing significantly with the first, “pre-registering models”; the second, “post-registering exploratory models”; and the fourth, “registered modeling reports.” The justification for these three seems rooted in the assumption that the goal of modeling is confirmation of an existing model (usually one’s own). That is not and should not be the goal of modeling. There can be many goals for modelers, but at the head of the list should be the following: Modeling should be used to upgrade and improve our understanding of the causal mechanisms producing cognition and behavior. Such a goal highlights refining, changing, and replacing, not confirming.
- Cialdini, R. B. (2001). Influence: science and practice (4th ed.). Boston: Allyn & Bacon.Google Scholar
- Lee, M. D., Criss, A., Devezer, B.,Donkin, C., Etz, E., Leite, F. P., Matzke, D., Rouder, J. N., Trueblood, J. S., White, C. N., & Vandekerckhove, J. (2019). Robust modeling in cognitive science. Computational Brain and Behavior. https://doi.org/10.31234/osf.io/dmfhk.