We thank Levine and Wackerle (2020) for their response to our literature review, ‘The emotional facet of subjective and neural indices of similarity’. They raised an excellent point—that it is important to take into account the control of possible confounding factors in light of the specific research question. For example, in studies primarily interested in individual differences in the representational space, it is not necessary to control for visual and semantic similarity during the stimulus selection; such control could even be detrimental if it severely decreases the sample.

We agree with the authors that it is challenging to exert experimental control over complex scenes, and that therefore, the eventual level of experimental control is often a compromise between the desirable and the feasible. For example, while available datasets are easier to use, they also present significant limitations. To overcome them, in a recent unpublished study we selected 72 complex pictures that depicted 4 different outdoor situations (4 categories, 18 pictures per category) by using different sources [e.g. Emopics (Wessa et al. 2010), NAPS (Marchewka et al. 2014), Google images]. According to the results from Levine and Wackerle (2020) reported in Fig. 1(B), 18 pictures per category should be enough to sample the representational space and avoid spurious correlations. Yet creating this stimulus sample required substantial effort, which will not be justified in every case.

We also agree that the methodological concerns that we discussed depend on the goal of the study. When researchers are interested in individual differences between participants in the representational space, the factors that we mentioned are not confounding. Yet when authors are interested in individual or group differences in emotional similarity judgements, then it may be beneficial to control for factors that affect the thematic similarity within emotional and neutral stimulus categories. While individuals may feel differently about specific stimuli, it is possible to control for differential thematic similarity (often higher within emotional vs neutral categories) in advance, because such differences depend on shared semantic knowledge. Removing average differences (here, between stimulus categories) can facilitate the interpretation of individual differences.

In conclusion, we appreciated the comments and we thank the authors for the excellent points that they raised. We hope that this exchange will guide researchers during the difficult process of stimulus selection.