Methods
Participants
We recruited 85 adults (50 women, mean age = 23.35 years, SD = 7.82) in the Paris region. All had attended university (mean length of university curriculum = 2.85 years, SD = 1.18), but none majored in mathematics. Considering the low complexity of the math problems involved, participants’ curriculum was a clear indicator that they possessed the mathematical expertise required to solve the problems. Sample size was determined using uncertainty and publication bias correction on results from a previous study (Gros et al., 2016), following Anderson, Kelley, and Maxwell’s recommendations (2017).
Materials
Our materials were inspired by Gamo et al. (2010), who showed that problems with the same formal mathematical structure are nevertheless preferentially solved with one of two available solving strategies, depending on the semantic content of the problem. Consider the weight problem in Table 1: this problem can be solved through two strategies. One is a three-step algorithm consisting of calculating the weight of each individual dictionary to compute the weight of the stack of dictionaries Lola is carrying: 14 – 5 = 9; 5 – 2 = 3; 9 + 3 = 12. The other one is a one-step algorithm that requires understanding that since Lola and Joe carry the same Spanish dictionary, calculating the weight of each book is unnecessary. Since the German dictionary is 2 kg lighter than the Russian dictionary, the weight difference between Joe’s and Lola’s books is of 2 kg as well: 14 – 2 = 12.
Table 1 Two isomorphic problems sharing the same mathematical structure but evoking different aspects of our knowledge about the world The duration problem in Table 1 has the same mathematical structure and can be solved using the same solving procedures. However, Gamo et al. (2010) showed that the two solving procedures are not randomly distributed across the two types of problems. Participants favor the three-step algorithm on problems like the dictionary problem (called cardinal problems) and the one-step algorithm on the second type of problems (called ordinal problems). This strategy using imbalance was our starting point. Gamo et al. (2010) and Gros et al. (2017) showed that the differences in the world semantics evoked by the problems resulted in different spontaneous encodings of the situations, from which this imbalance originatedFootnote 1 (see Fig. 1 for a description of this effect). Since cardinal and ordinal problems shared the same structure featuring the same parts and wholes presented in the same order with the same numerical values, the imbalance in strategy use could only be attributed to the variations of the semantic content of the problem statements. Additionally, when considering the correct answers on either algorithm there was no significant difference in adults’ performance between cardinal and ordinal problems, which indicates that the strategy imbalance was not a matter of problem difficulty (Gros et al., 2017).
Gros et al. (2017) have shown that most adults encode collection, price, and weight problems as cardinal representations, whereas they encode duration, distance, and floor problems as ordinal representations. We modified their problems and removed the value of Part 1 so that the three-step strategy could not be used (see Table 2). Consequently, the only solution left was the one-step strategy, which required using the values of Whole 1 and of the Difference (see Fig. 1). The constructed materials are available online (https://osf.io/fxgqh/?view_only=ed1374ef4d204c90a0cb03a30cb0a099). Ordinal problems were 333.5 characters long on average (SD = 38.37) and cardinal problems were 304 characters long on average (SD = 44.94). This length difference was not statistically significant (t(10) = 1.18, p = .26, paired t-test). Crucially, for each problem, participants were presented with the correct one-step solution (e.g. “14 – 2 = 12; Jolene has 12 marbles”). Participants’ task was to decide whether the provided solution worked, or whether there was no solution to the problem. Due to the already established imbalance in strategy use between problems evoking a cardinal encoding and problems evoking an ordinal encoding (Gamo et al., 2010; Gros et al., 2017), we assumed that the measure of participants’ ability to use the only remaining strategy on problems evoking different aspects of world semantics would be an effective assessment of the robustness of these effects.
Table 2 Example of target problems used in the study. Changes introduced from Gros el al.’s (2017) problem statements are italicized in the table for the sake of clarity, but they were not made apparent in the experiment. Translated from French The world semantics hypothesis predicts lower performances on cardinal than on ordinal problems, even among experts, because cardinal problems would require a re-representation of the situation when the only solution available is the one-step algorithm. By contrast, ordinal problems should be easier to solve because participants’ spontaneous encoding facilitates the use of the one-step algorithm. Since university-educated adults can be considered experts in solving subtractions such as 14 – 2 = 12, and since the deep structure of a problem is identical regardless of the objects involved, this prediction could not be made without the world semantics view, especially when participants only need to check the validity of the proposed solution. Additionally, we predict that recoding a situation initially encoded as a combination of subsets (such as a cardinal encoding) into a representation in terms of states and transitions between states (such as an ordinal encoding) is a costly process, requiring a longer response time. Although our hypotheses only regard solvable problems, we also included unsolvable distractors in the materials, so that the correct answer would not always be “This problem can be solved.” Among those distractors the value of Whole 1 was removed instead of the value of Part 1, which rendered the problems unsolvable with either algorithm.
Procedure
Participants answered the questions using three keyboard keys on a 17-in. laptop. Instructions stated that “Some of the problems can be solved using the values provided, while other problems cannot be solved with the available information. Your task is to tell apart problems that can be solved from problems that cannot. Answer as quickly as you can, although being correct is more important than being fast.”
Participants were presented with six target problems that were only solvable with the one-step algorithm: three cardinal and three ordinal problems. An equal number of distractors was introduced to fulfill subjects’ expectations regarding the uniform distribution of yes/no answers. Problem order, cover stories, and numerical values were randomized between participants. The value of Whole 1 was between 11 and 15, Whole 2 between 5 and 9, and the Difference was either 2 or 3.
We used a segmented self-presentation procedure displaying the text line by line on the screen when participants pressed the spacebar. Below, a question appeared: “Given the data provided, is it possible to find the solution?” followed by two possible choices: “(A) No, there is not enough information to find the solution.” “(B) Yes, and the following solution is correct:” (followed by, in the case of the marble problem: “14 – 2 = 12. Lucy has 12 marbles in total”). A solution was proposed for each problem, and it was up to the participants to assess whether it was valid or whether the problem was unsolvable.
Results
Data collected for both studies are available online (https://osf.io/fxgqh/?view_only=ed1374ef4d204c90a0cb03a30cb0a099). The dependent variable was the proportion of correct answers for solvable problems (see Fig. 2). Because multiple binary data points were recorded in a repeated design (each participant provided a binary answer to three ordinal and three cardinal solvable problems), the use of repeated measures ANOVA was deemed inappropriate and replaced by a mixed model (Hector, 2015). We used a generalized linear mixed model with a binary distribution, with the cardinal versus ordinal semantic nature of the problems as a fixed factor, and participants as a random effect. In line with our hypothesis, lay adults performed significantly better on ordinal (81.18%) than on cardinal problems (46.67%); z = 7.84, p < .001, R2GLMM(c) = .29.Footnote 2 Additionally, looking at individuals’ response patterns showed us that 65.9% of the participants made fewer mistakes on ordinal than on cardinal problems, 11.8% made no mistakes at all, 15.3% made the same number of mistakes in cardinal and in ordinal problems, and only 7.1% made more mistakes on ordinal than on cardinal problems.
Further analyses were conducted on participants’ response times (RTs) on solvable problems that had been successfully identified as such by the participants (see Fig. 3). Because the number of correct answers could vary from 0 to 6 for each participant, the number of RT data points varied accordingly, and the use of repeated-measures ANOVA was again deemed inappropriate (Hector, 2015). A linear mixed model with subjects as a random effect and semantic nature of the problems as a fixed factor showed that participants took more time to correctly solve cardinal (M = 34.05, SD = 18.78) than ordinal problems (M = 26.85, SD = 12.49), χ2 (1) = 29.14, p < .001, R2LMM(c) = .44. Additionally, we studied the participants’ individual response patterns to identify whether different participant profiles existed. For each participant, we computed the difference between their mean RTs on correctly solved cardinal and ordinal problems (see Fig. 4) and we performed Hartigan’s dip test for unimodality versus multimodality on the resulting distribution (Hartigan & Hartigan, 1985). The analysis failed to reject the null hypothesis that participants’ responses came from a unimodal distribution (D = .028, p = .94), thus providing no empirical ground to assume that the distribution of response times was multimodal.
Discussion
The difference in performance between cardinal and ordinal problems indicates that despite their expertise regarding basic subtractions, the adults’ answers were significantly influenced by the semantic content of the problem statements. This confirms previous results obtained with the “complete” version of the problems that could be solved either with the three-step algorithm or with the one-step algorithm (Gamo et al., 2010; Gros et al., 2017). Here, we showed that the strategy imbalance observed in these previous studies was not an effect of mere preference for one strategy over another, but an actual impossibility to identify the relevance of the one-step algorithm on cardinal problems, as attested by the fact that on these problems more than half of the participants rejected a perfectly valid solution, despite only needing to check its validity. Regarding RTs, the fact that correct answers took more time on cardinal problems suggests that recognizing the solution to a problem evoking aspects of world semantics seemingly incompatible with the solution required an extra processing step. This is also supported by the fact that there was no significant difference in length between cardinal and ordinal problems. This is in line with the recoding process we predicted. These results show that the semantic content of a problem can prevent university-educated adults from recognizing a simple subtraction as the solution to a problem whose mathematical structure is undoubtedly within their level of expertise. We designed a second study to identify whether such effects would remain with expert mathematicians, known to be especially accustomed to abstract reasoning.