In the decision-making literature, numerous reports have demonstrated that the presentation order of the choice options affects the outcome systematically (e.g., Bruine de Bruin & Keren, 2003; Houston & Sherman, 1995; Wänke, Schwartz, & Noelle-Neumann, 1995). For example, presentation order has been shown to influence the results in contexts as various as opinion polls (Wänke et al., 1995), preference for political candidates (Houston & Roskos-Ewoldsen, 1998), preference for job options (Slaughter & Highhouse, 2003), evaluation of food products (Dean, 1980), and professional judges’ ratings in figure-skating contests (Bruine de Bruin, 2005, 2006). Analogous effects have been found by psychophysicists for one and a half centuries (see, e.g., Guilford, 1954, and Hellström, 1985, for reviews), and the psychophysical tradition to use many stimulus pairs of varying magnitudes has made it possible to discover that the size and direction of presentation-order effects vary with, among several factors, the stimuli’s magnitude level. For example, Hellström (2003) found that, in length comparisons of lines of relatively long durations, subjects overestimated the left out of two short lines but the right out of two long lines. Similarly, in comparisons of musical excerpts, Koh (1967) found a tendency to prefer the first of two unpleasant excerpts but the second of two pleasant excerpts; the size of this effect varied linearly with the rated pleasantness of the stimulus pairs. Analogous results were found by Englund and Hellström (2012b) for pairs of successive jingles as well as for color patterns. These are but a few examples of what Fechner (1860) called space-order error (SOE) and time-order error (TOE). As is hinted at by the two terms, a SOE appears when stimuli are separated spatially, and a TOE when stimuli are separated temporally. Fechner defined these effects as being positive whenever the left (for SOEs) or the first (for TOEs) stimulus is overestimated relative to the other, and negative for the opposite case.

Inspired by these psychophysical results, Englund and Hellström (2012a) investigated preference choices between everyday objects and phenomena denoted by printed labels (e.g., applepear, headachestomach ache) and found analogous magnitude-level dependent order effects. Specifically, the stimuli in each pair were presented at the left and right margins, respectively, of a page, and preference statements were printed in-between them (e.g., “I like [much more, more, somewhat more, somewhat less, less, much less] than . . .”). Preference was indicated for each stimulus pair by marking the statement that agreed most with the participant’s opinion. Participants also rated their overall opinions of the stimuli—that is, the stimuli’s valence levels. The results showed a tendency to prefer the left/first-read of two attractive stimuli (e.g., applepear) and the right/last-read of two unattractive stimuli (e.g., headachestomach ache); the size of the effect varied linearly with the rated valence level of the stimuli. Englund and Hellström (2012a) dubbed this effect the word-order effect (WOE); in analogy to the SOE and TOE, a positive WOE indicates a tendency to prefer the left/first-read stimulus. One possible explanation for this valence-level-dependent WOE is that it is due to the left–right positioning of the words—that is, that it simply amounts to a SOE. Another possibility is that the effect is due to the semantics of the preference statements dictating a comparison direction (Tversky, 1977; Wänke, 1996). Obviously, pitting these two hypotheses against each other is of great theoretical, but also of practical, importance (e.g., in domains of product marketing and evaluation). Therefore, in the present article we present two experiments that were designed to test the two alternative explanations for the valence-level-dependent WOE reported by Englund and Hellström (2012a). First, however, we present the framework for description of the phenomenon and then the two alternative explanations.

The sensation-weighting model

Englund and Hellström (2012a) found that the valence-level-dependent WOE that they observed could be described well by the sensation-weighting (SW) model, which has proved a powerful tool for describing the effects of temporal and spatial presentation order (i.e., TOEs and SOEs) in comparisons of psychophysical stimuli (e.g., Hellström, 1979, 1985, 2003; Patching, Englund, & Hellström, 2012), and also of aesthetic stimuli (Englund & Hellström, 2012b). This model rests, essentially, on the simple notion, common across many disciplines, that the impacts of two independent variables X 1 and X 2 (here, the valences of the compared stimuli) on a dependent variable Y (here, the preference judgment) can often be efficiently analyzed by calculating the weights in a linear regression of Y on X 1 and X 2. According to the SW model, the subjective difference, d 12, between two compared stimuli can be described as the difference between two weighted subjective magnitudes (the subscripts 1 and 2 denote the left and right [first and second] stimuli, respectively):

$$ {d_{12}}=k\left\{ {\left[ {{s_1}{\psi_1}+\left( {1-{s_1}} \right){\psi_{\mathrm{ref}}}} \right]-\left[ {{s_2}{\psi_2}+\left( {1-{s_2}} \right){\psi_{\mathrm{ref}}}} \right]} \right\}+b, $$
(1)

where k is a scale constant, s 1 and s 2 are weighting coefficients, b is a constant to account for effects not attributable to the weighting process (e.g., a judgment bias), ψ 1 and ψ 2 are the subjective stimulus magnitudes (here, valences), and ψ ref is the magnitude corresponding to the current reference level (ReL). The ReL is conceived similarly to the adaptation level (Helson, 1964), which results from the pooling of focal, background, and residual stimulation, creating a subjectively neutral point—that is, an internal standard. According to the SW model, the size and sign of the order effect equals d 12 when ψ 1 = ψ 2 = ψ, and with the often reasonable assumption that b = 0, Eq. 1 reduces to

$$ {d_{12}}=k\left( {{s_1}-{s_2}} \right)\left( {\psi -{\psi_{\mathrm{ref}}}} \right). $$
(2)

This simplification is particularly useful in studying effects on preferences, using pairs of stimuli that are rather close in their positions on the valence continuum. According to Eq. 2, with the weight relation s 1 > s 2, the left (first) stimulus is overestimated when comparing two stimuli above the ReL, and the right (last) is overestimated when comparing two stimuli below the ReL. This weight relation could then account for the valence-level-dependent order effects obtained by Englund and Hellström (2012a). That is, the left stimulus had a greater impact than did the right on the outcome of the preference comparison, which led to the tendency to prefer the leftmost of two attractive stimuli and the rightmost of two unattractive stimuli.

The space-order hypothesis

As we noted above, the first possible explanation for the valence-dependent WOE, described by the weight relation s 1 > s 2, is that this order effect is due to the horizontal spacing of the words—that is, to a SOE. This interpretation is in line with previous psychophysical results as well as with the right-hemisphere lateralization hypothesis; a higher weight for the left than for the right stimulus of two horizontally spaced stimuli has been found previously for comparisons of line length (Hellström, 2003; Masin & Agostini, 1991), and more efficient processing by the right hemisphere (receiving stimulation from the left visual field) than by the left hemisphere has been found for emotional stimuli (e.g., Borod et al., 1998; Lang, Bradley, & Cuthbert, 1990). A modified version of this hypothesis would point to the possible importance of the successiveness of stimulus presentation that arises from the habitual reading order (for Swedish participants, from left to right), which would make the WOE akin to the TOE. (It should be noted that self-administered presentation of printed stimuli cannot be characterized as strictly simultaneous or successive.)

The comparison direction hypothesis

The second explanation—that the valence-dependent WOE, described by the weight relation s 1 > s 2, is due to the comparison direction—comes from cognitive psychology. In the cognitive research on presentation-order effects in preference comparisons, the most prominent paradigm is that of feature matching (e.g., Houston & Sherman, 1995; Houston, Sherman, & Baker, 1989, 1991; Wänke et al., 1995), which is based on Tversky’s (1977) contrast model of similarity comparison. The basic assumption of feature matching is that each stimulus can be represented by a set of features, and that the comparison is directed so that one stimulus, the subject, is compared to another stimulus, the referent. Furthermore, Houston and Sherman (1995) argued that, for judgments of preference, the shared features of the subject and referent are cancelled out because they do not carry any distinguishing information. Then, the unique features of the subject work as a checklist, and these features will be looked for actively among the features of the referent. Given equally many unique features of both stimuli, this checklist procedure will induce the perception that the subject possesses a larger number of the unique features. Therefore, the subject will be chosen whenever the unique features of the two stimuli are attractive, whereas the referent will be chosen whenever the unique features are unattractive. Consequently, one critical assumption behind the accuracy of feature-matching models based on Tversky’s (1977) ideas is that the direction of the order effect depends on the comparison direction—that is, on which stimulus is the subject in the comparison. In line with this interpretation of the effect, Englund and Hellström (2012a) suggested that the higher weight for the left stimulus (s 1 > s 2) in their results was due to the semantics of the preference statements pointing out the left stimulus as the subject, to be compared to the referent (the right stimulus). Such an effect could be mediated by more attention being allotted to the subject than to the referent (cf. Hellström, 1985; Tversky, 1977).

The present study

To the best of our knowledge, in only one study (Wänke, 1996) have WOEs been studied by pitting a word-order hypothesis against a comparison direction hypothesis. Wänke investigated the matter explicitly by using a factorial design with comparison requests with different semantic structures and different stimulus presentation orders. She concluded from her results that semantic determination of the comparison direction affects the responses, but that mere word order does not. A few important points need to be made regarding Wänke’s study, however. First, participants were asked the questions by interviewers, thus making the stimuli separated temporally rather than spatially. Thus, her results did not address the present issue of whether the valence-dependent WOE is analogous to a SOE—that is, whether it is induced by the spatial order of the stimuli. Second, the possibility exists that the different wordings used to vary the comparison direction “may have changed the comprehensibility of the sentence” (p. 404). Third, out of the three choices that Wänke’s participants made, only one was of a preferential nature. Consequently, before generalizing the interpretation of semantic dictation of the comparison direction to preference choices—and, in particular, to the valence-level-dependent WOE—it is of utmost interest to investigate this hypothesis further. Therefore, in the present article, we present two experiments that were designed to investigate, using an alternative method to Wänke’s, whether the valence-level-dependent WOE is due to the horizontal stimulus positioning (i.e., the space-order hypothesis) or to the semantics dictating a comparison direction by pointing out one stimulus as the subject of the comparison (i.e., the comparison direction hypothesis).

General method

The participants, who were different for each experiment, were Stockholm University psychology students who participated as volunteers or to fulfill a partial course requirement. In each experiment, the participants received a booklet consisting of three sections—a preference judgment task, a filler task (two personality tests not related to the experimental hypotheses: the Barratt Impulsiveness Scale of Patton, Stanford, & Barratt, 1995, and the Wender Utah Rating Scale of Ward, Wender, & Reimherr, 1993), and a valence-rating task. In the preference task, the participants were to make comparisons of stimulus pairs. The stimuli (e.g., apple–pear, headache–stomach ache) were spaced either horizontally (Exp. 1), by being printed at the respective margins, or vertically (Exp. 2), by being printed centrally on the page. In both experiments, the response alternatives were printed centrally on separate lines in-between the stimuli of each pair. The order of the response alternatives was counterbalanced, and the order of the stimulus pairs was randomized for each participant.

In the stimulus-valence rating task of Experiments 1 and 2, participants were to rate their general opinion on each stimulus separately. In Experiment 1, the stimulus was printed in the left margin, and there were seven statements (from uppermost to lowermost: “… I generally [like greatly, like, like somewhat, neither like nor dislike, dislike somewhat, dislike, dislike greatly] _______”) written on separate lines, with the stimulus on a level with the neutral alternative. Participants indicated their general opinions of the stimuli by marking with an “X” the dashed line to the right of the valence statement that most accurately represented their opinions. In Experiment 2, the response alternatives were the same as for Experiment 1, and in the same vertical order, but the stimulus was printed centrally above the response alternatives instead of in the left margin. The presentation order of the stimuli was randomized for each participant with the restriction that two stimuli belonging to the same pair occurred at least two full pages from one another.

Experiment 1

In Experiment 1, the aim was to test whether the valence-level-dependent WOE (Englund & Hellström, 2012a) is due to the left–right spatial positioning of the stimuli (i.e., to a SOE). This was tested by attempting to eliminate the semantically dictated comparison direction but still using a left–right stimulus positioning. If the valence-dependent WOE is due to the stimulus positioning (in accordance with the space-order hypothesis), it should occur even under these conditions, but if it is due to a comparison direction that is dictated semantically (in accordance with the comparison direction hypothesis), it should disappear.

Method

Participants

A group of 168 participants took part (48 men and 119 women, plus one participant who did not state gender and age), ranging in age from 19 to 54 years (M age = 26.3). (Two participants, uncounted here, failed to finish the booklets, and their data were discarded from all calculations.) The participants were assigned randomly to one of two groups receiving opposite orders of the preference response alternatives.

Stimuli and procedure

In the preference comparison task, participants were to make paired comparisons of 24 stimulus pairs (see Table 1); three stimulus pairs were printed on each page. The stimuli in each pair were spaced horizontally by being printed at the opposite margins of the page. The six response alternatives were printed on separate lines in-between the stimuli and consisted of a short preference statement, together with an open arrow pointing at the left or the right stimulus (see Fig. 1). Participants were instructed that the arrow pointed to the stimulus that was preferred according to the statement. The arrow designations were counterbalanced between participants; for half of the participants, the three topmost alternatives’ arrows pointed at the left stimulus and the three bottommost pointed at the right stimulus, and for the other half of the participants, the arrow allocations were the reversed. The participants were to indicate their preferences by marking the response alternative that agreed most with their own opinions.

Table 1 Experiment 1: Means (and standard deviations) of preference and valence ratings for horizontally spaced stimulus pairs with unique within-pair stimulus randomization (in translation from the original Swedish, where each stimulus was denoted by a single noun)
Fig. 1
figure 1

Experiment 1: Layout of response sheet for horizontal stimulus positioning

The within-pair presentation order was randomized so that, for each participant, a random set of 12 pairs were presented in the within-pair order specified in Table 1, and the remaining 12 pairs were presented in the opposite within-pair order. Thus, each participant received a unique combination of which stimulus pairs were presented in the within-pair orders A i B i and B i A i . The order of the stimulus pairs was randomized for each participant.

Results and discussion

Preference ratings were scaled from 2.5 (maximum preference for the left stimulus) to −2.5, in steps of 1. The valence ratings were scaled from 3 (highest positive valence) to −3, also in steps of 1. The means and standard deviations of the preference and valence ratings for each word order (AB or BA) are displayed in Table 1.

The relative effect of the presentation order on the preference judgments, the WOE, for a stimulus pair {A, B} was defined theory-independently for this design by Englund and Hellström (2012a) as one-half of the difference between the mean preference judgments in the two stimulus-presentation orders AB and BA. The WOE was defined as being positive (in accordance with the SOE definition of Fechner, 1860) whenever the left tended to be preferred over the right stimulus, and as being negative when the right stimulus tended to be preferred. Englund and Hellström (2012a) also showed that the WOE can be expressed in terms of the SW model (Hellström, 1979, 1985, 2003); a first step would be to rewrite Eq. 1 to yield the simpler formulation (with the subscripts L and R denoting the left and right stimulus, respectively)

$$ {d_{\mathrm{L}\mathrm{R}}}={W_{\mathrm{L}}}\cdot {\psi_{\mathrm{L}}}-{W_{\mathrm{R}}}\cdot {\psi_{\mathrm{R}}}+C, $$
(3)

where W L = k · s L, W R = k · s R, and C = (W RW L)ψ ref + b. Then, using Eq. 3, the preference d LR in the pair {A, B} can be expressed for the two possible presentation orders, respectively, as

$$ {d_{\mathrm{A}\mathrm{B}}}={W_{\mathrm{L}}}\cdot {\psi_{\mathrm{A}}}-{W_{\mathrm{R}}}\cdot {\psi_{\mathrm{B}}}+C $$
(4a)

and

$$ {d_{\mathrm{B}\mathrm{A}}}={W_{\mathrm{L}}}\cdot {\psi_{\mathrm{B}}}-{W_{\mathrm{R}}}\cdot {\psi_{\mathrm{A}}}+C, $$
(4b)

where the subscripts AB and BA denote the presentation orders. Finally, using Eqs. 4a and 4b, the WOE for the pair {A, B} can be expressed as the predicted d LR in a hypothetical pair where each stimulus has the valence (ψ A + ψ B)/2:

$$ \begin{array}{*{20}c} {{d_{\mathrm{L}\mathrm{R}}}=\mathrm{WOE}=\left\{ {\frac{{{d_{\mathrm{A}\mathrm{B}}}-\left( {-{d_{\mathrm{B}\mathrm{A}}}} \right)}}{2}} \right\}=\left\{ {\frac{{\left. {{d_{\mathrm{A}\mathrm{B}}}+{d_{\mathrm{B}\mathrm{A}}}} \right)}}{2}} \right\}} \hfill \\ {=\left\{ {\frac{{\left( {{W_{\mathrm{L}}}-{W_{\mathrm{R}}}} \right)\left( {{\psi_{\mathrm{A}}}+{\psi_{\mathrm{B}}}} \right)+2U}}{2}} \right\}} \hfill \\ {=\left( {{W_{\mathrm{L}}}-{W_{\mathrm{R}}}} \right)\left[ {\left\{ {\frac{{{\psi_{\mathrm{A}}}+{\psi_{\mathrm{B}}}}}{2}} \right\}-{\psi_{\mathrm{ref}}}} \right]+b.} \hfill \\ \end{array} $$
(5)

In parity with the SOE definition above, the WOE as defined in Eq. 5 will be positive whenever the left stimulus tends to be preferred over the right, and negative in the reverse case.

The WOE values for each stimulus pair were calculated in accordance with the definition WOE = (d AB + d BA)/2 (using the values in Table 1) and were plotted against the respective pair’s mean valence (see Fig. 2). The prediction based on the space-order hypothesis, according to which the spatial separation of the stimuli should yield a valence-level-dependent WOE—is falsified by the results in Fig. 2. According to this prediction and Eq. 5, a linear relationship should hold between WOE and valence level. However, no such relationship was found. As can be seen in Fig. 2 (and Table 1), the WOE values for some stimulus pairs are different from zero, but these effects seem to portray chance rather than systematic effects. Indeed, we found no significant linear (p = .986) or quadratic (R 2 < .001, p = .9998) relationship between WOE and valence level. These results are in opposition to the predictions of the space-order hypothesis, but compatible with the comparison direction hypothesis.

Fig. 2
figure 2

Experiment 1: Horizontal stimulus positioning. The word-order effect (WOE) is plotted against the mean valence level of each stimulus pair (labels are from Table 1). The fitted regression line is also displayed. A positive WOE means a tendency to prefer the left over the right stimulus

For each participant, Eq. 3 was fitted by linear regression of d LR on ψ L and ψ R, yielding W L, W R, and C (due to the minus sign in the equation, the sign of W R was changed). The mean multiple R across participants was .68 (SD = .13, range .05–.91). The mean W L was 0.76 (SD = 0.25, range 0.01–1.42), and the mean W R was 0.76 (SD = 0.24, range 0.03–1.48). The mean C was 0.019 (SD = 0.24, range −0.68 to 0.69). The mean of W LW R was −0.005 (SD = 0.13, SEM = 0.010, range −0.52 to 0.37).

The individual W values were submitted to a repeated measures analysis of variance (ANOVA) with Stimulus Position as a within-participants factor (left vs. right) and Gender and Scale Order (uppermost arrow pointing left vs. right) as between-participants factors. No main or interaction effects of gender approached significance, so the Gender factor was dropped from further analyses. No main effect of stimulus position was apparent, F(1, 166) = 0.23, p = .629, η p 2 = .001; that is, the left stimulus had the same impact (mean W L = 0.756) on the comparison as did the right stimulus (mean W R = 0.761). We also found no significant overall effect of scale order, F(1, 166) = 0.74, p = .390, η p 2 = .004, and stimulus position did not interact significantly with this variable, F(1, 166) = 0.00, p = .963, η p 2 = .000. Thus, the lack of a systematic effect of presentation order was mirrored by the absence of systematic differential weighting of the compared stimuli. Next, the individual values of C (i.e., the predicted WOE for a stimulus pair with valences of zero) were submitted to a two-way ANOVA with Gender and Scale Order as between-participants factors. The main effect of gender and its interaction with scale order did not approach significance. Therefore, the Gender factor was dropped. The effect of scale order did reach significance, F(1, 166) = 4.57, p = .034, η p 2 = .027: With the uppermost arrow pointing to the left and the right, respectively, the mean Cs were 0.060 (SD = 0.22) and −0.017 (SD = 0.25).

It may be noted that in the study by Phaf and Rotteveel (2009), arrows to the right induced more positive affect than did arrows to the left. However, such a tendency, if present, should not have affected the weighting here. The arrows were open, so any association of right-directed arrows with rewarding “Play” buttons, suggested by Phaf and Rotteveel, should be ruled out. If the arrows should still have evoked a tendency to prefer the right alternative over the left, this tendency should have been equally present for all pairs, and thus cancelled out in the data by the thoroughly randomized design. The same applies to a possible tendency to prefer the uppermost statements, which might be predicted by Meier and Robinson’s (2004) finding of a tendency to associate upper and lower positions with positive and negative words, respectively.

Experiment 2

The results from Experiment 1, where a left–right presentation order was used, weaken seriously the hypothesis that the valence-dependent WOE found by Englund and Hellström (2012a) was simply a SOE with a higher weight for the left stimulus (Hellström, 2003). Instead, the results are compatible with the suggestion made by Englund and Hellström that valence-dependent WOEs occur because the comparison direction is dictated semantically by the preference statements (Wänke, 1996). In Experiment 2, the plausibility of the comparison direction hypothesis was investigated further by using vertical stimulus positioning and the same response alternatives that Englund and Hellström (2012a) had used; if the linear valence dependence of the WOE is due to the semantic dictation of the comparison direction, it should reappear. It was hypothesized that the upper stimulus (the subject) would be given more attention, and therefore receive greater weight, than the nether stimulus (the referent), and that this would lead to a linearly valence-dependent WOE with a tendency to prefer the upper of two attractive stimuli, or the nether of two unattractive stimuli.

Method

Participants

A total of 174 participants took part, 41 men and 131 women (plus two participants who did not state gender, one of whom also did not state age), ranging in age from 20 to 53 years (M age = 27.2). The participants were assigned randomly to one of two groups (with opposite orders of the response alternatives; see below).

Stimuli and procedure

The procedure and the stimuli were the same as in Experiment 1, but in the preference comparison task, the stimuli of each pair were printed centrally and spaced vertically, with the response alternatives (those from Englund and Hellström’s, 2012a, Exp. 2—i.e., “. . . I like [much more, more, somewhat more, somewhat less, less, much less] than . . .”) printed on separate lines in-between them (see Fig. 3). A check-box was placed just to the left of each response alternative; for each stimulus pair, participants were to respond by choosing the statement that most accurately represented their opinion.

Fig. 3
figure 3

Experiment 2: Layout of response sheet for vertical stimulus positioning with a dictated comparison direction

Results and discussion

The scaling of the preference and valence ratings (for the means and standard deviations, see Table 2), as well as the calculation of the WOE values for the respective stimulus pairs, were performed using the methods described for Experiment 1. The WOE values are displayed in Fig. 4, plotted against the mean overall valence for the respective pair, and a fitted regression line is also shown. As can be seen from Fig. 4, with the comparison direction dictated semantically, the expected linearly valence-dependent WOE reappeared. The slope of the regression of the WOE on mean valence for the stimulus pairs was statistically significant, t(22) = 4.31, p < .001, as was the intercept, t(22) = −2.38, p = .027. The significant regression slope indicates a valence-level-dependent WOE, and the significant intercept suggests a slight bias toward a higher preference for the nether stimulus. The SW model is well able to account for these results, which becomes clear by replacing the subscripts L and R in Eq. 5 with subscripts denoting the upper (U) and nether (N) stimulus, respectively, which yields.

$$ \mathrm{WOE}=\left( {{W_{\mathrm{U}}}-{W_{\mathrm{N}}}} \right)\left[ {\left\{ {\frac{{\left( {{\psi_{\mathrm{A}}}+{\psi_{\mathrm{B}}}} \right)}}{2}} \right\}-{\psi_{\mathrm{ref}}}} \right]+b. $$
(6)
Table 2 Experiment 2: Means (and standard deviations) of preference and valence ratings for vertically spaced stimulus pairs with unique within-pair stimulus randomization (in translation from the original Swedish, where each stimulus was denoted by a single noun)
Fig. 4
figure 4

Experiment 2: Vertical stimulus positioning with a dictated comparison direction. The word-order effect (WOE) is plotted against the mean valence level of each stimulus pair (labels are from Table 2). The fitted regression line is also displayed. A positive WOE means a tendency to prefer the upper over the nether stimulus

Thus, in SW terms, the valence-dependent WOE is described as a higher weight for the upper stimulus, W U > W N.

Analyses were also performed on the individual W and C values, which were estimated by linear regression as in Experiment 1. The mean multiple R across participants was .70 (SD = .12, range .23–.88). The mean W L was 0.72 (SD = 0.27, range −0.91 to 1.50), and the mean W R was 0.65 (SD = 0.28, range −0.88 to 1.38). The mean C was −0.19 (SD = 0.26, range −0.61 to 0.88).

The individual W values were submitted to a repeated measures ANOVA with Stimulus Position (upper vs. nether) as a within-participants variable and Gender and Scale Order (“. . . I like much more than . . .” being the uppermost vs. nethermost response alternative) as between-participants factors. No main or interaction effects of gender approached significance, so the Gender factor was dropped from further analyses. The analysis showed that the weight for the upper stimulus was significantly higher than that for the nether stimulus, \( M_{W_{\text U}–W_{\text {N}}} = 0.071\) (SD = 0.18, SEM = 0.013, range −0.44 to 0.84), F(1, 172) = 28.82, p < .001, η p 2 = .144, indicating that the upper stimulus had a greater impact on the comparison than did the nether stimulus. The effect of stimulus position was nonsignificant, F(1, 172) = 0.001, p = .979. The interaction of stimulus position and scale order approached significance, F(1, 172) = 3.11, p = .080, η p 2 = .018, with the mean weight differences (W UW N) being 0.048 and 0.094 for participants with much more printed uppermost and nethermost, respectively. However, simple analyses confirmed that the mean weight difference was significant for both scale orders, t(86) = 2.74, p = .007, and t(86) = 4.73, p < .001, respectively. Next, the individual values of C (the predicted WOE for a stimulus pair with valences of zero) were submitted to a two-way ANOVA with Gender and Scale Order as between-participant factors. Here, the effect of gender reached significance, F(1, 168) = 4.21, p = .042, η p 2 = .024. For the women, M C was −0.043 (SD = 0.26), indicating a slight tendency to prefer the nethermost stimulus, and for the men, M C = 0.060 (SD = 0.27). The effect of scale order was nonsignificant, F(1, 168) = 0.83, p = .364, η p 2 = .005. The Gender × Scale Order interaction was also nonsignificant, F(1, 168) = 1.24, p = .267, η p 2 = .007.

The upper versus nether position per se might be suspected to have affected the results. As was noted earlier, Meier and Robinson (2004) found a tendency to associate an upper and a nether position with positive and negative words, respectively. Such an association could be expected to lead to a general tendency to prefer the upper stimulus, which is in opposition to the slight tendency that we obtained in the opposite direction. Furthermore, due to the randomized design, such a preference should have applied equally to all stimulus pairs, and could hardly have changed the weighting. Catrambone, Beike, and Niedenthal (1996) found, consistent with Tversky’s (1977) feature-matching theory, higher similarity judgments of pairs of countries (i.e., “How similar is A to B?”) when B (the referent) was highly familiar and A (the subject) less familiar than were found in the reverse case. In contrast, no effect was apparent of the vertical position (upper, nether) of the familiar and less familiar country names on nondirectional similarity judgments (i.e., “How similar are A and B?”). By analogy, the higher weight for the upper stimulus in Experiment 2 is not likely to be due to the vertical placement of the stimuli to be compared.

The present results strengthen further the comparison direction hypothesis—that is, that the linear valence-level dependence of the WOE arises as a result of the systematic use of one stimulus, in this case the upper, as the subject in the comparison (e.g., Houston & Sherman, 1995; Wänke, 1996). This would then increase the attention that this stimulus receives, and thereby its weight—that is, its impact on the comparison outcome. At the same time, these results further weaken the space-order hypothesis, the notion that Englund and Hellström’s (2012a) results—a valence-level-dependent WOE expressing itself as a higher weight for the left than for the right stimulus—were due to the space order per se—that is, that they constituted a SOE proper (Hellström, 2003; Masin & Agostini, 1991).

General discussion

The two experiments presented here were designed to investigate further the potential impact of the semantics of the response alternatives on the valence-level-dependent WOE (Englund & Hellström, 2012a). Specifically, the purpose was to investigate the two rivaling hypotheses that the valence-level-dependent WOE was due to (a) the horizontal stimulus positioning (i.e., a classic SOE), or (b) the semantics of the response alternatives dictating a comparison direction by pointing out one stimulus, the first-read, as the subject to be compared to the referent, the last read. Taken together, the results of these two experiments show that the valence-level-dependent WOE (Englund & Hellström, 2012a) is not due to the spatial positioning of the stimuli—that is, to a SOE proper. Instead, the effect seems to result from the comparison being directed, with a subject being compared to a referent. In the present case, the appointing of the first-read stimulus as the subject is the result of the specific preference statements emphasizing the first stimulus as possessing the property of being liked more or less than the second stimulus (“A I like . . . than B”). In such a directed stimulus comparison, the stimulus in focus, the subject, has a greater impact on the comparison (e.g., Houston & Sherman, 1995; Tversky, 1977), which may be mediated by a higher degree of attention and, in SW terms, is represented by a higher weight for the subject than for the referent. As is illustrated by Eq. 2, letting the subscripts 1 and 2 represent the subject and referent, respectively, the result is that the WOE varies in size and direction with the valence level (ψ) of the compared stimuli relative to a reference level (ψ ref). Although the present results make it unlikely that a valence-dependent WOE would be evoked by lateral stimulus positioning alone, this factor might possibly modulate the effect of the subject–referent relationship of the alternatives. It should therefore be of interest to investigate such possible interactions.

As was discussed by Englund and Hellström (2012a), changes of the size and direction of the WOE with the stimulus valence cannot be explained by most common preference models, including expected-utility models (e.g., Bradley & Terry, 1952; Kahneman & Tversky, 1979; Luce, 1959; Marley, Flynn, & Louviere, 2008), because these models do not even address order effects. Even though some suggested model modifications are designed specifically to accommodate order effects (e.g., Beaver & Gokhale, 1975; Davidson & Beaver, 1977), these extended models fail to explain the present data. The reason for that failure is the attempt to account for order effects by merely adding a constant to the model. This kind of extension will only help to explain a positive or a negative order effect, but not both (see Englund & Hellström, 2012a, for a fuller discussion). Feature-matching models, though (e.g., Houston & Sherman, 1995; Tversky, 1977), are exceptions to this argument, as these models can be used to explain changes in the direction of order effects. However, as Englund and Hellström (2012a) noted, it is unclear whether and how feature-matching models can be used to account for the valence-level dependence of the WOE. Englund and Hellström (2012a) suggested one possibility, attaching valence values to the features, but argued that feature-matching models generally lead to predictions similar to those of the simpler SW model, which makes the latter the preferred model (see Englund & Hellström, 2012a, for a discussion on the limitations of the feature-matching paradigm).

The present results and those of previous research (Englund & Hellström, 2012a; Wänke, 1996) strongly suggest that comparison direction is a factor that is decisive for the direction of presentation-order effects; thus, it is imperative for the researcher to know in which direction the comparisons are made. In the present case, the direction was dictated explicitly by the preference statements, but it may not be quite as easy to predict the comparison direction when using other methods or instructions. This issue does not seem to have been investigated specifically, but researchers have made suggestions. For example, it has been suggested that the comparison direction can be determined by explicit instructions or questions with an implied direction (e.g., Houston & Sherman, 1995; Houston et al., 1991; Wänke, 1996; Wänke et al., 1995), or by sequential stimulus presentation (e.g., Brunner & Wänke, 2006; Houston & Sherman, 1995; Houston, Sherrill-Mittleman, & Weeks, 2001; Kardes & Sanbonmatsu, 1993). However, the results in the literature are inconsistent (e.g., Agostinelli, Sherman, Fazio, & Hearst, 1986; Englund & Hellström, 2012a; Houston & Sherman, 1995; Houston et al., 1989; McGill, 1990; Wänke et al., 1995). More importantly, mere instruction or implication does not ensure knowing which stimulus has actually been used as the subject, and this uncertainty is a danger to the researcher; it is all too easy to interpret the results by assuming that a particular stimulus was indeed the subject, because the results support the model given this assumption—a circular argument. The present results, in concert with those of Englund and Hellström (2012a), suggest that estimating the weights (W values) is one way to determine post hoc which stimulus was used as the subject. This is also in line with Tversky’s (1977) original formulation of his contrast model. Therefore, because the issue of determining which stimulus is used as the subject is crucial for the interpretation of results and for the systematic testing of models, the further investigation of the validity of the W values as reflecting the comparison direction is a natural quest for future research.

The present results suggest important implications for research practice. One commonly used method of handling potential order effects is the use of within-participants counterbalancing of the order of the choice options. This may be an effective method, but it is limited to contexts where it is reasonable. As was pointed out by Englund and Hellström (2012a), this is not the case when stimuli are remembered easily and participants have some motivation to appear consistent, and therefore tend to respond in accordance with previous responses. Decision models that do not account for presentation-order effects are inherently at fault with respect to decision-making reality, because these effects occur even in decisions that are made only once. Indeed, the presentation order of choice options has been shown to systematically influence the results in contexts as diverse and practically relevant as, in addition to those mentioned in the introduction, evaluation of consumer brands (Brunner & Wänke, 2006), preference comparisons of paintings (MacLaughlin & Kermisch, 1997) and of musical excerpts (Koh, 1967), TV audience voting of musical performances (Li & Epley, 2009), and selection of causal explanations (McGill, 1990).

Researchers seeking a maximally unbiased preference measure should use both presentation orders and calculate the arithmetic mean of the two measures (cf. Eqs. 4a and 4b, which yield d AB = {[s 1 + s 1]/2}[ψ Aψ B]). When this is not feasible for individual participants—for example, in polls or market research—one should give different groups different presentation orders and calculate the arithmetic means (Beike & Sherman, 1998). As is implied by Eq. 2, WOEs become largest at the ends of the valence continuum. Thus, the importance of counterbalancing the presentation order increases with stimulus valence (positive or negative).

In conclusion, the results presented here solidify the valence-dependent WOE as a real effect, and together with those of Englund and Hellström (2012a), indicate that it is caused by differential weighting of the choice options when preference comparisons are directed. This is accounted for well by Hellström’s (1979, 1985, 2003) SW model with a higher weight for the subject in the comparison: That is, the subject has a greater impact on the comparison than the referent. Thus, a tendency arises to prefer the subject out of two attractive stimuli, and the referent out of two unattractive stimuli, and this tendency becomes largest at the ends of the valence continuum.

The present results and those of Englund and Hellström (2012a) demonstrate that any researcher investigating order effects in preference comparisons needs to take into account the valence level of the stimuli and to establish which stimulus, if either, is actually being used as the subject in the comparisons. When studying preference comparison of paired stimuli, performing a multiple linear regression of the preference measure on the valences of the stimuli, and determining their respective weights, should be a natural first step in exploring the process behind the comparison.