Introduction

In modern society, the internet is arguably the most important tool that individuals use to learn about current social issues. Aside from quick access to information about social issues, individuals can easily share their arguments on the issues online. Arguments on social issues are no longer scarce in this era; instead, individuals may be overwhelmed by numerous arguments during a random search or an entry into social websites. Nevertheless, the cognitive resources of individuals are limited. In the face of a large quantity of information and the arguments that come from different sources, which may be contradictory to each other, how do individuals process and evaluate the information?

Previous studies have found that individuals often have confirmation bias when processing information. They would selectively search, explain, and memorize information that is consistent with their motivation and preference. For instance, when individuals handle information, they tend to delve into information that is consistent with their beliefs (Jonas et al., 2001). Even if there are two ways to explain the same information, individuals tend to perceive what they want to see (Balcetis & Dunning, 2006). In terms of memorization, individuals seem to have a more accurate memory of information that is consistent with that of their beliefs, although such a conclusion was often drawn from a small group of participants tested with only a few memory items (e.g., 10 participants tested with one item per condition by Levine & Murphy, 1943; 30–34 participants tested with three items per condition by Frost et al., 2015).

While confirmation bias during information learning and interpretation is a ubiquitous and robust effect (Nickerson, 1998), confirmation bias in memory is, on balance, not as robust and pronounced as one might expect. Eagly et al. (1999) carried out a meta-analysis of how attitude influenced memory and called the more accurate memory of the arguments consistent with attitude the “congeniality effect.“ They found that studies before the 1960s adopted problematic analysis methods and observed the effect; however, in post-1960s studies, the effect was small or nearly nonexistent. A follow-up empirical study by Eagly et al. (2000) further explicated the reason behind the null effect: given sufficient motivation and capability, individuals may actively defend against, rather than passively avoid, uncongenial information. Such active processing of uncongenial information can enhance rather than reduce memory for counter-attitudinal information.

Note that past studies on the attitude–memory relationship did not observe the congeniality effect, likely because they often recruited a relatively small number of Western college students and tested their memory performances in laboratory settings. First of all, studies with a small sample size might be underpowered to discover the congeniality effect, if any. Secondly, relatively non-conformative Western individuals might be more willing to challenge and actively defend against uncongenial arguments than conformative Eastern individuals, hence processing uncongenial arguments as deeply as congenial information (Bond & Smith, 1996; Eagly et al., 1999; Henrich et al., 2010). Thirdly, research participants might not have faithfully reported their true positions and therefore appeared neutral on socially sensitive issues as their identities were exposed in laboratories (Hofmann et al., 2005), which would then mask the true attitude–memory relationship.

To address these power, culture, and response bias issues, we conducted a large-scale study on an Eastern Asian population of 5,180 Taiwanese who participated anonymously through the internet. Because these volunteered participants anonymously used their own computers to browse and respond to each argument in this web-based study as if they sequentially encountered and responded to each message on social media websites (Frost et al., 2015), we expected their self-reported attitudes on social issues to reflect their true positions. Furthermore, with an improved statistical power for our investigation of a relatively conformative population, we anticipated seeing a much more pronounced congeniality effect compared to earlier studies.

To directly examine whether the aforementioned active vs. passive processing of information changes the attitude–memory relationship, this study presented messages to participants in two conditions—messages for one’s own information or for discussion. Specifically, the two conditions differ only in the question to which participants had to respond during exposure to each argument about a social issue. The questions were whether they wanted to learn more about an argument in the informational condition and whether they wanted to further discuss an argument further in the discussional condition, respectively. We expected the congeniality effect size to decrease for arguments presented in the discussional context relative to the informational context because deeper processing of uncongenial information in the discussional context would lead to a better memory of uncongenial information (Eagly et al., 2000). In the following sections, we will describe in detail how these two experimental conditions were implemented and whether they affected information learning and memory.

Materials and methods

Social issues and arguments

We selected four public issues that are widely discussed in Taiwan: marriage equality, abolishment of death penalty, nuclear power generation, and legalization of euthanasia. Marriage equality and legalization of euthanasia were less disputable, whereas abolishment of death penalty and nuclear power generation were highly controversial. Most of the arguments were collected from the hearings of the court, the Join Public Policy Network Participation Platform (https://join.gov.tw/), and commentaries on various websites. All these arguments were used as memory materials in the experiment.

To control for the difficulty of the memory materials, we rewrote the statements in the argument bank to approximately equalize the sentence length and complexity across arguments and issues (Table 1). In the end, each argument had only one premise to support its conclusion and had 55.68 ± 14.56 (M ± SD) Chinese characters on average. For example, one argument for death penalty was “As it is futile trying to dampen crime with death penalty, it should be abolished.“ One opposing argument was “In line with the principle of proportion, it is injustice keeping felony offenders alive.“ In total, 80 arguments listed in the supplementary materials were used for this study, with 20 arguments on each of the four social issues. Of these 20 arguments, 10 were supportive, while the other 10 were antagonistic.

Table 1 Descriptive statistics of argument length for each issue

Participants

We used the G*Power software to calculate the sample size needed for the present study. According to the meta-analysis reported by Eagly et al. (1999), the mean effect size of the congeniality effect, with outliers being excluded, was 0.08 in terms of Cohen’s d. Because memory performances may not be normally distributed and some studies have observed reversals of the effect (i.e., negative effect sizes), the type of power analysis was set to detect the effect by the two-tailed Wilcoxon signed-rank test. When the alpha level was set to be 0.05 and 0.001, the required sample size for achieving a power of 0.99 was 3,009 and 5,168, respectively. Therefore, we planned to recruit at least 5,168 participants.

The study website was hosted on our own server, and the participants could have direct access to the website through their Internet access. Compared with laboratory studies, web-based studies like ours can reach a geographically wider population, and participants can respond using their own computers in familiar settings. We placed advertisements on Facebook to target participants aged between 18 and 65 because people who are too young may not fully understand the social issues, and the memory system of the elderly may have undergone some qualitative changes (Grady, 2012). In the end, 5,180 Taiwanese individuals took part in this study, and 4,170 of them (1,264 males; 2,906 females) were regarded as not randomly responding and subsequently analyzed (see the later section on Data Analysis for details). Their ages ranged from 18 to 62 (M = 26.16, SD = 5.89). The highest level of education these participants had received ranged from senior high school (5.95%), college (72.16%), to graduate school (21.89%). The average time for them to finish the entire study was 37.23 min.

Because some participants may repeatedly take part in a study to get multiple payments (Gosling & Mason, 2015), we did not offer participants monetary incentives. Instead, we motivated participants on the opening page by the statements that their participation would help advance science about the memory of attitudinal information and let them learn more about themselves in this regard from a personalized report at the end. On the ending page, we did provide visual feedback to the participants as follows. Their six-dimensional personality characteristics were presented in a radar chart. Their accuracies of recognition memory were visualized by four doughnut charts, each showing a conspicuous percent correct score for a social issue in the middle of the doughnut. Additionally, there was a button to easily share the website on Facebook and another button to copy the website link for sharing the study through other channels.

Experimental procedure

The participants could visit the study webpage through a computer or a mobile device. Online informed consent was obtained from all participants before their enrollment in this study, which was approved by a university research ethics committee.

The overall study procedure is shown in Fig. 1. First, the participants read the descriptions of the four social issues and answered the questions about their familiarity with and positions on each issue. Then, they underwent two study phases, each of which was randomly paired with two social issues and one condition (i.e., informational vs. discussional context in which participants were exposed to attitudinal arguments). In each phase, the participants went through the following stages, one after another: argument exposure, personality testing, and argument recognition. The different steps of the study procedure are detailed below.

Fig. 1
figure 1

Experimental procedure

Issue Familiarity and Positions. The participants were asked about their familiarity with and position on the four social issues. One question with a Likert scale was used to measure a participant’s familiarity with a particular issue, where “1 Point” indicated strong unfamiliarity, while “5 Points” indicated strong familiarity. Another question with a visual analog scale was used to measure a participant’s position on a particular social issue, where “-50 Points” indicated strong disagreement while “50 Points” indicated strong agreement. Therefore, the sign of this attitude score was used to classify each participant’s position to be supportive or antagonistic toward an issue. Meanwhile, the absolute value of this attitude score was used as attitude strength in the later analyses.

Argument Exposure Stage. There were two experimental conditions: arguments exposed in an informational or discussional context. As shown in Fig. 2, these two conditions differed only in the response question given to the participants, who had to make a binary choice (yes vs. no) for each argument to indicate whether they were willing to further learn about an argument (i.e., informational condition) or to further discuss the argument (i.e., discussional condition). To avoid other factors to confound the effects of these experimental conditions, we adopted the following design. First, the presentation order of these two conditions was randomized (Fig. 1). Second, two issues were randomly assigned to be presented in each condition. Third, arguments about the two issues under the same condition were presented on screen, one at a time, in a randomized order. Fourth, no additional information or steps followed a participant’s response to each argument.

Fig. 2
figure 2

Two arguments of the same issue were presented in the informational condition (left panel) and in the discussional condition (right panel), respectively. The original statements were in Traditional Chinese

The informational condition was equivalent to the sequential information search paradigm used for examining confirmation bias in previous studies (e.g., Frost et al., 2015; Jonas et al., 2001), and our newly introduced discussional condition took a step further to investigate the contextual influences on the incidental learning and memory of value-laden information. We expected that the discussional condition would incur deeper processing of information in participants than the informational condition, as the participants had to further judge whether a viewed information was arguable.

Personality Testing Stage. This stage evaluated the personalities of the participants and also functioned as a memory retention period of a few minutes. This stage in each memory-testing phase used half of the questions (i.e., 30 items) in the Chinese 60-item Revised HEXACO Personality Inventory (Ashton & Lee, 2009) to evaluate the six personality dimensions of each participant: Honesty-Humility, Emotionality, Extroversion, Agreeableness, Conscientiousness, and Openness to experience.

Argument Recognition Stage. This stage administrated a surprise memory test to evaluate the participants’ incidental rather than intentional memory of the arguments that were presented in the argument-responding stage of the same memory-testing phase. This stage in each memory-testing phase intermixed 20 arguments about an issue and 20 arguments about another issue and presented them on screen, one at a time, in a randomized order. Among the 20 arguments about each issue, 5 supportive and 5 antagonistic ones had been randomly selected for presentation in the argument exposure stage of the experimental condition, while the other 5 supportive and 5 antagonistic arguments thus became novel lures to a participant. Participants had to judge, without any time constraint, whether each of the 40 arguments under testing had previously appeared in this study.

Posttest Procedure. After all the memory tests, the participants were asked to report their issue positions again and provide some basic personal information. The second measurement of the issue positions aimed to check the consistency of the participants’ self-reported attitudes before and after reading others’ arguments. The basic personal information included sex, age, and level of education. After completing all the questions, each participant obtained a test report on screen, which showed her/his scores on the HEXACO Personality Inventory and the memory recognition tests.

Data analysis

Response Times. No questionnaire or test was time-limited throughout the study. However, the time each participant responded to each item was recorded to examine whether there was a difference in participants’ cognitive effort between the two argument-responding conditions.

Data Pre-processing. To ensure data quality, we adopted two exclusion criteria to select samples for further analysis. The first criterion was to exclude the participants whose recognition memory accuracies did not statistically differ from chance level (i.e., 50% correct). According to a one-tailed binomial test, participants who correctly recognized at least 48 out of 80 arguments performed significantly better than random guessing (p < .05). To ensure data quality for further analyses, we hence excluded 1,010 participants whose mean argument-recognizing accuracies were below 60% correct. There remained 4170 or 80.50% of all the 5,180 participants.

Fig. 3
figure 3

Participants changed attitudes on issues. a Each point corresponds to each participant’s response to each issue (N = 16,680). The points in gray were those with sign flip between pre- and post-test attitudes (e.g., from − 50 to 50 or from 50 to -50) and hence excluded from further analyses. b The attitude changes of the remaining points (N = 15,667) were mostly around zero

The second criterion was to exclude data points where the congeniality of arguments on a social issue—the core variable of the present study—was ill-defined. Specifically, we removed data points where a participant changed attitudinal position on an issue after reading others’ arguments, as detected by a sign flip between pre- and post-test attitudes (see Fig. 3). In the end, 93.93% (15,667 issues from 4,168 participants) of the original data points (16,680 participants*issues) were retained. Because recognition performances were computed separately for congenial and uncongenial arguments about an issue, each of the remained data points yielded two samples, which amounted to a total of 15,667*2 = 31,334 samples for our statistical models. In all the analyses reported hereafter, the attitude strength was the absolute mean of the pre- and post-test attitude strengths for each remaining sample.

Congenial and Uncongenial Arguments. The supportive and antagonistic arguments on each social issue in the memory materials were re-labeled as congenial arguments and uncongenial arguments according to a participant’s attitude towards a particular issue. In our statistical analyses, if a participant’s attitude towards an issue was the same as that of a particular argument, that argument would be regarded as a congenial argument; otherwise, it would be regarded as an uncongenial argument.

Recognition Performances. The recognition memory of the participants was analyzed on the issue level. Two types of recognition performances were obtained from each participant’s responses to each of the four social issues in the recognition memory test: the recognition performance of congenial arguments and that of uncongenial arguments. Each type of these issue-level recognition performances was derived correspondingly from a participant’s 10 argument-level data samples. Specifically, we computed the continuous-valued percentage-correct score as well as the “discriminability” (d’) and “response criterion” (c) measures in the signal detection theory (Macmillan & Creelman, 2005; Shapiro, 1994)for each type of arguments. Note that the discriminability and criterion measures are calculated from the z scores of hit and false alarm rates, but the z scores corresponding to proportions of 0 and 1 are infinite. Therefore, rates of 0 were replaced by 0.5/n, and rates of 1 were replaced by 1-0.5/n, where n was the denominator in the calculation of hit or false alarm rates (Stanislaw & Todorov, 1999).

Statistical Models. Because multiple responses from the same participant were likely correlated with each other and thus violated the assumption of independent samples in multiple regression models, we used mixed-effects models to examine to what extent our explanatory variables could account for participants’ overall memory performances.

Each of the mixed-effects models consisted of fixed slopes and random intercepts. The fixed slopes estimated the linear relationships between a dependent variable (i.e., discriminability or response criterion) with all the explanatory variables, such as age, sex, experimental conditions (informational vs. discussional context), familiarity with an issue, the strength of attitude toward an issue, and congeniality of arguments. The random intercepts were intercepts estimated separately for each participant so as to capture variations in participants’ baseline memory abilities.

Statistical Packages. The statistical software R was used for cleaning, manipulation, and modeling of the study data. We used the lme4 library to construct mixed-effects models (Bates et al., 2014). Then, the sjplot library (Lüdecke, 2021) was used to obtain the conditional and marginal R2 of these models.

Results

Issue familiarity and issue position

The descriptive statistics about participants’ familiarity with and positions on the four social issues are summarized in Table 2. On average, participants were familiar with the four social issues. Regarding argument positions, abolishment of death penalty and nuclear power generation were controversial, but marriage equality and legalization of euthanasia were not. Such a difference in controversiality across issues allowed us to examine the effects of controversiality on memory performances.

Table 2 Participants’ Familiarity with the Social Issues and Their Argument Tendency

Effects of experimental conditions

The mean response time (RTs) for argument-responding and argument-recognizing under the two experimental conditions are summarized in Fig. 4a. Because RTs under each condition were not normally distributed across participants, we used one-tailed Wilcoxon signed-rank tests to compare the difference in RTs between the informational condition and the discussional condition. During the argument-responding stage, the median RT in the discussional condition was 0.98 s longer than that in the informational condition (p < .001 with a moderate effect size \(\text{r}=\text{z}/\surd \text{N}=0.36\)). On the contrary, during the argument-recognizing stage, the median RT in the discussional condition was 0.08 s shorter than that in the informational condition (p < .001 with a small effect size r = .057).

For each argument presented during the argument-responding stage, the participants expressed whether they would like to further learn about that argument in the informational condition or whether they would like to further discuss that argument in the discussional condition. Figure 4b summarizes the mean number of positive responses to congenial or uncongenial arguments in the informational or discussional condition.

Fig. 4
figure 4

Effect of the two conditions. a The discussional condition induced a longer response time than the informational condition. b While there was a confirmation bias toward learning more about congenial than uncongenial arguments in the informational condition, such a bias disappeared in the discussional condition. Note that the participants were more willing to further discuss than learn about uncongenial arguments

Because the numbers of positive responses under each condition were not normally distributed across participants, we used Wilcoxon signed-rank tests to compare the difference in expressed interests between the informational condition and the discussional condition. Under the informational condition, the participants made significantly more positive responses (p < .001) to congenial arguments (Mdn = 5, IQR = 1) than to uncongenial arguments (Mdn = 3, IQR = 3). Nevertheless, under the discussional condition, the number of positive responses to congenial arguments (Mdn = 4, IQR = 3) was not significantly different (p = .48) from that to uncongenial arguments (Mdn = 3, IQR = 3).

Recognition performances

We used recognition discriminability and response criterion as dependent variables in separate regression models. The distributions of the explanatory and response variables in these models are also visualized in Fig. 5.

Fig. 5
figure 5

The distributions of the explanatory and response variables used by the models in Table 3. Note that the participants were mostly young adults with strong attitudes toward the four issues

We constructed four mixed-effects models of the participants’ recognition performances, as shown in Table 3. Specifically, these are linear models with one intercept estimated for each participant and the same slope estimated across all the participants. The first two models are individual-level base models with discriminability and response criterion as dependent variables, respectively. The other two models are issue-level full models, which further consider experimental effects.

Table 3 Mixed-effects models of recognition performances

Each of the four linear mixed models comprises fixed-effects slopes and random-effects intercepts. The fixed-effects slopes can be interpreted in a similar manner to multiple regression, such as standardized coefficients viewed as effect sizes. The random-effects intercepts further partition the variance left unexplained by the fixed-effects slopes into the between-individual variance of intercepts (τ00) and residual, within-individual variance (σ2). The intra-class correlation (ICC) calculates the proportion of such variance explained by the between-individual differences (τ00/[σ2 + τ00]), ranging from zero to one. In the extreme case of an ICC of zero, by-participant sample grouping is unnecessary, and the data can be simply modeled by multiple regression. Relatedly, marginal R2 calculates the proportion of total variance explained by the fixed effects, whereas conditional R2 calculates the proportion of total variance explained by both the fixed and random effects.

In Table 3, the two base models primarily consider explanatory variables on the individual level, including age, sex, and educational level. Among these explanatory variables, age is most predictive of the recognition performances in terms of effect size and significance—older participants tended to be lower in recognition discriminability and higher in response criterion (i.e., more conservative to report having seen an argument). This negative relationship between age and discriminability has also been observed in other studies (e.g., Graves et al., 2017). Note, however, that the tiny marginal R2 (0.002 for d’; 0.001 for c) relative to the conditional R2 (0.291 for d’; 0.243 for c) in these models indicate that the random intercepts explain the variations in recognition performances much better than the fixed slopes.

In Tables 3, the two full models further examine the effects of stimulus properties and participants’ responses on the recognition performances of congenial or uncongenial arguments about an issue. The additional explanatory variables include the strength of a participant’s attitude toward the issue, the familiarity of a participant with the issue, whether an issue was controversial rather than uncontroversial (see Table 2), whether the issue was exposed early in condition 1 or late in condition 2 (see Fig. 1), whether the issue was arranged in a discussional or an informational context, whether the recognition performance was derived from responses to congenial or uncongenial arguments, and the total number of “yes” responses to the congenial or uncongenial arguments of an issue for more information or discussion. Note that the addition of these predictors does not substantially increase the explanatory power of the model, as evidenced by the tiny marginal R2 for d’ (0.018) and c (0.017).

The effect sizes of these additional explanatory variables help clarify their relative contributions to the full model. The controversiality of an issue and its presentation context have the largest effect on recognition discriminability—The participants tended to better discriminate arguments on controversial issues or arguments presented in a discussional context. This controversiality-enhanced memory phenomenon has also been observed in earlier studies (Eagly et al., 1999). By contrast, the variables of our primary interest—the strength of attitude toward an issue and the congeniality of an argument to a participant—have the smallest effects on recognition discriminability. In other words, congenial arguments were not remembered better than uncongenial arguments. As a side note, while issue familiarity has an expected positive relationship with recognition discriminability, the number of “yes” responses for more info/discussion unexpectedly has a negative relationship with recognition discriminability, which may reflect a decreased quality of memory encoding because of an attention shift from encountered arguments to related thoughts.

Discussion

The main contributions of the present research are twofold. First, we addressed the power, culture, and response bias issues in earlier studies, all of which could diminish the congeniality effect. Nonetheless, we still did not observe a substantial congeniality effect. Second, we examined the contextual influence on information learning and memory and found that a discussional context could correct biased attention and processing toward congenial information. In the following sections, we will elaborate on these two major findings.

Insubstantial attitude–memory relationship

In spite of an improved statistical power and measurement sensitivity for the examination of a relatively conformative population, the present study still found an extremely small, almost null, congeniality memory effect (Table 3). This result is consistent with earlier empirical and meta-analytic findings showing that the congenial effect is little to nonexistent (Eagly et al., 1999), particularly when the information source is low in credibility or unknown (Frost et al., 2015).

Our analyses using mix-effects models revealed that within-individual, attitude-related factors, such as the strength of attitude toward an issue, had little power in explaining the variations in the overall recognition performances (Table 3). Instead, variations in participants’ memory performances were mainly explained by between-individual differences, such as differences in motivation or memory capability. Overall, the attitude–memory relationship in question was rather weak.

Contextual modulation of learning and memory

The two experimental conditions were designed to manipulate participants’ levels of processing encountered messages, and the participants indeed showed behavioral differences between the two conditions. In terms of reaction times, the participants spent around one additional second when responding to each argument in the discussional than the informational condition (Fig. 4a), suggesting that the participants spent more cognitive effort in the discussional than in the information condition. In terms of choices, the participants preferred to learn more about congenial than uncongenial arguments, showing a confirmation bias during information learning like in earlier studies (e.g., Frost et al., 2015; Jonas et al., 2001). However, such a choice bias toward congenial information almost disappeared when the presented arguments were for discussion rather than for information (Fig. 4b). In the discussional condition, the participants might engage more motivated reasoning to counterargue uncongenial arguments (Eagly et al., 2000) or become more open and receptive to uncongenial information. In either case, the presumably deeper processing of encountered messages in the discussional than the informational condition could account for the positive contribution of the discussional context to participants’ memory performances (Table 3) according to the level of processing theory (Craik, 2002).

Limitations

Several limitations of the current study may be pointed out. First, participants’ attitudes toward social issues were only measured by a visual analog scale. Using more items for measuring attitude might have improved such estimates. Second, while participants could precisely report their attitudes on the visual analog scale ranging from − 50 to 50, most of their responses were at the two extremes of the scale, namely − 50 or 50 (Fig. 5), which would decrease the sensitivity of detecting the effects of attitude strength on memory performances. Third, the first recognition test was administered at the end of the first rather than the second condition to equalize the retention periods of the two conditions. Such a design made the learning in the second condition not incidental anymore, as participants would anticipate a later memory test in the second condition. This difference between the first and second conditions should be and had been controlled for in the mixed models (Table 3). Fourth, our observed patterns of memory performances could have been different if we had employed recall rather than recognition for memory assessment. This is because the two memory-testing methods differ in many respects (Yonelinas, 2002), although the overall congeniality effect sizes in earlier studies were not larger when recall rather than recognition was used (Eagly et al., 1999).

Conclusion

This large-scale, web-based study examined the learning and memory of congenial vs. uncongenial information in an Eastern Asian population with a continuous measure of attitudes. We found that these Eastern Asian participants were more interested in learning congenial than uncongenial information but did not show a superior memory for congenial information, which confirms the external and ecological validity of previous findings from small-scale laboratory studies about a strong confirmation bias and little congeniality effect in Western individuals.

Importantly, while confirmation bias during information learning is a default mode of cognitive processing and hard to overcome (Johnston, 1996; Lundgren & Prislin, 1998; Lilienfeld et al., 2009), our effective experimental manipulation implies that, when provided with an appropriate context, individuals can be more balanced between congenial and uncongenial arguments during information learning and processing.

In conclusion, effective communication between individuals with opposing arguments requires proper mutual understanding. In cognitive terms, this requirement can be seen as an equal opportunity for opposing information to be selected, processed, and remembered. Despite previous findings of confirmation bias and congenial effect, fortunately, such a cognitively equal opportunity for opposing social arguments can actually exist, as demonstrated in the present study.