In recent years, people around the world have become increasingly dependent on search engines to obtain information, including information that helps them make decisions about complex and socially important matters, such as whom to vote for in an upcoming election (Arendt & Fawzi, 2018; Trevisan et al., 2016; Wang et al., 2017). An increasing body of evidence also shows that search results that favor one candidate, cause, or company – by which we mean that they link to web pages that make that candidate, cause, or company appear superior to competitors – can have a rapid and dramatic impact on people’s opinions, purchases, and votes (Agudo & Matute, 2021; Allam et al., 2014; Epstein & Robertson, 2015, 2016; Epstein et al., 2022; Ghose et al., 2014; Joachims et al., 2007; Knobloch-Westerwick et al., 2015; Pan et al., 2007; Prinz et al., 2017; Wilhite & Houmanfar, 2015; cf. Feezell et al., 2021). In five randomized, controlled experiments with 4556 participants in two countries, Epstein and Robertson (2015) showed that search rankings favoring one political candidate can rapidly produce dramatic shifts in the opinions and voting preferences of undecided voters, in some demographic groups producing vote margins as high as 80% after just one online search. They labeled this new form of influence the “search engine manipulation effect” (SEME) and demonstrated that these shifts can occur without people being aware that they have been manipulated. SEME has been replicated several times since 2015 (Agudo & Matute, 2021; Draws et al., 2021; Epstein et al., 2022; Eslami et al., 2017; Haas & Unkel, 2017; Knobloch-Westerwick et al., 2015; Ludolph et al., 2016; Pogacar et al., 2017; Trielli & Diakopoulos, 2019).

Moreover, since search results are ephemeral experiences (West, 2018; cf. Mckinnon & MacMillan, 2018) – fleeting, often personalized, experiences that are generated spontaneously, impact the user, and subsequently disappear without being stored anywhere – they can impact millions of users every day without leaving a paper trail for authorities to trace (Epstein, 2018a). One cannot go back in time to determine what ephemeral content people have been shown, even if one has access to the algorithm that generated that content (Hendler & Mulvehill, 2016; Paudyal & Wong, 2018; cf. Taylor, 2019).

The fact that more than 90% of searches conducted in almost every country in the world are conducted on just one search engine (Google) (StatCounter GlobalStats, n.d.) raises special concerns about SEME (Epstein, 2018a). It means that a single company – one that is unregulated, highly secretive, not accountable to the public, and that has, for all practical purposes, no competitors (Singer, 2019) – could be producing systematic changes in the thinking of billions of people every day with no way for other parties to counteract its influence, or even, for that matter, to detect and document that influence (Hazan, 2013; Ørmen, 2016; see S1 Text for additional information about bias in search results).

Why is SEME so large? It is a list effect, but it seems different, both qualitatively and quantitatively, from previously studied list effects. Researchers have been studying list effects, such as the serial position effect, for more than a century (Ebbinghaus, 2013; Mack et al., 2017; Murre & Dros, 2015), and such effects are sometimes substantive. For example, when Candidate A’s name consistently appears above his or her opponent’s name on a ballot – perhaps simply because the names are in alphabetical order – this tends to boost Candidate A’s share of the votes by 3–15% – an effect called the “ballot-order effect” (Grant, 2017; Ho & Imai, 2008; Koppell & Steen, 2004). While counterbalancing the order of names on ballots can easily be done – even for paper ballots – it has rarely been done (Beazley, 2013).

The serial position effect itself can increase the likelihood of a word being recalled from a list; words at the beginning of a list (the primacy effect) and the end of a list (the recency effect) are usually recalled more often than words in the middle (Murdock, 1962). The ranking of content in lists can even affect juries’ opinions (Anderson, 1958; Carlson & Russo, 2001), the opinions of judges in singing contests (Bruine de Bruin, 2005), and wine preferences (Mantonakis et al., 2009).

SEME might be large, at least in part, because people generally trust computer output more than they trust content in which the human hand is evident (Bogert et al., 2021; Logg et al., 2019). Most people have no idea how computers work or what an algorithm is; as a result, they are inclined to view computer-generated content as impartial or objective (Fast & Jago, 2020; Logg et al., 2018). This trust has also been driven by the positive image Big Tech companies have had for many years. That trust has been tarnished in recent years because of data breaches and other scandals (Burt, 2019; Fortune, n.d.; Kramer, 2019), and leaks of documents and videos from these companies, along with reports by whistleblowers, have shown that the algorithmic output we see is frequently adjusted by employees. At Google, search results are apparently adjusted by employees at least 3200 times a year (Google, n.d.; Meyers, 2019).

Trust in companies and trust in computer output can be driven by a number of factors – marketing and advertising, for example (Danbury et al., 2013; Sahin et al., 2011), or the fact that nearly all the services we receive from Big Tech companies appear to be free (Epstein, 2016; Nicas et al., 2019). It is not clear how SEME can be accounted for by such trust, however. How can we account for the fact that high-ranking search results are more trusted than lower-ranking results (Edelman, 2011; Marable, 2003; Pan et al., 2007)? Why is the preference for high-ranking results so strong – strong enough not only to influence purchases (Ghose et al., 2014; Joachims et al., 2007) but to have a large and almost immediate impact on opinions and voting preferences?

The preference for high-ranking search results might be due in part to what people sometimes call “laziness” or “convenience.” People are busy, so, sometimes at least, they attend to and click on a high-ranking search result because doing so saves time. As one might expect, eye-tracking and other studies show that people generally attend to the first results displayed on a screen before they scroll down or click to another page (Athukorala et al., 2015; Nielsen & Pernice, 2010; Schultheiß & Lewandowski, 2020). This finding is comparable to the attention people pay (or at least used to pay) to above-the-fold content in newspapers. The limited attention span of users can be problematic for longer pages; people want information that gets to the point and are unlikely to read long web pages filled with text (Nielsen, 2010; Weinreich et al., 2008).

Convenience might contribute to some extent to the large impact of SEME, but in the present study, we explore another possibility – namely, that the power of SEME derives in part from the distinctive way in which people interact with search results. In an authoritative list of the 100 most common search terms people use (Soulo, n.d.), 86% of the search queries were one-to-two words long and simply directed users to simple facts or specific websites – search terms such as “news,” “speed test,” and “nfl scores.” The correct website invariably turns up in the highest position of the search results that are generated; frequently, that same information occurs in the second or third positions, as well. Other lists of common search terms are also dominated by queries that tend to produce simple factual answers in the top position of search results (Hardwick, n.d.; Siege Media, n.d.).

Because, day after day, the vast majority of search queries produce simple factual answers in the highest position of search results (Rose, 2018), we all learn, over and over again, that what is higher in the list is better or truer than what is lower in the list. To be more specific, we usually attend to and click on the highest-ranking search result because doing so is reinforced by the appearance of the correct answer to our query. Almost any reply to a verbal inquiry strengthens inquiries of that type, but a correct answer to an inquiry is an especially powerful reinforcer, presumably because it makes a speaker more effective (Skinner, 1957; cf. Kieta et al., 2018), and when the same source provides a series of correct answers over time, the value and potential power of those answers increases. As B. F. Skinner put it in his classic text on verbal behavior, “The extent to which the listener judges the response as true, valid, or correct is governed by the extent to which comparable responses by the same speaker have proved useful in the past” (Skinner, 1957, p. 427).

When, at some point, people finally enter an open-ended search query that either has no definitive answer (“trump”) or that seeks an opinion (“what’s the best restaurant in Denver”), they will tend both to attend to and click on high-ranking search results. We are speculating, in effect, that SEME is a large effect because it is supported by a daily regimen of operant conditioning. Although the idea that operant conditioning plays a role in voting behavior is not new (Visser, 1996), in this paper, we are emphasizing a kind of operant conditioning that never stops and that people are entirely unaware of – specifically, one that reinforces attending to and clicking on high-ranking search results that appear in response to routine factual searches.

We test this hypothesis with a randomized, controlled experiment – a modified version of the experimental procedure used by Epstein and Robertson (2015) in their original SEME experiments (see S2 Text for details about the procedure). The present study added one feature to the Epstein and Robertson (2015) procedure: Before beginning the political opinion study, participants experienced a pre-training procedure that either reinforced or extinguished the tendency to attend to and click on high-ranking search results. In theory, extinguishing that tendency should (a) change the pattern of clicks that typifies search behavior, and (b) reduce the impact that statistically biased search results have on people’s opinions and voting preferences.

Method

Participants

A total of 551 eligible US voters from 46 states were recruited through Amazon Mechanical Turk (MTurk, accessed through a company called Cloud Research, which screens out bots) and were paid a small fee (US$7.50) to participate. Fifty-nine point nine percent (n = 330) of participants identified themselves as female and 40.1% (n = 221) as male. The mean age was 38.3 (SD = 11.7). Seventy-three point nine percent (n = 407) of participants identified themselves as White, 8.2% (n = 45) as Black, 6.5% (n = 36) as Hispanic, 6.4% (n = 35) as Asian, 4.4% (n = 24) as Mixed, and 0.7% (n = 4) as Other. A majority of participants were college educated, with 55.2% (n = 304) reporting having received a bachelor’s degree or higher.

Procedure

See S3 Text in our Supplementary Material for our statement of compliance with current ethical standards.

The experiment was conducted online, and participants identified themselves using their MTurk Worker IDs; we had no knowledge of their names or email addresses. Before the experiment began, participants were asked a series of demographic questions and were then given instructions about the experimental procedure (see S4 Text). In compliance with APA and HHS guidelines, participants also clicked to indicate their informed consent to participate in the study. We also asked participants how familiar they were with the two candidates identified in the political opinion portion of the study.

The initial dataset contained 806 records and was cleaned as follows: Records were deleted in which no clicks were recorded, in which people’s reported familiarity with either candidate exceeded 3 on a scale from 1 to 10 (where 1 was labeled “Not at all” and 10 was labeled “Quite familiar”), or in which people reported English fluency below 6 on a scale from 1 to 10 (where 1 was labeled “Not fluent” and 10 was labeled “Highly fluent”).

The experiment itself had two main parts (Fig. 1).

Fig. 1
figure 1

The two parts of the experimental procedure. In the pre-training portion of the procedure, participants were randomly assigned to either a High-Trust or a Low-Trust group. The trust pre-training trials were followed by a conventional SEME experiment, in which the two trust groups were first divided (by random assignment) into three search conditions: one favoring UK candidate David Cameron, one favoring UK candidate Ed Miliband, and one favoring neither candidate (control group). See text for details

Pre-Training

In the pre-training portion of the experiment, participants were randomly assigned to either a High-Trust (n = 312) or a Low-Trust (n = 239) group. Each group was given five pre-training trials in which they were shown a search question that had a simple factual answer (such as “What is the capital of Lesotho?”) (see S5 Text for details), and they were then given 2 minutes to find the answer using the Kadoodle search engine, which closely simulates the functioning of the Google search engine. All participants had access to the same search results (on two search result pages, each listing six search results) and web pages (which could be accessed by clicking on the corresponding search result). Only the order of the search results varied between the groups.

In the High-Trust group, the answer could always be found by clicking on the highest-ranking result – just as it is virtually always found in that position on the leading search engine. In the Low-Trust group, the correct answer could be found in any of the 12 search result positions except the first two. At the end of 2 minutes, participants were given a five-option, multiple-choice question and were asked to provide the correct answer to the question they were shown earlier. They were immediately then told whether their answer was correct or incorrect. In theory, the pre-training trials in the High-Trust group were strengthening the user’s tendency to attend to and click on the highest-ranking search result, and the pre-training trials in the Low-Trust group were either (a) extinguishing tendencies to attend to and click on high-ranking search results, (b) reinforcing tendencies to attend to and click on low-ranking search results (differential reinforcement of alternative behavior), or (c) having both effects.

SEME Experiment

Immediately following the pre-training, the participants in each of the trust groups were randomly assigned to three sub-groups: Pro-Candidate-A, Pro-Candidate-B, or a control group in which neither candidate was favored. The election we used was the 2015 election for the Prime Minister of the United Kingdom; the candidates were David Cameron and Ed Miliband. We chose this election to try to assure that our participants – all from the US – would initially be “undecided” voters. On a 10-point scale, our participants reported an average familiarity level of 1.3 (0.6) for David Cameron and 1.3 (0.6) for Ed Miliband.

All participants (in each of the six sub-groups) were then given basic instructions about the “political opinion study” in which they were about to participate. Then they read brief, neutral biographies of both candidates (approximately 150 words each, see S6 Text), after which they were asked eight questions about any preferences they might have for each candidate: their overall impression of each candidate, how likeable each candidate was, and how much they trusted each candidate. We also asked which candidate they would likely vote for if they had to vote today (on an 11-point scale from –5 for one candidate to +5 for the other, with the order of the names counterbalanced from one participant to another), and, finally, which of the two candidates they would in fact vote for today (forced choice).

They were then given up to 15 minutes to use our mock search engine to conduct research on the candidates. All participants had access to five pages of search results, six results per page (see S7 Text for details). All search results were real (from the 2015 UK election, obtained from Google.com), and so were the web pages to which the search results linked. The only difference between the groups was the order in which search results were shown. In the Pro-Candidate-A group, higher ranking search results linked to web pages that favored Cameron (Candidate A), and the lowest ranking search results (on the last pages of search results) favored Miliband (Candidate B). In the Pro-Candidate-B group, the order of the search results was reversed. In the control group, pro-Cameron search results alternated with pro-Miliband search results (and the first search result had a 50/50 chance of favoring either candidate), so neither candidate was favored. Prior to the experiment, the “bias” of all web pages had been rated on an 11-point scale from –5 to +5 (with the names of the candidates counterbalanced) by five independent judges to determine the extent to which a web page favored one candidate or another. The mean bias rating for each web page was used in determining the ranking of search results.

When participants chose to exit from our search engine, they were asked those eight preference questions again, and they were then asked whether anything “bothered” them about the search results they had been shown. If they answered “yes,” then they could type the details about their concerns. This was our way of trying to detect whether people spotted any bias in the search results they saw. We could not ask about bias directly, because leading questions of that sort generate predictable and often invalid answers (Loftus, 1975). We subsequently searched textual responses for words such as “bias,” “skewed,” or “slanted” to identify people in the bias groups who had apparently noticed the favoritism in the search results we showed them.

Results

We focused our data analysis on people in the two pre-training groups who answered all five of the pre-training questions correctly. These individuals not only demonstrated high compliance with our instructions; they also presumably were most highly impacted by the pre-training contingencies. On any given trial in which people did not find the correct answer, they presumably were not impacted by the low-trust contingencies.

For comparison purposes, we also analyzed data from people who scored lower than 100% on the pre-training questions; the bulk of this analysis is included in the Supplementary Material of this paper. As one might expect, participants in the High-Trust group answered our multiple-choice questions more accurately (MCorrect = 4.8 out of 5 [0.4]) than participants in the Low-Trust group did (MCorrect = 4.1 [1.0]; t = 10.37, p < 0.001, d = 0.92) (also see S1 Fig.). This was presumably because Low-Trust participants had more trouble finding the correct answer in the allotted 2 minutes. Focusing on the high-compliance participants reduced the number of people in the High-Trust group from 312 to 255 and reduced the number of people in the Low-Trust group from 239 to 100.

Please note that we did not exclude any participants from the experiment; rather, we chose to analyze separately data we obtained from high-compliance participants – that is, people who were most likely to have been impacted by the training contingencies – and low-compliance participants – that is, people who were less likely to have been impacted by the training contingencies.

Pre-Training

Participants in the High-Trust group spent significantly more time on the webpages that were linked to the first two search results (M = 169.7 s [124.9]) than participants in the Low-Trust group did (M = 135.7 s [86.1]; t = 2.92, p = 0.004, d = 0.32). Participants in the High-Trust group also clicked more frequently on the webpages linked to the first two search results (M = 5.9 [1.2]) than participants in the Low-Trust group did (M = 5.4 [1.5]; t = 3.00, p = 0.003, d = 0.37). Participants in the High-Trust group also spent substantially less time on each of the search engine results pages (M = 83.5 s [49.0]) than participants in the Low-Trust group did (M = 168.2 s [66.1]; t = –11.63, p < 0.001, d = 1.46). In other words, High-Trust group participants were attending more to the first two search results and spent less time searching in general.

SEME Experiment

Immediately following the pre-training trials, all participants transitioned to a standard SEME procedure, in which it appears that the Low-Trust pre-training impacted behavior in a number of ways.

The main finding in SEME experiments is that participants show little preference for one candidate or the other before they conduct their search, and that post-search, the preferences of the participants in the two bias groups tend to shift in the direction of the bias that was present in the search results they had been shown. SEME studies look at five different measures of this shift, the most important of which is called “vote manipulation power” or VMP (see S8 Text for how VMP is calculated). VMP is of special interest because it is a direct measure of the increase in votes produced by the bias. It is calculated from answers given to a forced-choice question we ask participants both pre- and post-search, namely “If you had to vote right now, which candidate would you vote for?”

Biased search results tend to produce substantial VMPs after a single search (Epstein & Robertson, 2015; Epstein et al., 2022). This finding was replicated in the present study; however, the bias-driven VMP in the High-Trust group (VMP = 34.6%, McNemar’s Χ2 = 23.56, p < 0.001) was substantially larger than the bias-driven VMP in the Low-Trust group (VMP = 17.1%, Χ2 = 1.56, p = 0.21 NS, z = –3.25, p = 0.001) (see Table 1 and S1 Table for further details; cf. S2 and S3 Tables for low-compliance data; cf. S4 Table for high-compliance versus low-compliance VMP comparisons).

Table 1 VMP percentages, search times, and results clicked by trust group (high-compliance participants, 100% accuracy in pre-training)

The different VMPs for the High- and Low-Trust groups can be explained by the different ways – all predictable from the pre-training session – these two groups interacted with our search engine in the political opinion portion of our study. Participants in the High-Trust group spent more time viewing the web page linked to the highest search result than participants in the Low-Trust group did (MHigh = 60.9 s [58.1]; MLow = 53.4 s [57.0]; t = 1.11, p = 0.27 NS; d = 0.13) (also see Fig. 2). In addition, participants in the High-Trust group clicked on the link to the first search result significantly more often than participants in the Low-Trust group did (MHigh = 0.9 [0.4], MLow = 0.8 [0.5], t = 2.18, p = 0.03, d = 0.22) (Fig. 3). Participants in the High-Trust group spent more time on web pages linked to search results on the first page of search results than participants in the Low-Trust group did (MHigh = 241.5 s [193.9], MLow = 204.6 [153.2], t = 1.71, p = 0.09 NS, d = 0.21), and participants in the Low-Trust group spent more than twice as much time on web pages linked to search results past the first page of search results than participants in the High-Trust group did (MLow = 51.0 s [51.8], MHigh = 20.4 s [32.5], t = –5.51, p < 0.001, d = 0.71) (Fig. 4). Participants in the High-Trust group also clicked on search results on the first page of search results significantly more often than participants in the Low-Trust group did (MHigh = 4.0 [1.6], MLow = 3.6 [1.6], t = 2.54, p = 0.01, d = 0.25), and participants in the Low-Trust group clicked on search results past the first page of search results significantly more often than participants in the High-Trust group did (MHigh = 0.5 [0.8], MLow = 1.0 [1.1], t = –3.94, p < 0.001, d = 0.52) (Fig. 5). These differences emerged presumably because people in the Low-Trust group had learned in pre-training to attend to and click on lower-ranked search results that people in the High-Trust group tended to ignore.

Fig. 2
figure 2

Time spent on search result web pages as a function of search result rank (high-compliance participants). Participants in the Low-Trust group spent less time on web pages linked to the first page of search results and more time on web pages linked to subsequent pages of search results than participants in the High-Trust group did. For low-compliance data, see S2 Fig

Fig. 3
figure 3

Clicks on search results as a function of search result rank (high-compliance participants). Participants in the Low-Trust group were less likely to click on results on the first page of search results and more likely to click on results on subsequent pages than participants in the High-Trust group were. For low-compliance data, see S3 Fig

Fig. 4
figure 4

Time spent on search result pages as a function of page number (high-compliance participants). Error bars show standard error of the mean. For low-compliance data, see S4 Fig

Fig. 5
figure 5

Cumulative clicks on search results per page as a function of page number (high-compliance participants). Error bars show standard error of the mean. For low-compliance data, see S5 Fig

Post search, differences also emerged on most of the answers to the seven other preference questions. Pre-search, for question 7 – voting preference measured on an 11-point scale – we found no significant differences in mean ratings in the three sub-groups (pro-Cameron, pro-Miliband, and control) in both the High- and Low-Trust conditions (Table 2). Post-search, the mean ratings in the three sub-groups were significantly different in both the High- and Low-Trust conditions (Table 2).

Table 2 Changes in voting preferences measured on an 11-point scale (high-compliance participants)

Pre- vs. post-search shifts in ratings on the 11-point scale were consistent with the predicted impact of the bias, with pre/post gaps larger in the High-Trust group than in the Low-Trust group (Table 3). In the control group, pre/post shifts were minimal and non-significant (U = 1259.5, p = 0.82 NS).

Table 3 Changes in voting preference for the favored candidate measured on an 11-point scale, bias groups only (high-compliance participants)

Pre-search, we found no significant differences among the three sub-groups (pro-Cameron, pro-Miliband, and control) on their answers to any of the six opinion questions we asked about the candidates (S5 Table; see S6 Table for low-compliance data). Post-search, significant differences emerged for all six of those opinion questions for participants in both the High- and Low-Trust groups (S7 Table; see S8 Table for low-compliance data). Moreover, the net impact of biased search results on people’s opinions (that is, the change in opinions about the favored candidate vs. the change in opinions about the non-favored candidate) was always larger in the High-Trust group than in the Low-Trust group and always shifted opinions (for both groups) in a way that was advantageous to the favored candidate (S9 Table; see S10 Table for low-compliance data; cf. S11 and S12 Tables for control group comparisons). However, nearly all the High- versus Low-Trust differences between pre/post changes in opinions about the candidates were nonsignificant (S13 Table; see S14 Table for low-compliance data). See S9 Text for information about perceived bias in the SEME experiment.

Discussion

The present study supports the theory that operant conditioning contributes to the power that search results have to alter thinking and behavior. The fact that a large majority (about 86%) of people’s searches are for simple facts, combined with the fact that the correct answer to such queries invariably turns up in the highest-ranked position of search results, appears to teach people to attend to and click on that first result and, perhaps as a kind of generalization effect, to attend to and click on nearby search results in a pattern resembling one side of a generalization gradient. Both eye-tracking studies and studies looking at click patterns find those kinds of gradients for both attention and clicks (Athukorala et al., 2015; Chitika Insights, 2013; Cutrell & Guan, 2007; Dean, n.d.; Epstein & Robertson, 2015; Granka et al., 2004; Joachims et al., 2007; Kammerer & Gerjets, 2014; Lorigo et al., 2008; Pan et al., 2007; Schultheiß & Lewandowski, 2020). On the cognitive side, it could also be said that that daily regimen of operant conditioning is causing people to believe, trust, or have faith in the validity of high-ranking search results, and it is notable that people are entirely unaware that this regimen exists.

The fact that people generally believe that algorithms inherently produce objective and impartial output does not in and of itself explain the existence of that gradient of attention and responding. When, in the pre-training portion of the current experiment, we directed attention and clicks away from the top positions in the search list, we disrupted the usual gradient so that in the SEME portion of the study, attention was directed toward lower-ranking search results (in everyday language, we “broke the trust” people have in high-ranking results). As a result, the extreme candidate bias that was present in the search results we presented to participants in our two bias groups had less impact on the people in our Low-Trust pre-training group (VMP = 17.1%) than it did on the people in our High-Trust pre-training group (VMP = 34.6%, p = 0.001).

We note that if SEME is a large effect because of generalization, it is not the simple kind of generalization that occurs when wavelengths of light or sound are altered (Mis et al., 1972). That is because the nature of the task in the training situation is inherently different from the nature of the task in what we might call the test situation (the SEME experiment) – and this observation applies both to the present experiment and to the way people use search engines on a daily basis. In the pre-training phase of our experiment, people are searching for simple facts, and the reinforcing consequence is the correct answer; this is also the case when people are searching for simple facts on real search engines. In the test situation, however, there is no correct answer; the user is asking an open-ended question on an issue about which people might have a wide range of different opinions. In other words, there is a mismatch between informational properties of the training and test settings (Hogarth et al., 2015). This problem has long been a challenge when, with various impaired populations, new behavior is taught in a classroom setting, but it fails to occur in, say, the home setting; hence, the long-running concern with “transfer of training” in the behavior-analytic literature (Baldwin & Ford, 1988). Although a simple-fact query might be easily discriminable from an opinion query – at least most of the time – the present experiment sheds no light on this issue. We can assert only that pre-training that favors lower-ranked search results causes people to look more closely at lower-ranked search results, and that in turn reduces the magnitude of the shift in voting preferences.

As noted earlier, convenience might also play a role in the power that SEME has to shift opinions and voting preferences, but if that were the main or even a significant factor in explaining SEME’s power, it seems unlikely that the Low-Trust training procedure we employed in the present experiment would have disrupted performance as much as it did. Breaking the pattern of reinforcement that usually supports search behavior seemed to override any importance that convenience (that is, that search position alone) might play in SEME.

Limitations and Future Research

At first glance, it might appear to be remarkable that so little retraining – a mere five search trials in which the correct answer to a search query could appear anywhere among 12 search results other than in the top two positions – could interfere with years of conditioning that reinforced attending to and clicking on the highest-ranking search items. Presumably, with more training trials, we could have reduced the impact of our biased search results far more than we did in the present procedure. But bear in mind that attending to and clicking on the highest-ranking search results has been consistently reinforced on a nearly continuous schedule – the kind of schedule that often makes behavior highly vulnerable to disruption when reinforcement is discontinued (Kimble, 1961; Lerman et al., 1996; Mackintosh, 1974). It is especially easy to disrupt behavior when it has been continuously reinforced in discrete trials (Nevin, 2012), which is always the case for search behavior on a search engine.

The present study is also limited in how it motivates participants to express their views about political candidates. They have little or no familiarity with the candidates or the issues, given that they are looking at a foreign election. Would similar numbers emerge in a study with real voters in the middle of a real election? This issue was addressed in Experiment 5 in the Epstein and Robertson study (Epstein & Robertson, 2015). That experiment included more than 2000 undecided voters throughout India during the final weeks of the 2014 Lok Sabha election for Prime Minister. Biased search results shifted both opinions and voting preferences, with shifts in voting preferences (the VMP) exceeding 60% in some demographic groups.

That said, recent research suggests that low-familiarity (also called “low-information”) voters differ in nontrivial ways from high-familiarity (“high-information”) voters (Yarchi et al., 2021). Our 2014 Lok Sabha experiment suggests that low-familiarity voters may be more vulnerable to SEME than high-familiarity voters, and so does a set of experiments we recently conducted on what we call the “multiple exposure effect” (MEE) (Epstein et al., 2023). Understanding the relationship between familiarity and vulnerability to manipulation will require a systematic investigation, however, not simply a comparison of values found in separate SEME experiments.

The familiarity issue does raise another question that we can address directly with the data we collected in the present study: Can we be assured that our participants were indeed undecided? Here we have strong affirmative evidence. As we noted in our Results section, the differences in pre-search opinion ratings across the three groups (pro-Cameron, pro-Milliband, and control) were nonsignificant (Table 2). In addition, both the voting preferences on the 11-point scale and the voting preferences on the forced-choice question showed no candidate preferences (Table 3, S1 Table). Post-search, all these measures showed clear and predictable differences.

Pollsters often seek out people who are likely to vote, and, presumably, a company like Google can, given the vast amount of information they collect about people, easily discriminate between likely and unlikely voters. In the present study, we did not screen for this characteristic. In future studies, we will consider screening potential participants with a question such as, “How likely are you to vote in upcoming elections?”

We have other concerns about the real-world applicability of the present study, and we are addressing them in other research. The present study exposed voters to biased search results just once, but in the real world, voters might be exposed to similarly biased search results hundreds of times before an election. Are multiple exposures to similarly biased search results additive over time? And how might opinions and voting preferences be affected if people are exposed to search results biased toward Candidate A on some occasions and Candidate B on others? Overall, do the opinions and voting preferences of undecided voters shift in the direction of the net bias?

In the real world, moreover, people are impacted by multiple sources of bias. In the traditional, non-digital world of political influence, many if not all of these sources of influence might cancel each other out. If Candidate A erects a billboard or buys a television commercial, Candidate B can do the same. However, in the world of Big Tech, things work differently. If, for any reason, the algorithm of a large online platform favors one candidate, there is no way to counteract its impact, and if multiple online platforms all favor the same candidate, the impact of these different sources of influence might be additive.

Implications and Concerns

Given the concerns that have been raised about the power of biased search results to impact people’s thinking and behavior, one might wonder whether informing people about the role that operant conditioning appears to play in their online decision making would have any practical benefit. We submit that raising such awareness would, unfortunately, have few or no benefits, for one simple reason: Search algorithms are designed to put the best possible answer in the top position; when one is searching for simple facts, that means the correct answer. A search engine that listed the best answer in a lower search position – especially in an unpredictable position – would be of little value. That means that the daily regimen of conditioning we described earlier will continue to occur as long as people continue to use properly functioning search engines. Worse still, people will always be unaware that the process by which they make both trivial and important decisions is being affected by a perpetual regimen of operant conditioning, as if they were rats trapped forever in an operant chamber.

So how can people be protected from bias that might occur in search results that are displayed in response to open-ended queries about, say, election-related issues? No matter what the cause of the bias, it can have a rapid and profound effect on the thinking and behavior of people who are undecided on an issue, and that, we believe, should be a matter for concern.

We suggest three ways to provide such protection. One would be for the US Congress, the European Parliament, or other relevant authorities to declare Google’s index – the database it uses to generate search results – to be a public commons (Epstein, 2019). This will quickly lead to the creation of hundreds, then thousands, of competing search platforms, each vying for the attention of different populations, just as thousands of news sources do currently. With numerous platforms having access to the index through a public API (an application programming interface), search will become both competitive and innovative again, as it was before Google began to dominate the search industry more than a decade ago.

Users could also be protected to some extent if browsers or search engines are at some point required to post bias alerts on individual search results or on entire search pages, with bias continuously rated by algorithms, human raters, or both. Epstein and Robertson (2016) showed that the magnitude of SEME could be reduced to some extent by such alerts (cf. Tapinsky et al., 2018; Wu et al., 2023). Alerts of this sort could also be used to flag the rising tide of online “misinformation” – an imperfect but not entirely unreasonable method for appeasing free speech advocates without suppressing content (Nekmat, 2020; Shin et al., 2023; cf. Bak-Coleman et al., 2022; BBC, 2017; Bruns et al., 2023).

Finally, a leak of documents from Google in 2019 showed that the company has long been concerned with finding ways to assure “algorithmic fairness,” primarily as a way of correcting what Google executives and employees perceive to be social inequities (Lakshmanan, 2019). Setting aside the concerns one might have about the possibility that a highly influential company might be engaging in a large-scale program of social engineering (Chigne, 2018; Epstein, 2018b; Savov, 2018), the good news is that Google has developed tools for eliminating bias in algorithmic content quickly and efficiently. One of the leaked documents was a manual for Google’s “Twiddler” application, which was developed “for re-ranking results from a single corpus” (Google, 2018). In other words, Google has the power to eliminate political or other bias in search results “almost as easily as one can flip a light switch” (Z. Vorhies, personal communication, June 26, 2020).

If steps are eventually taken to protect users from the bias in search results that might be displayed in response to open-ended queries, perhaps operant conditioning or other factors that currently focus user attention on high-ranking results will do no harm. As it stands, we believe that this almost irresistible tendency to attend to and click on high-ranking results, which is currently affecting the thinking and behavior of more than 5 billion people worldwide with no mechanisms in place to offset its influence, poses a serious threat to democracy, free speech, and human autonomy.