Online mis- and disinformation as a societal problem

By substantial margins, Americans believe that misinformation and disinformation are serious societal problems. Misinformation is information that is false or inaccurate. Disinformation is a subset of misinformation and refers to false or inaccurate information that disseminated with the knowledge that it is false or inaccurate—deliberate lies, as it were. For example, as early as 2016, the Pew Research Center found that 64% of US adults believed that “fabricated news stories cause a great deal of confusion about the basic facts of current issues and events,” noting that this sense was shared widely across incomes, education levels, partisan affiliations, and most other demographic characteristics (Barthel et al., 2016, para. 2).

Since then, the problem has only become worse. In a 2021 poll, the Pearson Institute and the Associated Press-NORC Center for Public Affairs Research found that 95% of Americans identified misinformation as a problem when they’re trying to access important information.

In the face of this seeming consensus that fake news is a problem, however, it is not true that there is consensus on what content constitutes fake news or misinformation. One study found that exposure to fake news was linked to higher trust in government and lower trust in media when your side is in power but not when the other side is in power (Ognyanova et al., 2020).

It is thus not surprising that many on the right believe that there is rampant misinformation about Covid-19 (believing that media falsely claim that masks help and that vaccines work) or about the Presidential election of 2020 (believing that media falsely claim the election was won by Biden fair and square), and that many of the left believe there is rampant misinformation about Covid-19 (believing that media falsely claim that masks don’t help, and that vaccines don’t work or are dangerous) or about the Presidential election of 2020 (believing that media falsely claim that Trump was the real winner of the election).

So, there is widespread agreement that mis- and disinformation are problems and that is too much of it in the public square but widespread disagreement about what counts as mis- or disinformation. This paper focuses primarily on ameliorating one aspect of the problem—reducing exposure of the public to deliberate lies or falsehoods, that is disinformation—although the discussion has some relevance to the broader misinformation problem as well.

The marketplace of ideas (and its failures)

This discussion of the marketplace metaphor and the First Amendment analysis is taken largely from Lin (2019).

Historically and traditionally, the response to a variety of competing views being expressed is the invocation of the marketplace of ideas. This metaphor posits that the value of a specific idea is determined in competition with other ideas rather than by the judgment of an external authority (such as government), and the judgments that people make in weighing these various ideas against each other determine which ones survive. Truth emerges through public debate and discourse of ideas, uninhibited by governmental interference.

Both US political leaders and courts have invoked the marketplace metaphor. For example, John F. Kennedy said “We are not afraid to entrust the American people with unpleasant facts, foreign ideas, alien philosophies, and competitive values. For a nation that is afraid to let its people judge the truth and falsehood in an open market is a nation that is afraid of its people” (1962, as cited in Peters & Woolley, n.d., para. 7). Nearly 150 years earlier, Thomas Jefferson contended that “for here we are not afraid to follow truth wherever it may lead, nor to tolerate any error so long as reason is left free to combat it” (1820, as cited in Thomas Jefferson, n.d.).

As for the US courts, Justice Oliver Wendell Holmes wrote in his dissenting 1919 opinion in Abrams v. United States that “the ultimate good desired is better reached by free trade in ideas – that the best test of truth is the power of the thought to get itself accepted in the competition of the market, and that truth is the only ground upon which their wishes safely can be carried out” (1919). Thirty-four years later, Justice William O. Douglas in United States v. Rumely explicitly introduced the term “marketplace of ideas” when he wrote “Like the publishers of newspapers, magazines, or books, this publisher bids for the minds of men in the marketplace of ideas” (1953).

Since then, many Supreme Court decisions have invoked the metaphor.Footnote 2 In the marketplace of ideas, good ideas push out bad ideas. Thus, under current First Amendment jurisprudence and decades of precedent,Footnote 3 it is virtually certain that any government regulation directed at intentionally false, misleading, or polarizing speech would have to be drafted quite narrowly and with exceptional precision to avoid running afoul of the First Amendment.

Moreover, the wisdom of allowing government agencies to make decisions about what counts as “a false, misleading, or inauthentic statement designed to manipulate and distort the political process” can legitimately be questioned. The political context of sharp polarization in which any government action would be conducted simply amplifies such concerns, as demonstrated by a 2023 decision of the Fifth Circuit Court of Appeals finding that the Biden Administration’s contact with social media platform companies to persuade them to take down posts on elections and Covid likely violated the First Amendment (Zakrzewski & Menn, 2023).

Nevertheless, it is also true that markets sometimes do experience market failure for various reasons, at which point governments often step in to remediate those failures. The philosophy underlying the First Amendment was developed by John Stuart Mill (1869/2002) at a time in human history when the overall volume of information available to the public was sparser by many orders of magnitude than it is today. Tim Wu (2017, para. 2) has explored this issue, arguing that “it is no longer speech itself that is scarce, but the attention of listeners,” an assertion that, if true, undercuts the basis on which the First Amendment was originally written and adopted.

In addition, some empirical research indicates that individuals of different political orientations have different sensitivities to and awareness of mis- and disinformation. For example, based on a longitudinal study of individuals engaging with social media regarding a variety of high-profile political news stories, Garrett and Bond (2021) find that conservatives perform worse than liberals at distinguishing truths and falsehoods, a finding partly explained by the fact that at present the most widely shared falsehoods tend to promote conservative positions, while corresponding truths typically favor liberals.

The plenitude of information also underscores an implicit assumption of the marketplace metaphor—that information consumers have access to all of the ideas and information that must be compared. But amidst today’s cacophony of tweets, emails, TikTok videos, and so on, this just isn’t true, and so the principle of remedying false speech with speech that is true doesn’t work if listeners are unable to attend to unbalanced or false statements, a condition facilitated by a social media environment that algorithmically insulates listeners from hearing competing messages.

The ability to remedy false speech with true speech is also hampered by the perverse incentives created by Sect. 230 of the Communications Decency Act. Section 230 was enacted to provide liability protections for new market entrants at a time when their power to damage society was limited. It still does so today, but it also empowers large social media companies—whose very business model is to promote engagement by any possible legal means—to ignore entirely the consequences of any choices they make regarding content moderation.

A proposal for warrant-based content self-moderation

A root cause of the misinformation problem is identified as negative externalities by Van Alstyne (2020), who then develops a theory of how markets might clear these externalities using tools of Coase (1960). Using the tools provided by this theory, the independent actions of speakers and listeners could be incentivized to reduce fake news without need of centralized content regulation. In 2021, Van Alstyne developed a system based on “warranted content” to internalize externalities but did not include implementation details (2023).

In 2023, Van Alstyne, Smith, and Lin (2023) used the Van Alstyne system of 2021 as the basis for a set of principles (see below) that should govern changes to Sect. 230 that if adopted could pass tests of political acceptability and still help to reduce the prevalence of online mis- and disinformation. Clemons (Clemons 2024, forthcoming) reviews the scope and nature of the today’s disinformation problem and kindly invited this paper as one to stimulate discussion. Accordingly, this paper recaps the main points of the Van Alstyne, Smith, and Lin paper (which of course drew heavily on the 2021 paper of Van Alstyne) and elaborates on several points that would have to be addressed in any serious implementation of the originally outlined concepts. One approach to implementation is developed here.

To guide changes to Sect. 230, the 2023 paper proposed four principles.

Principle 1 was that First Amendment jurisprudence should continue to reign and that the government should be kept out of the business of defining unacceptable content. However, this sentiment is also built on the Supreme Court’s majority opinion in Gertz v. Welch (1974) that intentional lies don’t have the same social value as other forms of speech. Specifically, the opinion held that “there is no constitutional value in false statements of fact. Neither the intentional lie nor the careless error materially advances society’s interest in ‘uninhibited, robust, and wide-open’ debate on public issues.” Significantly, focusing on statements that are falsifiable allows definitive and dispositive judgments to be made about their truth value.

We also argued that original speech should have a higher degree of First Amendment protection than algorithmically amplified speech. In the eighteenth century, the purpose of the First Amendment was to guarantee the rights of people speaking—no computers or internet or social media then. Why then should we grant today the same degree of rights to computers making decisions about what speech to amplify? Others, such as the Aspen Commission on Information Disorder (Aspen Institute, 2021), have come to similar conclusions.

Principle 2 was that users should have a high degree of control though not necessarily absolute control over the rationale for the content they see. Today, users are subject to the rationales of the firms that provide social media services and afford users little choice at all. Instead, users should have the ability to import content moderation algorithms of their choice into the infrastructure where their data reside.Footnote 4 At the same time, users who are content with existing arrangements should not be forced to change them.

In principle, a content moderation service could allow a user who employed it to receive only news items that were shown on the Daily Caller, or Fox News, or the Wall Street Journal, or only items from The Nation or MS-NBC or the Washington Post. Users selecting different content moderators would see different sets of news items. A shopping algorithm could show only items made in the USA. Nor would display or suppression necessarily be the only choices. One algorithm might label content as “unverified” or “challenged.” Another might make it harder but not impossible to access undesirable content (e.g., locating it later in a user’s feed).

Fundamentally, all such choices about the scope and nature of the content moderation would belong to the user rather than anyone else. Content streams seen by the user would include mostly (but not exclusively) content that conformed to the moderation policies the user chose. Any party so motivated could create and offer for sale a filtering algorithm that would suppress certain kinds of content designated as undesirable. One person could choose content moderation handled by BBC, another by Fox News, and another by Consumer Reports. The aggregate of all such parties would constitute a competitive marketplace for filtering products that users could select for themselves.

In our proposal, the role of a content moderator is to filter the content streams coming from the social media companies with which the user interacts. Social media companies can continue to deliver the same content streams that they always have, but users see only that fraction of those streams that conform to the moderation policy that users themselves have chosen. At their own discretion, they could also seek out other policy-conforming content that would otherwise not appear in the streams of the social media companies. In addition, advertisers will contract with content moderators as well as platforms, based on their particular moderation policies, to avoid association with content they, the advertisers, did not like. Moderators would be free to make money based on ads or subscription fees or both.

Finally, although third-party content moderators are conceptually independent from the social media companies themselves, the latter would have to be required to offer Application Programming Interfaces to third-party moderating software that allow those parties to implement the services described above.

Principle 3 was that platforms have some responsibility for facilitating a degree of diversity in the information environment to which individuals are exposed so that “‘uninhibited, robust, and wide-open’ debate on public issues” praised by the Supreme Court can realistically occur. For this to occur, it must be possible to implement content-neutral measures that do facilitate exposure to at least some counter-speech as a remedy to false speech, that is, to pierce the filter bubble and expose listeners inside to other points of view, even if they would prefer not to be so exposed at all. Consistent with this principle, listeners inside a filter bubble could be insulated from most counter-speech but not all.

We noted that one way to implement Principle 3 would be to give content creators the ability to bypass moderation policies by “warranting” their content—what this paper calls a warrant-based content self-moderation scheme. A warrant is an enforceable attestation that the content in question is not per se illegal (e.g., does not violate copyright laws, is not child pornography, is not an immediate incitement to violence or some illegal act) and that it is not materially false (e.g., does not claim that the Pope endorsed my candidate, or claim that vaccines contain microchips, or that elections are on Wednesday not Tuesday).

The content creator warrants content by placing some specific asset at risk, an asset which is forfeited if the attestation proves to be invalid or untrue. In return, the content creator gains the right for his or her content to not be screened out in delivery to the user by the moderator’s filter. This ensures that “counter-speech” can pierce any filter bubble that a content moderation policy might create. A party objecting to the specifics of the warranted content is free to challenge the warrant, and if the content creator loses the challenge, the creator loses the asset. It is the prospect of losing the asset that incentivizes the creator to be truthful.

It’s worth elaborating here on an important question that arises in implementing Principle 3—what is the infrastructure for adjudicating challenges? Our original proposal suggested that a decentralized system of juries could be established for this purpose. Making that more specific, I propose here that this infrastructure should be built around peer juries that would be politically balanced by selection. The costs of maintaining and operating this infrastructure would be covered by nominal fees paid by those bringing challenges.

This proposal for peer juries is based on Allen et al. (2021), who found that politically balanced peer juries of ten persons can make judgments about the veracity of an article as well as professional fact checkers, and further that such juries could be recruited at relatively low cost (about $1 per judgment). Allen et al. were successful in recruited jurors from an online labor market paying about 1.2 times the federal minimum wage in the USA. (Perhaps surprisingly, Allen et al. also found that jurors were able to make these relatively accurate judgments based only reading the article’s headline and the lede rather than reading the full article and/or doing their own research; thus, the time needed to render judgments could be minimized.) The implication of Allen et al.’s findings is that a system for adjudicating challenges could be implemented at relatively low cost, and charging challengers a nominal fee to bring a challenge would not pose a substantial burden for them.

Principle 4 was that the parties responsible for content moderation should incur some liability when the content they pass to their users is inconsistent with their moderation principles and causes harm. The extent of liability should also depend on the magnitude of harm caused—harmful disinformation propagated widely should result in greater liability than the same disinformation propagated minimally. Since content moderation entails decisions about what content to deprecate and to recommend, amplification is conceptually a part of content moderation.

A key point is that warranting content shifts liability for specific content from content moderators to content creators: Creators don’t need to warrant content, and moderators don’t need to pass nonwarranted content to their users. However, content moderators are liable not for the truth value of individual messages or any particular content but rather for behaving consistently in accordance with their stated policy. For that task, they are subject to all of the standard practices and regulations that govern deceptive advertising—firms that contract with users to provide services described by a certain policy must indeed provide services consistent with that policy, and if they do not, they can be held liable under consumer protections regarding unfair and deceptive advertising and trade practices.

I note here that in practice, content moderators have strong business incentives to err on the side of overblocking (not passing content that is consistent with its filtering policy) rather than underblocking (passing nonwarranted content that is inconsistent with its filtering policy), because their business model is one of the blocking contents unwanted by their users. From the user’s perspective, underblocking is much more noticeable than overblocking, because underblocking results in the presence of something objectionable to the user while overblocking results in the absence of something that the user would have wanted to see. As long as overblocking is not excessive and the user still gets a reasonably rich stream of desired content (even if false), content moderators will focus on reducing the amount of underblocking. Since users would be far less likely to notice overblocking (i.e., to realize that some desired content had been blocked) than to notice underblocking (i.e., that some undesired content had seeped through), content moderators acting in accordance with their business incentives are unlikely to face significant complaint from their users.

An issue is how liability should be determined for failure to abide by a moderator’s performance standards. Traditionally, the extent of liability for defamation depends on the damage done to the party harmed. But if the issue is instead whether the practices of a content moderator are living up to its own policies, it makes sense to use as a baseline metric for liability the amounts of revenue earned as the result of the policy violations. Holding moderators liable for more than the ad revenues they generate from amplifying untrue and nonwarranted content makes doing so unprofitable, even if (as we propose) the original content remains and is accessible (see below). The social cost of the unamplified original content is not eliminated but can be regarded as the cost of preserving free speech rights.

Additional questions about a warrant-based content self-moderation system

The original proposal outlined in Van Alstyne, Smith, and Lin (2023) was silent on important details.

The nature of the asset placed at risk

We did not address the nature of the asset placed at risk by content creators wishing to penetrate moderator bubbles.

Responding to this point, I note that dollars are not the only form of value—content creators also value larger audiences, and so a point system that influences the size of their potential audiences could also serve the role of an asset placed at risk.

Consider a system in which a content creator Charlie would put at risk X “deprecation points” to warrant a particular item of content and a third party keeps a running total (RT) of deprecation points that is the sum of all warrants that Charlie had forfeited. Charlie’s RT would determine the degree to which any social media platform would deprecate all of Charlie’s subsequent content in its distribution—a larger RT would mean greater deprecation. A history of bad behavior reduces Charlie’s future content distribution and future revenues, at least for a while. To give Charlie an opportunity to make up for past forfeits, these deprecation points would expire after a certain time, so the deprecation penalty would not last forever. Deprecation points thus act as a currency whose overall loss (forfeiture) Charlie wishes to minimize and are an incentive for Charlie to refrain from warranting content that may be false. (He is, of course, to generate any content, true or false, without constraint by this proposal if he chooses not to warrant it.)

The appropriate amount of an asset placed at risk

Another important question is how to determine the number of points that Charlie should place at risk. A value close to zero is nearly equivalent to no warrant at all, since virtually nothing is placed at risk; under these circumstances, Charlie could lie with impunity. Thus, the structure of the proposal could provide incentives for Charlie to place more points at risk. One possibility for the incentive calls for modifying the original proposal—a higher amount at risk could correspond to a higher probability of being passed through the moderator’s bubble. Any monotonically increasing function that is zero for x = 0 and 1 for x = any arbitrarily large number will satisfy this requirement.

Under this scheme, Charlie wants to minimize the number of points at risk so as to keep his RT low but also wants to increase the number of points for any given warranted item so as to increase the likelihood of passing through a moderator’s bubble. It is this tension that influences Charlie to refrain from staking a number of points that is either too high or too low.

The relationship between the amount of an asset placed at risk and degrees of content deprecation

Still another question is how point totals correspond to different types and degrees of deprecation.

Consider first what it might mean in practical terms to deprecate content. Social media platform companies have been built around a goal of reducing “friction” for disseminating information—they seek to make it easy for users to reach audiences quickly, and friction is anything that interferes with this process.

From time to time, the social media companies have acknowledged some degree of responsibility for reducing the spread of disinformation. To a large extent, they have relied on three types of deprecation mechanism. First is a mechanism that flags a given disinformation item as false, disputed, or questionable in an effort to inform the user about its provenance; this mechanism relies on the assumption that users will be less likely to retransmit of items flagged in such a manner. This assumption is questionable at best, but this post does not deal further with this mechanism.

The second deprecation mechanism is for the platform to downrank disinformation to make such information less prominent in the information stream to users; this mechanism relies on the assumption that messages that are less prominent in the information stream are less likely to be noticed and thus users will be less likely to retransmit downranked items as compared to comparable items that are not downranked. Of course, the most extreme version of downranking is the complete suppression of a given disinformation item—such an item would never appear in the user’s information stream.

A third mechanism focuses not on the information per se but in restricting a bad actor so that the actor cannot retransmit or originate content. Such restriction can be temporary or permanent. If temporary, the user can be given a warning to behave in accordance with the terms of service.

However, these three mechanisms do not come anywhere near exhausting the possibilities for how to deprecate content. In particular, many technical parameters underlie the user experience, and all of these parameters can be adjusted. As noted above, ranking or prominence in the information stream is one such parameter. Other parameters include the following:

  • Latency, which refers to how long it takes for a user action retransmitting a particular information item to be reflected in the information streams of that user’s audience. When latency is high, the time between the user retransmission of an item and the appearance of that item in his or her audience’s information stream is long—the US Postal Service is an information channel with high latency, as it takes a few days for a letter to travel coast-to-coast.

  • Bandwidth, which refers to how long it takes for a particular information item to be fully available to someone in a user’s audience. Over a high-bandwidth channel, a movie clip takes only a short time to be fully available to a viewer; over a low-bandwidth channel, it takes longer.

  • Resolution, which refers loosely to the number of bits used to capture details in a particular information item. For example, a low-resolution photograph shows fewer details than a high-resolution photograph.

  • Convenience of retransmission, which refers to the effort that the user must expend in retransmitting an item. For example, low convenience might mean that the user would have to make more clicks or to wait longer to retransmit an item.

  • Volume, which refers to the number of information items that a user retransmits in a given time period. Prolific users are characterized by high volume; i.e., they retransmit many items.

  • Audience per retransmission, which refers to the number of additional users in a given user’s audience. Users with many followers have large audiences, and all can be reached with a single retransmission action.

In principle, any or all of these performance parameters can be adjusted. Note further that they are all analog parameters; i.e., they can be adjusted continuously. They are also bounded with zero at the lower end with a known upper limit, namely the value associated with ordinary content being transmitted. Perhaps most importantly, they can be easily used in a way that is content-neutral: any viewers of content—any kind of content, whether or not mis- or disinformation—from deprecated content creators will experience additional friction in the form of a delayed appearance, a slower-to-load screen display, an image or a video or an audio that is degraded in fidelity, and so on. (These undesirable impacts would be felt to a lesser extent if the total RT for a given content creator were lower.)

How should a deprecation RT be used operationally? The same kind of function as previously discussed works well here—recall that the output of that function is a number between zero and one, and the input is a number from zero to something arbitrarily large. So the output provides a percentage that can be applied to the maximum value of the performance parameter being deprecated.

Interpreting deprecation scores

An RT can thus be interpreted as a real-time global judgment about the worthiness of a given content creator to broadcast to its audience in an unrestricted (frictionless) manner. If Charlie’s RT were low, it would indicate Charlie’s worthiness to broadcast in a more unrestricted manner—because Charlie would be one of the following:

  1. (1)

    A newcomer in the content creation business (and hence entitled to presumptions of worthiness)

  2. (2)

    A party who had prevailed in most or all of the challenges to content he had previously warranted

  3. (3)

    Someone who had waited long enough without transgressing that he should be given another chance to broadcast in an unrestricted manner

By contrast, a high RT would mean that Charlie is none of these, and thus should have his expressive rights curtailed for some period of time.

It is worth noting that at some point in its history, Facebook apparently did employ a method similar to the one described above to moderate content. In a blog post dated November 15, 2018, Mark Zuckerberg, CEO of Facebook, offered the observation that “left unchecked, people will engage disproportionately with more sensationalist and provocative content” nothing that “no matter where we [Facebook] draw the lines for what is allowed, as a piece of content gets close to that line, people will engage with it more on average – even when they tell us afterwards they don’t like the content” (2018, Discouraging Borderline Content section, paras. 1–2). Graphically illustrating this phenomenon, he offered a figure very similar to Fig. 1.

Fig. 1
figure 1

User engagement increases when content is more provocative or sensationalist (Zuckerberg, 2018)

Assuming this image is based on real data, it is particularly useful because it necessarily indicates the existence of a calculable metric for sensationalism indicating how far a given piece of content is from the policy line beyond which the content would be prohibited. That is, it is possible to assign to any piece of content a score that indicates (a) whether that content exceeds some policy determined threshold and (b) whether that piece of content is closer or farther away from the policy threshold than another piece of content. Scoring any given piece of content for sensationalism then becomes the responsibility of the screeners—which are necessarily automated AI agents given the amount of content that needs to be examined.

In this context, the sensationalism metric roughly corresponds to a deprecation score, except that the former applies to a particular item of content and the latter to a content creator.

Zuckerberg goes on to indicate that it is possible to “penalize[e] borderline content so it gets less distribution and engagement. By making the distribution curve look like Fig. 2 below where distribution declines as content gets more sensational, people are disincentivized from creating provocative content that is as close to the line as possible” (2018, Discouraging Borderline Content section, para. 3). Zuckerberg calls this approach the second most effective way to stop the spread of misinformation (the most effective way being to delete the accounts that create misinformation).

Fig. 2
figure 2

User engagement forced to decreases when content is more provocative or sensationalist (Zuckerberg, 2018)

(Understanding borderline content is tricky. At the time of Zuckerberg’s post, Facebook had five categories of content (divided into 23 subcategories) that it defined as potentially harmful, i.e., that potentially raised issues concerning conformance to community standards. Facebook sought to remove content that falls into these categories, which can generically be called harmful content. But no matter a line is drawn to separate harmful from non-harmful content, some kinds of content approach but do not cross that line—such content could be called “borderline content.”).

It’s unknown whether Facebook (now Meta) continues to use this approach to managing the spread of disinformation. But taking Zuckerberg at face value as expressed by his November 2018 blog post, it appears that it would be relatively straightforward to modify the scheme Zuckerberg outlined then to the proposal set forward here—all of the deprecation mechanisms described above could be used as ways to reduce user engagement and plugged into the Facebook scheme in place in 2018, and the deprecation RT substituted for Facebook’s sensationalism metric.

Managing deprecation running totals

The deprecation scheme here depends on the ability to associate a deprecation RT with a given content creator.

To manage RTs for a large number of content creators, either centralized or decentralized mechanisms can be imagined. A centralized mechanism is easiest to describe. Thus, consider the establishment of a third-party organization whose the sole purpose is to manage deprecation RTs for content creators—when queried about a particular content creator, it simply returns the current RT. The adjudicatory mechanisms determining whether a warrant is forfeited also report the outcome of their decisions to this third-party organization, which then adjusts the RT as appropriate.

The third party can also act as a repository of all warranted content that has judged unfavorably, i.e., all content that has resulted in an increase in its creator’s RT. Making this repository available freely available online and publicizing its existence helps to make deprecated content more easily available to those who wish to search for it (and, not incidentally, insulating the proposal against charges of content suppression).

One obvious issue arises—what prevents a content creator with a high RT from adopting another brand-new online identity, which would presumably start fresh with a deprecation RT of zero? I argue that a couple of factors work to inhibit this behavior.

First, and perhaps most importantly, content creators have strong incentives to brand themselves for the very purpose of building audiences—adopting new online identities works against consistent branding. To the extent that they adopted a new online identity but still tried to maintain an association with previous identities through the use of content (e.g., “This post brought to you by “X,” formerly named “Y”), they would leave behind an evidentiary trail that could reassociate the new identity with the old one. This re-association would eliminate benefits of escaping punishment that could be derived from the adoption of a new identity.

Admittedly, the new identity could benefit the creator in the short run, that is, until the adoption of the new identity is detected. Thus, some kind of periodic check of content emanating from apparently new creators (i.e., for all new online identities) would be useful. If a given account did not accumulate any deprecation points for a long period of time, one could be reasonably confident that the account now had a significant investment in its reputation and the periodic check could be discontinued.

Discussion and limitations

It’s important to understand what this proposal does not do.

  • This proposal does not address the full extent of the very broad issue of information disorder (Aspen Institute, 2021) or pollution (Wanless, 2023) or corruption of the information environment. This proposal focuses only on information that is falsifiable, and arguably, the largest part of what makes people unhappy with their online experiences is of that nature is content that is obnoxious, distasteful, hurtful, and so on—but ultimately not legally harmful or prohibited.

  • This proposal, if implemented, would not guarantee that warranted content will be seen by everyone who disagrees with it. It would only increase the likelihood that such content will not be filtered out by any content moderator. Content moderators are under no obligation to seek out warranted content on their own, though they may do so if they so choose.

  • The proposal does not suppress or inhibit disinformation in real time—it only acts to create disincentives for a content creator to push disinformation in the future. A content creator can warrant information that he knows to be false and then circulate it in the hopes that no one will challenge it. If so, then he has “gotten away” with it. If the warrant is challenged and he loses, his ability to deliver a high-quality media experience in the immediate future is degraded—but the original disinformation remains available.

  • The provisions of the proposal are not mandatory. No content creator is required to warrant content, and in that sense, it does not inhibit the creation of any kind of content, whether or not such content is mis- or disinformation. Indeed, it is anticipated that the vast majority of content creators will not take the trouble to warrant content, simply because they don’t particularly care about piercing filter bubbles of people who don’t want to see the content they create. This means that the vast majority of online content will not be warranted either and will be just as available to users now as it has ever been. Those who are satisfied with the content moderation of the social media companies today are free to choose “status-quo” moderation options.

  • Counter-speech that does pierce filter bubbles will be a very small fraction of the content displayed therein. As noted immediately above, most content creators won’t bother to warrant content, which means that a content moderator has every right to selectively filter content undesired by users. Unwarranted content that is consistent with user preferences is freely passed to the user, but the only content passed to users that is inconsistent with user preferences must be warranted.

  • The proposal does not solve the problem of people who want to remain in their own filter bubbles where they can (mostly) hear only what they want to hear. One could well argue that societal discourse on important issues should proceed from a common base of knowledge, and it was famously said that “everyone is entitled to their own opinions but not their own facts.”Footnote 5 This maxim should be patently and obviously true to anyone, but societal discourse today—especially discourse involving politics—does seem to reflect the sentiment that people should be able to choose their own facts. And, there’s no law that forbids ignorance.

  • The proposal does not deal directly with complaints that the platforms preferentially suppress conservative or liberal content. Because the proposal presumes that content moderators are filtering content streams intended for the user, a content item that the platform doesn’t stream to the user is irrelevant to the operation of the content moderator. However, the proposal does allow content moderators to “enrich” the content stream to have much more conservative or liberal content. It also allows content moderators to seek out other content that is not included in the platform’s stream to the user, and in this way to reduce any alleged platform suppression.

  • The proposal does not eliminate the ability of social media platforms to have content moderation policies of their own, and users who are satisfied with current policies should have the option of staying with them. That is, they could be said to want a “status-quo” moderating option.Footnote 6 A social media platform could indeed offer a voluntarily chosen service to its users (as long as it did not privilege its own moderation service in presenting users with choices about which moderation services to choose). It is most likely the platform company would offer its status-quo moderating option for free, a point suggesting that competitors would also offer their services for free or a nominal charge and hence their revenues would result from selling ads.

Perhaps, the most important uncertainty about this proposal relates to the willingness of media platforms to engage at all in this or any other similar effort. Recent trends in the social media space may foreshadow a declining interest of the platform companies in trying to diminish disinformation passing through their services. The most prominent example, of course, is Twitter, now X, which has dissolved its trust and safety groups, lifted restrictions on radical and risky profiles, eliminated tags that notified users of affiliations with foreign governments (including outlets propagating Russian and Chinese narratives), and substituted the identification system for verified users with an “authentication through payment” method (Hammond-Errey, 2023). A number of commentators, including Frances Haugen (Haugen, 2023) and Kara Swisher (Swisher, 2022), argue that Facebook in fact does not particularly care to engage in disinformation management practices that impinge on its business model.

Also significant is that the very idea of warranting content and requiring platform companies to allow (indeed, to enable) third-party content moderators to control the flow of content to end users (in other words, the entire Van Alstyne approach) would constitute a large-scale disruption of the existing business models of the social media companies.

Today, the platform companies make their money on selling ads directly to businesses wanting consumer attention—and the former dictate the terms to the latter. Under this proposal, these companies could sell ads and thus obtain revenue directly only through their status-quo moderation; all other revenue would be derived from sharing in the revenue taken in by content moderators, who could in principle take over the majority of advertising market. This, alone, may be reason for platforms to resist this proposal.

On this last point, our original proposal called for standardized revenue sharing between the platform companies and the content moderators, the standard to be established on fair, reasonable, and non-discriminatory (FRAND) terms. But generally, FRAND terms presume the existence of organizations that can negotiate them in the first place. It’s possible to imagine the platform companies establishing a body unit to participate in such negotiations, but bootstrapping into existence what is now a nonexistent set of content moderators remains a problem. Thus, one could imagine that the initial establishment of FRAND terms, or at least a process for arriving at FRAND terms, would have to be accomplished legislatively.

Given that warrant-based content self-moderation would be strongly resisted by the social media companies, it makes sense to seek some kind of legislative mandate to force a chance in their business practices. Many of these efforts have foundered on First Amendment concerns about government involvement making content determinations, and thus, we (Van Alstyne, Smith, and Lin (2023)) were led to making this proposal in the context of possible changes to Section 230.


This paper addresses the tension between disinformation as a broadly recognized societal problem and the lack of political consensus on what specific information counts as disinformation. This tension is further exacerbated in the USA by First Amendment jurisprudence minimizing the role that government may play in content-based regulation of speech. The result is an online information environment that renders inoperative the mechanisms posited by such jurisprudence to promote constructive dialogue between opposing points of view and to allow higher quality speech to win out over speech of lower quality and value. These mechanisms are inert because people of various persuasions retreat into their own information bubbles and have fewer opportunities even to hear fair presentations of opposing views, i.e., counter-speech.

Van Alstyne’s proposal for what I have named “warrant-based content self-regulation” offers an opportunity to provide a measure of counter-speech in such situations that does not rely on centralized content-based regulation. By selecting their preferred content moderation policy, users get to filter out most speech that they don’t want to hear; the exception is for “warranted” speech—content that a content creator has warranted is true. Warranted speech can override content moderation policies, but such affordance does not guarantee that such speech is guaranteed to reach users—a warrant only guarantees that the associated content won’t be filtered out by the content moderator. A successfully challenged warrant (that is, warranted speech that is found to be untrue) results in a penalty being imposed on the content creator.

Because this approach is content-neutral, government regulation requiring platform companies to make applications programming interfaces available to third-party content moderators should pass First Amendment muster. Furthermore, to the extent that content creators choose to warrant their content, users living inside algorithmically generated filter bubbles will have a greater likelihood of being exposed to some degree of counter-speech.

But it is worth asking whether algorithmically generated filter bubbles are a major contributor to political polarization. On this point, the literature is mixed. For example, Levy (2021) finds experimental evidence that social media algorithms limit exposure to counter-attitudinal news and thus increase polarization, in essence promoting the creation of filter bubbles. Stein et al. (2023) found that because ideological segregation disproportionately helps to spread messages that are otherwise too implausible to spread, the fraction of false information circulating is systematically greater in ideologically segregated networks.

Other scholars take an opposing view. Gentzkow and Shapiro (2011) observe that “internet news consumers with homogeneous news diets are rare.” Garrett (2013) describes as both prevalent and wrong the idea of “perfect ideological insularity,” i.e., of “political echo chambers, in which news consumers seek out likeminded partisans while systematically shielding themselves from other viewpoints. Barberá et al (2015) suggest that the degree of ideological segregation in social media usage is less than is commonly believed. One possible reconciliation between these disparate findings could be that perfect echo chambers and filter bubbles do not exist and that the boundaries of all communities exhibit some porosity to discordant information, thus shifting the question to one of degree rather than one of kind.

A second important question is the how and to what extent, if any, counter-speech experienced by those in a filter bubble acts to change their attitudes, opinions, or behavior. Here again, the literature is mixed. For example, the phenomenon of belief perseverance was first demonstrated by Festinger et al. (1956), who found that members of a certain cult who believed the world would end on a specific identified day continued to believe in their cult’s teachings even after that day came and went without the world ending. Generalizing this phenomenon, Ross and Anderson (1982) found that people hold to their initial beliefs to a degree that is normatively inappropriate, even in the presence of credible challenges to them. Nyhan and Reifler (2010) found evidence of a backfire effect to some corrections of political misconceptions—exposure to information correcting a mistaken belief resulted in a stronger commitment to the original belief rather than a weaker one. Relatedly, Ecker et al (2014) documented a continued influence effect, in which individuals continue to rely at least to some extent on information in reasoning or decision-making even after that information has been authoritatively discredited or shown to be false.

On the other hand, a number of studies have failed to reproduce the backfire effect, at least under certain circumstances. Wood and Porter (2019) conclude that “backfire is stubbornly difficult to induce, and is thus unlikely to be a characteristic of the public’s relationship to factual information. Overwhelmingly, when presented with factual information that corrects politicians—even when the politician is an ally—the average subject accedes to the correction and distances himself from the inaccurate claim.” Bullock et al (2015) find that simply paying people as an incentive for providing correct answers to factual questions reduces partisan gaps in those answers, and thus that “the apparent gulf in factual beliefs between members of different parties may be more illusory than real.” Guess and Coppock (2020) find that when people are exposed to corrective information, they do, on average, update their views in the direction that implied by that information.

What is clear from these and other studies is that misperceptions are often, to a certain extent, “sticky”—it takes more than a simple correction to permanently shift a belief based on uncorrected information. The backfire effect corresponds to a shift in the “wrong” direction, entailing a greater commitment to one’s original belief. An educational or corrective effect corresponds to belief shifting in the appropriate direction, entailing a lesser commitment to one’s original belief. In either case, shifting belief happens under some conditions, which may include political or ideological predispositions, the manner in which corrective information is presented, and so on.

Finally and as noted above, the volume of warranted counter-speech that must be passed to users will be small relative to the volume of unwarranted “preferred” speech (i.e., speech that is consistent with user preferences). Thus, to the extent that relative volume makes a difference, preferred speech will have a considerable edge.

This background is necessary to understand the likely effects on the extent to which the public debate is poisoned by belief in misinformation of introducing warrant-based content self-regulation. Such regulation would increase the flow of counter-speech into filter bubbles. But if in fact a substantial fraction of the public lives in filter bubbles, and if people are highly resistant to counter messaging, the overall effect of an increased flow of counter-speech would be small.

Thus, a reasonable question to ask is the following—given that the proposal calls for a substantial upheaval in the business models of social media companies, and thus would likely entail substantial political effort and costs in implementing any such approach in practice, would the modest benefits described in the previous paragraph be worth the effort? Is that juice with the squeeze?

I argue the answer is yes—not because of what might be a small reduction in the degree to which misinformation is a part of the public debate, but rather for an entirely different reason. In particular, an important political element of political debate today concerns the nature and extent of government involvement in suppressing disinformation. In a period of time in which Democrats have control over the executive branch, Republicans will inevitably regard government efforts to improve the quality of online information as attempts to interfere with free speech and the free flow of information—and vice versa. Because warrant-based content self-regulation is based on a decentralized (i.e., non-governmental) approach for speech and counter-speech, it is an approach to addressing the problem that both sides should be able to support. That in itself makes the approach worthy of consideration.

The worst aspect of the original Van Alstyne proposal and the implementation described above is the large-scale restructuring of the current social media ecosystem that it entails and the resulting complexity. But the relatively simple system that exists today has downsides that are arguably existential threats for democratic society today. Any effort to address and resolve (or at least to mitigate) conflicting imperatives (e.g., between individuals enjoying their free speech rights and individuals being better informed) is bound to entail some degree of complexity, since balancing imperatives is always a complex task. Thus, I offer the ideas in this paper as a further step in outlining an arrangement that strikes a better balance than exists today. Two directions for future research thereby suggest themselves: mechanisms to encourage platforms’ acceptance and a simpler implementation for the Van Alstyne approach.