Introduction

Social media microblogging platforms, specifically Twitter, have become highly influential and relevant to shaping attitudes towards vaccination. With 206 million daily active users as of 2021, Twitter has substantial reach and daily exposure being the most popular social network for news consumption (Auxier & Anderson, 2021; Statista, 2021). Moreover, Twitter allows people to express their beliefs about vaccine confidence or hesitancy, their trust or mistrust in vaccines as well as their stance on civil rights and vaccination mandates. Vaccine hesitancy was defined by the 2011 interdisciplinary World Health Organization’s Strategic Advisory Group of Experts (SAGE) Working Group on Vaccine Hesitancy as the “delay in acceptance or refusal of vaccines despite its availability” (MacDonald & group, 2015). Moreover, the SAGE working group recognized at least three universal factors (3C Model) contributing to vaccine hesitancy, subsequently developing the 3C vaccine hesitancy model consisting of (1) vaccine confidence, (2) vaccine complacency, and (3) vaccine constraints (practical vaccine barriers). The 3C and subsequent vaccine hesitancy models (Betsch et al., 2018) have shown vaccine confidence to play a significant role and explain the most substantial proportion of variance underpinning vaccine doubt that in turn contributes to individuals not vaccinating. Therefore, in this study we explored the degree to which vaccine confidence was framed on social media and how it informs profiles of vaccine hesitancy for both HPV and COVID-19—the two most controversial yet effective and underutilized vaccines for which there remains substantial reluctance among the public. Hence, our first research question asks:

  • RQ1 How is confidence in the HPV and the COVID-19 vaccines framed in the Twitter discourse?

Answering this question was possible by casting the search for framings of vaccine confidence as a Natural Language Processing (NLP) problem operating on Twitter. To our knowledge, there are no current NLP techniques capable of identifying how vaccine confidence is framed in social media discourse. Social science stipulates that discourse almost inescapably involves framing (Boydstun et al., 2014; Chong & Druckman, 2007; Entman, 2007)—a strategy of highlighting certain issues to promote a certain interpretation or attitude. For example, when misinformation is used in framing vaccine confidence, it typically results in vaccine hesitancy. Similarly, when civil rights are highlighted in a particular framing, it determines vaccine refusal, while when trust in vaccines is increased, it leads to vaccine acceptance, and eventual uptake. Recent work in NLP concerning automatic recognition of framings targeted the study of political bias and polarization in social and news media (Baumer et al., 2015; Boystun et al., 2014; Card et al., 2015; Field et al., 2018; Roy & Goldwasser, 2020; Tsur et al., 2016), mainly addressing the recognition of 15 cross-cutting dimensions of political framing (Card et al., 2015), e.g., economic dimensions, fairness and equality or policy prescription and evaluation. Although recent Twitter content analysis (Rao et al., 2021) revealed that there is significance correlation between polarized attitudes towards vaccines and political dimensions, to our knowledge, no NLP methods have yet been developed to identify vaccine hesitancy framings, although vaccine hesitancy is often discussed in social/news media.

In this paper we present a Question/Answering (Q/A) solution for the identification of hesitancy framings, enabled by the questions introduced in (Rossen et al., 2019), as the Vaccine Confidence Repository (VCR). Q/A is an established NLP framework that consists of the automatic processing of the question language which enables the identification of its answer in large collections of documents (Strzalkowski & Harabagiu, 2006). When the questions are complex, (Harabagiu & Hickl, 2006; Scialom et al., 2019) the answer is automatically processed by summarizing multiple passages deemed relevant to the question. Given that large collection of tweets discussing either the COVID-19 or the HPV can be retrieved from the Twitter API, questions targeting the confidence in vaccines (available from the VCR) can be used in a Q/A system operating on the index of tweets to capture the framing of vaccine hesitancy.

The questions from the VCR were informed by the antivaccine content analysis of Kata (Kata, 2010, 2012). Kata’s analysis of the content of anti-vaccination websites was among the first to reveal six classes of content attributes related to vaccine hesitancy or resistance: (1) concerns about the safety and effectiveness of vaccines; (2) the consideration of alternative medicine; (3) the interaction of civil liberties with vaccination programs; (4) reference to conspiracy theories; (5) the influence of morality, religion and ideology on choice of vaccination and (6) the usage of misinformation and falsehoods. This pioneering work on categorizing vaccine hesitancy or resistance informed more than 500 studies. In the study reported in Rossen (2019), 3–4 questions were generated by researchers targeting each of the classes of content attributes reported by Kata (2010, 2012), producing the VCR. The VCR questions were used in that study as survey links available from Facebook pages and parenting forums. Nearly 300 Australian visitors answered the questions, enabling the discovery of hesitancy profiles. We were inspired by this work, which aimed to discover vaccine hesitancy profiles, and instead of soliciting answers from Twitter users, we decided to (a) retrieve the tweets that answer the same questions; and (b) infer from them the vaccine hesitancy framings. Moreover, we were interested to examine if this methodology of discovering hesitancy framings can inform the discovery of hesitancy profiles. Furthermore, we applied the methodology not only to the HPV vaccine, but also to another vaccine, namely the COVID-19 vaccine. For this purpose, we extended the VCR with questions addressing confidence in the newer COVID-19 vaccine, modifying the original VCR questions when necessary. Not only were we able to discern hesitancy framings from Twitter for both vaccines, but when analyzing these framings, we noticed that they often relied on misinformation. In this work, we considered misinformation as any misconception, references to conspiracy theories, or any flawed reasoning. This allowed us to address the second research question:

  • RQ2 What specific misinformation about the HPV and COVID-19 vaccines is propagated on Twitter?

Answering RQ2 entails discovering the specific misinformation that was unveiled by answering the questions from VCR, but also and importantly, the derivation of a taxonomy of misinformation that is used to frame confidence in the HPV or the COVID-19 vaccines. Misinformation has exploded on social media platforms such as Twitter (Cacciatore, 2021; Hou et al., 2021; Wawrzuta et al., 2021), but it is less known which misinformation themes are propagated and what concerns they address. Building a taxonomy of misinformation to uncover HPV and COVID-19 vaccine related misinformation is needed, against which inoculation interventions can be prepared. A growing literature suggests that vaccine acceptance depends to a large extent on public trust and related confidence in the safety and efficacy of vaccines (Larson et al., 2018; Latkin et al., 2021; Siegrist, 2021). To further understand the way in which trust in vaccines impacts vaccine confidence, we considered a third question in our study:

  • RQ3 What trust issues are associated with the HPV and COVID-19 vaccines in Twitter conversations?

The multidimensional concept of trust involves not only trust in the vaccine, but also trust in the healthcare practitioners who administer vaccines, the healthcare systems, public health authorities and governments who advocate for vaccination. Trust is increasingly important especially in the context of high uncertainty for vaccine decision-making such as with the recent coronavirus pandemic, rapidly changing emerging science on what is known about Coronavirus as example, changing vaccine recommendations, growing science illiteracy, and the growing number of vaccines being recommended. Under these conditions of uncertainty, the public depends increasingly on the expertise, judgements, competency, and transparency in sharing what is known about vaccines. The case of trust and vaccination carries with it a history of vaccine development and missteps, but social movements and reactions render trust in vaccines highly variable and locally specific (Larson et al., 2016). In the context of our study, we explored trust erosion, or trust increase in vaccination. The answer to RQ3 led to the development of two trust taxonomies for each vaccine, namely a taxonomy for trust building and a taxonomy for eroding trust. These taxonomies revealed a constellation of concerns that addressed several trust themes impacting confidence in vaccines. Interestingly, we found that many of the trust concerns that we discovered aligned with a multitude of definitions of trust, ranging from the individual level, e.g. trust involving the overall reluctance to obtain vaccination due to fear of side effects (Latkin et al., 2021), to societal and system levels of trust in science and public health authorities (Siegrist, 2021; Sutton et al., 2020). Explaining the differences in vaccine trust or in the tension between civil rights and vaccine mandates is made possible by considering the moral aspects of the vaccine hesitancy framings. Consequently, we also addressed the research question:

  • RQ4 What moral dimensions characterize the confidence in the HPV and COVID-19 vaccines on Twitter?

Because previous work in social psychology considered the Moral Foundations Theory (MFT) (Haidt & Graham, 2007; Haidt & Joseph, 2004) as a theoretical framework advocating that there is a small number of moral values emerging from evolutionary, social, and cultural origins, that human support. The moral values are referred as Moral Foundations (MFs) and include Care/Harm, Fairness/Cheating, Loyalty/Betrayal, Authority/Subversion and Purity/Degradation. The contrasting Care/Harm MFs are concerned with care for others, generosity, compassion, ability to feel pain of others, sensitivity to suffering of others, prohibiting actions that harm others, or the opposite respectively. The contrasting Fairness/Cheating MFs are concerned with fairness, justice, reciprocity, reciprocal altruism, rights, autonomy, equality, proportionality, prohibiting cheating, or the opposite respectively. The contrasting Loyalty/Betrayal MFs are concerns with group affiliation and solidarity, virtues of patriotism, self-sacrifice for the group, prohibiting betrayal of the group, or the opposite respectively. The contrasting Authority/Subversion MFs are concerned with fulfilling social roles, submitting to authority, respect for social hierarchies or tradition, leadership, prohibiting rebellion against authority, or the opposite respectively. Finally, the contrasting Purity/Degradation MFs are concerned with associations with the sacred or holy, religious notions which guide how to live, prohibiting violation of the sacred or disgust and contamination, respectively. We used these MFs when analyzing hesitancy framings and encoded each framing with the MFs they evoked. These moral encodings proved to be very informative in the discovery of the hesitancy profiles based on vaccine confidence, as revealed by Twitter discourse. Ultimately, in this study we were most interested to answer the research question:

  • RQ5 What hesitancy profiles can be discerned from Twitter for the HPV and COVID-19 vaccines?

Answering this question entails discovering how hundreds of thousands of Twitter users frame their confidence in vaccines and what stance they have towards these framings. This was possible because we had access to tweets discussing the HPV vaccine authored by 192,487 users and tweets discussing the COVID-19 vaccine authored by 2,268,358 users. However, we found that only 138,779 Twitter users framed their confidence in the HPV vaccine and only 665,798 users framed their confidence in the COVID-19 vaccines. Nevertheless, we hypothesized that Twitter authors evoking vaccine hesitancy framings in similar ways, with respect to their adoption or rejection of misinformation, their erosion or building trust in the vaccines, the vaccine literacy they have or lack, their stance on the respect of civil rights as well as their focus on certain moral foundations, must belong to the same hesitancy profile.

To our knowledge, this is the first study that aims at the automatic discovery of hesitancy profiles at large scale, especially by using vaccine hesitancy framings discovered automatically as answers to a set of questions about confidence in vaccines. We believe that this method uncovered hesitancy profiles that provide a more nuanced interpretation than those reported in Rossen (2019), which were based on answers provided by human participants that indicated their agreement with each VCR question. This is because hesitancy framings were further characterized by linking them to the misinformation or trust taxonomies while also considering the moral foundations and the vaccine literacy. This ontological characterization of the hesitancy framings allowed us to interpret the hesitancy profiles against the stance tweet authors had when evoking their vaccine confidence, instead of only relying on the quantified attitudes as answers to the VCR questions. We were pleasantly surprised by the hesitancy profiles revealed by the method presented in this paper, and the insightful interpretations that could be derived. We believe that these profiles identify where interventions can be delivered on the Twitter platforms, and most importantly, what vaccine hesitancy issues the interventions need to consider. Finally, as the method was successfully applied for two different vaccines, it highlights its portability for considering confidence across vaccines and deriving vaccine-specific hesitancy profiles.

Methods

Overview of methodology

The methodology that we employed for uncovering vaccine hesitancy profiles from the Twitter discourse addressing vaccine confidence is illustrated in Fig. 1, which shows the four main processing steps. In Step 1, we performed the identification of the vaccine hesitancy framings for either the HPV or COVID-19 vaccines as answers to questions asking about confidence in vaccines. Step 2 scaled-up the discovery of vaccine hesitancy by automatically discovering (a) all the tweets that evoked any of the hesitancy framings identified in Step 1 and (b) the stance of the tweet author towards the framing. In the Step 3 ontological commitments of the hesitancy framings identified in Step 1 are derived. First, the framings are categorized and then taxonomies of misinformation or trust are derived, while also recognizing in the hesitancy framings the implied Moral Foundations (MF) provided by the Moral Foundations Theory (MFT) (Haidt & Graham, 2007; Haidt & Joseph, 2004), health literacy and impact of civil rights on vaccine hesitancy. These ontological commitments along with all the stance of tweet authors towards the framings, identified in Step 2, inform the representation of each Twitter user framing their vaccine hesitancy. The user representations enabled the discovery of the hesitancy profiles in Step 4 of the method.

Fig. 1
figure 1

Overview of Methodology: Step 1: Identifying hesitancy framings using Question/Answering; Step 2: Scaling-up the discovery of tweets evoking any hesitancy framing; Step 3: Derivation of ontological commitments of the framings; Step 4: Discovery of vaccine hesitancy profiles

Step 1: Identification of vaccine hesitancy framings

The Q/A framework that was used for identifying how vaccine hesitancy is framed on Twitter introduces some novelties in the typical Q/A architectures that consist of a Question Processing Module (QMP) which allows any question to be transformed in queries used by a Tweet Processing Module (TMP) to identify text passages relevant to a question, informed by an index where documents have been processed; and an Answer Processing Module (APM) which extracts the answer. First, in the QPM, instead of processing directly the questions, we automatically generated five attitude-invoking questions from each question, as shown in Fig. 1 by relying on regular expressions. The rationale for generating attitude-evoking question stems from the belief, supported by prior research in opinion-based Q/A (Yu & Hatzivassiloglou, 2003) that tweets expressing attitudes are more likely identified by attitude-evoking question than by complex questions. The questions processed in the QPM consist of (a) a general question asking about confidence in the HPV or COVID-19 vaccine, i.e. Q1: “How confident are you in the safety of the HPV/COVID-19 vaccine? “ and (b) a set of 18 questions from the Vaccine Confidence Repository (VCR), introduced by Rossen et al. (2019). The entire list of VCR questions that were used is available in the supplemental material.

The VCR questions concern five major belief themes resulting from the analysis of the content expressed in anti-vaccination sites (Kata, 2010, 2012). The question belief themes are: (T1) vaccines are unsafe and unnatural; (T2) vaccines are ineffective; (T3) there is redundant vaccination; (T4) parents should be free to choose whether or not to vaccinate their children and (T5) vaccination is a conspiracy. For each belief theme, three or four questions were formulated. Table 1 lists one question from theme T2 (vaccines are ineffective) pertaining to the quest for vaccine confidence in the COVID-19 vaccine as well as a question from T2 used for inquiring about confidence in the HPV vaccine along with the attitude-invoking questions generated from them. There were 95 (19 × 5) attitude-evoking questions generated for each vaccine.

Table 1 Examples of automatically generated attitude-evoking questions

The TPM uses the attitude-evoking questions resulting from the QPM to find relevant tweets for them, based on a relevance model, implementing the BM25 vector ranking model (Beaulieu et al., 1997). The collection that was searched for relevant tweets was obtained by using the Twitter streaming API for each vaccine. For the HPV vaccine, we used the Twitter historical API with the following query “(human papillomavirus vaccination) OR (human papillomavirus vaccine) OR gardasil OR cervarix OR (hpv vaccine) OR (hpv vaccination) OR (cervical vaccine) OR (cervical vaccination) lang:en”, 1,833,380 total tweets, with 969,372 retweets and 864,008 original tweets from 625,354 total authors. These tweets were authored in the time frame initiating on January 1st, 2008, and end ending on May 1st, 2021 (~ 13 years). A large fraction of these tweets, which were duplicates likely due to spam bots, required filtering. Locality Sensitive Hashing (LSH) (Das et al., 2007) is a well-known method used to remove near-duplicate documents in large collections. We performed LSH, with term trigrams, 100 permutations, and a Jaccard threshold of 50%, on our original tweets collection to produce the collection \(C_{T}^{HPV}\) = 422,078 unique original tweets. The tweets from \(C_{T}^{HPV}\) were authored by \(A_{T}^{HPV}\) = 192,487 users. Using the same methodology, we used the query “(covid OR coronavirus) vaccine lang:en” for retrieving for the COVID-19 vaccines a collection of 19,021,575 total tweets, with 9,888,104 retweets and 9,133,471 original tweets from 4,382,289 total users obtained from the Twitter streaming API, which resulted after near-duplication removal into the collection \(C_{T}^{COVID - 19}\) = 5,865,046 unique original tweets, authored by \(A_{T}^{COVID - 19}\) = 2,268,358 users in the time span January 17th, 2021–July 21st, 2021 (~ 6 months). We used Lucene (lucene.apache.org) to index in \(I^{HPV}\) the tweets from \(C_{T}^{HPV}\) and in \(I^{COVID - 19}\) the tweets from \(C_{T}^{COVID - 19}\). For each of the attitude-evoking questions pertaining to the confidence in the HPV vaccine, a set of ranked tweets from \(C_{T}^{HPV}\) are retrieved. The ranking is produced by the scoring function from the BM25 relevance model, operating on the indexes \(I^{HPV}\). Similarly, for each of the attitude-evoking questions pertaining to the confidence in the COVID-19 vaccine, a set of ranked tweets from \(C_{T}^{COVID - 19}\) was retrieved, using or \(I^{COVID - 19}\).

In the APM, from the ranked list of tweets retrieved by the relevance model, we considered (1) only the top 300 ranked tweets, and (2) we merged all the top ranked tweets retrieved for all attitude-evoking questions, aiming to judge their relevance. The judgements were performed by two experts in question answering. A total of 1523 tweets for the HPV vaccine, and 2388 tweets for the COVID-19 vaccine were judged as being relevant by two researchers from the Human Language Technology Research Institute at University of Texas at Dallas. Cohen’s Kappa score was 0.81, which indicates strong agreement between annotators (0.8–0.9) (Zapf et al., 2016). As shown in Fig. 1, tweets that were judged to be truly relevant were categorized with respect to the attitude of the tweet authors towards the predication of the question. The same human judges that evaluated the relevance have also judged whether the tweet is (a) against; (b) doubts or (c) accepts the predication of the question. This was essential for the process of inferring the hesitancy framings, which is the goal of the APM. Hesitancy framings were inferred from tweets that shared the same attitude towards a question. We were inspired by work in query-based summarization (Baumel et al., 2016; Yulianti et al., 2018), in which an abstractive summary is created to highlight the most informative aspects of multiple documents that answer a query. In our case, the tweets had the role of documents, and considering that the tweets were already retrieved based on the processing of attitude-evoking questions, two computational linguistics experts selected the discourse units that are shared by a set of tweets, from which the framing was generated, informed by the pyramid method. The pyramid method (Nenkova & Passonneau, 2004) is an empirically grounded method for content selection that quantifies the centrality of viewpoints.

Table 2 illustrates examples of the attitudes assigned to tweets relevant to a question inquiring about the safety of COVID-19 vaccines as well as hesitancy framings inferred using the pyramid method. In generating the framing A, first, a discourse unit from Tweet 1 was selected, while a second discourse unit was selected from Tweet 3, to infer the framing, similarly to an abstractive summary. The linguists also have inspected all other discourse units of the tweets, deciding whether they are (a) central to the issues discussed across all tweets sharing the same attitude and (b) offering the same response to the inquiry question before selecting them to be used in the framing. When a discourse unit expressed new content, it was selected for the inference of the framing. In this way, Step 1 produced \(framings^{COVID - 19}\) and \(framings^{HPV}\), thus informing the answer to RQ1.

Table 2 Example of question used to inquire about the confidence in COVID-19 vaccines, answered by relevant tweets

Step 2: Scaling-up the discovery of hesitancy framings

While hesitancy framings were inferred from tweets that were relevant to the questions from the VCR, we expect that many other tweets from the collection \(C_{T}^{HPV}\) = 422,078 unique tweets may evoke any of the \(framings^{HPV}\), and similarly, many tweets from the collection \(C_{T}^{COVID - 19}\) = 5,865,046 unique tweets may evoke any of the \(framings^{COVID - 19}\). Therefore, in Step 2 we aimed to discover all tweets that evoke any of the framings identified in Step 1. For this purpose, each framing was used as a query, to retrieve tweets that are deemed relevant to the framing. The retrieved tweets were judged by three language experts as being relevant or irrelevant, with inter-judge agreement computed using the Kappa score, yielding a score of 0.82 for relevant tweets for the HPV vaccine framings and 0.84 for relevant tweets for the COVID-19 vaccine framings. Then we used a Natural Language Processing (NLP) method taking advantage of Deep Learning to recognize all tweets evoking any vaccine hesitancy framing. The NLP method, detailed in (Weinzierl & Harabagiu, 2021), uses a supervised learning framework. This entails that we have divided the judged tweets into a training set, a validation set, and a testing set. For the HPV vaccine, the training set contains 4128 tweets, out of which 3703 tweets evoked a framing. For the same vaccine, the validation set had 459 tweets, out of which 424 tweets evoked a framing, while the testing set had 1147 tweets, out of which 1024 evoked a framing. For the COVID-19 vaccines, the training set contains 7604 tweets, out of which 6684 tweets evoked a framing, the validation set had 845 tweets, out of which 748 tweets evoked a framing and the testing set had 2113 tweets, out of which 1838 evoked a framing.

The training set of tweets informed the construction of a of a fully connected graph for each framing \(F_{i}\) (FCG-\(F_{i}\)) which was bootstrapped through link-prediction to discover additional tweets evoking \(F_{i}\). The bootstrapping of FCG-\(F_{i}\) took advantage of the fact that a deep learning representation of the graph of FCG-\(F_{i}\) can be learned with knowledge embedding models, e.g. TransE (Bordes et al., 2013), TransD (Ji et al., 2015), TransMS (Yang et al., 2019) or TuckER (Balazevic et al., 2019). When knowledge embedding models learn to represent each node and each relation from a graph, they rely on a link scoring function, which we have used to predict if a tweet evokes a framing \(F_{i}\), and thus should be included in the graph FCG-\(F_{i}\). In addition, as we detail in (Weinzierl & Harabagiu, 2021), we designed a neural architecture that also considered the language from each tweet, not only its neural representation to make the prediction of a tweet that should be linked in each FCG-\(F_{i}\). This method allowed us to discover that there were 282,651 tweets in \(C_{T}^{HPV}\) that evoked some \(framings^{HPV}\) and 1,256,369 tweets in \(C_{T}^{COVID - 19}\) that evoked some \(framings^{COVID - 19}\).

In Step 2, as illustrated in Fig. 1, the scaling-up of the discovery of hesitancy framings also involves the automatic identification of the stance. To discover the stance of each tweet addressing any of the framings, we relied on a second NLP method using deep learning, detailed in (Weinzierl et al., 2021), by stacking several layers of lexico-syntactic, semantic, and emotion Graph Attention Networks (GATs) (Velickovic et al., 2018) to learn and refine all the possible interactions between these different linguistic phenomena, before classifying a tweet as (a) agreeing; (b) disagreeing or (c) having no stance towards the framing of interest. Stance discovery was made possible by the stance judgements produced by the three language experts which judged whether tweets from the \(C_{T}^{HPV}\) or \(C_{J}^{COVID - 19}\) collection were relevant to \(framings^{HPV}\) or \(framings^{COVID - 19}\). Whenever a tweet evoked a framing, it received a probabilistic distribution [\(p^{Accept}\), \(p^{Reject}\), \(p^{No Stance}\)] with respect to the framing. Using the method detailed in (Weinzierl et al., 2021), we discovered that there were 137,261 tweets that accepted some framing from \(framings^{HPV}\) and 54,946 tweets that rejected some framing from \(framings^{HPV}\). Similarly, we have found that there are 877,481 tweets that accepted some framing from \(framings^{COVID - 19}\) and 447,716 tweets that rejected some framing from \(framings^{COVID - 19}\).

Step 3: Deriving the ontological commitments of hesitancy framings

The hesitancy framings identified in Step 1 were first categorized by language experts as expressing (1) misinformation; (2) evoking issues of trust in vaccines; (3) pertaining to civil rights or (4) expressing morality issues. The decision of whether a framing contained misinformation was based on finding evidence on the Web, as retrieved by search engines, that the framing expressed known misconceptions, or conspiracy theories. In addition, whenever flawed reasoning was observed, the framing was categorized as misinformation. One researcher with expertise in Web search and an expert on Public Health independently judged the framings that contain misinformation. The two researchers adjudicated their differences and decided that out of the 64 framings inferred for the confidence in the HPV vaccines, 21 of them (33%) expressed misinformation. Similarly, out of the 113 framings inferred for the safety of COVID-19 vaccines, misinformation was present in 38 of them (34%). Table 3 illustrates some examples of misinformation.

Table 3 Examples of misinformation expressed in vaccine hesitancy framings

Examples H-M1 to H-M3, pertaining to the HPV vaccine, articulate misinformation about the effects of the vaccine on the immune system. Each framing articulates a different nuance of these effects. However, all three framings share the theme of the vaccine’s effects on the immune system, an observation that motivated us to derive a taxonomy of misinformation based on the themes and the concerns raised by the misinformation articulated throughout framings. Similarly, the framings C-M1 to C-M3 cover the theme that the COVID-19 vaccines are unnecessary. Given these observations, all framings expressing misinformation were organized in a misinformation taxonomy, by inspecting common themes and concerns and concerns they addressed. The Misinformation Taxonomy has three layers of abstraction: themes → concerns → framings → tweets. Given that in Step 2 we automatically recognized all tweets that evoke framings that express misinformation, the next layer of abstraction concerns grouping all framings that addressed the same concern, and finally all concerns that share the same theme—and the highest level of abstraction. This allowed us to answer RQ2.

While expressing misinformation is also seeding mistrust in the vaccines, to our surprise, many other framings addressed the issue of trust in the safety of vaccines, although not expressing any misinformation. As with judging misinformation, two researchers (one public health expert and a sociolinguist expert) made independent judgements about whether a framing is eroding or increasing trust in vaccine safety or does not convey any trust issue. The inter-judge agreement was computed using the Kappa score, yielding a score of 0.8 for trust expressed about the HPV vaccine and 0.82 for trust expressed about the COVID-19 vaccines. After adjudicating the judgements, we found that there were 21 hesitancy framings that increase trust and 20 hesitancy framings that erode this trust in the HPV vaccine. Similarly, 27 framings were found to increase the trust while 25 framings eroded trust in COVID-19 vaccines.

Table 4 illustrates examples of both forms of trust in the HPV or the COVID-19 vaccines. We derived two separate taxonomies for trust in the vaccines: a taxonomy of eroding trust and a taxonomy of building trust in the vaccines. Like the derivation of the Misinformation Taxonomy, common themes were first identified, which were further categorized by the concerns they raised, generating the three levels of abstraction: themes → concerns → framings → tweets.

Table 4 Examples of framings that erode or increase the trust in vaccine confidence

In Step 3 of our methodology, additional ontological commitments were produced for hesitancy framings that showcase vaccine literacy of its absence. For example, framings HT + 1 and HT + 2 illustrated in Table 4 were coded as showcasing vaccine literacy, whereas framings HT-1 and HT-2 display lack of vaccine literacy. In this study, we have relied on the definition of vaccine literacy reported in (Biasio et al., 2021), which considers vaccine literacy as the competence to find, understand and use health and vaccination information. We found that 17 of the hesitancy framings addressing the HPV vaccine showcased vaccine literacy, while 15 displayed a lack of literacy. Similarly, 27 of the framings used for the COVID-19 vaccine showcased vaccine literacy, while 21 displayed a lack of literacy.

A small number of framings identified in Step 1 addressed civil rights issues. From all the framings that were inferred for the HPV vaccine, 12 framings address civil right issues, while from all the framings inferred for the COVID-19 vaccine 28 framings addressed civil rights issues. Examples of framings that were categorized as expressing civil rights are listed in Table 5. The ontological commitments that were considered for the framings addressing civil rights encoded two possible situations: (1) framings implying that vaccination should be prioritized over civil rights (e.g., framing C.CR.3 from Table 5); and (2) framings implying that civil rights should always be prioritized (e.g., framing H.CR.3 from Table 5).

Table 5 Examples of framings that address civil rights

The final categorization concerned moral issues highlighted by the framings. Examples of such framings are provided in Table 5. Nevertheless, we considered to reveal all the Moral Foundation (MFs) implied in each framing. Previous work (Johnson & Goldwasser, 2018, 2019) has shown that there are correlations between stances towards framings and moral convictions that justify the stances. To further explore the correlation between MFs and hesitancy framings, a computational linguist and an expert in public health have independently assigned MFs to all the framings for the HPV and the COVID-19 vaccines. The inter-judge agreement was computed using the Kappa score, yielding a score of 0.89 for the HPV vaccine and 0.85 for the COVID-19 vaccines. These annotations enabled us to answer question RQ4.

Step 4: Discovering vaccine hesitancy profiles

The previous three steps of our methodology provided information that allowed us to generate a representation of the users involved in the discourse about COVID-19 vaccines and the users participating in the discourse about the HPV vaccine. A vectorial representation was produced for each Twitter user that evoked \(framings^{HPV}\) or \(framings^{COVID - 19}\) in any of their tweets. This The Vector User Representation (VUR) has entries for (a) the themes from the misinformation taxonomy; (b) the themes from the taxonomy for building trust or from the taxonomy for eroding trust; (c) a quantification of the vaccine literacy or lack of; (d) a quantification of the impact of civil rights on vaccination; and (e) a quantification of each of the MFs, as illustrated in Fig. 2. These values of the VUR are computed as: (1) values \(v_{Theme}\) quantifying the conceptualization of misinformation or trust taxonomy themes in each user’s tweets; (2) values \(v_{Literacy}^{ + / - }\) quantifying the vaccine literacy (or lack of) of the framings referred by users in their tweets; (3) values \(v_{CR}^{ + / - }\) quantifying a user’s preference of vaccination mandates over civil rights ( +) or the respect of civil rights, regardless of public health circumstances (−); and (4) values \(v_{MR}^{i}\) quantifying the support of each of the MFs. Central to the computation of these four types of values is the quantification of each \(Framing_{X}\) evoked by a user in its tweets, in a value \(v_{Framing}^{X}\). To compute \(v_{Framing}^{X}\) we note that any tweet \(t\) authored by the same user has: (1) a \(Framing_{X}\), that it evokes; (2) a stance that reflects if the tweet \(t\) is (a) accepting the \(Framing_{X}\), quantified by the probability \(p_{X}^{Accept} \left( t \right)\); or it rejects the \(Framing_{X}\), quantified by the probability \(p_{X}^{Reject} \left( t \right)\); or the tweet has no stance towards \(Framing_{X}\), quantified by the probability \(p_{X}^{No Stance} \left( t \right)\), where the distribution (\(p_{X}^{Accept} \left( t \right)\), \(p_{X}^{Reject} \left( t \right)\), \(p_{X}^{No Stance} \left( t \right)\)) is produced by the automatic stance detection from Step 2 of the methodology. We made the assignment of \(v_{Framing}^{X} \left( t \right) = {\text{max}}\left( {p_{X}^{Accept} \left( t \right), p_{X}^{Reject} \left( t \right)} \right)\), preferring a quantification provided by the dominant stance. Moreover, when the dominant stance was the rejection of the \(Framing_{X}\), we changed the polarity, i.e. \(v_{Framing}^{X} \left( t \right) = - v_{Framing}^{X} \left( t \right)\), such that positive values assigned to \(Framing_{X}\) are interpreted as acceptance of the framing, whereas negative values represent rejection of the framing. Furthermore, the quantification of each \(v_{Theme}\) is based on the observation that framings that are mapped into the misinformation or trust taxonomies can be ontologically characterized by some \(Theme_{Y}\) from one of these taxonomies illustrated in Tables 6, 7, 8 and 9. If a user generates only one tweet \(t\) that evokes \(Framing_{X}\), then \(v_{Theme}^{Y} = v_{Framing}^{X} \left( t \right)\). However, when a user generates multiple tweets \(t_{i}\) that evoke same \(Framing_{X}\), then \(v_{Theme}^{Y}\) is computed as the average of \(v_{Framing}^{X} \left( {t_{i} } \right)\). Moreover, if the same user generates tweets which refer to multiple framings \(Framing_{j}\) categorized under the \(Theme_{Y}\) of one of the taxonomies, the value \(v_{Theme}^{Y}\) becomes the sum of framing values, i.e. \(v_{Framing}^{j} \left( {t_{i} } \right)\). Because of this, for some themes the values in the user representation may be outside the interval [− 1, + 1]. The values \(v_{Literacy}^{ + }\) result from taking the average value of all \(v_{Framing}^{X} \left( t \right)\) that were annotated as exhibiting vaccine literacy, while \(v_{Literacy}^{ - }\) results from taking the average value of all \(v_{Framing}^{X} \left( t \right)\) that showcase lack of vaccine literacy, given every tweet \(t\) of a user. Finally, the evaluation of the values \(v_{MR}^{i}\), for each of the \(i = 1, \ldots ,9\) Moral Foundations (MF), is generated by taking the average value of all \(v_{Framing}^{X} \left( t \right)\), for each framing that was annotated with the \(MF_{i}\). When the VURs for all users were generated, hesitancy profiles were discerned by using the k-Means clustering algorithm (Lloyd, 1982), experimenting with \(K = 2, \ldots ,10\) possible clusters. The final number of clusters was determined by the Elbow method [Thorndike 1953]. We found \(K = 5\) to be optimal for both HPV and COVID-19 vaccine hesitancy profiles.

Fig. 2
figure 2

Vector User Representation for Discovery of Hesitancy Profiles

Table 6 A Taxonomy of Misinformation about the HPV vaccine
Table 7 A Taxonomy of Misinformation about the COVID-19 Vaccines
Table 8 The Taxonomy for building Trust in HPV vaccines and the Taxonomy for eroding Trust in the HPV vaccines
Table 9 The Taxonomy for building Trust in COVID-19 vaccines and the Taxonomy for eroding Trust in the COVID-19 vaccines

Results

Step 1 of our method produced 113 \(framings^{COVID - 19}\) and 64 \(framings^{HPV}\). They answer RQ1:

  • RQ1 How is confidence in the HPV and the COVID-19 vaccines framed in the Twitter discourse?

A more nuanced answer to RQ1 is provided by the categorization \(framings^{COVID - 19}\) and \(framings^{HPV}\) performed in Step 3 the methodology, yielding a more in-depth understanding of how confidence in the two vaccines is framed. Figure 3, which illustrates the distribution of the framing categories across all the themes covered by the VCR questions (Rossen et al. (2019). Surprisingly, for the HPV vaccine, misinformation framings were inferred across all question themes, except for question Theme 4, where civil rights dominated. We expected to find a lot of misinformation in the framings answering the questions from theme 5, but we were startled to find plenty of misinformation answering questions from themes 1, 2 and 3. Moreover, the framings answering the general theme also contained misinformation, indicating that misinformation is pervasive in the framing of confidence in the HPV vaccines. As shown in Fig. 3, misinformation (indicated in red shading) is also present in framings answering questions about confidence in the COVID-19 vaccines. We also noticed that there is a higher percentage of framings that erode trust (yellow shading) in the COVID-19 vaccines than in the HPV vaccine. Surprisingly, there is a substantial percentage of framings that build trust across both vaccines. Framings involving civil right issues were inferred as answers mostly to questions from Theme 4 about parents’ right to decide whether to vaccinate their children for both vaccines. Vaccine literacy (blue shading) seems to be present in most framings except Theme 5 (vaccine conspiracy theories) in the case of COVID-19.

Fig. 3
figure 3

Distribution of Hesitancy Framings Categories

The results of Step 2 of the method, aiming to scale up the discovery of framings on the entire Tweet collections discussing the vaccines, were evaluated using the following metrics: Precision (\(P\)), Recall (\(R\)) and \(F1\)-measure. \(P\) computes the number of tweets correctly identified to evoke any framing, out of all tweets that the system reported in in (Weinzierl & Harabagiu, 2021) automatically pinpointed. \(R\) measures how many tweets were identified by the system reported in in (Weinzierl & Harabagiu, 2021) to evoke a framing out all of the tweets that were judged to do so, and \(F1 = 2PR/\left( {P + R} \right)\). We measured \(P =\) 80.1%; \(R =\) 83.2%, and \(F1 =\) 81.6% when evaluating on the test collection for the HPV vaccine while \(P =\) 71.5%, \(R =\) 75.8%, and \(F1 =\) 73.6% when evaluated in the test collection for COVID-19 vaccines. In Step 2 we also recognized the stance of the tweet authors against the framing they evoke. The stance detection system we used, detailed in (Weinzierl et al., 2021), recognized the “Accept” stance with \(F1 =\) 86.5%; the “Reject” stance with \(F1 =\) 70.5% for the \(framings^{HPV}\), while for the \(framings^{COVID - 19}\) it recognized the “Accept” stance with \(F1 =\) 87.6% and the “Reject” stance with \(F1 =\) 71.5%. These results indicate that these two automatic systems performed quite well.

The results of Step 3 involve the creation of several taxonomies, including the Misinformation Taxonomy, illustrated in Tables 6 and 7; The Trust Taxonomies illustrated in Tables 8 and 9. Ten misinformation themes were discovered in the Misinformation Taxonomy for the HPV vaccine, illustrated in Table 6 while nine misinformation themes were discovered for COVID-19 vaccination, illustrated in Table 7. Asterix among concerns denotes that these concerns were common across vaccines. Although higher order themes were similar across vaccines (with HPV vaccination having one additional promiscuity theme), concerns across vaccines differed in number and content sharing only 21% of concerns. Of the 33 concerns, only 7 were shared across vaccines. This suggests that misinformation is tailored to the worries that are vaccine specific. The Misinformation Taxonomy derived for the HPV vaccine as well as the Misinformation Taxonomy derived for the COVID-19 vaccine answer the question:

  • RQ2 What specific misinformation about the HPV and COVID-19 vaccines is propagated on Twitter?

It is important to also compare the number of framings expressing misinformation in both taxonomies, as it represents the lowest level of abstraction, now shown in Tables 6 and 7. 21 framings for the HPV vaccine and 38 framings for the COVID-19 vaccine expressed misinformation.

Two different Trust taxonomies discovered for each vaccine provided the answers to:

  • RQ3 What trust issues are associated with the HPV and COVID-19 vaccines in Twitter conversations?

Table 8 illustrates these taxonomies for the HPV vaccine, whereas Table 9 illustrates both taxonomies for the COVID-19 vaccine. Trust in the HPV vaccine is characterized by 6 themes expressing 21 different concerns, while the taxonomy that encodes erosion of trust in the HPV vaccine used 18 concerns distributed across 8 different themes. When comparing the taxonomy encoding knowledge that builds trust in the HPV vaccine with the same taxonomy for the COVID-19 vaccine, we observed only 4 common concerns, which are marked with * in Tables 8 and 9. However, the same comparison on the trust eroding taxonomies leads to the observation that here is only one shared concern, marked with * in Tables 9 and 10.

Table 10 Twitter hesitancy profiles discovered for the HPV vaccine

In Step 3 we also produced annotations of the implied Moral Foundations (MFs), answering the research question:

  • RQ4 What moral dimensions characterize the confidence in the HPV and COVID-19 vaccines on Twitter?

Overall, there were 111 moral foundations (MFs) annotated in the 64 confidence framings for the HPV vaccine and 230 MFs annotated in the 113 confidence framings used for the COVID-19 vaccines. Interestingly, 1 framing was coded with 4 MFs, 32 with 3 MFs, 95 with 2 MFs and 48 framings with one MF. The predominant MF in the confidence framings for the HPV vaccine was a tie between Authority and Subversion, while the predominant MF in the framings for the COVID-19 vaccine was Harm. This indicates that the most common approach by which each vaccine is morally framed in public discourse shifts depending on the vaccine. The framings that expressed misinformation mostly implied the MF Subversion. The framings that conveyed trust erosion in vaccines predominantly conveyed the MFs Subversion and Harm, while the framings that built trust in the vaccine implied the MFs Care and Authority. Framings that involved civil rights issues were predominantly implying the MF Fairness, while framings that involved literacy issues were predominately implying the MF of Care.

The results of Step 4 are the vaccine hesitancy profiles discovered for each of the two vaccines. Table 10 lists the hesitancy profiles and their characteristics for the HPV vaccine, while Table 11 lists those for the COVID-19 vaccines. The hesitancy profiles answer the research question:

  • RQ5 What hesitancy profiles can be discerned from Twitter for the HPV and COVID-19 vaccines?

Table 11 Twitter Hesitancy Profiles for the COVID-19 Vaccine

Five hesitancy profiles were derived for the HPV vaccine, interpreted by the prototypical vector user representations, available in Table 10, generated as the centroid of each cluster corresponding to a profile. Similarly, five hesitancy profiles were recognized for the COVID-19 vaccine, with interpretations made possible by their corresponding prototypical vector user representations, available in Table 11.

Discussion

Qualitatively, vaccine confidence for both HPV and COVID19 was expressed in framings covering a range of themes from general vaccine safety issues to individual level concerns (unsafe, adverse effects, ingredients, overwhelming the immune system), to vaccine development, testing and transparency concerns, to questioning vaccine efficacy, whether vaccinating is necessary, and alternatives. We therefore uncover not only vaccine confidence themes on social media, which may have been recognized in prior literature (Dunn et al., 2017; Islam et al., 2020; Shapiro et al., 2017; Wawrzuta et al., 2021), but we also uncover users’ stance toward those vaccine confidence themes across millions of users at scale.

Quantitatively, we inferred a larger number of hesitancy framings for the COVID-19 vaccine (113 framings) than for the HPV vaccine (64 framings), perhaps because we operated on a larger number of tweets in the \(C_{T}^{COVID - 19}\) collection (5,865,046 unique tweets), which is an order of magnitude larger than the number of tweets in the collection \(C_{T}^{HPV}\) collection (422,078 unique tweets). But we also believe that the quantitative differences may be explained by the question/answering framework we designed to find the vaccine confidence framings, illustrated in Fig. 1. We noticed that the experts judged a larger number of tweets relevant to the COVID-19 vaccine questions than they did for the HPV vaccine questions. This provides a second, and perhaps better explanation for why we obtained a different number of framings between the two vaccines, highlighting the finding that in Twitter discourse, people have a greater number of vaccine confidence issues for the COVID-19 vaccine than for the HPV vaccine.

The vaccine hesitancy framings that we collected in \(framings^{COVID - 19}\) and \(framings^{HPV}\) allowed us to discover not only the fact that the distribution of framing categories varies between the vaccines, but also the fact in Twitter discourse about vaccine confidence is impacted not only by misinformation, but also by the erosion of trust in vaccines. Some of the questions from VCR invite confidence framings that rely on misinformation. For example, Q11: “Homeopathic medicines are an effective alternative to conventional vaccines” produces as answers only misinformation framings, as shown in Fig. 3. However, other questions, such as Q9: “The more people who get vaccinated the greater the protection against disease” are answered by framings that either build or erode trust in vaccines clearly showcasing vaccine literacy problems. Not surprisingly, framings answering the question Q13: “It is important that people are able to make their own decisions about vaccination” are dominated by civil rights issues for both vaccines. Another interesting observation derived from the analysis of Fig. 3 is that framings relying on misinformation were inferred as answers to 11 questions when considering the HPV vaccine, while for the COVID-19 vaccine, misinformation was present in the framings answering 14 questions. Hence misinformation plays an important role in answering more VCR questions for the COVID-19 vaccine.

While much interest has been shown in identifying misinformation on social media platforms, relatively few studies have considered addressing the problem of identifying the specific misinformation that is propagated on Twitter or other social media platforms (Luo et al., 2019; Margolis et al., 2019; Massey et al., 2020; Reiter et al., 2018). Typically, known misinformation can be identified, through methods such as (Weinzierl & Harabagiu, 2021) by relying on Wikipedia web pages or similar sources that collect debunked specific misinformation. However, our method of identifying vaccine hesitancy framings produced an interesting byproduct, namely the discovery of framings that contained specific misinformation, which we further organized in misinformation taxonomies specific to the HPV or the COVID-19 vaccines. It is also important to note that the two misinformation taxonomies comprise typologies that discovered a greater number of misinformation themes than the typology of misinformation reported in Jamison et al (2020), which was adapted from Kata’s ontology. Interestingly, although most of misinformation concerns from the misinformation taxonomies that we derived are vaccine-specific, and so are the framings. This accounts for the generative power of Misinformers and the tailoring of the misinformation to vaccines. But vaccine confidence is not impacted only by misinformation, as we have seen in Fig. 3. From the \(framings^{HPV}\), 21.8% increased trust in the safety of vaccines (n = 21/96) while 20.8% of frames eroded trust in the safety of the HPV vaccine (N = 20/96). From the \(framings^{COVID - 19}\), 24.1% increased trust in the safety of the vaccines (n = 27/112) while 22.3% eroded trust in the safety of vaccines (n = 25/112). This motivated our decision to derive two different trust taxonomies for each vaccine. To our knowledge, this is the first time when trust in vaccines has considered either the erosion of trust or the increase in trust. Moreover, the empirically derived taxonomies of trust in the COVID-19 and HPV vaccines reveal a number of findings. First, Twitter discourse on vaccination expressed vaccine attitudes in relation to trust to a substantial degree and in approximate equal proportions in relation to building or eroding trust across vaccines. Second, trust in vaccines was expressed across both HPV and COVID19 vaccines at the individual level (e.g., confidence in vaccine over natural immunity), the family level (e.g., vaccination protects families), to the system level (e.g., vaccine prevents cancer regardless of pharma profit, government provides and makes transparent vaccine information). Thus, a social-ecological framework contextualized trust at multiple levels on twitter (Latkin et al., 2021).

Qualitatively, some of the trust themes were also recognized in recent research on COVID-19 vaccination and trust by Latkin et al (2021). Trust in the vaccine literature has commonly been measured with single items (Larson et al., 2018), has often been operationalized by measuring credibility source (Larson et al., 2018; Sutton et al., 2020) but has rarely been operationalized or measured as perceived motivations or erosion of trust or as moral values that align. Only recently has one trust scale been developed that measures parental confidence in source credibility and trust in various sources but also measures the norms that vaccination is important for children, a protective measure that all teenagers should get vaccinated (Frew et al., 2019). It is important to remember that the hesitancy framings involving trust are unsolicited as opposed to survey results. Twitter framings that erode trust focus on instilling fear or casting doubt, de-motivating vaccination at the individual level, but also raise attention on the failure and incompetence of institutional systems and experts. This has been highlighted in Recreancy and social capital theory, which explains loss of trust and credibility in contentious public health disasters by emphasizing institutional failures of responsibility (Freudenburg, 1993).

It is important to note our usage of the implied Moral Foundations (MFs) is also new in its application to understanding vaccine confidence and hesitancy framings on social media. More importantly, we have associated each implied MFs with a stance. The stance of each author of a tweet towards the framing (s)he evokes is transferred to the MFs implied by the same framing. Based on this observation, clear moral attitudes emerge within the hesitancy profiles across both the HPV and COVID-19 vaccines. Promoters of both the HPV and COVID-19 vaccines tend to strongly accept framings, which espouse Care, Authority, Loyalty, and Fairness, with rejection of Subversion. Alternatively, Misinformers of both the HPV and COVID-19 vaccines tend to adopt framings in stark moral contrast than Promoters, whose moral foundations of Betrayal, Harm, and Subversion oppose those moral stances of Loyalty, Care, and Authority respectively. Misinformers tend to have much stronger moral stances than Promoters, which indicates morality plays a key role in the motivation of those spreading misinformation at scale. A similar pattern is found when comparing the Skeptic profiles, where moral foundations of Subversion, Betrayal, and Harm are adopted towards both the HPV and COVID-19 vaccines. In contrast, we find slightly differing moral profiles across vaccines when comparing the Trusters of the HPV Vaccine to the Ambivalent of the COVID-19 Vaccine. The two groups share in their adoption of Authority and Fairness moral foundations, but the HPV Vaccine Trusters adopt Care, while the COVID-19 Vaccine Ambivalent adopt Harm. The Debunkers share in rejection of Subversion but differ across other moral foundations. HPV Vaccine Debunkers equally tend to reject Degradation, Subversion, and Authority, while COVID-19 Vaccine Debunkers focus much more on rejecting Subversion, with a secondary focus on Harm and Fairness.

Vaccine hesitancy profiles as person centered audience segmentation for targeted campaigns

HPV and COVID19 vaccine hesitancy profiles highlight a constellation of accept and reject stances across various vaccine hesitancy framings, which will inform future messaging campaigns. The potential range of messaging targets spans inoculating against specific misinformation to tapping into moral frameworks to importantly, ways to bolster trust or debunk messaging that erodes trust. Vaccine stance identified from social media importantly, should be distinguished in its value for reflecting unsolicited attitudes toward vaccines in contrast to survey research (Hornik et al., 2020). Although five profiles were discerned for each vaccine, there are substantial differences both quantitatively in the relative size of profiles, and qualitatively, how profiles distinguish users.

Interest to public health interventionists involves strategically targeting profile members whose stance suggest their vaccine attitudes are amenable to change, or alternatively, whose vaccine attitudes may already be positive but need strengthening (Roskos-Ewoldsen et al., 2002). With this goal in mind, Promoters (21%) and Debunkers (32%) who make up more than half of the HPV vaccine users, express to a large degree support for vaccination in their high motivation to vaccinate, their trust, being vaccine literate, supportive of mandating vaccination, and appeal to moral frames of care, authority and loyalty in the case of Promoters. These users may respond to authoritative appeals, mandating vaccination, and trust appeals that emphasize the importance of public health. For COVID-19 vaccine profiles on the other hand, whose stance may also already be positive but in need of strengthening, Promoters make up a much smaller subgroup (9%) while Debunkers make up 35%. Bolstering positive vaccine attitudes may be achieved with trust messaging that emphasizes the importance of vaccinating for public health i.e., the collective, and motivating vaccination by emphasizing moral values of care (preventing harm), authority, loyalty, and fairness for Promoters. By contrast, for Debunkers—who make up a substantial subgroup (almost 200,000 users), morality messaging should be avoided with this subgroup who reject moral framings. An emphasis on moral messages with this subgroup may boomerang (Fishbein et al., 2002).

Of greater interest are profiles whose framing-stance scores suggest these profile users are ambivalent, on-the-fence, or skeptical whose members are more likely to be unvaccinated and hold vaccine attitudes amenable to change. Among HPV vaccine profiles, Trusters comprised the second largest subgroup (28%) after Debunkers and make up a substantial group in size relatively speaking (nearly 20,000). The pool of HPV vaccine users overall was smaller than that of COVID19 vaccine users – representing a vastly larger population of Twitter users. Trusters are accepting of a range of vaccine themes that build trust (e.g., motivated to vaccinate, trust in the role of public health, doctors, science, and the effects of the vaccine). These findings on frame-stance score suggest that Trusters are likely to respond favorably to messages that build trust. These users are literate and their motivation to vaccinate can be tapped possibly through moral value appeals of care, authority and fairness. Avoiding messaging that emphasizes vaccine mandates is warranted for this subgroup given Trusters’ weak yet existing stance on civil rights above all irrespective of public health circumstances. Findings from our study therefore inform not only messaging that users may respond to but also messaging that should be avoided in order to prevent potential iatrogenic message effects (Fishbein et al., 2002; Moos, 2005).

In comparison to Trusters among HPV vaccine profiles, the Ambivalent, who make up the largest subgroup (48%) of COVID-19 vaccine hesitancy profiles with 267,087 users, also reveal a weak yet existing frame-stance score across trust themes in COVID-19 vaccination. These users truly are on the fence, both accepting and rejecting trust framings and whose motivation to vaccinate needs to be strengthened. This subgroup may benefit from significantly bolstering trust and motivation coupled with inoculating against misinformation and utilizing moral appeals of authority and preventing harm. Both Trusters (HPV) and the Ambivalent (COVID19) are ripe for receiving inoculation messages against misinformation across vaccine safety, effectiveness, the testing process, transparency, ingredients, and adverse reactions. Similar misinformation domains have been recognized in the literature across HPV and COVID-19 yet never in this social media vaccine frame-stance context at this scale (Calo et al., 2021; Head et al., 2018; Jamison et al., 2020; Kim et al., 2020; Loomba et al., 2021; Massey et al., 2020; Sundstrom et al., 2021; Van der Linden et al., 2016; Zimet et al., 2013).

A smaller subgroup of vaccine profiles, the Skeptics (13% for HPV; 7% for COVID19), exhibit frame-stance scores that are accepting of most misinformation and erosion of trust framings. These users are illiterate, strongly de-motivated to vaccinate, and with whom moral values of subversion, harm and betrayal resonate as well as civil right above all irrespective of public health circumstances. Reaching these users presents more challenges. These subgroups’ vaccine stances are not as extreme as Misinformers who widely distrust vaccination and actively propagate misinformation. The Skeptics exhibit weak frame-stance scores on many fronts suggesting these could be targeted to shift vaccine attitudes.

Stengths and limitations

Strengths of the methodology for discovering vaccine hesitancy framings presented in this paper include (1) the Q/A framework that was used as a starting point to identify framings; (2) the discovery at scale of tweets that evoke the hesitancy framings and (3) the identification of the stance of the tweet authors towards vaccine confidence framings. These sophisticated natural language processing methods have the advantage of operating at the pragmatic level of language processing, in contrast with the topic processing methods, which operate at the lexical level. The framings are insightful because they enabled us to identify misinformation, trust in vaccines, civil rights and morality issues that were discussed. The framings also revealed the vaccine literacy of Twitter users. Moreover, by discovering framings at scale, this method considers the viewpoint of 138,779 Twitter users regarding their confidence in the HPV vaccine (i.e., users who specifically expressed their stance toward HPV and COVID-19 vaccine confidence) and of 665,798 Twitter users regarding their confidence in the COVID-19 vaccines. By identifying the stance of tweets towards each framing we have used a more advanced form of affect processing of the language in Tweets than the one afforded by sentiment analysis. This is because sentiment analysis operates at the lexical level, considering the positive, negative, or neutral orientation of words to infer the sentiment of a tweet. In contrast, stance identification considers the interaction of lexical, syntactic and semantic features of a tweet’s language with emotions to derive the attitude of a tweet towards a specific framing and results in the identification of subgroups of users, which is informative to a greater degree for public health campaign design. For example, a tweet may have positive sentiment, but its stance may be rejecting a given framing.

The discovery of the hesitancy profiles, which is unique to the method presented in this paper, is another notable strength, made possible by (a) the recognition of \(framings^{HPV}\) and \(framings^{COVID - 19}\) at scale; (b) the identification of the stance of tweet authors evoking any of these framings and (c) the representation of the users authoring these tweets which takes into account the ontological commitments of the hesitancy framings. The method described in the paper has also some important limitations. First, we do not know if \(framings^{HPV}\) and \(framings^{COVID - 19}\) represent all the framings of vaccine confidence in the two types of vaccines. To address this issue, we would need to consider additional questions that address vaccine confidence and find if new framings were inferred for the new questions. Second, the Misinformation and Trust Taxonomies have the same limitation of completeness, which may impact the completeness of the hesitancy profiles. A third limitation of the method for discovering hesitancy profiles derives from our exclusive focus on vaccine confidence, while hesitancy should also account for vaccine convenience and complacency.

While the interpretation of the hesitancy profiles is insightful, future work will need to test and validate these profiles for both user vaccination status and for profile member responsiveness to strategic messaging. Six vaccine relevant value frameworks characterize these vaccine profiles for HPV and COVID-19 vaccination—two voluntary and underutilized vaccines shown to be safe and effective. Derived from millions of tweets and unsolicited vaccine attitudes expressed on Twitter—these factors contribute as first steps to identify and characterize the complexity in vaccine hesitancy profiles at such scale. The implication with these profiles indicates promise to reach vastly larger number of unvaccinated with more precise and strategically targeted messaging. The misinformation and trust taxonomies that informed the vaccine profiles shed light on nuanced differences and similarities among subgroups and between vaccines in regard to stance on moral foundations, trust, and misinformation dimensions that contribute to vaccine attitudes and importantly informs which vaccine attitudes may be accessible and amenable to change for each subgroup. Prior research has demonstrated the importance of only some of the hesitancy framings categories in characterizing hesitancy profiles. Loomba et al (2021) as one example, were the first to quantify the impact of misinformation exposure on vaccine hesitancy.

Conclusion

In this paper we have presented a method capable of identifying how confidence in the HPV vaccine is framed in a collection of 422,078 unique tweets and how confidence in the COVID-19 vaccines is framed in a collection of 5,865,046 unique tweets. The categorization of these hesitancy framings enabled the derivation of misinformation and trust taxonomies as well as the analysis of vaccine literacy, the implied moral foundations and the tension between vaccine mandates and civil rights, which allowed us to discover several profiles of hesitancy for each vaccine. The discovery of these profiles was made possible by (a) the automatic recognition of all tweets from that evoke any of the hesitancy framings; and (b) the automatic identification of the stance the tweet authors have towards the evoked framings.

This novel methodology sheds light on what has been known but rarely modelled in this detailed, in-depth manner, namely the heterogeneity that makes up vaccine attitude profiles. Furthermore, this novel modeling approach captures user stance toward vaccine framings that uncovers the attitude orientation and informs messaging that can tap into which vaccine attitudes may be accessible (Roskos-Ewoldsen et al., 2002). These results begin to disentangle the complex attitudes shaping vaccine attitudes. Furthermore, such a person or user-centered approach to characterizing vaccine hesitancy profiles recognizes the importance of uncovering subgroups with similar vaccine hesitancy stance across multiple ontological dimensions. The patterns of vaccine hesitancy framings across multiple value frameworks inform public health messaging approaches to effectively reach profiles with promise to shift or bolster vaccine attitudes.