Deep learning for topical trend discovery in online discourse about Pre-Exposure Prophylaxis (PrEP)

Edinger, Andy; Valdez, Danny; Walsh-Buhi, Eric; Bollen, Johan

doi:10.1007/s10461-022-03779-2

Deep learning for topical trend discovery in online discourse about Pre-Exposure Prophylaxis (PrEP)

Original Paper
Published: 02 August 2022

Volume 27, pages 443–453, (2023)
Cite this article

Download PDF

AIDS and Behavior Aims and scope Submit manuscript

Deep learning for topical trend discovery in online discourse about Pre-Exposure Prophylaxis (PrEP)

Download PDF

Andy Edinger¹,
Danny Valdez ORCID: orcid.org/0000-0002-2355-9881²,
Eric Walsh-Buhi¹ &
…
Johan Bollen^2,3

1324 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Pre-Exposure Prophylaxis (PrEP) interventions are increasingly prevalent on social media. These data can be mined for insights about PrEP that may not be as apparent in surveys including personal musings about PrEP and barriers/facilitators to PrEP uptake. This study explores online discourse about PrEP using an interdisciplinary public health and computational informatics approach. We collected (N = 4,020) tweets using Twitter’s Application Programming Interface (API). These data underwent a three-step neural network/deep learning process to identify clusters within these tweets and relative similarity/dissimilarity between clusters. We identified 25 distinct clusters from our original collection of tweets. These clusters represent general information about PrEP, how PrEP is communicated among diverse groups, and potential pockets of misinformation and disinformation regarding PrEP. Specific clusters of interest include discussions of medication side effects, social perception of PrEP usage, and concerns with costs and barriers to access of PrEP interventions. Our approach revealed diverse ways PrEP is contextualized online. Importantly this information can be leveraged to identify points of possible intervention for disinformation and misinformation about PrEP.

Advances in Social Media Research: Past, Present and Future

Article Open access 06 November 2017

Mental Health Analysis in Social Media Posts: A Survey

Article 03 January 2023

A survey of sentiment analysis in social media

Article 04 July 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Pre Exposure Prophylaxis (PrEP) is a daily oral or injectable medication for groups at highest risk of contracting the Human Immunodeficiency Virus (HIV), including men who have sex with men (MSM), transgender women, and people who inject drugs [1]. When taken as prescribed, PrEP is shown to reduce HIV-1 infection by 75% or more when coupled with safe-sex and/or clean needle sharing practices [2, 3]. To date, about 1 million adults globally have begun PrEP uptake with representative adoption rates across gender, race, and ethnicities [4].

Prior to PrEP’s advent, HIV awareness, prevention, and outreach materialized as public health education campaigns on HIV mitigation strategies [2]. Though the success of these programs varied, PrEP’s Food and Drug Administration (FDA) approval during the social media era also introduced novel mediums to rapidly disseminate information about HIV and HIV prophylaxis as prevention [5]. Indeed, digital and e-health (broad categories of multidisciplinary public health science at the intersection of technology and healthcare) have played a pivotal role in creating online and mobile knowledge and awareness campaigns to spread information about PrEP and its benefits rapidly [1, 5,6,7]. Much of these campaigns are naturally tailored to appeal groups at higher risk of HIV exposure including younger audiences (ages18-29) and racially diverse MSM [5, 8]. The success of social media campaigns led to an 880% increase in PrEP adoption among key groups since 2012 [9].

A natural consequence of online, digital health, e-health, and mHealth interventions and marketing campaigns is a paper trail of online discourse among social media users sharing and/or discussing PrEP in diverse contexts. This allows, beyond obvious analyses of intervention effectiveness online, the opportunity to mine these data for deeper, niche discussions about PrEP that may not be apparent in surveys or data derived from interventions themselves, including potential misinformation about PrEP usage and safety, ongoing lawsuits against Gilead Pharmaceuticals, the manufacturer of the two FDA-approved PrEP medications (Truvada® and Descovy®) among other potential determinants and barriers to PrEP uptake [10].

Social Media Mining and Public health

Beyond PrEP, social media’s inescapable role in the public lexicon has shifted approaches to studying health behavior. Over two-thirds of the US population use at least one social media platform daily, resulting in billions of unique data points that are spontaneous, open-source, diachronic, and open-ended [11]. A multitude of studies have conclusively demonstrated that markers of health behavior can be meaningfully extracted from social media data collected en-masse. This includes extracting mental health markers, such as cognitive distortions [12], subjective well-being indicators from individual social media timelines [13], constructing ego-networks from friendship lists [14], and even simply identifying common topics or themes within millions of tweets (i.e., posts on Twitter), or postings on other social media platforms [15].

To bridge the fields of computational informatics and public health, specific calls for interdisciplinary collaborations between these fields have been proposed. Valdez et al., (2021) highlighted several strategies for bridging computational informatics and public health with Natural Language Processing (NLP) methods. Valdez, Patterson, and Prochnow (2021) have also called for the harmonious application of public health and social network theory to lend context and/or qualitative prediction to social media studies. Collectively, this body of work argues that computational informatics methods coupled with public health frameworks can yield rich and nuanced findings about a given event– including tracking online discourse amid medical innovations and ascertaining belief systems for them.

Social media mining and PrEP: an interdisciplinary analysis

As an opportunity to study information dissemination and synthesis from a joint public health/computer science perspective, PrEP’s evolution from a medical novelty into life-saving necessity and LGBTQIA + cultural phenomena may be particularly impactful. As a semi-novel yet revolutionary medication poised to continue reshaping HIV mitigation, insights from such an interdisciplinary analysis can identify salient discussion points about PrEP, gaps in PrEP knowledge, and points of intervention from a policy perspective. Therefore, the purpose of this study is to explore online discourse about PrEP using an interdisciplinary Public health and Computational informatics approach. Our study is guided by two research questions:

1.
Can we meaningfully consolidate PrEP related tweets into emerging themes or ideas?
2.
How can interdisciplinary frameworks add additional nuance to online conversations about PrEP and other Public health topics more broadly?

Insights from this study will contribute to reshaping our approach to mining social media data for medical novelties. By leveraging tools and strategies from multiple fields– in this case Public health and Computational informatics– we will ultimately glean deeper understanding of the how medical novelties of meaningfully communicated in online spaces, both in positive and negative contexts.

Methods

Data

Data germane to this study were collected from tweets posted between June 1, 2018 and May 31, 2021. All data were obtained via a data repository that continuously queries Twitter’s Application Programming Interface (API). Twitter’s API allows developers to query and archive an estimated 1% of total daily tweet volume for a given search term. As these tweets are only available in real-time, this 1% sample constitutes the maximum data available for analysis. However, this rate of collection remains the standard for data derived from Twitter. For our study, specifically, we collected a series of tweets containing key words relating to Pre-Exposure Prophylaxis (PrEP), PrEP usage, and associated PrEP medications: Truvada, Descovy, #PrEP, “pre-exposure prophylaxis”, #truvada, #descovy, #truvadaprep, #descovyprep, #truvadaforprep, #descovyforprep. We filtered our data to remove duplicate and non-English tweets. Our total collection of tweets (i.e., a corpus) was a random sample of N = 4,020 tweets, deemed to be representative of PrEP-related discourse yet analyzable by our domain experts. All data were saved into a single repository where they were scrubbed of identifying information. Our use of these data conformed to the Institutional Review Board (IRB) standards for data security and privacy for secondary data analysis.

Analyses

We leveraged three broad classes of computational informatics methods and algorithms to analyze our data: [1] The Sentence Bi-Directional Encoder Representations from Transformers (S-BERT); [2] Principal Component Analysis (PCA) with Uniform Manifold Approximation and Projection (UMAP); and [3] K-means Clustering. These methods have been extensively used to analyze social media data (see Karisani & Karisani, (2021)for an example of S-BERT and social media mining). These tools have also been applied in public health contexts [19].

Gauging tweet similarity with Sentence Bidirectional Encoder Representations from Transformers (S-BERT). S-BERT is an extension of the state-of-the-art Bidirectional Encoder Representations from Transformers (BERT). The BERT approach uses neural networks to detect and map patterns in large-scale text data [20, 21]. BERT is trained with large-scale text data from which it learns numerical representations of text semantics by analyzing matching sequences of words [22]. The resulting representations then allow quantitative comparisons of the similarity of two texts.

Likewise, as shown in Fig. 1, S-BERT generates a numerical vector representation of each tweet (taking as a “sentence”) that represents its semantics. This vector can be numerically compared to the similarly generated S-BERT vectors of other tweets using standard distance or similarity metrics, e.g., cosine-similarity. The latter is commonly used to gauge the degree of alignment between two vectors: which varies from zero (representing orthogonality (dissimilarity)) to one (representing collinearity (similarity)).

Provided S-BERT vectors represent the semantics of 2 respective tweets, we can thus determine the degree of semantic similarity of these tweets by calculating the cosine similarity of their respective S-BERT vectors. For example, a tweet “I am concerned about the side-effects of my PrEP treatment” would be translated to a specific 384 × 1 vector representing its content (384 × 1 is the default dimensionality for pre-trained S-BERT), while another tweet “I don’t think my PrEP pills are safe because of the effects it has on my mood” would be translated to another 384 × 1 vector. The cosine similarity between the two respective vectors is 0.624, indicating they are moderately well-aligned, and thus similar in meaning (both describe concerns about PrEP treatment), regardless of whether they use or do not use the exact same wording. On the other hand, the cosine similarity between “I am concerned about the side-effects of my PrEP treatment” and “I can’t afford the costs of my monthly PrEP treatment” would be 0.502 (the 2 tweets describe different concerns about PreP treatment).

Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP). The S-BERT vectors that are retrieved for each tweet are highly dimensional (D = 384) numerical indicators of the semantics of their content, and thus need to be projected to a lower-dimensional space (D = 2) for visualization. PCA and UMAP are common techniques for dimensionality reduction that were employed to facilitate analysis of sentence embeddings [23,24,25]. PCA extracts the principal components, a set of variables which successively capture a greater degree of variance between data points. The original data can be projected onto the most significant components, thereby optimally retaining the most significant variation of the original data in new, lower-dimensional space, assigning to each data point a new coordinate in this reduced space. Similarly, UMAP reduces data to a 2-dimensional visualization that preserves the similarities between each data point and its nearest neighbors. The operation of UMAP can be modified according to two parameters, namely the number of neighbors of a data point it takes into consideration and how tightly clustered neighboring data points are visualized in the resulting graph. By varying these parameters, one may control how much local versus global structure is preserved in the final projection. Here we apply a PCA to the 384-dimensional S-BERT vectors retrieved for each PrEP tweet such that they can be positioned in a lower-dimensional semantic space spanned by the respective PCA components that explain the greatest amount of variance in the original data, followed by a UMAP procedure that positions each tweet in a two-dimensional visualization.

K-Means Clustering. We divide the tweets in the UMAP visualization into a set of visually distinct clusters using k-means clustering. The k-means clustering algorithm [26] partitions a dataset into k number of highly dense sets of data points by adjusting a set of cluster centers such that the assigned clusters minimize the distance between the cluster center and the data points that are assigned to the cluster. K-means clustering requires the value k to be specified, which in the case of two-dimensional data may be determined by visually or qualitatively analyzing the data set for natural visually compelling groupings, such as distinct groupings of data points on a plot or logical divisions of topics in a set of text samples. Integrating dimensionality reduction (PCA and UMAP) with k-means clustering supports the visual analysis of (any) topical clusters occurring within the set of tweets analyzed.

Procedure

The first step in our process consisted of retrieving S-BERT vectors for each tweet. These high-dimensional (n = 384) sentence vectors were then reduced to two dimensions using a combination of PCA and UMAP such that each tweet could be placed on a 2-dimensional map according to their semantic similarity allowing for a visual analysis of the data. A wide range of parameter values were tested for PCA and UMAP, producing two-dimensional mappings of the corpus for visual analysis. These two-dimensional maps were subject to a k-means clustering procedure which assigned each tweet to a cluster, thereby codifying the visual clustering of the set of 4,020 tweets in the map to an explicit partitioning.

The approach that was determined to be the most effective used PCA to reduce the initial S-BERT embeddings to 40 dimensions. We then further reduced to 2 dimensions using UMAP with parameters of 20 nearest neighbors and a minimum distance of 0.1. These parameter values direct UMAP to prioritize local groupings of data points more heavily than global structure, and were best able to preserve the structure of topical clusters out of all parameters tested.

Visual analysis of this data indicated that roughly 25 distinct topical clusters were present, which informed the application of k-means clustering to partition the data set into 25 clusters. Two of our authors, serving in part as domain experts, analyzed the resulting tweet clusters that the k-means algorithm identified. They independently generated topic summaries of the content of the tweet clusters they examined. See Fig. 2. Their summaries were compared by the lead author of this study, and overlap between summaries was deemed as sufficient agreement for interpretation.

Results

We mined n = 4,020 tweets that matched the mentioned PrEP relevant terms and translated their semantics in a visualization using S-BERT, UMAP, and K-Means clustering. Broadly, we identified several observations regarding the myriad contexts in which PrEP is discussed online. We present our findings briefly without comment.

RQ1: Can we meaningfully consolidate PrEP related tweets into themes or ideas?

Using the data processing pipeline shown in Fig. 1, we identified 25 unique themes that occurred among the N = 4,020 tweets in our sample. Tables I and II outline all themes, delineated by the name of each theme, a brief definition, and a list of ten words most associated with each theme. Though most of clusters were clear and interpretable, we observed one cluster with a string of unclear terms and phrases (see Cluster 18). Lists of associated terms were generated by frequency of terms within each cluster, and the presence of apparently non-topical terms such as “I” or “re-tweet” is a natural artifact of this process.

Table I List of Clusters and Associated Words per Cluster [1–13]

Full size table

Table II List of Clusters and Associated Words Per Cluster [14–25]

Full size table

Cluster content spanned a wide range of relevant topics, reflecting a diversity of context that ran the gamut from the quotidian pre-occupations of PrEP users. For example, clusters emerged relative to general PrEP uptake and PrEP information. We also identified several topics believed to discuss some of the side effects associated with PrEP including chronic fatigue syndrome and renal/hepatic issues. As a likely consequence of reported side effects, we also observed topics related to lawsuits against Gilead Sciences, the maker of Truvada, and Truvada alternatives (i.e., Descovy, a monthly injectable PrEP medication). Lastly, we observed several topics related to condomless sex or MSM hookups among those on an active PrEP regimen.

RQ2: How can interdisciplinary frameworks add additional nuance to online conversations about PrEP and other Public health topics more broadly?

By leveraging the visualization tools described above we mapped the corpus of tweets into 25 topical clusters, as depicted in Fig. 3. These clusters represent bodies of tweets that are semantically and contextually similar, and are placed in the map such that their position reflects their relative similarity to other clusters. As a result of the K-means clustering algorithm, several outlying groups of tweets were unavoidably assigned their own clusters. These clusters (6, 10, 12, 13, and 20) comprised less than 10 tweets each.

Clusters in the center (i.e., topics 25, 11, 19, 2, and others) are general PrEP topics, which in some capacities are similar to all clusters located in the vector map. For example, cluster 25 contains tweets discussing Truvada costs and cluster 11 discusses PrEP medication relative to other methods of HIV prevention. Clusters further from the center are typically responses to news or specific events related to PrEP medications.

Close proximity among clusters indicate tweets aligned with themes identified in our analysis that are similar or overlap in content. For example, clusters related to PrEP cost, PrEP generic alternatives, and Gilead price gouging are likely to have close vector representation and appear close to one another in the vector map given the likely similarity of these bodies of tweets.

Distal clusters (e.g. Topics 9 and 7 compared with 21 and 1) indicate bodies of tweets that are semantically and contextually different from each other. For example, the previously mentioned clusters regarding PrEP costs and generic alternatives are most distal from topics/themes about condomless sex and MSM hookups.

Discussion

The purpose of this study is to explore online discourse about PrEP using an interdisciplinary public health and computational informatics approach. By analyzing a diverse array of PrEP related tweets, we uncovered several allusions to PrEP promotion efforts and how PrEP is positively and negatively contextualized online.

PrEP as a Medical and Cultural Social Media Phenomenon

PrEP altered the scientific community’s approach to preventing HIV exposure and transmission [27]. As one of the first medications approved during the social media era [28], scientists leveraged such online spaces to promote PrEP uptake and adherence. A natural consequence of using online mediums for information dissemination is the diachronic and public domain nature of these data [29]. Over time online information and interventions for PrEP have created a nine-year trail of online discourse including how PrEP is broadly communicated online.

Most clusters uncovered by our analysis aligned with PrEP information dissemination. This includes topics related to general information about PrEP, medical costs, and various options for PrEP including Truvada (a once-daily oral medication) and Descovy (a once monthly injectable medication). Tweets associated with these clusters suggest sincere efforts to promote and disseminate medically accurate information about PrEP (e.g., TWEET: Talking to your doctor about HIV prevention treatment can be intimidating– but it doesn’t have to be). Similar tweets also addressed ways insurers, companies, and/or advocacy groups could mitigate the cost of PrEP (e.g., TWEET: PrEP to prevent HIV is expensive. We are here to help!). We also identified several topics that compared Descovy and Truvada and the associated pros and cons for each medicine (e.g., TWEET: Truvada, a once daily oral pill; or Descovy, a once monthly injection: Which is right for you?).

We also highlighted several clusters that expressly referred to how PrEP is communicated among the MSM community. These cultural aspects about PrEP are reflected in our findings, which yielded two topics related to MSM hookups, LGBTQIA + identity, and condomless sex (Topic 15 (MSM community and PrEP) and Topic 16 (Condomless sex and MSM hookups)). Indeed, since PrEP’s inception the prophylactic treatment forged new identities among gay and bisexual men with regard to casual dating, hookups, and perceptions of PrEP users and non-users [30]. For example, users of popular MSM internet hookup sites (i.e., Grindr, Scruff, and others) are increasingly disclosing HIV serostatus and PrEP use as part of their bios or personal profiles [31]. Yet, disclosing one’s serostatus or PrEP use may come at the cost of social stigmatization among others who choose not to follow these practices. Indeed, evidence suggests there is a sharp divide in attitudes and perceptions of MSM who use PrEP, versus those who do not, and discrepancy among each persons’ choice. For example, tweets associated with these topics alluded to PrEP use disclosure and how PrEP users perceive themselves and others. (TWEET: If it’s not queer shaming, it’s slut shaming à la “Truvada whore” getting bandied around amongst our own like we don’t have a modern medical miracle sitting in our goddamn laps). These tweets, and others, suggest a certain degree of skepticism in sexual practices adopted by PrEP users, or even uncertainty about the safety of PrEP to reduce HIV infection. This observation is further supported by increases in STIs among MSM engaging in unprotected sex (i.e., barebacking), prompting concerns of drug resistant STI strains [32]. Ongoing research on identity formation should examine the effects of PrEP on gay and bisexual social circles, including how PrEP regimens alter personal social networks in a hookup context.

Nuanced PrEP Topics Reveal Social and Medical Barriers Inhibiting Uptake

Since the FDA approved PrEP in 2012, an estimated 1 million global (and 100,000 US) adults regularly use PrEP to prevent HIV-1 infection [33]. Yet, estimates indicate that PrEP remains underutilized among all eligible groups [34]. In the United States, only one quarter of eligible adults (i.e., MSM, trans women, and people who inject drugs) are on a PrEP regimen [35]. Global estimates highlight similarly low uptake rates in Sub-Saharan Africa, Asia, and Latin America [36,37,38]. Due to poor uptake and adherence, there have been efforts to identify barriers that may inhibit PrEP use and access. Though barriers are numerous, and often context and country-specific, recurring concerns among US and global populations include: lack of knowledge/awareness about HIV and PrEP, perceived HIV risk, social stigma, healthcare mistrust, lack of access, financial burden, among others [39].

Several topics along the periphery of our vector map alluded to complications and barriers of PrEP uptake. Topics along the periphery suggest these conversations are not the unilateral focus of the corpus but represent side conversations tangentially related to central topics located at the center of the vector map (i.e., PrEP information dissemination). Some barriers alluded to in these topics are highly documented including associated PrEP costs and insurance coverage. However, amongst topics alluding to documented barriers to PrEP use, we also identified additional barriers that are well-known in legal circles yet somewhat empirically understudied, including PrEP effectiveness , PrEP side effects, exorbitant costs and price-gouging, and lawsuits against Gilead Sciences, the maker of Truvada . These topics strongly suggest that at least some discourse about PrEP is framed negatively– particularly calling into question the long-term effects of continued PrEP uptake. These concerns are not new and highly documented [40]. Indeed, beginning in 2019, several state and federal lawsuits (and one class action lawsuit) were filed against Gilead Sciences. These lawsuits allege Gilead knew, or should have known, the active medication in Truvada could lead to serious side-effects including bone density loss, renal damage, and liver failure if not carefully monitored by medical providers [41]. Financial complaints similarly allege Gilead’s patent on Truvada (which would prevent the creation of a generic alternative) only served profiteering purposes by increasing the monthly cost of PrEP between $1200 and $2000 USD [10, 42], though generic alternatives for Truvada have since become available.

Independently, PrEP related concerns and/or barriers identified in our model should not affect PrEP uptake. However, given low PrEP uptake and adherence, there is evidence these controversies may be adversely affecting PrEP adoption and maintenance [43]. Indeed, persistent concerns in e-health and digital health science are short and long-term effects of misinformation, disinformation, and ideological echo chambers on individual health beliefs and behaviors [44]. Social media’s sordid history of ideological polarization further supports these concerns with regard to information seeking and dissemination among likely groups. Regarding PrEP, in 2019, an influx of social media ads likely targeting anti-PrEP groups began seeking plaintiffs in lawsuits targeted at Gilead. A study on the effects of such ads concluded that nearly half of participants who viewed the ads would either never start PrEP or discontinue their current PrEP regimen [10]. This, coupled with conflicted perceptions of PrEP users among MSM suggests that tailored messaging is needed to mitigate the effects of misinformation and disinformation campaigns. Future studies should continue studying online PrEP discourse, including identifying sources of misinformation and how to counter it. Interventionists should also consider leveraging pockets of disinformation as viable insights/sources for interventions promoting accurate medical information.

Insights into Interdisciplinary Public Health & Computational Informatics Collaborations

Public health has historically borrowed methods, tools, and algorithms from computer science and computational informatics to mine social media data. As shown in our study, the synergistic use of public health frameworks with computer science tools uncovered nuanced portrayals about PrEP. This includes PrEP’s evolution from medical novelty into a cultural phenomenon in addition to controversies that may be weaponized via misinformation campaigns.

Collectively these findings suggest that data derived from social media can provide a more comprehensive portrait of PrEP and other medical interventions more broadly. However, limited understanding of all available tools and how to best apply them may create uncertainty about the validity of these data. In many scientific disciplines, including public health, social media data bears an unfortunate reputation for being unreliable and non-efficacious. Mistrust of social media data may stem from the radical departure of social media analytics from traditional quantitative (i.e., multiple regression) and qualitative (i.e., focus-groups, interviews) analyses; or the secondary data collection nature of scraping social media data that may be days, months, and in some cases, years old. However, the uniqueness of social media data necessitates methods to facilitate extraction, analysis, and synthesis of individual posts, timelines, and collections of timelines. Inherently, these methods must also understand the time-variant nature of social media and how that data contributes to a larger narrative, regardless of the data’s age. Refined algorithms in Computer Science, and other similarly technical fields, have afforded the opportunity to visualize the scope, scale, and precision of social media data, including methods undertaken herein. Indeed, the combined S-BERT, UMAP, and K-means approach can group similar tweets in a corpus into clusters that represent pockets of dialogue that, when mapped by vector representation, illustrate the semantic and contextual ways medical necessities are communicated online.

This study has timely implications for computational informatics and public health. From an informatics perspective, this study contributes to a body of research on social media discourse and misinformation. From a public health perspective there are numerous implications associated with our findings, including how these data can be leveraged even further for interventions. First, our analysis identified pockets of possible misinformation (and how it may impact PrEP uptake and maintenance). These bodies of tweets capitalized on controversies associated with Gilead science, possibly creating concerns about Truvada’s safety, effectiveness, and unintended consequences of PrEP use. We also identified PrEP uptake as a form of judgement among people with differing views, for example, MSM PrEP users versus non PrEP users and conflicts between them. This is not the only instance of communicative tensions online regarding medical interventions— COVID-19 vaccination status has played out similarly. While these insights are, by themselves, informative, we can increase granularity by leveraging metadata associated with tweets and clusters of tweets. Indeed, by leveraging metadata, we can determine whether tweets sharing information about Truvada’s adverse effects come from social media users or bot accounts, defined as automated accounts that are not operated by humans. If these tweets originate from individual users, then it is possible to mine individual timelines to determine personal characteristics about that user, including markers of mental illness and other patterns of social media behavior that may be facilitating problematic online posting habits. We can also examine the relative popularity of these tweets determined by frequency in which they are shared among social networks. From a public health standpoint, these represent potential intervention targets– namely groups that may be misinformed about Truvada– to promote accurate PrEP information.

Ethical Considerations for Mining Social Media Data Among Vulnerable Populations

Although social media analysis represents an important potential resource for developing insight into attitudes and behaviors relating to public health concerns, caution must be urged when undertaking such investigations. Indeed, ethical concerns related to social media mining, including consent, exploiting data, and unknowingly scraping personal social media feeds represent persistent challenges in this area of research[45]. These challenges are particularly noteworthy for at-risk and vulnerable populations. Indeed, we must remember that the sero-status of individuals who are at-risk or currently living with HIV represents a deeply personal and sensitive characteristic. As such, studies relating to PrEP interventions, and for studies involving vulnerable populations more broadly, the utmost care with ensuring the anonymity and security of personal data pertaining to vulnerable populations must be observed. This study adhered closely to ethical principals for social media mining, including anonymized data and deleting all personally identifiable account information. We highly encourage the adoption of these practices for any study involving social media data related to sensitive topics, including HIV dialogue. We also strongly discourage the analysis of a select, or few, number of tweets/accounts with deep learning tools, as more data typically afford greater degrees of anonymity and accuracy of post classification.

Limitations

This work is subject to limitations we hope to address in future research. First, our study was limited by Twitter’s API, which allows users to only collect 1% of total tweet volume for a given search query. This threshold resulted in a relatively small sample of N = 4,020 tweets, which were likely intended for a highly specific population (i.e., men who have sex with men, people who inject drugs, or other persons at risk of contracting HIV). As a consequence of the limited populations for which PrEP is intended, it is likely there was less social media discourse relative to other medical interventions intended for the general population such as the COVID-19 vaccine. Second, our study relied on tweets matching a limited set of PrEP-related search terms, thus excluding a wide range of tweets that may be relevant to PrEP but did not contain these specific terms. In other words, our tweet sample inclusion criterion focused on high precision, including tweets in our sample exactly matched a small set of PrEP-relevant terms, but likely yielded low recall. The outcomes of this analysis however can point to more sophisticated methods that identify a wider range of PrEP-relevant tweets and may do so in a manner that adapts to the changing landscape of online communities of interest. Third, our study was also limited given that we did not perform a full qualitative analysis of tweet content. Future studies should consider validating our findings by conducting a full qualitative analysis of this corpus.

Conclusions

Our findings demonstrate that leveraging interdisciplinary collaboration between computational informatics and public health can provide insight into discourse surrounding complex issues such as PrEP. Social media data contains a wealth of information regarding public attitudes towards health issues but extracting the nuances of these narratives requires the analysis of large amounts of unstructured data. By developing a research framework utilizing deep learning neural networks and pattern recognition tools to prepare data for qualitative analysis grounded in public health research, we were able to distill large data corpora into more coherent topical groupings for exploratory interpretation. The findings of this study indicate a need for deeper analysis into PrEP discourse on social media, as well as an opportunity to extend our research framework towards better understanding other public health issues.

References

McCormack S, Dunn DT, Desai M, Dolling DI, Gafos M, Gilson R, et al. Pre-exposure prophylaxis to prevent the acquisition of HIV-1 infection (PROUD): effectiveness results from the pilot phase of a pragmatic open-label randomised trial. The Lancet. 2016 Jan;387(10013)(2):53–60.
Article Google Scholar
Riddell JIV, Amico KR, Mayer KH. HIV Preexposure Prophylaxis: A Review. JAMA. 2018 Mar;27(12):1261–8. 319(.
Article Google Scholar
Volk JE, Marcus JL, Phengrasamy T, Blechinger D, Nguyen DP, Follansbee S, et al. No New HIV Infections With Increasing Use of HIV Preexposure Prophylaxis in a Clinical Practice Setting. Clin Infect Dis Off Publ Infect Dis Soc Am. 2015 Nov;15(10):1601–3. 61(.
Article Google Scholar
Huang YLA, Tao G, Smith DK, Hoover KW. Persistence With Human Immunodeficiency Virus Pre-exposure Prophylaxis in the United States, 2012–2017. Clin Infect Dis. 2021 Feb 1;72(3):379–85.
Patel VV, Ginsburg Z, Golub SA, Horvath KJ, Rios N, Mayer KH, et al. Empowering With PrEP (E-PrEP), a Peer-Led Social Media–Based Intervention to Facilitate HIV Preexposure Prophylaxis Adoption Among Young Black and Latinx Gay and Bisexual Men: Protocol for a Cluster Randomized Controlled Trial. JMIR Res Protoc. 2018 Aug 28;7(8):e11375.
Dehlin JM, Stillwagon R, Pickett J, Keene L, Schneider JA. #PrEP4Love: An Evaluation of a Sex-Positive HIV Prevention Campaign. JMIR Public health Surveill. 2019 Jun 17;5(2):e12822.
Keene L, Dehlin J, Pickett J, Berringer K, Little I, Tsang A, et al. #PrEP4Love: success and stigma following release of the first sex-positive PrEP public health campaign. Cult Health Sex. 2020 Mar 26;23.
Walsh-Buhi E, Houghton RF, Lange C, Hockensmith R, Ferrand J, Martinez L. Pre-exposure Prophylaxis (PrEP) Information on Instagram: Content Analysis. JMIR Public health Surveill. 2021 Jul 27;7(7):e23876.
AIDSVu. Mapping PrEP: First Ever Data on PrEP Users Across the U.S. [Internet]. AIDSVu. 2018 [cited 2021 Nov 14]. Available from: https://aidsvu.org/prep/.
Grov C, Westmoreland DA, D’Angelo AB, Pantalone DW. How Has HIV Pre-Exposure Prophylaxis (PrEP) Changed Sex? A Review of Research in a New Era of Bio-behavioral HIV Prevention. J Sex Res. 2021 Sep 2;58(7):891–913.
Jaidka K, Giorgi S, Schwartz HA, Kern ML, Ungar LH, Eichstaedt JC. Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods. Proc Natl Acad Sci. 2020;117(19):10165–71.
Article CAS Google Scholar
Bathina KC, ten Thij M, Lorenzo-Luaces L, Rutter LA, Bollen J. Individuals with depression express more distorted thinking on social media. Nat Hum Behav. 2021 Feb;11:1–9.
Google Scholar
ten Thij M, Bathina K, Rutter LA, Lorenzo-Luaces L, van de Leemput IA, Scheffer M, et al. Depression alters the circadian pattern of online activity. Sci Rep. 2020 Oct;14(1):17272. 10(.
Article Google Scholar
Bollen J, Gonçalves B, van de Leemput I, Ruan G. The happiness paradox: your friends are happier than you. EPJ Data Sci. 2017 May;18(1):4. 6(.
Article Google Scholar
Valdez D, Thij M ten, Bathina K, Rutter LA, Bollen J. Social Media Insights Into US Mental Health During the COVID-19 Pandemic: Longitudinal Analysis of Twitter Data. J Med Internet Res. 2020;22(12):e21418.
Article Google Scholar
Valdez D, Picket AC, Young BR, Golden S. On Mining Words: The Utility of Topic Models in Health Education Research and Practice. Health Promot Pract. 2021 May 1;22(3):309–12.
Valdez D, Patterson M, Prochnow T. The importance of interdisciplinary frameworks in social media mining: An exploratory approach between Computational informatics and Social Network Analysis (SNA). Health Behav Res [Internet]. 2021 Aug 19;4(2). Available from: https://newprairiepress.org/hbr/vol4/iss2/4.
Karisani P, Karisani N. Semi-Supervised Text Classification via Self-Pretraining. ArXiv210915300 Cs [Internet]. 2021 Sep 30 [cited 2021 Nov 15]; Available from: http://arxiv.org/abs/2109.15300.
Roshanzamir A, Aghajan H, Soleymani Baghshah M. Transformer-based deep neural network language models for Alzheimer’s disease risk assessment from targeted speech. BMC Med Inform Decis Mak. 2021 Mar 9;21(1):92.
Alfeo AL, Cimino MGCA, Vaglini G. Technological troubleshooting based on sentence embedding with deep transformers. J Intell Manuf. 2021 Aug 1;32(6):1699–710.
Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. ArXiv190810084 Cs [Internet]. 2019 Aug 27 [cited 2021 Nov 15]; Available from: http://arxiv.org/abs/1908.10084.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is All you Need. In: Advances in Neural Information Processing Systems [Internet]. Curran Associates, Inc.; 2017 [cited 2021 Nov 15]. Available from: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
Ali M, Borgo R, Jones MW. Concurrent time-series selections using deep learning and dimension reduction. Knowl-Based Syst. 2021 Dec 5;233:107507.
Diaz-Papkovich A, Anderson-Trocmé L, Gravel S. A review of UMAP in population genetics. J Hum Genet. 2021;66(1):85–91.
Article Google Scholar
Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philos Trans R Soc Math Phys Eng Sci 2016 Apr. 2065;13:20150202. 374(.
Google Scholar
Likas A, Vlassis N, Verbeek J. J. The global k-means clustering algorithm. Pattern Recognit. 2003 Feb;36(2):451–61.
Article Google Scholar
Eakle R, Venter F, Rees H. Pre-exposure prophylaxis (PrEP) in an era of stalled HIV prevention: Can it change the game? Retrovirology. 2018 Apr 2;15(1):29.
Kudrati SZ, Hayashi K, Taggart T. Social, Media & PrEP: A Systematic Review of Social Media Campaigns to Increase PrEP Awareness & Uptake Among Young Black and Latinx MSM and Women. AIDS Behav [Internet]. 2021 May 3 [cited 2021 Nov 15]; Available from: https://doi.org/10.1007/s10461-021-03287-9.
Cougnon LA, de Viron L. Covid-19 and social media: a diachronic discourse analysis for the modeling of linguistic patterns during crises. In 2020 [cited 2021 Nov 15]. Available from: https://dial.uclouvain.be/pr/boreal/object/boreal:235420.
García-Iglesias J. “PrEP is like an adult using floaties”: meanings and new identities of PrEP among a niche sample of gay men. Cult Health Sex. 2020 Oct 1;1–14.
Medina MM, Crowley C, Montgomery MC, Tributino A, Almonte A, Sowemimo-Coker G, et al. Disclosure of HIV serostatus and pre-exposure prophylaxis use on internet hookup sites among men who have sex with men. AIDS Behav. 2019 Jul;23(7):1681–8.
Article Google Scholar
Scott HM, Klausner JD. Sexually transmitted infections and pre-exposure prophylaxis: challenges and opportunities among men who have sex with men in the US. AIDS Res Ther. 2016 Jan 19;13:5.
Celum C, Baeten J. PrEP for HIV Prevention: Evidence, Global Scale-up, and Emerging Options. Cell Host Microbe. 2020 Apr;27(4):502–6.
Article CAS Google Scholar
van Dijk M, de Wit JBF, Guadamuz TE, Martinez JE, Jonas KJ. Slow Uptake of PrEP: Behavioral Predictors and the Influence of Price on PrEP Uptake Among MSM with a High Interest in PrEP. AIDS Behav. 2021 Aug 1;25(8):2382–90.
Hannaford A, Lipshie-Williams M, Starrels JL, Arnsten JH, Rizzuto J, Cohen P, et al. The Use of Online Posts to Identify Barriers to and Facilitators of HIV Pre-exposure Prophylaxis (PrEP) Among Men Who Have Sex with Men: A Comparison to a Systematic Review of the Peer-Reviewed Literature. AIDS Behav. 2018 Apr;22(4):1080–95.
Article Google Scholar
Assaf RD, Konda KA, Torres TS, Vega-Ramirez EH, Elorreaga OA, Diaz-Sosa D, et al. Are men who have sex with men at higher risk for HIV in Latin America more aware of PrEP? PLOS ONE. 2021 Aug 13;16(8):e0255557.
Mugo NR, Ngure K, Kiragu M, Irungu E, Kilonzo N. PrEP for Africa: What we have learnt and what is needed to move to program implementation. Curr Opin HIV AIDS. 2016 Jan;11(1):80–6.
Article Google Scholar
Zablotska I, Grulich AE, Phanuphak N, Anand T, Janyam S, Poonkasetwattana M, et al. PrEP implementation in the Asia-Pacific region: opportunities, implementation and barriers. J Int AIDS Soc. 2016 Oct 18;19(7Suppl 6):21119.
Mayer KH, Agwu A, Malebranche D. Barriers to the Wider Use of Pre-exposure Prophylaxis in the United States: A Narrative Review. Adv Ther. 2020 May 1;37(5):1778–811.
D’Angelo AB, Westmoreland DA, Carneiro PB, Johnson J, Grov C. Why Are Patients Switching from Tenofovir Disoproxil Fumarate/Emtricitabine (Truvada) to Tenofovir Alafenamide/Emtricitabine (Descovy) for Pre-Exposure Prophylaxis? AIDS Patient Care STDs. 2021 Aug 1;35(8):327–34.
Chan L, Asriel B, Eaton EF, Wyatt CM. Potential Kidney Toxicity from the Antiviral Drug Tenofovir: New Indications, New Formulations, and a New Prodrug. Curr Opin Nephrol Hypertens. 2018 Mar;27(2):102–12.
Article CAS Google Scholar
Ddaaki W, Strömdahl S, Yeh PT, Rosen JG, Jackson J, Nakyanjo N, et al. Qualitative Assessment of Barriers and Facilitators of PrEP Use Before and After Rollout of a PrEP Program for Priority Populations in South-central Uganda. AIDS Behav. 2021 Nov;25(11)(1):3547–62.
Article Google Scholar
Thomann M, Grosso A, Zapata R, Chiasson MA. ‘WTF is PrEP?’: attitudes towards pre-exposure prophylaxis among men who have sex with men and transgender women in New York City. Cult Health Sex. 2018 Jul;3(7):772–86. 20(.
Article Google Scholar
Liu Y, Yu K, Wu X, Qing L, Peng Y. Analysis and Detection of Health-Related Misinformation on Chinese Social Media. IEEE Access. 2019;7:154480–9.
Article Google Scholar
Norval C, Henderson T. Contextual Consent. Ethical Mining of Social Media for Health Research [Internet]. arXiv; 2017 [cited 2022 May 15]. Available from: http://arxiv.org/abs/1701.07765.

Download references

Funding

AE was partially funded by the National Science Foundation NRT grant 1,735,095, “Interdisciplinary Training in Complex Networks and Systems.“ Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Department of Applied Health Science, Indiana University School of Public Health, 47405, Bloomington, IN, USA
Andy Edinger & Eric Walsh-Buhi
Luddy School of Informatics and Computer Engineering, Indiana University, 47405, Bloomington, IN, USA
Danny Valdez & Johan Bollen
Department of Psychological and Brain Sciences, Indiana University, 47405, Bloomington, IN, USA
Johan Bollen

Authors

Andy Edinger
View author publications
You can also search for this author in PubMed Google Scholar
Danny Valdez
View author publications
You can also search for this author in PubMed Google Scholar
Eric Walsh-Buhi
View author publications
You can also search for this author in PubMed Google Scholar
Johan Bollen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AE collected/analyzed the data and wrote technical aspects of the manuscript; DV coded data and wrote the initial draft of the manuscript; EWB provided expert content support and reviewed/edited drafts of the manuscript; and JB conceptualized the study, oversaw the execution of analyses, and edited the manuscript.

Corresponding author

Correspondence to Danny Valdez.

Ethics declarations

Conflict of interest

Not Applicable.

Ethical Review

This study was exempt by the Institutional Review Board (IRB) given the secondary nature of data collection and analysis.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Reprints and permissions

About this article

Cite this article

Edinger, A., Valdez, D., Walsh-Buhi, E. et al. Deep learning for topical trend discovery in online discourse about Pre-Exposure Prophylaxis (PrEP). AIDS Behav 27, 443–453 (2023). https://doi.org/10.1007/s10461-022-03779-2

Download citation

Accepted: 25 June 2022
Published: 02 August 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10461-022-03779-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Deep learning for topical trend discovery in online discourse about Pre-Exposure Prophylaxis (PrEP)

Abstract

Similar content being viewed by others

Advances in Social Media Research: Past, Present and Future

Mental Health Analysis in Social Media Posts: A Survey

A survey of sentiment analysis in social media

Introduction

Social Media Mining and Public health

Social media mining and PrEP: an interdisciplinary analysis

Methods

Data

Analyses

Procedure

Results

RQ1: Can we meaningfully consolidate PrEP related tweets into themes or ideas?

RQ2: How can interdisciplinary frameworks add additional nuance to online conversations about PrEP and other Public health topics more broadly?

Discussion

PrEP as a Medical and Cultural Social Media Phenomenon

Nuanced PrEP Topics Reveal Social and Medical Barriers Inhibiting Uptake

Insights into Interdisciplinary Public Health & Computational Informatics Collaborations

Ethical Considerations for Mining Social Media Data Among Vulnerable Populations

Limitations

Conclusions

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Review

Additional information

Publisher’s Note

Electronic supplementary material

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation