1 Introduction

Social media have become an integral part of society in recent years. Besides all the benefits this has brought, it has also uncovered a number of serious problems including the increasing speed and the number of interactions that go beyond the users’ ability to monitor and understand such content, resulting in threats such as the pervasive diffusion of fake news and biased as well as toxic content such as hate speech. A common way to address such challenges is through the adoption of powerful natural language processing approaches, triggered by the paradigmatic shift that the introduction of transformer-based models (such as BERT [35]) has led to. The adoption of common benchmark collections has been another major driver in this context. Several of those datasets focus on the detection of single threats (e.g. in the domain of fake news detection [112] or for hate speech detection [78]). Others try to unify existing text data collections, e.g. for classification of toxic content [105, 126].

Such benchmarks have also increasingly been utilized in a growing number of shared tasks and competitions (with leaderboards), primarily led by the machine learning (ML) community. However, a lot of work in this area remains in a purely academic classification scenario and is not being put to use in a practical context. Perhaps more importantly, it has been observed that the performance levels reported for common benchmarks do not necessarily reflect how well the algorithms will work in a realistic use case as systems are often very brittle and the performance levels do not actually transfer easily to different domains, datasets or even variations of the same dataset [21].

Instead of adopting a well-controlled setting with no real user involvement, we aim to address an actual practical use case that does not lend itself to being modelled around existing benchmark collections. Our starting point is the observation that social media users often have a limited understanding of the platforms and their algorithms and, more importantly, the effects of their actions on others’ experiences and the proliferation of toxic phenomena [66, 123]. We present a framework that serves as a machine-learning-based social media education tool that aims at integrating solutions to the above-mentioned problems directly in the users’ social media experience [96]Footnote 1. As such the user’s feed is augmented automatically with additional information on the content and underlying producing social network, as can be seen in Fig. 2. Machine learning is used to trigger personalized and contextualized educational experiences that raise users’ awareness about social media and its threats. At the same time, autonomous evaluation is encouraged by highlighting the principles and limits of the involved algorithmic components. The ultimate objective is to educate and empower social media users. Fig. 1 gives a high-level view of the educational framework we are proposing.

Fig. 1
figure 1

Conceptual view of our proposed framework. Social Media Analysis shows the tool that provides additional information while browsing the feed; Education represents educational activities (example here: machine learning models’ limitations, described in more detail in Sect. 5.3)

Fig. 2
figure 2

Screenshot of social media content analysis results inside our Twitter demo interface. Here other user posts of the person connected to the first tweet as well as sentiment and emotion analysis are displayed. The buttons under each post allow to show/hide these additional information

In this paper, we start by discussing threats arising through social media, then present trends in how the community works on solving such issues, and then contextualize these developments in a scenario of practical use taken from the COURAGE project.

2 Social Media Threats

Threats occurring on social media cover a broad range of categories due to the vast amounts of multifaceted content on such platforms. As a result, crucial ethical and practical issues, like preserving freedom of speech and allowing users to be collectively satisfied while dealing with the conflicts generated by their different opinions and contrasting interests, lead to negative influences on users and society.

Critical cases include the spread of fake news, biased content and the growing trend of hate practices (which indeed is not a new phenomenon on the internet [26, 39, 109]). Even though social media platforms are presenting policies against hate speech, discrimination or violent and racist content, the mentioned threats are still part of these websitesFootnote 2 [19, 51], underlining the need for raising awareness to the users.

Before presenting ways of how to counteract these issues in general, and how we do that with the help of our approach in the COURAGE project, we want to give a brief overview about the categories of social media threats, grouping them in (1) content-based, (2) algorithmic, (3) dynamics, and (4) cognitive and socio-emotional.

The transitions between these types of threats are fluid, making it hard to provide clear distinctions. Our focus while describing these issues lies on teenagers, which for example are heavily affected by bullying [86, 116], addiction [111, 117], body stereotypes, and others [31, 79, 97]. This is also the reason why we aim at supporting this exceptionally vulnerable group of the society in the COURAGE project.

2.1 Content-Based Threats

Content-based threats are very common for all types of media, including classical outlets, but they are especially crucial in the context of social media platforms.

Examples of textual threats include toxic contents [63, 65], fake news/disinformation [12, 32] and bullying [44].

However, content is not only limited to text but can also appear in form of image or video data, as for example is dominant on platforms like Instagram and TikTok. Such user-created video and image content might convey any sort of message (verbally, non-verbally, textually or by other visual means) which can be the source of a range of threats on social media. Concrete examples are the propagation of beauty stereotypes via image data [125] or hyper-realistic videos/images showing people saying and doing things that never happened [25, 132], so called “deep fakes”. Fig. 3Footnote 3 demonstrates how images can be hard to distinguish between real and fake. In general, they can be misleading due to aspects like manipulation or because of the missing context of the event depicted.

Fig. 3
figure 3

Real but potentially misleading images (a and c) and DeepFake/manipulated images (b and d)3

Given the importance of this category of threats, much research is focused on the development of dedicated detection systems as we will discuss in Sect. 4.

2.2 Algorithmic Threats

Besides the content itself, additional threats are caused by automatic algorithms that are used on social media platforms. These lead to the selective exposure of digital media users to news sources [110], risking to form closed-group polarised structures; e.g. so-called ‘filter bubbles’ [40, 94] and ‘echo chambers’ [34, 42]. Another undesired network condition is gerrymandering [114], where users are exposed to unbalanced neighbourhood configurations. Especially in decision-making frameworks, such as elections, gerrymandering can overturn the decision of networks’ participants biasing the outcome of a vote, such that one “party” wins up to 60 percent of the time in simulated elections of two-party situations where the opposing groups are equally popular through this selective presentation. This phenomenon highlights the relevance of network structure and information exposure in decision-making settings.

2.3 Dynamics-induced Threats

Another type of threat is dynamics on social media, induced by the extended and fast-paced interaction between algorithms, common social tendencies and stakeholders’ interests [9, 83]. This may lead to an escalating acceptance of toxic beliefs [93, 114] and thus making the users’ opinion susceptible to phenomena such as the diffusion of hateful content. In addition, these types of threats can lead to large-scale outbreaks of fake news [34, 130].

2.4 Cognitive and Socio-emotional Threats

A substantial body of work on analyzing the mechanisms of content propagation on social media exists. However, modeling the effects of the users’ emotional and cognitive states as well as traits on the propagation of malicious content remains a major challenge. This is especially the case considering the significant contribution of their cognitive limits [5, 99].

Such cognitive factors refer to the users’ limited attention and error-prone information processing [131] that may be worsened by the emotional features of the messages [22, 67]. Moreover, the lack of non-verbal communication and limited social presence [48, 107] lead to carelessness and misbehavior as the users perceive themselves as anonymous [36, 103]. Consequently, they do not feel judged or exposed [133] and deindividualize themselves and others [76].

Another recently recognized threat in this category is digital addiction [6, 90] and it has several harmful consequences, such as unconscious and hasty user actions [4, 7]. Some of them are especially relevant for teenagers affecting their school performance and mood [1]. In the last few years, it became clear that recognizing addiction to social media cannot only be based on the “connection time” criterion but also on how people behave [89, 118]. As with other behavioral addictions, a crucial role may be played by the environmental structure [64, 95].

2.5 Limited Social Media Literacy

Finally, the common lack of digital literacy among teenagers [82] has a strong impact on the escalation of other threats, for example by favoring the spread of content-based threats and engaging in toxic dynamics [136]. This underlines the need for education of young people in dealing with social media threats and demonstrates that automatic tools to support users in their behavior on such platforms are very important.

Teenagers also show over-reliance on algorithmic recommendations and a lack of awareness of the unwitting use of toxic content. This results in a reduction of their ability to make choices and leads towards an increasingly dangerous behavior [14, 127].

3 Related Work

The effort of supporting users on social media aims at helping them make the right decision for themselves and other people using such platforms. Strategies developed in the context of behavioral and cognitive sciences offer a well-founded framework to address these issues. In particular, nudging [119] and boosting [56] can be considered as two paradigms that have both been developed to minimize risk and harm. They do this in a way that makes use of behavioral patterns and is as unintrusive as possible, something particularly important in contexts like social media.

Nudging [119] is a behavioral-public-policy approach aiming to push people towards more beneficial decisions through the “choice architecture” of people’s environment (e.g., default settings). In a way, the machine learning-based recommender systems integrated into the social media platform already define a choice architecture that reduces the amount of content users have to interact with, however, such recommendations are not aimed at improving users’ choices in terms of collective wellbeing [95].

Some approaches have exploited machine learning tools to support user interactions with social media. Kyza et al. [69] propose a solution based on a web browser plugin that would use AI to support citizens dealing with misinformation by showing measures of tweets’ credibility and employing a nudging mechanism that blurs out low-credibility tweets according to user’s preferences. While their study uses a fact-checked dataset, it shows that such an AI-based tool may deter social media users from liking and spreading misinformation. Another work [10] proposes a browser plugin to extend Instagram with the result of inverse image search algorithms to help users contextualize and detect fake images.

Other forms of nudging are warning lights and information nutrition labels as they offer the potential to reduce harm and risks in web searches (e.g. [138]).

While nudges are particularly suitable for integration in social media interfaces as they may not add additional cognitive load on the users, their limitation is that they do not typically teach any competencies, i.e. when a nudge is removed, the user will behave as before (and not have learned anything). This is where boosts come in as an alternative approach. Boosts focus on interventions as an approach to improve people’s competence in making their own choices [56].

The critical difference between a boosting and nudging approach is that boosting assumes that people are not merely “irrational” and therefore need to be nudged toward better decisions. However, such new competencies can be acquired without too much time and effort and may be hindered by the presence of stress and other sources of reduced cognitive resources. Both approaches nicely fit into the overall approach proposed here. Nudges offer a way to push content to users, making them aware of it. Boosting is a particularly promising paradigm to strengthen online users’ competencies and counteract the challenges of the digital world. It also appears to be a good scenario for addressing misinformation and false information, among others. Both paradigms help us educate online users rather than imposing rules, restrictions, or suggestions on them. They have massive potential as general pathways to minimize and address harm in the modern online world [66, 74].

In particular, we refer to the concept of “media literacy” that [13] defines as: the “ability of a citizen to access, analyze, and produce information for specific outcomes”. Several definitions have been proposed in the literature highlighting the importance of critically approaching the media also in the light of the propagation of fake news and other toxic content as well as the influence that media can have on other citizens [24, 123].

While in this paper we present a multi-modal approach leveraging machine learning methodologies to support users and their education, algorithms and automation have taken control of many media processes such as content generation, recommendation, and filtering. Today, algorithms and machine learning are used for tracking user profiling, targeted advertising, and behaviour engineering. They have played a role in the dissemination of disinformation and misinformation as well as in impacting political opinion. The need to understand algorithm-based media requires new educational methodologies. In particular, [123] points out the necessity of combining media literacy with computing education specific to these mechanisms to allow users to cope with the changing media landscape, and [29] noted interactivity is a positive factor that influences the efficacy of digital media literacy.

For example, it is important to find methodologies to explain and educate about how machine learning components affect our decisions directly or by shaping our choice and information architecture, in particular in social contexts [72]. It is also crucial to show the limits of such algorithms and the trade-off we should consider between our and their competencies [30].

4 Threat Detectors and Content Analyzers

The great variety of social media threats (as described in Sect. 2) results in challenging issues and researchers are studying how to automatically identify them. One way of bringing together the community to working on solving social media threats are workshops on these topics, e.g. [68, 91]. As introduced in the beginning, another way are shared tasks. Examples include hate speech detection at SemEval 2019 [15] or Evalita 2020 [108] as well as toxic comment detection at GermEval 2021 [106] or toxic span detection at SemEval 2021 [98].

Solutions proposed to counteract threats on social media are usually defined as classification tasks commonly solved using deep learning. Depending on the type of threat the input can include textual, visual or network signals. We present methods and models that have been developed as part of this project and we are using them for the detection of threats in our proposed framework. This includes (1) classifying textual content, (2) analyzing visual content and (3) revealing network structures like echo chambers. The general architecture is flexible so that new classifiers can easily be added or replaced in a plug-and-play fashion.

4.1 Text-Based Detectors

With a vast amount of social media threats taking a textual form, we proceed to present text-based detectors categorized by different threats.

4.1.1 Hate Speech and Toxic Content

An approach to profiling hate speech spreaders on Twitter was submitted to CLEF2021 and features runs for multiple languages [3]. For English, a pretrained BERT-model was fine-tuned while for Spanish a language-agnostic BERT-based sentence embedding model without fine-tuning was used.

Transformer models are widely adopted in solving text classification tasks and [57] use them to generate text representations for their submission at the Evalita 2020 shared task on hate speech detection.

Transformer models for hate speech detection were also used for identifying irony in social media [122]. Ensembles of transformer models and the automatic augmentation of training data were proposed. Using the common SemEval 2018 Task 3 benchmark collection they demonstrate that such models are well suited in ensemble classifiers for the task at hand.

However, also other methods are introduced, for example, an approach based on graph machine learning by [134]. The participation in the HASOC [87] campaign aimed at examining the suitability of Graph Convolutional Neural Networks (GCN), due to their capability to integrate flexible contextual priors, as a computationally effective solution compared to more computationally expensive and relatively data-hungry methods, such as fine-tuning of transformer models. Specifically, the combination of two text-to-graph strategies based on different language modeling objectives was explored and compared to fine-tuned BERT.

Another graph-based method in the context of hate speech detection, more specifically sexism detection, was introduced in [135]. This method builds on Graph Convolutional Neural Networks (GCN) exploring different edge creation strategies and one combining graph embeddings from different GCN through ensemble methods. In addition, different GCN models and text-to-graph strategies are explored.

Despite the success achieved by these efforts, the robustness of these systems is still limited. They often cannot generalize to new datasets and resist against attacks (for example, word injection) [45, 58]. Some recent models can generalise the task while maintaining similar results in different platforms and languages under certain conditions [135]. In general this is important as small changes impact the system performance making it challenging to applying these approaches in the dynamic contexts of social media.

4.1.2 Fake News and Misinformation

To detect fake news an approach that applies automatic text summarization to compress original input documents before classifying them with a transformer model was proposed. Promising performance was reported on the utilized dataset while the system has also established a new state-of-the-art benchmark performance on the commonly used FakeNewsNet dataset [52].

Other recent methods apply ensembles of different models for fake news detection with a focus on transformer models [120].

In general, fake news detection datasets have frequently been proposed as part of shared tasks and we use them as for example in [121] or [71]. While [121] apply automatic text summarization, similarly as in [52], and combine this information with automatic machine translation, [71] introduce an approach that is based on text graphs and graph attention convolution. Although submissions were very competitive, the contributions by [121] demonstrate that this approach is highly competitive as they resulted in winning the German cross-lingual fake news detection challenge at CLEF 2022 “CheckThat!” [121].

4.1.3 User Beliefs and Opinions

We also use models to extract user-related properties, beliefs, and opinions as well as sentiments and emotions. Inferring and interpreting human emotions [102] includes distinguishing between sentiment analysis, the polarity of content (e.g. [49, 50, 70]), and emotion recognition (e.g. [2, 16]). In comparison, opinion extraction aims at discovering users’ interests and their corresponding opinions [129]. Similarly, the positive aspects of social media interaction, crucial for estimating the “collective social well-being”, could be extracted. Still, they have attracted less attention, but see [28, 128].

As a lot of work in this area is going on in the NLP community, we are mainly relying on methods proposed in the literature. We use models for sentiment prediction in English [101], German [47], Italian [18] and Spanish [101]. In addition, we use models for the detection of emotions in Italian [18], Spanish [11] and English [75].

4.2 Visual Content

One way of identifying threats in image or video data is to use textual cues related to such postings, for example associated user-comments [77], results of transcribing the audio of a video via speech-to-text models [54, 137] or by considering text located in images [12, 41, 60].

Other methods aim at operating directly on the level of the image data: regarding the threats arising from beauty stereotypes [125] (e.g. to learn whether someone’s feed is predominantly occupied by posts of users promoting a specific body type) we have developed a body mass index (BMI) detector that is based on a convolutional neural network and partly makes use of OpenFace [8], an open source face recognition model. It identifies a person’s face within an image and predicts the BMI based on this cutout.

We also provide a gender predictor (again based on OpenFace [8]), identifying the gender of people present in an image, and an object detection algorithm that makes use of YOLOv3 [104] to get further contextual information about the setting displayed in an image, both based on convolutional neural networks. These tools provide metadata about the image that can be used as a feature for the detection of hate speech [33], violent content [37], and other threats.

Approaches to counteract threats like the previously mentioned “deep fakes” include the usage of deep neural networks for the detection of artifacts resulting from the production of such content (for videos see for example [20, 53, 62, 88, 115], for images see [27, 46, 59]). Such artifacts are for example related to image blending, the environment, behavioural anomalies, as well as audiovisual synchronization issues [85].

To improve the understanding of image feature relevance for misleadingness and correlations between user characteristics and interpretations of visual content we propose a partly crowd-sourcing-based image annotation schema. The features we consider for that are inspired by criteria used by fact-checking institutions such as the IFCN network [43] and include a mixture of objective and subjective concepts. For the crowd-sourcing-based annotation, we also account for annotator characteristics using different scales such as [23, 100, 113].

4.3 Echo Chambers and Information Gerrymandering

Another function of our tool provides support for echo chamber identification and thus helps in counteracting algorithm-based social media threats as introduced in Sect. 2.2. As there is no standard approach for the detection of echo chambers [84] we adopt commonly used ideas to this approach. We first apply language models for topic identification to the user’s feed and the timeline posts of users connected to them in a one-hop neighborhood. In addition, we run sentiment detectors on these data. If we identify a large proportion of posts with homogeneous topics and sentiment (> 0.85 % of considered posts) we assume this user to be located in an echo chamber, i.e. virtually surrounded by similarly-minded people. However, note that no information is usually available on the actual feed presented to the specific user by the platform. We suppose that if the content is shared by most of their connections it will have high chances to be presented. We thus present this aggregated information on neighborhood posts to the companion users to help evaluate the quality of their feed’s sources as well as have a clearer view of the presence of social media-specific phenomena such as echo chambers and filter bubbles, which are difficult to detect for the users while affecting their experience.

5 Educational Activities and Boosting

In this section, we present the educational activities integrated complementing the companion interface’s nudging functionalities with a boosting side. They aim at raising users’ media literacy [61, 123]. In other words, they focus on improving students’ understanding of social media dynamics and underlying computational mechanisms as well as awareness of their threats, and the strategies to use them conscientiously.

5.1 Narrative Scripts

One of our educational activities adopts the integration of image classifiers within the educational approach of the narrative scripts [55]. The narrative scripts notion combines elements from computer supported collaborative learning script mechanisms and storytelling techniques within a simulated social media platform.

The integration of machine learning tools can further assist learning scenarios covering topics related to body image stereotypes, social media algorithms and filter bubbles. Specifically, students can engage with fictional scenarios explaining the functionality of machine learning algorithms and participate in games demonstrating their effect. The objective of this work is to provide a hands-on experience of how social media algorithms work.

5.2 Education about Echo Chambers

The goal of a second activity is to increase the perception of social media influence and the possible impact of the distortions produced by echo chambers and filter bubbles. We opt for a game-oriented strategy that motivates the students and gives them the opportunity to experience the consequences of information personalisation on decision-making. The game is framed as repeated estimation task where “wisdom of crowds” [17, 73, 92] is leveraged to simulate a bias (towards the correct or wrong direction) of the information filtering system [73]. During this activity, participants are estimating the number of dots in an image and can revise their answers once (after also providing an aggregation of other participants’ answers to them).

The intuition is that direct exposition to consequences of echo chambers and filter bubbles pushes students to being more aware of these mechanisms and their effects (i.e. when biased aggregation distorts users’ unbiased opinions and its explanation). Results from a first study with around 50 students (including a baseline where the estimation task’s results are not shown) confirm that explaining consequences of information personalisation on their performances during the task increase the students’ awareness [72].

5.3 Awareness of Model Misclassifications

To educate teenagers about limitations of machine learning models (as used in our companion), we provide a third activity including an additional web page with examples for prediction results and statistical diagrams showing the models’ average performance. Our objective is to foster the students’ competence in dealing with predictions made by automatic systems, generally speaking a boosting activity [56].

A part of this interface can be seen in Fig. 4. We plan to use it in upcoming experiments to see whether this has positive effects on the social media literacy of teenagers.

Fig. 4
figure 4

Sample predictions of the English emotion prediction model for education

5.4 Trust in AI and Reliance on Machine Learning

A fourth activity focuses on the reliance on machine learning algorithms. We investigate the role of trust in AI and reliance on labelling systems to decorate visual content. Labelling content to signal doubtful content to reduce the spread of misinformation has been proven to be a helpful tool to increase users’ capabilities to deal with fake news on social media [38, 81] but people’s trust in machine learning algorithms can also have a role in visual content misinterpretation.

We label multiple images both with the output from multiple predictors. More specifically we used the BMI, gender, and object detectors presented in Sect. 4.2 to label a set of images showing people. In addition, annotators were asked to annotate the same set of images with the information the models were producing.

The hypothesis is that people who trust more in AI will be more prone to rely on mislabelled content by AI. We present both sets of labels (human and AI generated) to the participants and ask them to select those that are more correct in their opinion. In the experimental condition the participants are presented the labeling methods along with the set of labels while this information is not given in the baseline condition (Fig. 5).

Fig. 5
figure 5

Screenshot of the Trust in AI study (control condition). Participants were requested to select the prediction they trust more

Participants in both conditions are asked to answer a survey related to the trust in AI [80, 124]. We plan to compare selection behavior to understand the role of trust in AI in users’ image selection.

6 Conclusion

Big challenges are arising from social media usage, especially for vulnerable groups of society like teenagers, which we have summarized as part of this work. Methods for addressing these threats have been proposed and we are integrating support for multimodal content and otherwise invisible network-based threats directly into the user feed.

However, it remains an open question to which extent the analysis and visualization of the content lead to more threat awareness among users of social media platforms. As a next step, we plan to conduct controlled user studies together with schools (in Italy, Spain and Germany) to find out how our augmented feed affects teenagers perceiving users’ attitudes and content, e.g. posts, on such platforms.

In addition, several challenges remain in terms of providing efficient, extensive, and reliable machine learning-based user support tools. It is thus important to complement nudging interfaces supported by machine learning, such as our companion, with boosting educational activities to guide students in learning to leverage these tools to develop their own critical attitudes toward social media interactions instead of over relying on them.

7 Ethical Considerations

With the use of personal data and the involvement of vulnerable subjects (e.g. school children) ethical and privacy concerns arise. We strictly follow the corresponding guidelines of our institutions (and ethical approval has been obtained before running any experiments).

We also need to stress that any individual user data (e.g. extracted from the user’s social media feed) is only being used in the interaction with that specific user.