Leveraging Continuous Learning for Fighting Misinformation

Adamopoulou, Evgenia; Alexakis, Theodoros; Peppes, Nikolaos; Daskalakis, Emmanouil; Demestichas, Konstantinos

doi:10.1007/978-3-031-62083-6_27

Evgenia Adamopoulou⁸,
Theodoros Alexakis⁸,
Nikolaos Peppes⁸,
Emmanouil Daskalakis⁸ &
…
Konstantinos Demestichas⁹

Part of the book series: Security Informatics and Law Enforcement ((SILE))

125 Accesses

Abstract

The evolution brought by social media as well as the Internet itself has led to new paradigms in journalism and mass communication. New technologies revolutionize the way humans communicate and get informed about what is happening in the world; in parallel, however, individuals and organizations can exploit these new paradigms to pursue their own agenda. In this light, the spread of misinformation and disinformation can have a variety of negative effects on society, for instance, by creating dipoles in the political dialogue and threatening democracy, putting the health and security of citizens at risk through falsified information, or spreading conspiracy theories about climate change and the environment. To help in the fight against misinformation, the present study focuses on an innovative approach that evaluates and combines the results of different content verification services. This approach, entitled “Meta-Detection Toolset (MDT)” for content verification, consists of an algorithmic consensus (voting) mechanism based on weights, which rewards or penalizes verification services on the basis of their prediction results and the ground truth (verification labels) provided by human domain experts (fact checkers or other). Using a dedicated weight recalculation algorithm, as the feedback of the experts is gradually provided, the weights are recalculated and updated constantly, thus forming a continuous learning procedure for content verification.

You have full access to this open access chapter, Download chapter PDF

Keywords

Introduction

The eruption of digitization and the establishment of social media as a major content production and reproduction means have led to new paradigms of journalism and news spreading. The rapid changes that took place in the last 20 years led to an environment of pluralism without borders, where also many threats are lurking. One of these threats is the rapid spreading of misinformation and disinformation. It has been reported that fake news is spreading even six times faster than credible information [1]. This phenomenon represents a major concern, firstly, for media organizations and professionals, and, secondly, for law enforcement agencies (LEAs) due to the fact that the rapid spread of disinformation can severely threaten several aspects of society. According to the European Commission, the spread of both disinformation and misinformation can have a range of harmful consequences, such as the threatening of our democracies, the polarization of debates, and the setting of the health, security, and environment of EU citizens at risk [2].

As the practices of misinformation and disinformation evolve, it is of utmost importance to design, develop, and engage innovative technologies and solutions in order to tackle such phenomena. In this light, numerous approaches have emerged taking advantage of machine learning (ML) in order to address this problem from different viewpoints. Even though, from a technical perspective, different solutions for fake news detection and identification of misinformation exist, such as transfer learning, multi-task learning, reinforcement learning, and online learning, no universal solution that can address all the aspects of the issue has been developed so far. Almost each and every single solution aims to address the problem in a specific topic or narrow domain and based on a limited dataset.

The purpose of this study is to present an approach that combines and evaluates the results of different machine learning prediction models into a common environment named “Meta-Detection Toolset.” This solution relies on the calculation of a meta-score by using weights-based voting among different “prediction models,” which are referred to herein as “verification services.” The weights of the verification services are constantly updated by the end users of the toolset based on an annotation procedure. This leverages the current solution into a lifelong learning approach that is future-proof and adaptable as the machine learning models improve or deteriorate, through the course of time, and might perform better or worse for different topics or styles of writing.

The remainder of this study is structured as follows: Section “Related Work” contains related works concerning natural language processing applications (e.g., topic selection and language modeling) and lifelong learning studies. Section “Meta-Detection Toolset” presents in detail the proposed Meta-Detection Toolset, and, finally, Section “Conclusion: Future Work” concludes the article and paves the path for future updates of the presented toolset.

Related Work

Lifelong learning (LL) or continuous learning (CL) is an emerging trend in computer science as well as in artificial intelligence. Thus, in the last few years, there has been an upward trend for studies focused on producing systems and solutions based on the concept of LL. The vast majority of dis/misinformation fighting tools are based on machine learning and deep learning algorithms. A comparative analysis of six available state-of-the-art fake news detection tools was made by Giełczyk et al. [3]. This comparison was feasible due to the fact that the datasets used were labeled and this is something rare in real-life conditions. The use of LL comes as an answer to minimize the need of expensive and scarce labeled data. In the domain of dis/misinformation and the trustworthiness of news and articles, LL is at a nascent stage and, thus, this section presents notable LL studies relevant to other fields of text analysis and natural language processing.

Topic identification is an application that can be enabled by LL approaches. More specifically, ML-based models called “topic models” extract hidden structures and correlations from a collection of documents in order to classify similar documents under a common topic. Each topic contains sets of common or contextually related words or characteristics [4]. In this light, Chen et al. [5] proposed a lifelong topic model called non-negative matrix factorization (NMF)—lifelong topic model (LTM). The method of Chen et al. [5] showed better performance compared to other methods after extensive experiments on public corpora. In the same direction, Xu et al. [6] proposed the lifelong learning topic (LLT) model that tries to lift the limitations when there are limited co-occurrences in a dataset. The LLT model is based on the notion of lifelong learning and expands the topic knowledge discovered by learning new word-embeddings based on the topics generated in previous iterations. Another interesting approach to topic modeling and learning was made by Zhang et al. [7]. More specifically, Zhang et al. combined generative adversarial network (GAN) with lifelong learning in their solution named lifelong knowledge-enhanced adversarial neural topic model (LKATM). LKATM discovers topics in documents by using a knowledge extractor that utilizes knowledge distillation and data augmentation in order to transfer prior topic knowledge.

Apart from topic identification, language modeling is another task where LL is exploited to offer state-of-the-art solutions. In this context, Sun et al. [8] proposed the language modeling for lifelong language modeling (LAMOL) framework. Specifically, LAMOL is a language model that learns to solve tasks and, at the same time, generates training samples. The dynamic representations for imbalanced lifelong learning (DRILL) solution is presented by Ahrens et al. [9] and mainly focuses on addressing dataset limitations. In particular, DRILL is defined as a novel lifelong learning architecture for open-domain sequence classification. DRILL is a hybrid architectural and rehearsal-based continuous learning method that utilizes meta-learning and a self-organizing neural architecture in order to adapt to new unseen data while trying to avoid catastrophic forgetting.

Chen et al. [10] proposed an LL solution for sentiment classification based on product reviews. Their approach focused on negative or positive reviews for various products. Each product represented a different task for the LL learner they proposed. He et al. [11] studied and proposed the applicability of an LL model in Weibo rumor detection. Their approach aimed to mitigate the rapid changes happening in online news and rumors as well as the limited availability of data.

To the best of our knowledge and at the time of writing the current study, there is no dedicated LL method or approach to the trustworthiness of news or articles. Thus, the related works presented in this section constitute LL solutions and approaches concerning applications relevant to text and language processing in other domains, such as rumor detection, sentiment detection, language modeling, and topic detection.

Meta-Detection Toolset

The proposed solution of the Meta-Detection Toolset engages different verification services. These diverse verification services serve as predictors of credibility for a given piece of content (typically, an article provided in the form of a URL or text). Based on the integration and implementation of a weighted majority algorithm [12], equal weights are initially assigned to each verification service. During the continuous training process, the weight assigned to a verification service is automatically adjusted, according to the accuracy of its predictions. Verification services with more correct predictions during the training phase are provided with higher weights, thus playing a more significant role when the MDT is calculating the credibility of a certain article. The verification services that the MDT can host may vary, ranging, for instance, from BERT-based models up to sentiment and stylometric analysis models (Fig. 27.1). Also, the input that the MDT can process might come from diverse sources, such news sites, or social media posts (Twitter, Telegram, etc.), as shown in Fig. 27.1.

A schematic diagram of the M D T presents verification services provided including toxic language detection, topic modeling, bot detection, clickbait detection, SPAM detection, sentiment analysis, stylometric analysis, and author attribution for U R L, file or text upload, Twitter feed, and Telegram posts. — **Fig. 27.1**

End users, for example, fact-checkers, also play an active role in the training process. More specifically, end users can insert their credibility evaluation of specific articles (i.e., indicating whether a specific piece of content represents legitimate or fake news). The aforementioned users’ evaluations are provided in the form of ground-truth labels (legitimate/fake), stored in a database, and utilized during the continuous training phase for updating the weights assigned to each verification service. Thus, a growing number of these annotations lead to improved verification results of the Meta-Detection Toolset.

The accumulated experience of the toolset leads to the generation of a model that extensively utilizes contemporary AI technologies for combating the spread of dis/misinformation on the web or in social media. This model is comprised of multiple specialized verification services and has the ability to combine them, aiming to evaluate the truth based on a complex scoring mechanism. This AI-based process is called Meta-Detection and achieves continuous improvement established by annotation processes performed by specialized end-users. In the context of the Meta-Detection Toolset, an integrated management environment of the verification services has been developed, where the Meta-Detection scores are also determined according to the annotations provided by fact-checkers. More specifically, for a specific article, for example, the annotation of a ground-truth label is provided (legitimate/fake) by certified fact-checkers.

As shown in Fig. 27.2, data ingestion can be achieved either at the end users’ side over the HTTPS protocol or by using data connectors (Kafka topics and/or REST APIs). Then, the input data can be consumed by various verification services integrated into or connected with the toolset. Following the completion of the verification services’ computation processes, the prediction results are sent to the MDT and the results are combined in order to compute a meta-score that reflects the credibility of the digital content. The meta-score results are available through endpoints of REST APIs and/or Kafka topics.

A schematic diagram of the meta-detection toolset presents sources 1 to N, Kafka topics, rest A P Is, end users, M D T U I, and web server services, including verification services, meta-score engines, and advanced analytics. — **Fig. 27.2**

As shown in Figs. 27.2 and 27.3, the evaluations of different verification services are combined by the MDT. The annotation process performed by the fact-checkers helps to identify which verification services perform better compared to the rest. These annotations are provided through the Meta-Detection Toolset user interface and better-performing verification services are provided with higher weights. In this way, the MDT enables the knowledge retention from previous evaluations and is capable of updating the weights in a continuous way, leading to a continuous learning paradigm. Last but not least, the system is constantly expanding by evaluating new pieces of content and recalculating the weights based on the experts’ feedback. This entire process is depicted in Fig. 27.3.

A schematic diagram of the continuous learning process of M D T presents experts' input in the search for evaluation results, getting verification services results, retrieving weights, M D T evaluation, and updating weights to update past knowledge or insert new and retrieving past knowledge. — **Fig. 27.3**

Conclusion: Future Work

The work presented in this study (Section “Meta-Detection Toolset”) combines the prediction results from various dis/misinformation prediction models and computes a meta-score that reflects the credibility of the digital content, aiming to achieve continuous improvement based on annotation processes. Through this solution, an aggregation of different ML prediction models is implemented in order to provide more trustful insights on content credibility for news articles. Thus, end users are provided with a reliable indicative score about the credibility of the content under evaluation.

The future steps involve the expansion of the Meta-Detection Toolset so as to integrate more verification services that could work on different data formats, such as pictures, video, voice, and other types as well, in order to assess their credibility. In addition to the legitimate/fake annotations, the future steps of the Meta-Detection Toolset focus on enabling end users to also provide annotations of the type of news included in a URL (e.g., political news and sports) or even insert a news category annotation of their own choice. For each category of news (including the user-defined categories), a distinct set of verification services’ weights will be calculated. This aims to improve the predictions of the MDT, as more annotations arrive over time. It will also enhance the proposed solution with the ability of learning new tasks (credibility evaluation of additional content categories), initially unknown to it.

References

Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359, 1146–1151. https://doi.org/10.1126/science.aap9559
Article ADS Google Scholar
European Commission Tackling Online Disinformation Available online: https://digital-strategy.ec.europa.eu/en/policies/online-disinformation. Accessed on 18 April 2023.
Giełczyk, A., Wawrzyniak, R., & Choraundefined, M. (2019). Evaluation of the existing tools for fake news detection. In Proceedings of the computer information systems and industrial management: 18th international conference, CISIM 2019, Belgrade, Serbia, September 19–21, 2019, proceedings (pp. 144–151). Springer-Verlag.
Chapter Google Scholar
Khan, M. T., & Khalid, S. (2018). Trends and challenges in lifelong machine learning topic models. In Proceedings of the 2018 international conference on computing, electronic and electrical engineering (ICE Cube) (pp. 1–6).
Google Scholar
Chen, Y., Wu, J., Lin, J., Liu, R., Zhang, H., & Ye, Z. (2020). Affinity regularized non-negative matrix factorization for lifelong topic modeling. IEEE Transactions on Knowledge and Data Engineering, 32, 1249–1262. https://doi.org/10.1109/TKDE.2019.2904687
Article Google Scholar
Xu, M., Yang, R., Harenberg, S., & Samatova, N. F. (2017). A lifelong learning topic model structured using latent embeddings. In Proceedings of the 2017 IEEE 11th international conference on semantic computing (ICSC) (pp. 260–261).
Google Scholar
Zhang, X., Rao, Y., & Li, Q. (2022). Lifelong topic modeling with knowledge-enhanced adversarial network. World Wide Web, 25, 219–238. https://doi.org/10.1007/s11280-021-00984-2
Article ADS Google Scholar
Sun, F.-K., Ho, C.-H., & Lee, H.-Y. (2019). LAMOL: Language modeling for lifelong language learning. arXiv preprint arXiv:1909.03329.
Google Scholar
Ahrens, K., Abawi, F., & Wermter, S. (2021). DRILL: Dynamic representations for imbalanced lifelong learning. In CoRR. abs/2105.08445.
Google Scholar
Chen, Z., Ma, N., & Liu, B. (2018). Lifelong learning for sentiment classification. In CoRR. abs/1801.02808.
Google Scholar
He, X., Tuerhong, G., Wushouer, M., & Xin, D. (2022). Rumors detection based on lifelong machine learning. IEEE Access, 10, 25605–25620. https://doi.org/10.1109/ACCESS.2022.3152842
Article Google Scholar
Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108, 212–261. https://doi.org/10.1006/inco.1994.1009
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Communication and Computer Systems, School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
Evgenia Adamopoulou, Theodoros Alexakis, Nikolaos Peppes & Emmanouil Daskalakis
Department of Agricultural Economics and Rural Development, Agricultural University of Athens, Athens, Greece
Konstantinos Demestichas

Authors

Evgenia Adamopoulou
View author publications
You can also search for this author in PubMed Google Scholar
Theodoros Alexakis
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaos Peppes
View author publications
You can also search for this author in PubMed Google Scholar
Emmanouil Daskalakis
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos Demestichas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Evgenia Adamopoulou .

Editor information

Editors and Affiliations

Satways Ltd, Irakleio, Greece
Ilias Gkotsis
M4D, CERTH-ITI, Ilioupoli, Greece
Dimitrios Kavallieros
Bulgarian Defence Institute, Sofia, Sofiya, Bulgaria
Nikolai Stoianov
M4D, CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Satways Ltd, Irakleio, Greece
Dimitrios Diagourtas
CENTRIC, Sheffield Hallam University, SHEFFIELD, UK
Babak Akhgar

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Adamopoulou, E., Alexakis, T., Peppes, N., Daskalakis, E., Demestichas, K. (2025). Leveraging Continuous Learning for Fighting Misinformation. In: Gkotsis, I., Kavallieros, D., Stoianov, N., Vrochidis, S., Diagourtas, D., Akhgar, B. (eds) Paradigms on Technology Development for Security Practitioners. Security Informatics and Law Enforcement. Springer, Cham. https://doi.org/10.1007/978-3-031-62083-6_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-62083-6_27
Published: 30 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-62082-9
Online ISBN: 978-3-031-62083-6
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics