Abstract
Misleading information spread on social networks is often supported by activists who promote this type of information and bots that amplify their visibility. The need for useful and timely mechanisms of credibility assessment in social media has become increasingly indispensable. Efforts to tackle this problem in Spanish are growing. The last years have witnessed many efforts to develop methods to detect fake news, rumors, stances, and bots on the Spanish social web. This work leads to a systematic review of the literature that relates the efforts to develop this area in the Spanish language. The work identifies pending tasks for this community and challenges that require coordination among the leading investigators on the subject.
This is a preview of subscription content, access via your institution.






References
Abonizio HQ, de Morais JI, Tavares GM, Barbon Junior S (2020) Language-independent fake news detection: English, Portuguese, and spanish mutual features. Future Internet 12(5):87
Agirrezabal M (2020) KU-CST at the profiling fake news spreaders shared task. In: Working Notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Al-Zoubi A, Faris H, Alqatawna J, Hassonah M (2018) Evolving Support Vector Machines using Whale Optimization Algorithm for spam profiles detection on online social networks in different lingual contexts. Knowl-Based Syst 153:91–104
Almendros Cuquerella C, Cervantes Rodríguez C (2018) CriCa Team: MultiModal Stance detection in tweets on Catalan 1Oct Referendum (MultiStanceCat). In: Proceedings of the third workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval) colocated with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN), Sevilla, Spain, volume 2150 of CEUR Workshop Proceedings, pp 167–172
Ambrosini L, Nicolò G (2017) Neural models for StanceCat shared task at IberEval 2017. In: CEUR-WS, Conference of 2nd Workshop on Evaluation of Human Language Technologies for Iberian Languages, IberEval 2017, vol 1881, pp 210–216
Aragón ME, Jarquín-Vásquez HJ, Montes-y-Gómez M, Escalante HJ, Pineda LV, Gómez-Adorno H, Posadas-Durán JP, Bel-Enguix G (2020) Overview of MEX-A3T at iberlef 2020: fake news and aggressiveness analysis in Mexican Spanish. In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020), Málaga, Spain, 23 September 2020, volume 2664 of CEUR Workshop Proceedings, pp 222–235. CEUR-WS.org
Arce-Cardenas S, Fajardo-Delgado D, Carmona MÁÁ (2020) Tecnm at MEX-A3T 2020: fake news and aggressiveness analysis in Mexican Spanish. In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020), Málaga, Spain, 23 September 2020, volume 2664 of CEUR Workshop Proceedings, pp 265–272. CEUR-WS.org
Ashraf S, Javed O, Adeel M, Rao H, Nawab M (2019) Bots and gender prediction using language independent stylometry-based approach notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th Working Notes of Conference and Labs of the Evaluation Forum. CLEF, vol 2380
Bacciu A, Morgia M, Mei A, Nemmi E, Neri V, Stefa J (2019) Bot and gender detection of twitter accounts using distortion and LSA notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th Working Notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Bakhteev O, Ogaltsov A, Ostroukhov P (2020) Fake news spreader detection using neural tweet aggregation. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Barbieri F (2017) Shared task on stance and gender detection in tweets on catalan independence—LaSTUS system Description. In: CEUR-WS Conference of 2nd Workshop on Evaluation of Human Language Technologies for Iberian Languages, IberEval, vol 1881, pp 217–221
Barrón-Cedeño A, Elsayed T, Nakov P, Da San Martino G, Hasanain M, Suwaileh R, Haouari F (2020) CheckThat! at CLEF 2020: enabling the automatic identification and verification of claims in social media. In: Conference of 42nd European Conference on IR Research, ECIR, in Lecture Notes in Computer Science, 12036 LNCS. Springer, pp 499–507
Basile V, Bosco C, Fersini E, Nozza D, Patti V, Pardo FMR, Rosso P, Sanguinetti M (2019) Semeval-2019 task 5: multilingual detection of hate speech against immigrants and women in twitter. In: Proceedings of the 13th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2019, Minneapolis, MN, USA, 6–7 June 2019. Association for Computational Linguistics, pp 54–63
Bello HRM, Heilmann L, Ronan E (2020) Detecting fake news spreaders with behavioural, lexical and psycholinguistic features. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Boididou C, Papadopoulos S, Zampoglou M, Apostolidis L, Papadopoulou O, Kompatsiaris Y (2018) Detection and visualization of misleading content on Twitter. Int J Multimed Inf Retrieval 7(1):71–86
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist (TACL) 5:135–146
Bolonyai F, Buda J, Katona E (2019) Bot or not: a two-level approach in author profiling notebook for PAN at CLEF 2019. In: CEUR-WS Conference of 20th Working Notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Bounaama R, Abderrahim M (2019) Tlemcen university: bots and gender profiling task notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th Working Notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Brereton P, Kitchenham BA, Budgen D, Turner M, Khalil M (2007) Lessons from applying the systematic literature review process within the software engineering domain. J Syst Softw 80(4):571–583
Buda J, Bolonyai F (2020) An ensemble model using n-grams and statistical features to identify fake news spreaders on twitter. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Bugueño M, Mendoza M (2020) Learning to combine classifiers outputs with the transformer for text classification. Intell Data Anal 24(S1):15–41
Caled D, Silva M (2019) FTR-18: collecting rumours on football transfer news. In: CEUR-WS, Conference on Information and Knowledge Management Workshops, CIKM, vol 2482
Cardaioli M, Cecconello S, Conti M, Pajola L, Turrin F (2020) Fake news spreaders profiling through behavioural analysis. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Castillo C, Mendoza M, Poblete B (2011) Information credibility on twitter. In: Proceedings of the 20th international conference on World Wide Web, WWW 2011, Hyderabad, India, 28 March–1 April 2011, pp 675–684
Castillo S, Allende-Cid H, Palma W, Alfaro R, Ramos H, Gonzalez C, Elortegui C, Santander P (2019) Detection of bots and cyborgs in Twitter: a study on the Chilean Presidential Election in 2017. In: Conference of 11th international conference on Social Computing and Social Media, SCSM 2019, held as part of the 21st International Conference on Human–Computer Interaction, HCI, in Lecture Notes in Computer Science, LNCS, vol 11578. Springer, pp 311–323
Cegarra-Navarro J-G, Martelo-Landroguez S (2020) The effect of organizational memory on organizational agility: testing the role of counter-knowledge and knowledge application. J Intellect Capital 21(3):459–479
Cer D, Yang Y, Kong S, Hua N, Limtiaco N, John RS, Constant N, Guajardo-Cespedes M, Yuan S, Tar C, Strope B, Kurzweil R (2018) Universal sentence encoder for English. In: Proceedings of the 2018 conference on Empirical Methods in Natural Language Processing, EMNLP 2018: System Demonstrations, Brussels, Belgium, 31 October–4 November 2018. Association for Computational Linguistics, pp 169–174
Chung CK, Pennebaker JW (2012) Linguistic inquiry and word count (liwc): pronounced luke . . . and other useful facts
Clark K, Luong M, Le QV, Manning CD (2020) ELECTRA: pre-training text encoders as discriminators rather than generators. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020
Congosto M, Basanta-Val P, Sanchez-Fernandez L (2017) T-Hoarder: a framework to process Twitter data streams. J Netw Comput Appl 83:28–39
Cresci S (2020) A decade of social bot detection. Commun ACM 63(10):72–83
Cruz FL, Troyano JA, Pontes B, Ortega FJ (2014) Building layered, multilingual sentiment lexicons at synset and lemma levels. Expert Syst Appl 41(13):5984–5994
Cücük D, Can F (2020) Stance detection: a survey. ACM Comput Surv 53(1):1–37
Das KA, Baruah A, Barbhuiya FA, Dey K (2020) Ensemble of ELECTRA for profiling fake news spreaders. In: Cappellato L, Eickhoff C, Ferro N, Névéol A (eds) Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Davis CA, Varol O, Ferrara E, Flammini A, Menczer F (2016) Botornot: a system to evaluate social bots. In: Proceedings of the 25th international conference on World Wide Web, WWW 2016, Montreal, Canada, 11–15 April 2016, Companion volume, pp 273–274
Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, volume 1 (Long and Short Papers), pp 4171–4186
Espinosa D, Gómez-Adorno H, Sidorov G (2019) Bots and gender profiling using character bigrams notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Espinosa DY, Gómez-Adorno H, Sidorov G (2020a) Profiling fake news spreaders using character and words n-grams. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Espinosa MS, Centeno R, Rodrigo Á (2020b) Analyzing user profiles for detection of fake news spreaders on twitter. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Fagni T, Tesconi M (2019) Profiling twitter users using autogenerated features invariant to data distribution notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Fernández JL, Ramírez JAL (2020) Approaches to the profiling fake news spreaders on twitter task in English and Spanish. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Fernquist J (2019) A four feature types approach for detecting bot and gender of twitter users notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Gallagher E, Suárez-Serrato P, Velazquez Richards E (2019) Socialbots whitewashing contested elections; a case study from Honduras. Adv Intell Syst Comput 797:547–552
Gamallo P, Almatarneh S (2019) Naive-Bayesian classification for bot detection in twitter notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
García D, Larriba Flor A (2017) Stance detection at IberEval 2017: a biased representation for a biased problem. In: CEUR-WS, Conference of 2nd workshop on Evaluation of Human Language Technologies for Iberian Languages, IberEval, vol 1881, pp 204–209
Germani F, Biller-Adorno N (2020) The anti-vaccination infodemic on social media: a behavioral analysis. Lancet Digit Health 2(10):504–505
Giachanou A, Ghanem B (2019) Bot and gender detection using textual and stylistic information notebook for pan at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Giachanou A, Rosso P (2020) The battle against online harmful information: the cases of fake news and hate speech. In: CIKM ’20: the 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, 19–23 October 2020. ACM, pp 3503–3504
Giachanou A, Rosso P, Crestani F (2019) Leveraging emotional signals for credibility detection. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, 21–25 July 2019. ACM, pp 877–880
Giglou HB, Razmara J, Rahgouy M, Sanaei M (2020) Lsaconet: a combination of lexical and conceptual features for analysis of fake news spreaders on twitter. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Gishamer F (2019) Using hashtags and pos-tags for author profiling notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
González J-A, Pla F, Hurtado L (2017) ELiRF-UPV at IberEval 2017: stance and gender detection in tweets. In: CEUR-WS, Conference of 2nd workshop on Evaluation of Human Language Technologies for Iberian Languages, IberEval, vol 1881, pp 193–198
González J, Hurtado L, Pla F (2018) ELiRF-UPV at MultiStanceCat 2018. In: Proceedings of the third workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval) colocated with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN), Sevilla, Spain, volume 2150 of CEUR Workshop Proceedings, pp 173–179
Goubin R, Lefeuvre D, Alhamzeh A, Mitrović J, Egyed-Zsigmond E, Ghemmogne Fossi L (2019) Bots and gender profiling using a multi-layer architecture notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Graells-Garrido E, Baeza-Yates R, Lalmas M (2020) Every colour you are: stance prediction and turnaround in controversial issues. In: 12th ACM Conference on Web Science, pp 174–183
Gómez V, Kappen H, Litvak N, Kaltenbrunner A (2013) A likelihood-based framework for the analysis of discussion threads. World Wide Web 16(5–6):645–675
HaCohen-Kerner Y, Manor N, Goldmeier M (2019) Bots and gender profiling of tweets using word and character N-grams notebook for PAN at CLEF
Halvani O, Marquardt P (2019) An unsophisticated neural bots and gender profiling system notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Hashemi A, Zarei MR, Moosavi MR, Taheri M (2020) Fake news spreader identification in twitter using ensemble modeling. notebook for PAN at CLEF 2020. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. In: Proceedings of the 56th annual meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, 15–20 July 2018, volume 1: Long Papers. Association for Computational Linguistics, pp 328–339
Ikae C, Savoy J (2020) Unine at PAN-CLEF 2020: profiling fake news spreaders on twitter. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Jimenez-Villar V, Sánchez-Junquera J, Montes-Y-Gómez M, Villaseñor-Pineda L, Ponzetto S (2019) Bots and gender profiling using masking techniques notebook for pan at clef 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Johansson F (2019) Supervised classification of twitter accounts based on textual content of tweets notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Khaund T, Al-Khateeb S, Tokdemir S, Agarwal N (2018) Analyzing social bots and their coordination during natural disasters. In: Conference of 11th International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction conference and Behavior Representation in Modeling and Simulation, SBP-BRiMS, in Lecture Notes in Computer Science, LNCS, vol 10899. Springer, pp 207–212
Kollanyi B, Howard PN, Woolley SC (2016) Bots and automation over twitter during the first U.S. election. Data Memo 2016.4. Oxford, UK: Project on Computational Propaganda
Koloski B, Pollak S, Skrlj B (2020) Multilingual detection of fake news spreaders via sparse matrix factorization. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Labadie R, Castro-Castro D, Bueno RO (2020) Fusing stylistic features with deep-learning methods for profiling fake news spreader. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Lai M, Cignarella A, Farías D (2017) ITACOS at IberEval2017: detecting stance in Catalan and Spanish tweets. In: CEUR-WS, Conference of 2nd workshop on Evaluation of Human Language Technologies for Iberian Languages, IberEval, vol 1881, pp 185–192
Lai M, Cignarella A, Hernández Farías D, Bosco C, Patti V, Rosso P (2020) Multilingual stance detection in social media political debates. Comput Speech Lang 63:101075
Lichouri M, Abbas M, Benaziz B (2020) Profiling fake news spreaders on twitter based on TFIDF features and morphological process. Notebook for PAN at CLEF 2020. In: Working Notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Liu H, Singh P (2004) Conceptnet—a practical commonsense reasoning tool-kit. BT Technol J 22:211–226
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized BERT pretraining approach. CoRR arXiv:1907.11692
López Á, Martí P (2020) Profiling fake news spreaders on twitter. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
López-Santillán R, González-Gurrola L, Montes-Y-Gómez M, Ramírez-Alonso G, Prieto-Ordaz O (2019) An evolutionary approach to build user representations for profiling of bots and humans in twitter notebook for PaN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Ma J, Gao W, Joty SR, Wong K (2020) An attention-based rumor detection model with tree-structured recursive neural networks. ACM Trans Intell Syst Technol (ACM-TIST) 11(4):42:1-42:28
Magallón Rosa R (2019) Verificado Mexico 2018. Disinformation and fact-checking on electoral campaign [Verificado México (2018) Desinformación y fact-checking en campaña electoral]. Revista de Comunicacion 18(1):234–258
Majumder S, Das D (2020) Detecting fake news spreaders on twitter using universal sentence encoder. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Manna R, Pascucci A, Monti J (2020) Profiling fake news spreaders through stylometry and lexical features. unior NLP @pan2020. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Mendoza M, Poblete B, Castillo C (2010) Twitter under crisis: can we trust what we rt? In: Proceedings of the 1st workshop on Social Media Analytics, SOMA 2010, Washington, USA, 28 June 2010, pp 71–79
Mendoza M, Tesconi M, Cresci S (2020) Bots in social and interaction networks: detection and impact estimation. ACM Trans Inf Syst (TOIS) 39(1):1–32
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J, (2013) Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems, 2013 Proceedings of a meeting held December 5–8, 2013. Lake Tahoe, Nevada, United States, pp 3111–3119
Mohammad S, Turney PD (2013) Crowdsourcing a word-emotion association lexicon. Comput Intell 29(3):436–465
Molina-González MD, Martínez-Cámara E, Martín-Valdivia MT, Perea-Ortega JM (2013) Semantic orientation for polarity classification in Spanish reviews. Expert Syst Appl 40(18):7250–7257
Montañés R, Aznar R, Nogueras S, Segura P, Langarita R, Meléndez E, Peña P, Del Hoyo R (2018) Social media monitoring [Monitorizacion de Social Media]. Procesamiento de Lenguaje Natural 61:177–180
Oliveira R, De Andrade C, Figuerêdo J, Rocha-Junior J, Calumby R, Da Conceição Silva I, Da Silva Neto A (2019) Bot and gender identification: textual analysis of tweets notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Onose C, Nedelcu C-M, Cercel D-C, Trausan-Matu S (2019) A hierarchical attention network for bots and gender profiling notebook for PaN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Pardo FMR, Giachanou A, Ghanem B, Rosso P (2020) Overview of the 8th author profiling task at PAN 2020: profiling fake news spreaders on twitter. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings
Pastor-Galindo J, Zago M, Nespoli P, Bernal SL, Celdrán AH, Pérez MG, Valiente JAR, Pérez GM, Mármol FG (2020a) Spotting political social bots in twitter: a use case of the 2019 Spanish general election. IEEE Trans Netw Serv Manag 17(4):2156–2170
Pastor-Galindo J, Zago M, Nespoli P, Bernal SL, Celdrán AH, Pérez MG, Valiente JAR, Pérez GM, Mármol FG (2020b) Twitter social bots: the 2019 Spanish general election data. Data Brief 32:106047
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp 1532–1543
Petrik J, Chuda D (2019) Bots and gender profiling with convolutional hierarchical recurrent neural network notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Pimentel B, Portugal R (2020) Fake news in Spanish: towards the building of a corpus based on Twitter. Commun Comput Inf Sci (CCIS) 1070:333–339
Pinnaparaju N, Indurthi V, Varma V (2020) Identifying fake news spreaders in social media. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Pizarro J (2019) Using N-grams to detect Bots on Twitter Notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th Working Notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Pizarro J (2020) Using n-grams to detect fake news spreaders on twitter. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Polignano M, De Pinto M, Lops P, Semeraro G (2019) Identification of bot accounts in Twitter using 2D CNNs on user-generated contents notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Posadas-Durán J-P, Gomez-Adorno H, Sidorov G, Escobar J (2019) Detection of fake news in a new corpus for the Spanish language. J Intell Fuzzy Syst 36(5):4868–4876
Przybyła P (2019) Detecting bot accounts on twitter by measuring message predictability notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Rangel F, Rosso P (2019) Overview of the 7th author profiling task at Pan 2019: bots and gender profiling in twitter. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Russo I (2020) Sadness and fear: classification of fake news spreaders content on twitter. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Salazar ME, Tenorio AG, Naranjo ZL (2020) Evaluation of the precision of the binary classification models for the identification of true or false news in Costa Rica. Revista Iberica de Sistemas e Tecnologias de Informacao (RISTI) 2020(E38):156–170
Saralegi X, Vicente IS (2013) Elhuyar at tweet-norm 2013. In: Proceedings of the tweet normalization workshop co-located with 29th conference of the Spanish Society for Natural Language Processing (SEPLN 2013), Madrid, Spain, 20 September 2013, pp 64–68
Segura-Bedmar I (2018) LABDA’s early steps toward multimodal stance detection. In: Proceedings of the third workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval) colocated with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN), Sevilla, Spain, volume 2150 of CEUR Workshop Proceedings, pp 180–186
Shashirekha HL, Balouchzahi F (2020) Ulmfit for twitter fake news spreader profiling. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Shashirekha HL, Anusha MD, Prakash NS (2020) Ensemble model for profiling fake news spreaders on twitter. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Shrestha A, Spezzano F, Joy A (2020) Detecting fake news spreaders in social networks via linguistic and personality features. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Speer R, Chin J, Havasi C (2016) Conceptnet 5.5: an open multilingual graph of general knowledge. CoRR, arXiv:1612.03975
Srinivasarao M, Manu S (2019) Bots and gender profiling using character and word N-grams notebook for PAN at CLEF 2019. In: Conference of 20th working notes of CLEF Conference and Labs of the Evaluation Forum, vol 2380
Suárez-Serrato P, Richards E. Velázquez, Yazdani M (2018) Socialbots supporting human rights. In: AIES—Proceedings AAAI/ACM Conference on AI, Ethics, and Society. Association for Computing Machinery, Inc, Conference of 1st AAAI/ACM—AI, Ethics, and Society, AIES, pp 290–296
Swami S, Khandelwal A, Shrivastava M, Akhtar S (2017) LTRC IIITH at IBEREVAL 2017: stance and gender detection in tweets on catalan independence. In: CEUR-WS, Conference of 2nd workshop on Evaluation of Human Language Technologies for Iberian Languages, IberEval, vol 1881, pp 199–203
Swire-Thompson B, Lazer D (2020) Public health and online misinformation: challenges and recommendations. Annu Rev Public Health 41(1):433–451
Sánchez-Casado N, Cegarra-Navarro J, Tomaseti-Solano E (2015) Linking social networks to utilitarian benefits through counter-knowledge. Online Inf Rev 39(2):179–196
Taulé M, Martí M, Rangel F, Rosso P, Bosco C, Patti V (2017) Overview of the task on stance and gender detection in tweets on catalan independence at IberEval 2017. In: CEUR-WS, Conference of 2nd workshop on Evaluation of Human Language Technologies for Iberian Languages, IberEval, vol 1881, pp 157–177
Taulé M, Rangel F, Martí M Antònia, Rosso P (2018) Overview of the task on multimodal stance detection in Tweets on catalan #1Oct referendum. In: CEUR-WS, Conference of 3rd workshop on Evaluation of Human Language Technologies for Iberian Languages, IberEval, vol 2150, pp 149–166
Tiedemann J (2012) Parallel data, tools and interfaces in OPUS. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey, 23–25 May 2012. European Language Resources Association (ELRA), pp 2214–2218
Valarezo-Cambizaca L-M, Rodríguez-Hidalgo C (2019) Innovation in journalism as an antidote to fake news [La innovación en el periodismo como antídoto ante las fake news]. RISTI Revista Iberica de Sistemas e Tecnologias de Informacao E20:24–35
Valencia A Valencia, Adorno H, Rhodes C, Pineda G (2019) Bots and gender identification based on stylometry of tweet minimal structure and n-grams model notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Van Halteren H (2019) Bot and gender recognition on tweets using feature count deviations Notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Varol O, Ferrara E, Davis CA, Menczer F, Flammini A (2017) Online human-bot interactions: detection, estimation, and characterization. In: Proceedings of the eleventh International Conference on Web and Social Media, ICWSM 2017, Montréal, Québec, Canada, 15–18 May 2017, pp 280–289
Velazquez Richards E, Gallagher E, Suárez-Serrato P (2019) Boostnet: bootstrapping detection of socialbots, and a case study from Guatemala. In: Conference of 33rd National Forum of Statistics, FNE 2018 and 13th Latin-American Congress of Statistical Societies, CLATSE, vol 301. Springer, pp 145–154
Villatoro-Tello E, Ramírez-de-la-Rosa G, Kumar S, Parida S, Motlícek P (2020) Idiap and UAM participation at MEX-A3T evaluation campaign. In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020), Málaga, Spain, 23 September 2020, volume 2664 of CEUR Workshop Proceedings. CEUR-WS.org, pp 252–257
Vinayakumar R, Kumar S Sachin, Premjith B, Prabaharan P, Soman K (2017) Deep stance and gender detection in tweets on catalan independence@Ibereval 2017. In: CEUR-WS, Conference of 2nd Workshop on Evaluation of Human Language Technologies for Iberian Languages, IberEval, vol 1881, pp 222–229
Vogel I, Jiang P (2019) Bot and gender identification in Twitter using word and character n-grams notebook for PAN at CLEF 2019. In: CEUR-WS, Conference of 20th working notes of Conference and Labs of the Evaluation Forum, CLEF, vol 2380
Vogel I, Meghana M (2020) Fake news spreader detection on twitter using character n-grams. In: Working notes of CLEF 2020—Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, volume 2696 of CEUR Workshop Proceedings. CEUR-WS.org
Volkova S, Bell E (2017) Identifying effective signals to predict deleted and suspended accounts on Twitter across languages. In: Proceedings of the 11th International Conference on Web and Social Media, ICWSM. AAAI Press, pp 290–298
Vosoughi S, Roy D, Aral S (2018) The spread of true and false news online. Science 359(6380):1146–1151
Wojatzki M, Zesch T (2017) Neural, non-neural and hybrid stance detection in tweets on catalan independence. In: CEUR-WS, Conference of 2nd workshop on Evaluation of Human Language Technologies for Iberian Languages, IberEval, vol 1881, pp 178–184
Yang Y, Cer D, Ahmad A, Guo M, Law J, Constant N, Ábrego GH, Yuan S, Tar C, Sung Y, Strope B, Kurzweil R (2020) Multilingual universal sentence encoder for semantic retrieval. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics: System Demonstrations, ACL 2020, Online, 5–10 July 2020. Association for Computational Linguistics, pp 87–94
Zaizar-Gutiérrez D, Fajardo-Delgado D, Carmona M Á Á (2020) Itcg’s participation at MEX-A3T 2020: aggressive identification and fake news detection based on textual features for Mexican Spanish. In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020), Málaga, Spain, 23 September 2020, volume 2664 of textitCEUR Workshop Proceedings. CEUR-WS.org, pp 258–264
Zhang X, Ghorbani AA (2020) An overview of online fake news: characterization, detection, and discussion. Inf Process Manag 57(2):102025
Zhou X, Zafarani R (2020) A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Comput Surv (CSUR) 53(5):1–40
Zotova E, Agerri R, Nuñez M, Rigau G (2020) Multilingual stance detection: the catalonia independence corpus, 03
Zubiaga A, Kochkina E, Liakata M, Procter R, Lukasik M (2016) Stance classification in rumours as a sequential task exploiting the tree structure of social media conversations. In: COLING 2016, 26th international conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, 11–16 December 2016, Osaka, Japan, pp 2438–2448
Zubiaga A, Aker A, Bontcheva K, Liakata M, Procter R (2018) Detection and resolution of rumours in social media: a survey. ACM Comput Surv 51(2):32:1-32:36
Acknowledgements
Mr. Mendoza acknowledge funding from the Millennium Institute for Foundational Research on Data. Mr. Mendoza was also funded by ANID PIA/APOYO AFB180002 and ANID FONDECYT 1200211.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Review planning
The review planning step starts by defining search keywords, which will retrieve the first body of literature. These search keywords were defined using logical AND and OR connectors to control the coverage of documents matched by the search system. The search strings used for this process include the three variants of the problem that are the object of this study: stance, rumors, and bots. To locate the Spanish-speaking community’s results, we added the keyword Spanish to each of these terms. We also include Twitter as a keyword, the social media platform that concentrates the most cited studies in English (Zhang and Ghorbani 2020). To avoid restricting the search results to English publications, we also use these search strings in Spanish. The set of search strings used in the review is shown in Table 1.
The search in Scopus was restricted to works published since 2009, ruling out works of rumors not related to this phenomenon’s explosion in social media. The works were restricted to two specific areas of knowledge: Computer Science and Engineering. In this way, the retrieved papers will include works on automatic detection methods, which is the focus of this study.
The review planning process also considers the definition of inclusion/exclusion criteria. These criteria are subsequently used in the literature screening stage, during which the content of the works retrieved during the search phase is reviewed. The works that meet the inclusion criteria and do not match any exclusion criteria are included within the literature’s definitive body. We first define a list of exclusion criteria with four items:
-
Exclusion criteria 1 (ExCr1): When an article appears in more than one search, it will be considered only once. Accordingly, the articles repeated in the search results are eliminated, as well as versions of the same work published in different media (duplication by media).
-
Exclusion criteria 2 (ExCr2): Articles written in a language other than Spanish or English are not considered.
-
Exclusion criteria 3: Reviews (ExCr3-a), editorials (ExCr3-b), notes and erratum (ExCr3-c), and conference reviews ((ExCr3-d)) are not considered.
-
Exclusion criteria 4 (ExCr4): Articles whose title or abstract do not refer to the study (semantic mismatch) are discarded.
The inclusion criteria consider two items:
-
Inclusion criteria 1 (InCr1): Three sections of the work are reviewed. These are abstract, introduction, and conclusion. We verify if the work focuses on solving any of the tasks object of this study in Spanish.
-
Inclusion criteria 2 (InCr2): If there is no conclusive evidence identified when applying inclusion criteria 1, the full article is read. If the work does not address any of the tasks in the Spanish language, the paper is discarded.
1.2 Literature search and screening
The search for papers was carried out during 2020. By applying the search strings to Scopus, we retrieved a total of 4506 documents. This first body of literature was examined, applying exclusion and inclusion criteria defined in this study. Figure 7 shows how many documents were deleted after applying the criteria. The reduction of the initial set is notorious. A total of 4360 documents were eliminated using the exclusion criteria. The remaining 146 documents were analyzed using inclusion criteria. The first inclusion criterion was validated in 102 documents, of which 67 also match the second inclusion criterion. As a result, the first body of literature records 67 documents.
Exclusion/inclusion criteria applied to the documents detected in this SLR. The study considered two stages, the first based on the documents identified using search strings and the second based on the articles that cited the first body of literature. A total of 94 documents met the exclusion/inclusion criteria. Finally, after reviewing the selected documents’ references, 3 more papers were added to the survey validating the exclusion/inclusion criteria
The second body of literature was created by analyzing the works that cite the first body of literature. The citations include related work relevant to these articles, which provides an important source of papers connected to the survey subject that was not detected using search strings. A total of 194 documents were identified in this process, which was reduced to 69 after applying the exclusion criteria, and 27 after applying the inclusion criteria.
Both stages of the systematic review made it possible to identify a total of 94 documents. We conducted an exhaustive review of their references for these documents, looking for works related to this survey subject that had not been detected in the previous two stages. In this last process, three more papers were identified, which passed the exclusion criteria matching both inclusion criteria. In total, the SLR allowed the identification of 97 works related to the subject of this survey. The total number of papers per task is shown in Fig. 8.
1.3 Acronyms
-
Systematic literature review: SLR
-
Bag-of-Words: BOW
-
Part-of-Speech: POS
-
Term Frequency Inverted Document Frequency: TF-IDF
-
Latent Semantic Analysis: LSA
-
Universal Language Model Fine-Tuning: ULMFiT
-
Singular Value Decomposition: SVD
-
Recurrent Neural Network: RNN
-
Bidirectional Encoder Representations based on the Transformer: BERT
-
Supervised Autoencoder: SAE
-
Pointwise Mutual Information: PMI
-
Affective Norms for English Words: AFINN
-
Linguistic Inquiry and Word Count: LIWC
-
Named Entity Recognition: NER
-
Global Vectors for word representation: GloVe
-
Support Vector Machines: SVM
-
Random Forests: RF
-
Logistic Regression: LR
-
Convolutional Neural Networks: CNN
-
Long Short-Term Memory: LSTM
-
Adaptive Boosting: ADABOOST
-
Feed-Forward Neural Networks: FFNN
-
Multinomial Naive Bayes: MNB
-
Document frequency selection: DF
-
Frequently co-occurring entropy: FCE
-
Information Gain: IG
-
Whale optimization: WO
-
Genetic algorithms: GA
-
Particle swarm optimization: PSO
-
Hierarchical Attention Networks: HAN
-
Bidirectional LSTM: Bi-LSTM
-
Gated Recurrent Unit: GRU
-
Spanish Billion Word Corpus and Embeddings: SBWCE
-
The Catalonia Independence Corpus: CIC
Rights and permissions
About this article
Cite this article
Providel, E., Mendoza, M. Misleading information in Spanish: a survey. Soc. Netw. Anal. Min. 11, 36 (2021). https://doi.org/10.1007/s13278-021-00746-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-021-00746-y
Keywords
- Misleading information
- Rumor verification
- Stance classification
- Bot detection
- Fake news