Intellectual dark web, alt-lite and alt-right: Are they really that different? a multi-perspective analysis of the textual content produced by contrarians

Matos, Breno; Lima, Rennan C.; Almeida, Jussara M.; Gonçalves, Marcos A.; Santos, Rodrygo L. T.

doi:10.1007/s13278-023-01187-5

Intellectual dark web, alt-lite and alt-right: Are they really that different? a multi-perspective analysis of the textual content produced by contrarians

Original Article
Published: 25 January 2024

Volume 14, article number 32, (2024)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

Breno Matos¹,
Rennan C. Lima¹,
Jussara M. Almeida¹,
Marcos A. Gonçalves¹ &
…
Rodrygo L. T. Santos¹

199 Accesses
1 Altmetric
Explore all metrics

Abstract

Contrarian groups, notably Intellectual Dark Web, Alt-lite, and Alt-right, are present across the Web, ranging from fringe websites to mainstream social media. Such massive presence raises major concerns as contrarians often engage in the spread of conspiracy theories and hate speech toward particular groups of people. Historically, there is a general sense that these groups exhibit different degrees of extremism, with Alt-right standing out as the most extremist one. In particular, prior work often takes participation in Alt-right communities as a proxy for radicalization. Yet, to which extent are these groups really different? While most previous analyses have focused on a content consumption (i.e., viewer) standpoint, no prior work analyzed these groups (i.e., contrarians) from a content production perspective. Are there significant differences in the content produced by them? Toward tackling this question, we here analyze the textual data associated with videos shared by the three aforementioned groups. Specifically, we analyze 14 years of content produced by contrarians on YouTube with data from 355,000 videos. Firstly, we assess the degree of toxicity of the content created by each contrarian group, comparing them to one another and, for control purposes, against traditional media content. The results show that all contrarian groups have a more skewed toxicity distribution than traditional media. Yet, all three groups exhibit very similar textual toxicity properties. Further analyses based on psycholinguistic properties and semantic (text) classification reinforce the observation that indeed there is great similarity among the content created by all three contrarian groups. These results suggest that, despite the different definitions, the three contrarian groups are indeed much more similar, in terms of the content produced and shared by them, than the general wisdom (and literature) seems to suggest. Moreover, we also identify a significant temporal increase in content toxicity in all three groups, corroborating prior observations regarding the escalation in the harmfulness of online speech over the years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Alternative Media Experience: LiveLeak

Advances in Social Media Research: Past, Present and Future

Article Open access 06 November 2017

User-Generated Short Video Content in Social Media. A Case Study of TikTok

Notes

https://www.perspectiveapi.com/.
https://github.com/brenomatos/contrarians.
https://mediabiasfactcheck.com/.
Further details on Perspective’s attributes are available at https://developers.perspectiveapi.com/s/about-the-api-attributes-and-languages.
Interestingly, differences between Media and the contrarian groups are less noticeable for the Threat attribute, which might be due to nature of the news content often broadcasted by the Media channels.
https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_classif.html.
https://perspectiveapi.com/how-it-works/.
https://huggingface.co/prajjwal1/bert-tiny.
https://huggingface.co/docs/transformers/main_classes/trainer.

References

ADL (2017) Glossary terms alt-right. https://www.adl.org/resources/glossary-terms/alt-right
ADL (2019) From alt right to alt lite: naming the hate. https://www.adl.org/resources/backgrounders/from-alt-right-to-alt-lite-naming-the-hate
Alharthi R (2021) Recognizing hate-prone characteristics of online hate speech targets. In: Companion publication of the 13th ACM web science conference 2021, WebSci ’21 companion, pp 153–155, New York, NY, USA. Association for Computing Machinery. ISBN: 9781450385251. https://doi.org/10.1145/3462741.3466676
Ali S, Saeed MH, Aldreabi E, Blackburn J, De Cristofaro E, Zannettou S, and Stringhini G (2021) Understanding the effect of deplatforming on social networks. In: Proceedings of the 13th ACM web science conference 2021, WebSci ’21, pp 187–195, New York, NY, USA. Association for Computing Machinery. ISBN: 9781450383301. https://doi.org/10.1145/3447535.3462637
Arnold NA, Steer B, Hafnaoui I, H A Parada G, Mondragón RJ, Cuadrado F, and Clegg RG (2021) Moving with the times: Investigating the alt-right network gab with temporal interaction graphs. In: Proceedings of the ACM on human–computer interaction, 5(CSCW21). https://doi.org/10.1145/3479591
Atkinson DC (2018) Charlottesville and the alt-right: a turning point? Politics Groups Identities 6(2):309–315. https://doi.org/10.1080/21565503.2018.1454330
Article Google Scholar
Bartlett J, Miller C (2012) The edge of violence: towards telling the difference between violent and non-violent radicalization. Terror Political Violence 24(1):1–21
Article Google Scholar
Borum R (2011) Radicalization into violent extremism I: a review of social science theories. J Strateg Secur 4(4):7–36
Article Google Scholar
Bryant LV (2020) The youtube algorithm and the alt-right filter bubble. Open Inf Sci 4(1):85–90. https://doi.org/10.1515/opis-2020-0007
Article Google Scholar
Caetano J, Guimarães S, Araújo MMR, Silva M, Reis JCS, Silva APC, Benevenuto F, Almeida JM (2022) Characterizing early electoral advertisements on twitter: a Brazilian case study. In: Hopfgartner F, Jaidka K, Mayr P, Jose J, Breitsohl J (eds) Social informatics. Springer International Publishing, Cham, pp 257–272
Chapter Google Scholar
Chipidza W (2021) The effect of toxicity on Covid-19 news network formation in political subcommunities on reddit: an affiliation network approach. Int J Inf Manag 61:102397. https://doi.org/10.1016/j.ijinfomgt.2021.102397
Article Google Scholar
Coleman A (2022) Fact-checkers label youtube a “major conduit of online disinformation”. https://www.bbc.com/news/technology-59967190
Dalgaard-Nielsen A (2010) Violent radicalization in Europe: What we know and what we do not know? Stud Confl Terror 33(9):797–814
Article Google Scholar
Das S (2023) Laughing bodies and the tickle machine: understanding the youtube pipeline through alt-right humour. J Cult Res, pp 1–15
de Andrade CM, Belém FM, Cunha W, França C, Viegas F, Rocha L, Gonçalves MA (2023) On the class separability of contextual embeddings representations—or “the classifier does not matter when the (text) representation is so good!’’. Inf Process Manag 60(4):103336. https://doi.org/10.1016/j.ipm.2023.103336
Article Google Scholar
de Andrade CM, Belém FM, Cunha W, França C, Viegas F, Rocha L, Gonçalves MA (2023) On the class separability of contextual embeddings representations—or “the classifier does not matter when the (text) representation is so good!’’. Inf Process Manag 60(4):103336. https://doi.org/10.1016/j.ipm.2023.103336
Article Google Scholar
Devlin J, Chang M-W, Lee K, and Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Finlayson A (2021) Neoliberalism, the alt-right and the intellectual dark web. Theory Cult Soc 38(6):167–190
Article Google Scholar
Grant N (2022) Youtube may have misinformation blind spots, researchers say. https://www.nytimes.com/2022/11/05/technology/youtube-misinformation.html
Guimarães SS, Reis JCS, Ribeiro FN, and Benevenuto F (2020) Characterizing toxicity on facebook comments in Brazil. In WebMedia, WebMedia ’20, pp 253–260, New York, NY, USA. ACM. ISBN 9781450381963. https://doi.org/10.1145/3428658.3430974
Hafez M, Mullins C (2015) The radicalization puzzle: a theoretical synthesis of empirical approaches to homegrown extremism. Stud Confl Terror 38(11):958–975
Article Google Scholar
Hoseini M, Melo P, Benevenuto F, Feldmann A, and Zannettou S (2023) On the globalization of the qanon conspiracy theory through telegram. In: Proceedings of the 15th ACM web science conference 2023, WebSci ’23, pp 75–85, New York, NY, USA. Association for Computing Machinery. ISBN 9798400700897. https://doi.org/10.1145/3578503.3583603
Hosseini H, Kannan S, Zhang B, and Poovendran R (2017) Deceiving Google’s perspective api built for detecting toxic comments. arXiv:1702.08138
Hosseinmardi H, Ghasemian A, Clauset A, Mobius M, Rothschild DM, and Watts DJ (2021) Examining the consumption of radical content on youtube. In: Proceedings of the national academy of sciences, 118(32): e2101967118. https://doi.org/10.1073/pnas.2101967118. https://www.pnas.org/doi/abs/10.1073/pnas.2101967118
Ingram M (2018) Most Americans say they have lost trust in the media. https://www.cjr.org/the_media_today/trustin-media-down.php
Jiao X, Yin Y, Shang L, Jiang X, Chen X, Li L, Wang F, and Liu Q (2020) TinyBERT: Distilling BERT for natural language understanding. In EMNLP, pp 4163–4174. ACL. https://doi.org/10.18653/v1/2020.findings-emnlp.372. https://aclanthology.org/2020.findings-emnlp.372
Kelsey D (2020) Archetypal populism: The “Intellectual Dark Web” and the “Peterson Paradox”, pp 171–198. Springer International Publishing, Cham. ISBN: 978-3-030-55038-7. https://doi.org/10.1007/978-3-030-55038-7_7
King M, Taylor DM (2011) The radicalization of homegrown Jihadists: a review of theoretical models and social psychological evidence. Terror Political Violence 23(4):602–622
Article Google Scholar
League A-D (2017) Funding hate: How white supremacists raise their money. New York
Lees A, Tran VQ, Tay Y, Sorensen J, Gupta J, Metzler D, and Vasserman L (2022) A new generation of perspective API: efficient multilingual character-level transformers. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, KDD ’22, pp 3197–3207, New York, NY, USA. Association for Computing Machinery. ISBN: 9781450393850. https://doi.org/10.1145/3534678.3539147
Lewis R (2018) Alternative influence: broadcasting the reactionary right on Youtube
Li HO-Y, Bailey A, Huynh D, Chan J (2020) Youtube as a source of information on Covid-19: a pandemic of misinformation? BMJ Glob Health 5(5):e002604
Article Google Scholar
Lima L, Reis JC, Melo P, Murai F, and Benevenuto F (2020) Characterizing (UN) moderated textual data in social systems. In: ASONAM, pp 430–434. IEEE
Malagoli LG, Stancioli J, Ferreira CHG, Vasconcelos M, Couto da Silva AP, and Almeida JM (2021) A look into covid-19 vaccination debate on twitter. In: 13th ACM web science conference 2021, WebSci ’21, pp 225–233, New York, NY, USA. Association for Computing Machinery. ISBN: 9781450383301. https://doi.org/10.1145/3447535.3462498
Mamié R, Horta Ribeiro M, and West R (2021) Are anti-feminist communities gateways to the far right? Evidence from reddit and youtube. In: Proceedings of the ACM WebSci ’21
Manifest I The intellectual dark web. Undisclosed. https://web.archive.org/web/20190407170300/, http://intellectualdark.website/
Marantz A (2017) The alt-right branding war has torn the movement in two. https://www.newyorker.com/news/news-desk/the-alt-rightbranding-war-has-torn-the-movement-in-two
Matos B, Lima RC, Almeida JM, Gonçalves MA, Santos RLT (2022) On the presence of abusive language in MIS/disinformation. In: Hopfgartner F, Jaidka K, Mayr P, Jose J, Breitsohl J (eds) Social informatics. Springer International Publishing, Cham, pp 292–304 (ISBN 978-3-031-19097-1)
Chapter Google Scholar
McCauley C, Moskalenko S (2008) Mechanisms of political radicalization: pathways toward terrorism. Terrorism and political violence 20(3):415–433
Article Google Scholar
McClernan N (2019) Steven pinker’s right-wing, alt-right & hereditarian connections
Milmo D (2022) Youtube is major conduit of fake news, factcheckers say. https://www.theguardian.com/technology/2022/jan/12/youtube-is-major-conduit-of-fake-news-factcheckers-say
Mittos A, Zannettou S, Blackburn J, De Cristofaro E (2020) “and we will fight for our race!” a measurement study of genetic testing conversations on reddit and 4chan. In: Proceedings of the international AAAI ICWSM 14:452–463
Moffitt B (2023) What was the ‘alt’ in alt-right, alt-lite, and alt-left? on ‘alt’ as a political modifier. Political Stud 00323217221150871. https://doi.org/10.1177/00323217221150871
Morstatter F, Shao Y, Galstyan A, and Karunasekera S (2018) From alt-right to alt-rechts: Twitter analysis of the 2017 german federal election. In Proceedings of the WWW ’18, WWW ’18, pp 621–628, Republic and Canton of Geneva, CHE. WWW ’18. ISBN: 9781450356404. https://doi.org/10.1145/3184558.3188733
Nagle A (2017) Kill All Normies: Online Culture Wars From 4Chan And Tumblr To Trump And The Alt-Right. Zero Books, Alresford, GBR. 1785355430
Neumann PR (2013) Options and strategies for countering online radicalization in the United States. Stud Confl Terror 36(6):431–459
Article Google Scholar
Newman N, Fletcher R, Schulz A, Andi S, Robertson CT, and Nielsen RK (2021) Reuters institute digital news report 2021. Reuters Institute for the study of Journalism
Niu S, Mai C, McKim KG, and McCrickard S (2021) #teamtrees: Investigating how youtubers participate in a social media campaign. In: Proceedings of the ACM on human–computer interaction, 5(CSCW21). https://doi.org/10.1145/3479593
Nouh M, Nurse JR, and Goldsmith M (2019) Understanding the radical mind: identifying signals to detect extremist content on twitter. In: 2019 IEEE international conference on intelligence and security informatics (ISI), pp 98–103. https://doi.org/10.1109/ISI.2019.8823548
Obadimu A, Mead E, Hussain MN, and Agarwal N (2019) Identifying toxicity within Youtube video comment text data. In: SBP-BRiMS ’19, pp 214–223. Springer
O’Malley RL, Holt K, and Holt TJ (2022) An exploration of the involuntary celibate (incel) subculture online. https://doi.org/10.1177/0886260520959625. PMID: 32969306
Ottoni R, Cunha E, Magno G, Bernardina P, Meira Jr W, and Almeida V (2018) Analyzing right-wing Youtube channels: Hate, violence and discrimination. In: Proceedings of ACM WebSci, WebSci ’18, pp 323–332, New York, NY, USA. ACM. ISBN: 9781450355636. https://doi.org/10.1145/3201064.3201081
Pennebaker JW, Boyd RL, Jordan K, and Blackburn K (2015) The development and psychometric properties of liwc2015. Technical report
Resende G, Melo P, Reis JCS, Vasconcelos M, Almeida JM, and Benevenuto F (2019) Analyzing textual (MIS)information shared in Whatsapp groups. In: Proceedings of the 10th ACM conference on web science, WebSci ’19, pp 225–234, New York, NY, USA. Association for Computing Machinery. ISBN: 9781450362023. https://doi.org/10.1145/3292522.3326029
Ribeiro MH, Ottoni R, West R, Almeida VA, and Meira Jr W (2020) Auditing radicalization pathways on Youtube. In: Proceedings of ACM FAT*, pp 131–141
Ribeiro MH, Blackburn J, Bradlyn B, De Cristofaro E, Stringhini G, Long S, Greenberg S, Zannettou S (2021) The evolution of the manosphere across the web. Proc ICWSM 15(1):196–207
Article Google Scholar
Ribeiro MH, Jhaver S, Zannettou S, Blackburn J, Stringhini G, De Cristofaro E, and West R (2021b) Do platform migrations compromise content moderation? Evidence from r/the_donald and r/incels. In: Proceedings of the ACM on human–computer interaction, 5(CSCW2). https://doi.org/10.1145/3476057
Roose K (2019a) The making of a Youtube radical. The New York Times, 8
Roose K (2019b) The making of a Youtube radical. https://www.nytimes.com/interactive/2019/06/08/technology/youtube-radical.html
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7
Article Google Scholar
Rye E, Blackburn J, and Beverly R (2020) Reading in-between the lines: an analysis of dissenter. In: Proceedings of the ACM internet measurement conference, IMC ’20, pp 133–146, New York, NY, USA. Association for Computing Machinery. ISBN: 9781450381383. https://doi.org/10.1145/3419394.3423615
Sap M, Card D, Gabriel S, Choi Y, Smith AN (2019) The risk of racial bias in hate speech detection. In: Proc ACL
Sellars A (2016) Defining hate speech. Berkman Klein Center Research Publication 2016–20:16–48
Tang L, Fujimoto K, Amith MT, Cunningham R, Costantini RA, York F, Xiong G, Boom JA, Tao C (2021) “down the rabbit hole’’ of vaccine misinformation on youtube: network exposure study. J Med Internet Res 23(1):e23262. https://doi.org/10.2196/23262
Article Google Scholar
Tausczik YR, Pennebaker JW (2010) The psychological meaning of words: Liwc and computerized text analysis methods. J Lang Soc Psychol 29(1):24–54
Article Google Scholar
Thorburn J, Torregrosa J, and Panizo Á (2018) Measuring extremism: validating an alt-right twitter accounts dataset. In: IDEAL 2018, pp 9–14. ISBN: 978-3-030-03496-2
Ul Rehman Z, Abbas S, Khan MA, Mustafa G, Fayyaz H, Hanif M, and Saeed MA (2021) Understanding the language of ISIS: an empirical approach to detect radical content on twitter using machine learning. Comput Mater Continua, 66(2)
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(86):2579–2605
Google Scholar
Viegas F, Alvim MS, Canuto SD, Rosa T, Gonçalves MA, Rocha L (2020) Exploiting semantic relationships for unsupervised expansion of sentiment lexicons. Inf Syst 94:101606. https://doi.org/10.1016/j.is.2020.101606
Article Google Scholar
Viegas F, Cunha W, Gomes C, Júnior APDS, Rocha L, and Gonçalves MA (2020b) Cluhtm—semantic hierarchical topic modeling based on cluwords. In: D. Jurafsky, J. Chai, N. Schluter, and J. R. Tetreault, editors, Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020, pp 8138–8150. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.724
Vig J (2019) A multiscale visualization of attention in the transformer model. In: Proceedings of the 57th annual meeting of the association for computational linguistics: system demonstrations, pp 37–42, Florence, Italy. Association for Computational Linguistics. 10.18653/v1/P19-3007. https://www.aclweb.org/anthology/P19-3007
Weiss B, and Winter D (2018) Meet the renegades of the intellectual dark web. The New York Times, 8
Willingham A (2018) Middle school teacher secretly ran white supremacist podcast, says it was satire. CNN News
Winter A (2019) Online Hate: from the Far-Right to the ‘Alt-Right’ and from the Margins to the Mainstream, pp 39–63. Springer International Publishing, Cham. ISBN: 978-3-030-12633-9. https://doi.org/10.1007/978-3-030-12633-9_2
Wolfowicz M, Litmanovitz Y, Weisburd D, Hasisi B (2020) A field-wide systematic review and meta-analysis of putative risk and protective factors for radicalization outcomes. J Quant Criminol 36:407–447
Article Google Scholar
Zannettou S, Bradlyn B, De Cristofaro E, Kwak H, Sirivianos M, Stringini G, and Blackburn J (2018) What is gab: a bastion of free speech or an alt-right echo chamber. In: Proceedings of WWW ’18, WWW ’18, pp 1007–1014, Republic and Canton of Geneva, CHE. WWW ’18. ISBN: 9781450356404. https://doi.org/10.1145/3184558.3191531
Zannettou S, Elsherief M, Belding E, Nilizadeh S, and Stringhini G (2020) Measuring and characterizing hate speech on news &websites. In: Proceedings of the 12th ACM conference on web science, WebSci ’20, pp 125–134, New York, NY, USA. Association for Computing Machinery. ISBN: 9781450379892. https://doi.org/10.1145/3394231.3397902
Zighed DA, Lallich S, and Muhlenbach F (2002) Separability index in supervised learning. In: PKDD, volume 2, pp 475–487. Springer

Download references

Acknowledgements

We thank Ribeiro et al. (2020) for kindly sharing the dataset with us. This work was partially supported by the authors’ individual grants from CNPq, CAPES, and FAPEMIG.

Author information

Authors and Affiliations

Universidade Federal de Minas Gerais, P.O. Box 1212, Belo Horizonte, Brazil
Breno Matos, Rennan C. Lima, Jussara M. Almeida, Marcos A. Gonçalves & Rodrygo L. T. Santos

Authors

Breno Matos
View author publications
You can also search for this author in PubMed Google Scholar
Rennan C. Lima
View author publications
You can also search for this author in PubMed Google Scholar
Jussara M. Almeida
View author publications
You can also search for this author in PubMed Google Scholar
Marcos A. Gonçalves
View author publications
You can also search for this author in PubMed Google Scholar
Rodrygo L. T. Santos
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation and data collection were performed by [BM] . Analyses for RQ1 and RQ2 were performed by [BM], while analyses for RQ3 were performed by [RCL]. The first draft of the manuscript was written by [BM] and [RCL] and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Breno Matos.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix

This appendix aims to provide additional information on some the techniques used in this paper.

1.1 A.1 Perspective attributes

In this section of the Appendix, we will go into more detail on Perspective API. Perspective API scores text on the impact said text may have on the reader and is widely used in the literature, as mentioned in Sect. 3 (Methodology). Perspective can provide scores for many attributes, and in this work, we used Perspective to infer scores for "toxicity," "severe toxicity," "insult," "profanity," "threat," and "inflammatory." Table 7 displays definitions of each attribute. Although Perspective allows users to use experimental (i.e., not thoroughly tested yet), we pertained to attributes used in production. Further details on additional attributes are available in Perspective’s documentation.^{Footnote 7}

Finally, although Perspective’s implementation is not open-source, their team has released information on how the current system was trained and deployed, including the pretraining of the model (Lees et al. 2022).

Table 7 Perspective’s definitions of analyzed attributes

Full size table

1.2 A.2 LIWC

The foundation of Linguistic Inquiry and Word Count (LIWC) stems from extensive scientific research spanning decades, showcasing the capacity of language to offer profound insights into individuals’ psychological states, encompassing emotions, cognitive styles, and social concerns. While some connections are straightforward, like the use of positive words indicating happiness, such as "happy," "excited," and "elated," many relationships between verbal expression and psychology are less apparent. For instance, higher social standing and confidence are linked to elevated use of "you" words and reduced use of "me" words. LIWC relies on decades of empirical research and provides specialized means to comprehend, elucidate, and quantify psychological, social, and behavioral phenomena.

LIWC is a text analysis program that analyzes individual or multiple language files quickly and efficiently. It is designed to be transparent and flexible, allowing users to explore word use in various forms. LIWC is used in research to analyze the ways people use words when communicating, which can provide rich information about their beliefs, fears, thinking patterns, social relationships, and personalities. Further details on how LIWC was built are available in its documentation (Pennebaker et al. 2015). The extensive research employed in developing LIWC motivated us to use it in our methodology. In our work, we employed LIWC to analyze each word of an input text automatically, attributing it to a psycholinguistic class. Then, it calculates the overall frequency of each one of its categories in the input text. We relied on the frequency report returned by LIWC for the analyses of our second research question, implementing minor pre-processing, namely the removal of URLs and covert all text inputs to lowercase.

1.3 A.3 Embeddings

For the fine tuning, the first step is the prepare data for training. Given that, the training data for the fine-tuning is very skewed (see Table 1), with the Media category containing the most entries. To avoid learning biases, we employed an under-sampling strategy to build a balanced subsample of the training set prior to the classification analysis. The sampling strategy randomly selects 17k entries from each category based on the size of the smallest category (Alt-right), resulting in 68k entries for fine-tuning. Rather than focusing on the final accuracy of the classifier, which could benefit from more data, our main interest is in evaluating the model’s capability of discriminating among the categories under similar conditions. The model consists of a classification layer over the BERT Tiny pre-trained model,^{Footnote 8} which is chosen over Vanilla BERT due to resource limitations. The model consists of a classification layer over the BERT Tiny pre-trained model, which has slightly superior classification effectiveness compared to BERT Tiny (Jiao et al. 2020), but has a much higher training cost.

We employ a five-fold cross-validation procedure to assess the classification model’s discriminative capability. Data is split into five partitions, with four used for training and one for testing. The procedure is repeated five times with different training/test partitions, and the reported results are averages over the 5 test partitions. The model is trained for five epochs with 512 as the max input size of tokens, the standard maximum BERT-like model implementations, and a batch size of 16 entries as the maximum allowed due to resource restrictions. Other parameters are the default of the HuggingFace’s trainer,^{Footnote 9} representing standard values. We use the [CLS] token output to capture contextual embedding representations for all entries of the balanced dataset sample. BERT represents a sentence as a sequence of hidden states, which must be reduced to a single vector for downstream tasks. Therefore, BERT prepends a [CLS] token (short for “classification”) at the beginning of each sentence and uses a more straightforward method of taking the hidden state corresponding to the first token.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Matos, B., Lima, R.C., Almeida, J.M. et al. Intellectual dark web, alt-lite and alt-right: Are they really that different? a multi-perspective analysis of the textual content produced by contrarians. Soc. Netw. Anal. Min. 14, 32 (2024). https://doi.org/10.1007/s13278-023-01187-5

Download citation

Received: 17 July 2023
Accepted: 17 December 2023
Published: 25 January 2024
DOI: https://doi.org/10.1007/s13278-023-01187-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Intellectual dark web, alt-lite and alt-right: Are they really that different? a multi-perspective analysis of the textual content produced by contrarians

Abstract

Access this article

Similar content being viewed by others

An Alternative Media Experience: LiveLeak

Advances in Social Media Research: Past, Present and Future

User-Generated Short Video Content in Social Media. A Case Study of TikTok

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

A Appendix

1.1 A.1 Perspective attributes

1.2 A.2 LIWC

1.3 A.3 Embeddings

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Intellectual dark web, alt-lite and alt-right: Are they really that different? a multi-perspective analysis of the textual content produced by contrarians

Abstract

Access this article

Similar content being viewed by others

An Alternative Media Experience: LiveLeak

Advances in Social Media Research: Past, Present and Future

User-Generated Short Video Content in Social Media. A Case Study of TikTok

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

A Appendix

A Appendix

1.1 A.1 Perspective attributes

1.2 A.2 LIWC

1.3 A.3 Embeddings

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation