1 Introduction

The COVID pandemic has been a pervasive issue since 2020, featuring as an almost inescapable topic in various types of discourse. Emerging in late 2019, the new virus had spread globally by early 2020 and drastically changed all aspects of our lives, e.g. how we conduct ourselves, how we dress, how we socialize. Furthermore, it changed healthcare, politics, the economy, education, the work environment, and communication. The virus’ unknown origins, mechanisms, means of spreading, short- and long-term effects on health and other potential impacts superimposed on us a complex and obscure topic replete with many uncertainties. To make sense of it, we have, among other coping mechanisms or adaptive strategies, frequently resorted to metaphors, much like we do when faced with other abstract, obscure and/or complex concepts. According to Conceptual Metaphor Theory (CMT, Johnson 1987; Lakoff & Johnson 1980) and its many ensuing developments (e.g. Semino, 2016; Gentner et al. 2001; Gibbs, 1994, 2017; Kövecses, 2005; Kövecses, 2020; Steen, 2011), metaphors are not only figures of speech on the linguistic level but can act as powerful tools for various communicative and/or cognitive goals. They enable us to present a certain topic (domain) that may be difficult to understand by making an analogy to a more understandable, known, familiar topic (domain). They also allow us to implicitly express emotions and opinions through the metaphor’s implications, e.g. one could just succinctly say COVID wave and mean a lot of COVID-infections that are uncontrollable, hard to treat and constrain, and as such undesirable.

With regard to COVID being a global phenomenon that trickled into people’s personal lives, investigating their (changing) attitudes and reactions to the arrival of the virus, its development, and the measures taken against it would allow us to get insights into both how people were experiencing the situation and also potentially act as a guide for health, media and political communication in the future. The COVID pandemic has been, unsurprisingly, investigated through numerous studies of social media relating to information and misinformation spread, sentiment analysis, human mobility tracking and others (a comprehensive review of studies, datasets and unresolved issues is provided by Huang et al. (2022). Because metaphors are such a powerful tool for representation and understanding, they are understandably also a worthwhile avenue for research, but apart from Wicke and Bolognesi (2021), Wicke and Bolognesi (2020), and Abdo et al. (2020) for English, posts by the general public on social media have not yet been sufficiently analysed for COVID-related metaphors, especially in less-resourced languages such as Slovene.

To shed light on the metaphorical language of social media users, and, foremost, to investigate potential differences between users of different languages and countries, our work focuses on the language in Twitter communication. The dataset consists of more than 2,000 tweets in Slovene and English, which are complemented with more than 4,000 annotations. The choice to use Twitter data was guided by various reasons. Twitter is a frequent choice among researchers, primarily because of the ease-of-access to data. Although Facebook is the most popular platform with the highest number of users globally,Footnote 1 the access to the information through their application programming interface (API)Footnote 2 is much more restricted. Access to Twitter data, on the other hand, is made easy and straightforward through a Twitter APIFootnote 3 that allows developers and researchers to retrieve tweets with all associated metadata. Secondly, as a microblogging service, it is used primarily as a textual medium, whereas others may feature and encourage more visual content such as videos and images. The posts (tweets) are limited in length (now allowing up to 280 characters) which makes the posts brief and comparable with each other. For this study, we applied for Academic access which allows for fetching up to 10 million tweets. Nevertheless, Twitter’s Terms and Conditions stipulate that data collected through the API can only be redistributed in the form of Tweet IDs.

2 Background

Metaphor has been traditionally recognized as a figure of speech on the level of language. In that capacity, metaphor equals (re-)naming, transferring the name of one thing to another thing on the basis of some similarity. In the contemporary view based on Conceptual Metaphor Theory (CMT; Lakoff & Johnson, 1980; 2003), metaphor is seen not just as a linguistic, but a cognitive phenomenon. CMT posits that metaphorical expressions (on the language level) stem from certain recurring patterns, called conceptual metaphors. Metaphor in CMT is defined as a primarily cognitive device that maps concepts and structures from a source domain, which is typically more concrete and familiar, to a target domain which is typically more abstract and unfamiliar. In this way, it allows us to understand one domain of experience in terms of another. Metaphor as a cognitive phenomenon manifest itself in linguistic metaphors (or other forms such as visual metaphors (Steen, 2018). Thus, metaphorical expressions are regarded as surface manifestations of underlying conceptual metaphors. For example, the following sentences contain metaphorical expressions that can be regarded as stemming from the conceptual metaphor LOVE is a JOURNEYFootnote 4:

  • They are at a crossroads in their relationship.

  • This relationship isn’t going anywhere.

  • They’re in a dead-end relationship.Footnote 5

Our basic experiences of journeys and trips allow us to easily understand the situations referred to by the examples. We realize that we have to decide where to turn or how to proceed at a crossroads, we know progression happens by going somewhere, and we know that there is no way forward in a dead-end street.

CMT has been an extremely influential view that boosted metaphor research in the past few decades. However, the latter still faces a lot of challenges. For one, the identification of metaphors in language has for a long time lacked a robust identification procedure that would allow researchers to analyse metaphors in language. For English, a group of researchers has developed MIP (Metaphor Identification Procedure; Pragglejaz Group, 2007) which has later evolved into a more detailed MIPVU (MIP Vrije Universitet, Steen, 2010). However, such identification approaches require a lot of time and effort from annotators, as it involves reading the whole text, separating it into lexical units and only then deciding for each of the units if it is metaphorical or not. To alleviate the effort with the manual annotation, researchers can apply more targeted approaches from corpus linguistics that involve searching for only a specified set of words in the corpus or use automatic computational methods. Another open problem, at the core of metaphor analysis, is the ascription of source and target domains that form conceptual metaphors. The term “domain” is not clearly defined by the originators of CMT, however, in cognitive linguistics, it is defined as ”a coherent area of conceptualization relative to which semantic units may be characterized” (Langacker, 1987, p. 488). The concept can seem quite similar to the concept of lexical field. However, as Cameron (2003) notes, contrary to the latter, domains are “not just a collection of concepts or entities” but also encompass the various meaningful relations between the entities. That is, while lexical fields group words and phrases on a linguistic, lexical level, the concepts the words evoke are grouped and interconnected on a much richer conceptual and cognitive level. It is often unclear at which precise level to formulate these domains that construe conceptual metaphors (Cameron, 2003; Kövecses, 2017; Kövecses, 2020), and some metaphor researchers may instead use other, more specific conceptual constructs such as mental space, scene, frame, script, schema etc. In this study, the metaphorical analysis is made on the general level of domains as defined by Kövecses (2020). That is, we selected broad conceptual domains, i.e. WAR, STORM, TSUNAMI and MONSTER, and captured them via the proxy of a lexical field, a group of lexical units that evoke those domains. At present, we do not try to identify or distinguish between particular frames or other conceptual constructions that may be instantiated via metaphors, which involve conceptually richer information, with specific roles and relations.

3 Related work

In this section, we present previous work that relates to three main aspects of our study: (1) metaphor identification, (2) existing metaphor datasets, and (3) studies of COVID-related metaphors in particular.

3.1 Metaphor identification approaches

Linguistic and conceptual metaphor identification approaches in non-annotated corpora vary in their methods and scope. First, we can differentiate between what Brdar et al. (2020) call ‘census’ and ‘sampling’ approaches. The first take a bottom-up approach, starting from text, and identifying metaphorically used words, either manually or automatically. A completely manual approach, such as the MIPVU procedure (Steen, 2010), involves careful reading of texts in their entirety, separating each text into lexical units, and deciding for each unit if it is used metaphorically or not. This approach is only possible for smaller corpora or by enlisting a large number of annotators. The second, sampling approaches, adopt a top-down perspective. Here some sort of filtering is applied to texts, either by looking for examples based on metaphorical signals,Footnote 6(Goatly, 1997) or by limiting the search to selected conceptual metaphors (or domains) (Stefanowitsch, 2006), and supplementing the results with manual annotation. It can involve searching for source domain vocabulary, searching for target domain vocabulary, or searching for sentences (or other units) containing lexical items from both target and source domains. The latter is considered to provide a good balance between coverage, accuracy, time and effort compared to other manual or semi-automatic approaches (Stefanowitsch, 2006, p. 4).

From a computational perspective, many efforts have been made, especially in English, to develop methods to identify metaphors (or figurative language in general) through more automatic means. Extensive reviews of metaphor processing are provided in Shutova (2011) and Rai and Chakraverty (2020). Earlier approaches include those using hand-coded knowledge (Fass, 1991), language resources (Gedigian et al., 2006; Krishnakumaran & Zhu, 2007), psycholinguistic features such as abstractness of words (Turney et al. 2011), similarity- or relatedness-based clustering (Birke & Sarkar, 2006; Shutova et al. 2010), and topic modelling (Broadwell et al., 2013; Heintz et al., 2013; Strzalkowski et al., 2013). From the development of deep learning with neural networks, the field of automatic metaphor identification has shifted its focus to supervised methods that involve training neural models on metaphor-annotated datasets (e.g. Choi et al., 2021; Do Dinh & Gurevych, 2016; Haagsma & Bjerva, 2016; Liu et al. 2020; Rai et al. 2016; Zayed et al. 2020b). Computational approaches to metaphor identification also differ in their level of processing, which can be carried out on the level of words, relations, specific constructions, or sentences. In the first case, metaphoricity is ascribed to individual words, so the task usually involves labelling every token in the text (as in e.g. Choi et al., 2021; Do Dinh & Gurevych, 2016). In relation-level approaches, groups of syntactically related words are considered, usually containing expressions from both source and target domains. Most approaches, such as those by Shutova et al. (2010) and Shutova et al. (2016) tackle VERB-NOUN relations where the verb is metaphorical, and others such as Tsvetkov et al. (2014), Turney et al. (2011), Gutiérrez et al. (2016), Bizzoni et al. (2017) focus on ADJ-NOUN relations where the adjective is metaphorical. Some address both relation types (Rei et al. 2017; Zayed et al. 2018; Zayed et al., 2020b). However, there are also other common constructional patterns identified in corpus studies (Sullivan, 2013), including copula constructions (NOUN is NOUN, e.g. COVID is war), prepositional constructions (NOUN of NOUN, e.g. wave of poverty), domain constructions where the noun is metaphorical (ADJ NOUN, e.g. political monster). These have attracted only a few computational endeavours (Dodge et al. 2015; Krishnakumaran & Zhu, 2007; Rai & Chakraverty, 2017) despite the usefulness of constructions in determining conceptual domains (Sullivan, 2013).

While the field of figurative language processing has made quite some progress in English and other well-resourced languages, low-resourced languages such as Slovene unfortunately lacks far behind. We are aware of only a few (semi)-automatic approaches. Although not specifically addressing metaphors, Škvorc et al. (2021) construct MICE, a neural model trained to discern figurative or literal usage of idiomatic phrases. Recently, Zwitter et al. (2022) investigate adapting the MICE model by transfer learning and use it to identify sentences containing metaphors in a corpus of migration-related news. In a semi-automatic approach, Brglez et al. (2021) looked for COVID is WAR metaphors in news discourse by extending the lexical field of WAR using word embeddings and thus capturing a wider set of items coming from the source domain. Computational metaphor processing in Slovene is thus still in its early stages, one reason for it being the lack of linguistic resources. In the next chapter, we discuss the availability of datasets in both English and Slovene.

3.2 Metaphor datasets

There is only a small number of metaphor datasets available that can be used either for large-scale linguistic analysis or to train deep-learning-based models for automatic metaphor identification. The subsections below describe the existing English and Slovene metaphor datasets.

3.2.1 English datasets

The largest and most widely used corpus, especially for metaphor identification, is the Vrije Universiteit Amsterdam Metaphor Corpus (VUAMC Steen, 2010). It comprises English texts from four registers and 190,000 words and identifies linguistic metaphors on the word-level of various part-of-speech types (verbs, adjectives, nouns, adverbs, and prepositions). However, it has certain limitations relating to metaphor analysis as it deals with word-level metaphors only. As opposed to relation-level approaches (Zayed et al., 2018) that try to capture both the source domain and target domain in one phrase or syntactic relation, VUAMC only contains annotations for metaphoric expressions of the source domain. It does not relate them to their possible referents (expressions of the target domain) and is also not annotated with conceptual domains. There are a few exceptions of English corpora or datasets that do account for phrase-level metaphors and/or conceptual domains. One group of research outputs includes the five studies (Dodge et al., 2015; Gordon et al., 2015; Levin et al., 2014; Mohler et al., 2016; Shaikh et al., 2014) under the umbrella of a IARPA project that focus on metaphors related to societal issues and governance. Levin et al. (2014) create lists of conventional conceptual metaphors from previous literature and research on metaphor, in which they enumerate various syntactic patterns and lexical markers. This allows them to identify around 7500 English sentences (but also Russian, Spanish, Farsi). Mohler et al. (2016) introduce LLC datasets that were either manually or automatically compiled, that focus on relation-level metaphors. Their approach focuses on so-called metaphoric constructions, which are syntactically related terms within a sentence that could relate to a source and a target domain. For 80,100 such pairs, they provide metaphoricity ratings, polarity and intensity ratings as well as domain mappings for approximately 20,000 metaphoric pairs in English, but also Spanish, Russian and Farsi. The free dataset is reduced to around 9,000 annotated pairs (available upon request). Similarly, Gordon et al. (2015) design an annotation scheme and annotate around 1,500 sentences in detail for ontological categories, frames, frame elements, and affective polarity by combining manual and automatic methods. Shaikh et al. (2014) are one of the rare computational approaches that deal with metaphor in a larger context than a sentence. Based on the selected target topic of interest (such as Democracy), they identify around 189,862 relevant passages (English) and assign metaphoricity and affect ratings to verbs, adjectives and nouns in the context window with the use of topic modelling, dependency parsing, corpus analysis, WordNet and conceptual resources. They also assign various proto-source domains to the metaphors found. Another large resource stemming from the same line of research (although it is not a text corpus that can be used as a dataset) is the MetaNet Wiki repository (Dodge et al., 2015), which was also constructed based on known conceptual metaphors, on the basis of lexical sets and syntactic constructions. A separate endeavour to annotate metaphors with conceptual domains is Shutova and Teufel (2010), where they ascribe source and target domains to verbs in 761 sentences coming from various domains and genres (BNC), altogether 164 verb metaphors.

There are other approaches that try to capture syntactic constructions and deal with metaphor on the level of relations, which would allow easier identification of source and target domain terms. However, the studies below do not (yet) try to assign conceptual metaphors. This line of research usually focuses on one to three constructions, the most common are VERB-NOUN and ADJECTIVE-NOUN constructions, in which the verb or the adjective is metaphorical and the noun acts as the target domain referent. Turney et al. (2011), Tsvetkov et al. (2014), and Gutiérrez et al. (2016) collect adjective-noun constructions in which the adjective is literal or metaphoric. The sets include 1768 and 8592 adjective-noun pairs, respectively. Shutova (2010) constructs a small set of 62 verbs with verb-subject and verb-object relations. To develop automatic metaphor identification, Shutova et al. (2016) adapt the MOH metaphor corpus (Mohammad et al. 2016) to MOH-X with explicit relations of metaphoric verbs to either a subject or a direct object, resulting in 647 verb-noun pairs out of which 316 are labelled metaphorical. Another more recent dataset, also going into the domain of user-generated texts, is Zayed’s tweet dataset (Zayed et al., 2019). It contains around 2,500 tweets with metaphoric verbs paired with their object. Some studies have also adapted the previous existing datasets to fit phrase-level approaches. Parde and Nielsen (2018) created a dataset of phrase-level metaphors sampled from the VUA corpus to provide novelty annotations. It contains around 18,000 metaphoric word pairs (containing V, ADJ, ADV, N, and PP metaphors). Zayed et al. (2020a) extend the dataset by Tsvetkov et al. (2014) with 1,800 tweets to provide context for the original ADJ-N metaphor pairs, and determine the subject or object relation for metaphoric verbs in 6000 sentences taken from VUAMC and TroFi (Birke & Sarkar, 2007).

Another subdomain of metaphor datasets concerns user-generated content on social media. Apart from Zayed et al. (2019) and Zayed et al. (2020a), few other social media datasets exist that involve annotations for metaphors or other types of figurative language in computer-mediated user discourse. The dataset by Ghosh et al. (2015), Li et al. (2014) was constructed for a SemEval2015 task and is split into various figurative categories based on the hashtags of the tweets and expansion with LSA. Of these, 2000 tweets are labelled as metaphoric. Jang et al. (2014) annotate posts from an online breast cancer support group, a forum for gang members and a forum for online course participants (altogether 314 sentences). The work was continued by Jang et al. (2015), which resulted in around 2500 annotated posts with literal and metaphoric uses of 7 selected words. In a more specific vein of research, Yadav et al. (2020) construct a dataset of 3738 depressive tweets, where they primarily annotate examples with depression symptoms but also sarcasm and metaphor.

3.2.2 Slovene metaphor datasets

Slovene, a language spoken by approximately 2 million people, is a less-resourced language, reflected also in its availability of metaphor datasets. Currently, only two metaphor corpora have been published, one was released in 2020 and another one in 2022. The KOMET corpus (Antloga, 2020a) was developed to parallel the effort of the VUAMC corpus (Steen, 2010) in English, and is thus similar in size, genre makeup and annotation schema: it contains around 200,000 words coming from journalistic, fiction and on-line texts. In addition, it also contains semantic/conceptual annotations and a separate label for metonymy. The metaphorically used words are (for the most part) annotated with one of 67 semantic frames. However, the corpus is annotated on a word-level, meaning no connection is made between the expression of the source domain (metaphor) and its target (the expression the metaphor refers to). Another corpus of metaphors released only recently, the G-KOMET corpus (Antloga & Donaj, 2022), is an upgrade of KOMET, as it extends the genre coverage to include spoken texts. Similar to the VUAMC, both of these were designed as a general corpus not specific to a particular topic, and thus allow metaphor analysis on a broader level.

3.3 Research relating to metaphors on COVID

The COVID pandemic has been a difficult, continuous and ever-evolving issue. From a point of view of linguistic and social studies, such events often produce interesting metaphors which give insight into how such situations are experienced and understood. A very common and conventional conceptual metaphor for disease-related events is ILLNESS IS WAR, attested in linguistic studies on Zika (Ribeiro et al. 2018), SARS (Chiang & Duann, 2007; Ibrahim, 2007; Wallis & Nerlich, 2005), AIDS and cancer (Sontag, 1977). The WAR domain is nowadays frequently used for a wide array of topics (Flusberg, Matlock, & Thibodeau, 2018), such as politics, sports, and societal issues. The metaphorical framing of the COVID pandemic and its various developments has also already elicited a lot of linguistic studies: in media (e.g. Brglez et al., 2021; Busso & Tordini, in press; Fernández-Pedemonte et al. 2020; Kalinin, 2021; Zhang et al. 2022), political discourse and health communication (e.g. Castro Seixas, 2021; Charteris-Black, 2021; Papamanoli & Kaniklidou, 2022), children’s books (Muelas-Gil, 2022), scientific articles (Dar, 2021), and, to an extent, also user-generated content such as Twitter (Abdo et al., 2020; Wicke & Bolognesi, 2020, 2021). The studies mostly focus on and reaffirm the predominance of the conceptual metaphor ILLNESS IS WAR. On the other hand, some linguists and social scientists show that many other alternative frames are possible, if not also more suitable (Hanne, 2022; Olza et al. 2021; Pérez-Sobrino et al. 2022; Semino, 2021; Wicke & Bolognesi, 2020).

A large proportion of these studies have investigated communication of COVID by politicians or the media, while less attention has been paid to the linguistic expression and understanding of COVID by the general public, i.e. in user-generated content on social media, with few exceptions. Colak (2022) asked Turkish users of Facebook, Instagram and Twitter, users to provide a post completing the prompt “COVID-19 is like _ because _”. They collected 125 responses and 84 valid metaphors covering a wide array of source domains. COVID was most frequently presented as an unwanted relative, love, an ex-partner, gossip, and cancer. The authors notice they did not observe the most frequent metaphor used by media and politicians at that time, which was “war” or “struggle”. By using the same prompt, Gök Uslu and Kara (2022) collected 210 responses from Turkish participants and collected 43 different metaphors with a wide array of domains. Among the 7 subcategories of metaphors, the most frequent one frames the virus as something deadly/dangerous. They also observe difference in particular frame use depending on the gender and medical history.

In large-scale social media studies that indirectly collected data, Abdo et al. (2020) analyse 14 days of data from Twitter using keywords such as “Corona”, “Coronavirus”, “COVID-19” and their synonyms at the start of the pandemic. Part of their study is also to detect metaphors by comparing the lexis of tweets to the MetaNet repository (Dodge et al. (2015)) of known conceptual metaphors. The most frequent detected metaphor is DISEASE TREATMENT IS WAR. However, the focus of the paper is not on metaphors so the metaphor identification procedure is not clearly explained, secondly, the authors do not investigate other non-conventionalized metaphors. Wicke and Bolognesi (2020) collect English tweets published in the 14-day period in March and April 2020 using a set of COVID-related hashtags. To balance the corpus and make it more representative of the general population, they retain only the first tweet of a user per day. Using LDA topic modelling, they determine the most prevalent themes in the corpus. By compiling a list of 91 war-related words, they identify around 5.0% of tweets that contain WAR framing. By classifying tweets into the LDA-discovered topics, they conclude that the WAR frame is mostly used to talk about the treatment and proposed measures, but lacks presence in topics that refer to more social or personal aspects. They also explore the use of alternative frames STORM, MONSTER and TSUNAMI but find these are much less frequent in their data. However, their study only looks at the “surface” layer of words, collecting frequency data of potentially metaphorical seed words but not deliberating on their actual metaphoricity in context.

In a subsequent study, Wicke and Bolognesi (2021) include a somewhat larger time span of tweets, namely from March 20 to July 1, 2020. They investigate the temporal change of topics related to COVID-19, the sentiment, subjectivity and figurative framing using the WAR frame. In the part focused on figurative framing, they analyse the frequency of war-related lexis overall and in three intervals. Their study finds that the distribution is not constant and that the use of the WAR frame slowly diminishes after time. On the other hand, in the last interval, they also see a rise of specific war-related words but determine these are mostly used literally, relating to real-world violent events in the US.

A limitation to most studies of metaphors in COVID discourse so far, including in particular the discourse on Twitter, is that they are limited to the initial phases of the pandemic, i.e. based on data produced in 2020. In our study, we are also interested in the overall development of metaphors through time (or at least a wider time frame), and expect to see metaphors evolving, dying, adapting, emerging, becoming more or less popular etc. Studies have often also been limited to just one language, more often than not also only one country. As the studies show, metaphor frequency and selection of particular conceptual domains can depend on several factors, from the time period relative to the course of events, type of discourse, individual personal factors such as gender and medical history, to country and culture. For instance, the use of metaphors for the SARS epidemic was different in the countries where the epidemic had stronger effects than in those that experienced it from a distance (Wallis & Nerlich, 2005). It has also been shown that certain frames are generally more likeable than others, so the frames used for COVID differ depending on the specific country context (Brugman et al., 2022). War-related metaphors were more or less avoided in Germany (Jaworska, 2020; Paulus, 2020) as well as in New Zealand, where the government communication relied more on the frames of LEVEL, BUBBLE and TEAM (Kearns, 2021). Our study takes inspiration from the work by Wicke and BolognesiWicke and Bolognesi that took a quantitative approach and used semi-supervised methods to analyse the use and pervasiveness of different conceptual domains in framing the COVID pandemic on Twitter. We complement past findings by overcoming some of the limitations of previous studies: the advantages of our approach are multilinguality and, especially, the inclusion of Slovene as a less-resourced language, more conceptual domains, investigating user-generated content, a wider time-span of data, and distinguishing between different English-speaking countries. Additionally, our relation-level approach focuses on metaphorical expressions where the metaphor is conveyed by an adjective or a noun. Albeit prepositions and verbs are the parts of speech responsible for the largest portion of metaphors according to corpus studies (Antloga, 2020b; Cameron, 2003; Krennmayr & Steen, 2017) they have also been found to be less novel—more conventional (Do Dinh et al. 2018) and less deliberate (Reijnierse et al. 2019).

4 Methodology

In this section, we describe the methodology of compiling the dataset.

Sections 4.1 and 4.2 describe the collection and filtering of tweets for English and Slovene, respectively. In Sect. 4.3, we describe the various issues related to data normalization and cleaning. In Sect. 4.4, we address linguistic processing including tokenization, sentence segmentation as well as lemmatization and part-of-speech tagging with automatic linguistic pipelines. Finally, in Sect. 4.5, we describe our approach of extracting and annotating metaphoric expressions from the dataset.

4.1 English data collection

We employed the publicly available GeoCOV19Tweets Dataset (Lamsal, 2020a; 2021). This is a filtered sample from the original, larger COV19Tweets Dataset (Lamsal, 2020a; 2021) with only tweets that have geolocation information. It consists of IDs of tweets that contained any of the COVID-19-related keywords or hashtags upon their publication on the Twitter platform, starting from March 19, 2020. The initial set of keywords contained the words “corona, #corona, coronavirus, #coronavirus” but was later expanded to include 46 different keywords or hashtags. The cut-off point for our dataset is January 31, 2022. We use the Hydrator (2020) tool to hydrate the tweets via the Twitter API, which includes retrieving the full text of the tweet and all its associated metadata, including tweet text, user ID, geolocation, information on retweets and likes, time of creation etc. Out of the 463,903 IDs provided, 401,452 (86.54%) could be retrieved, the rest being already removed.

In the next step, we process the dataset to ensure approximately the same contributions by different users, following Wicke and Bolognesi (2021) to balance more productive and less productive users. We discard retweets and keep only one tweet per user per day according to the user ID information provided with each tweet. We divide the English dataset, based on the country as identified by the tweet metadata. There are more than 200 countries present in the dataset, but the vast majority of countries contributes less than 1% of the total number of tweets. The distribution of countries is depicted in Fig. 1.

Fig. 1
figure 1

Proportion of tweets by country in the GeoCOV19Tweets dataset

Table 1 English subdataset sizes

We keep only the 9 most productive countries, which also all have English as their official language. Namely, those include the United States, United Kingdom, Canada, India, Australia, South Africa, Philippines, Nigeria, and Ireland. After applying these steps, the English part consists of a total of 259,450 COVID-related tweets. The size of individual country-specific subdatasets is listed in Table 1.

4.2 Slovene data collection

For Slovene, at the time of our study, no consistent public Twitter stream of Slovene-only tweet IDs was available. A straightforward option we investigated was to use the Twitter API and apply a filter to retrieve all the tweets with the Slovene language tag ‘sl’. However, this produced very fuzzy results, yielding a large proportion of non-Slovene tweets. Instead, we accessed tweets obtained by Marko Plahuta,Footnote 7 a Slovene data analyst who has been regularly collecting Slovene tweets of user IDs recognized as Slovene (Kamenarič & Vorkapić, 2022). We then hydrated the tweet IDs using Hydrator to get all the tweet text and associated metadata. The Slovene dataset covers the timespan from March 1, 2020, to January 21, 2022. The stream collected by the researcher yielded 14,754,609 Slovene tweets from that period that could be hydrated. However, we noticed that a lot of users recognized as Slovene may also write complete tweets in other languages (especially English or German). Another step thus consisted of only keeping tweets that are language-tagged as Slovene, which constricted the total to 9,974,340 tweets. Then, in order to narrow down the dataset to only COVID-related tweets, we collect the COVID keywords that were used to populate the English dataset by Lamsal (2021), and append the original list of keywords with manual translations into Slovene. The list, shown in Table 2, contains 137 keywords and 31 phrases related to COVID. The reason for the much larger number of keywords is that Slovene users are prone to code switching, i.e. mix between Slovene and English (Reher & Fišer, 2018), and use international topic markers for such a particularly global phenomenon, which is why we decided a to keep a large portion of words from the original English set (i.e. ‘covid’, ‘vaccine’, ‘pandemic’). Secondly, there were several naming variants possible for some of the keywords, for example the term ‘social distancing’ does not have an unequivocal term in Slovene, and can be translated in various ways: ‘socialna distanca’, ‘socialno distanciranje’, ‘socialna razdalja’, medsebojna razdalja’, ‘ohranjanje razdalje’.

Table 2 Keywords and phrases used to filter the Slovene COVID-only tweets

Another issue with filtering Slovene tweets based on a keyword search is the highly inflectional nature of Slovene and the presence of diacritics. For instance, the basic dictionary form of ‘coronavirus’ in Slovene would be koronavirus. However, with six possible cases and three grammatical numbers, the word may appear in various forms in running text, such as koronavirusa, koronavirusom, koronavirusi. Additionally, some of the keywords we search for have letters with a caron mark (č, š, ž) but we are aware that social media users sometimes avoid writing them and replace them with simple letters (c, s, z).Footnote 8 To facilitate the search for all these forms, the keywords and phrases are lemmatized, and we apply a preprocessing step to lemmatize and normalize the words in the text with the CLASSLAFootnote 9 library (Ljubešić & Dobrovoljc, 2019) to match between the words in the tweets and the words in our list. Secondly, we also allow matches between words with and without diacritics (e.g. the word pljucnica in the tweet matches pljučnica in the list of keywords). After gathering all COVID-related tweets, we discard retweets and keep one tweet per user per day according to the user ID provided in the metadata (as was also done to balance the English dataset). This reduces the size to 350,177 tweets.

4.3 Data cleaning and normalization

Tweets are a form of computer-mediated communication (CMC), a discourse type known for unconventional and non-standard language (Fišer et al., 2018). Apart from non-standard language, such as the use of slang, dialect words, misspelled words, or shortened words, Twitter also features special semiotic signs (e.g. hashtags, emoticons, emojis) and references (e.g. user mentions, URLs). Processing Twitter corpora is thus not a straightforward task and requires calculated decisions, depending on the task at hand.

The data cleaning steps described below are taken for several reasons. Cleaning and normalization are carried out to filter out non-relevant information and tend to improve performance in several NLP tasks (Kaufmann, 2010; Satapathy et al., 2017; van der Goot & Çetinoğlu, 2021). The process can include various steps and depends on the ultimate goal. It can include lexical normalization (e.g. Baldwin et al., 2015), for instance unfolding acronyms and abbreviations to their full form (lol, 2mrw, srsly to laughing out loud, tomorrow, seriously) or deduplicating repeated characters from elongated words (coooool, aaaaaah to cool, aah). For sentiment analysis (e.g. Agarwal et al., 2011; Murshed et al., 2021), data preprocessing may also involve removing stopwords and replacing emoticons with associated emotions in text form. Some researchers opt for complete removal of emojis and newline characters (Alash & Al-Sultany, 2020). Authors may remove just the hashtag symbol from words or remove these instances altogether. Our main aim is to facilitate the identification of relation-level metaphors, meaning syntactic constructions such as NOUN-NOUN or ADJECTIVE-NOUN that contain a metaphorically used word, for example covid monster. To that end, the tweets have to be pre-processed in a way to allow for more accurate linguistic processing from tokenization, sentence segmentation, and lemmatization to part-of-speech tagging. This helps to attain a more comprehensive and accurate identification of metaphors later on. Secondly, we also consider the aspect of using the tweets as training data for a neural model. In such cases, it is customary to anonymize users, remove uninformative content, and reduce the size of the vocabulary (the number of unique tokens) by either removing rare tokens or replacing them with uniform tokens, such as replacing all the various emojis with \(\langle\)EMOJI\(\rangle\).

To clean and normalize the texts, we employ our own custom preprocessing script (code available on GitHub) based on regular expressions. In the first step, we replace certain less frequent tokens with special tokens. Although in some studies and uses of tweets, these less frequent tokens are completely removed, we find retaining them necessary as they can also perform syntactically relevant functions (Arhar Holdt, 2018). Emojis, for instance, can be used instead of sentence-ending punctuation.

We apply the following substitutions, examples of which are depicted in Table 3:

  • anonymize user mentions by changing @user to \(\langle\)USER\(\rangle\)

  • replace links with \(\langle\)URL\(\rangle\)

  • replace isolated numbersFootnote 10 with \(\langle\)NUMBER\(\rangle\),

  • replace hashtag symbols (#) with \(\langle\)HASHTAG\(\rangle\)

  • replace emojis with \(\langle\)EMOJI\(\rangle\),

  • replace other symbols and non-Latin scripts with \(\langle\)SYMBOL\(\rangle\).

With regard to punctuation marks, we first separate them from the rest of the text by adding a whitespace character before and after. This step is applied because users on Twitter may use text shortening strategies, for instance avoiding a space after a comma (Goli et al., 2016). Then, we remove all punctuation that is not considered a mark of clause and sentence segmentation. Namely, we remove punctuation that is not the following: comma, dot, question mark, exclamation mark, semi-colon, colon. We also deduplicate repeating punctuation (e.g. three dots ‘\(\ldots\)’ to one dot ‘.’). The preprocessing script then removes repeated whitespace and newline characters, compresses elongated words by removing character duplicates (aaaaaah to ah), contracts dot-separated abbreviations (U.S. to US). We do not apply other lexical normalization steps related to abbreviations and non-standard spelling.

Table 3 Comparison of the original tweet and the tweet text after normalization

Special attention is paid to hashtags, that is, hashtagged words. Studies for both English (Wikström, 2014) and Slovene (Michelizza, 2018) show that hashtags perform several functions. One of the most frequent ones is categorizing, i.e. providing a topic to which the tweet belongs at the start or the end of the tweet. Moreover, they can also perform other functions, for instance, to emphasise words or to mark their expressiveness. In these cases, the hashtagged words are embedded in the syntax, that is, they are part of the sentences of the tweet text as shown in the example in Table 4.

Table 4 Example of a tweet using hashtags in different functions

These hashtagged words can also be part of metaphor expressions. We consider that using one or two subsequent hashtagged words, followed by other non-hashtagged words in a sentence as ‘ordinary words’. Thus, in these cases, we remove the \(\langle\)HASHTAG\(\rangle\) tokens completely. In other cases, where the hashtags appear after the sentence, meaning they are not followed by ‘ordinary’ words, or if there is a sequence of three or more hashtags, we keep the \(\langle\)HASHTAG\(\rangle\) tokens to act as boundaries between the hashtagged words. On the one hand, this rule aids in augmenting both recall (extracting more metaphoric expressions from sentences) and accuracy (avoiding the extraction of subsequent hashtagged words that are not syntactically related and should not be considered metaphor candidates).

4.4 Linguistic processing

The English tweets were processed by using the Stanza Pipeline (Qi et al., 2020). For Slovene, we use the CLASSLA Pipeline (Ljubešić & Dobrovoljc, 2019). The Slovene pipeline includes two main models, one for standard and another for non-standard language. As Twitter discourse, especially Slovene, frequently consists of non-standard spelling, colloquialisms, slang etc., we use the non-standard model. The authors of the Stanza pipeline report F1 accuracies of 0.95 on UPOS tagging, 0.99 on tokenization, 0.81 on sentence segmentation, and 0.97 on lemmatization. The documentation for the CLASSLA fork for processing non-standard Slovene does not provide detailed evaluation metrics but does note that for standard Slovene, the F1 scores amount to 0.99 for sentence segmentation, 0.99 for lemmatization, and 0.98 for UPOS tagging. The processing inaccuracies can thus result in some propagating errors. Usually, the pipelines are used for the full processing of blocks of texts or documents, from tokenization, sentence splitting, lemmatization, and other tagging tasks. However, because our dataset contains special tokens (such as \(\langle\)HASHTAG\(\rangle\)), the tweets cannot be input to the pipeline as-is, because this would result in unnecessary splitting (“\(\langle\)”, “HASHTAG”, “\(\rangle\)”). To avoid this, we opt out of the tokenization step and use whitespace to tokenize the preprocessed text. Additionally, we also design rules for sentence segmentation that correspond to the specific text of tweet posts. We consider the following as separators:

  • \n line feed and/or \r carriage return

  • Dot [.], question mark [?], exclamation mark [!], ellipsis[\(\ldots\)], pipe [|]

  • Sequence of three or more hashtagged words as a marker of a new sentence

4.5 Domain-driven metaphor extraction

Our method to extract metaphors follows the general methodology of corpus-based linguistics (Stefanowitsch, 2006) in that it starts off with a top-down approach, searching for particular lexical items of interest. Our initial approach consists of searching for tweets containing lexical items from the source domains used in ILLNESS (and COVID) metaphors. We use the already assembled sets by Wicke and Bolognesi (2020) which contain lexical units characterizing the conceptual domains of WAR, STORM, MONSTER and TSUNAMI. These four conceptual domains were chosen for their propensity to be used as source domains for metaphors related to the domain of ILLNESS. One can find equivalent examples in both languages, for instance: fight against HIV/boj proti HIV (WAR), clouds of depression/oblaki depresije (STORM), evil disease/zla bolezen (MONSTER), wave of infections/val okužb (TSUNAMI). However, different to the approach by Wicke and Bolognesi, we lemmatize the seed words (unifying, for example, singular and plural forms), discard some items that do not have a clear semantic domain associated with them (words such as force, surface) as well as overly specific words such as thucydides). We also decide to deduplicate the words that appear as lexical units in two domains (this was the case for the domains STORM and TSUNAMI, sharing words such as wave, flood). Thus our seed set consists of 204 seed words for English (79 WAR, 42 STORM, 35 TSUNAMI, 48 MONSTER). We manually translate the English seed set into a comparable Slovene set of seed keywords, consisting of 225 seed words (85 WAR, 38 STORM, 41 TSUNAMI, 61 MONSTER).Footnote 11 Both of the seed word lists are available in Appendix A. We then search the linguistically-annotated Slovene and English tweets containing any of the seed words or phrases, and thus filter down the tweets that contain potential metaphors from the four conceptual domains.

Table 5 Frequency of tweets containing a lexical unit (LU) from one of the four metaphorical source domains

Table 5 shows the overall number of tweets found through this simple search for lexical units. Table 6 shows the distribution of the four conceptual domains in all the sampled datasets. The WAR domain is the most prevalent, with some countries more inclined to also use alternative domains (UK, Australia and Ireland using less than the average of lexical units from the WAR domain, among which the last two also make use of the TSUNAMI domain more than the other country subsets). Although this approach approximates the numbers in the study done by Wicke and Bolognesi (2020), to determine actual metaphoricity (or, conversely, mere literal use of the seed words), the expressions have to be looked at in context. Namely, during manual annotation of a sample, we observe that in the tweets collected only based on the presence of (potentially metaphoric) seed words, a lot of times the latter are used literally. Furthermore, we find cases where the tweet is not COVID-related in its entirety but mentions COVID-related terms exclusively in the hashtags (as a contextual element). Both of these phenomena can be observed in the following tweet, where the keyword terms ‘rain’ and ‘water’ are used literally, and the COVID keyword ‘covid19’ is only used as an extra contextual marker (emphasis added):

Villagers standing by road eroded by rain water at Budura village near Bhergaon in Udalguri district on Saturday...#flood #road #floodwash #devastation #assamflood #covid19 #bhergaon #udalguri #imagesbyshajid\(\ldots\) https://t.co/PbNoPsQNAA

Table 6 Relative distribution of tweets over the four metaphorical source domains per language/country

Thus, as a second approach, we decide to only look for phrases that better signalize the metaphorical usage of the seed word in relation to COVID. We search for relation-level metaphors in various constructions (Sullivan, 2013), similar to Dodge et al. (2015). Our approach consists of looking for phrases that contain both one of the seed words, which relate to the source domains (WAR, STORM, TSUNAMI, MONSTER), and one of the keywords used to populate the datasets, which relate to the target domain (COVID). The constructions have to satisfy the following regular expression pattern:

figure a

The above expression matches consecutive POS tags and specifies that the sequence must contain at least two words, a noun (NOUN) preceded by either an adjective (ADJ), a noun (NOUN) or a proper noun (PROPN). It also allows extra elements in between such as prepositions (ADP) and determiners (DET), and the possessive particle ’s (PART), thus capturing larger or more complete phrases. The advantage of this approach is clearly shown in Table 7, where only looking for ADJ-NOUN expressions would miss the metaphoric example second wave of the pandemic. The whole process, from the hydration of tweets to the semi-automatic metaphor extraction, is sketched out in Fig. 2 below.

Table 7 Weak phrase-level metaphor annotation rule
Fig. 2
figure 2

Visualization of the methodology

The process yields a much more constrained dataset, finding altogether 2001 metaphoric phrases in 1960 English tweets, and 3375 metaphoric phrases in 3324 Slovene tweets containing such phrases with high potential for metaphoricity. Out of all COVID-related tweets, it reduces the size to approximately 1% and selects only around 15% of the tweets from the initial approach that only used source domain lexical items (Table 8).

Table 8 Subcorpora sizes

5 Manual annotation

Table 9 Examples of the manual annotation procedure
Table 10 Examples of the manual annotation procedure
Table 11 Examples of the manual annotation procedure

To validate our dataset, we also perform manual annotation. To reduce the size and still ensure good coverage, we construct a list of unique phrases, based on their lematized forms. For English, we find 1363 such phrases, and for Slovene, we find 1877. We randomly sample one tweet per each unique phrase. Additionally, to account for more balance in the data and to uncover any other metaphoric expressions not covered by our approach, we also output all other constructions in that same tweet that satisfy the regular expression POS pattern. For both English and Slovene, the procedure was carried out by one person with a background in linguistics and metaphor research. Examples of completed annotation are illustrated in Table 9.

During annotation, the annotator first checked the validity of the POS-annotation. There were a few cases where the verbs were wrongly labelled as nouns or adjectives by the linguistic processing pipelines. In some cases, the automatically extracted phrase had to be corrected. As shown in the example in Table 9, list of corona warriors was shortened to corona warriors because the extra word was not necessary for the disambiguation between literal and figurative use.

Metaphoricity was ascribed to phrases that were clearly metaphorical from the provided context of the tweet.Footnote 12 Additionally, as our focus was COVID-related metaphors, the positive metaphor label ‘y’ was only given to metaphoric expressions directly related to the the pandemic. As shown in the example in Table 9, the otherwise metaphoric expression struggle for workers rights was given a negative metaphor label ‘n’, however, it has an additional label (“not covid rel”) in the Comments column. Moreover, some phrases that were extracted from tweets were metaphors but did not include items from both domains. Such an example is the phrase shelter in the time of storm provided in Table 10. The phrase is metaphorical in the context of the tweet but does not demonstrate its metaphoricity on its own, that is, it could be used literally in another context. In such cases, we still labelled the phrase with a positive ‘y’ label, and supplemented it with a label ‘not relation level’ in the Comments column. Other interesting cases include reverse metaphors (Campbell & Katz, 2006), where the source and target domains are reversed (Table 11). For instance, in pandemic of violence, pandemic refers to the source domain of ILLNESS and violence to the target domain of VIOLENCE. This is in line with findings by Busso and Tordini (in press) in their analysis of Italian newspapers in the first few months of the pandemic, in which they find frequent examples of the domains of HEALTH and ILLNESS being used as a metaphorical source to frame the topics of ECONOMY and SOCIETY. They assert this overlap of conceptual domains mirrors “the intersection across the different aspects of the Covid pandemic: social, economic, and sanitary” (Busso & Tordini, 2021, p. 23). All these additional labels in the Comments column allow for easy modifications of the dataset for the desired use - for instance, including metaphors not related to COVID or excluding those that are not clearly metaphorical on the relation level.

6 Results and discussion

The results of the manual annotation are listed in Tables 12 (English data) and 13 (Slovene data). Both show that our procedure of extracting metaphors on the level of relations was reasonably accurate. In English, 69.45% of potentially metaphoric phrases, i.e. those containing lexical items from both source and target domains (S + T), were indeed confirmed as metaphoric. In Slovene, 81.21% of such phrases were metaphoric.

Table 12 Annotation on the English subcorpora, by phrase components
Table 13 Annotations on the Slovene subcorpus, by phrase components

The approach targeting a larger span of words also proved to be beneficial for identifying additional metaphoric lexis in such phrases, such as eye in eye of the coronavirus storm, dark in dark evil Corona, labirint [eng. labirynth] in labirint boja z virusom [eng. labyrinth of the fight with the virus], or senca [eng. shadow] in senca boja z virusom [eng. shadow of the fight with the virus].

Moreover, as a result of the manual annotation process we discovered other metaphorical phrases not anticipated by our weak annotation rule, that is, those that did not contain lexical items from both the target and any of the source domains. These include, for instance, frontline workers, game-changer in English and zlom zdravstvenih sistemov, kolaboranti virusa, eksplozija okužb [eng. collaborators of the virus, explosion of infections] in Slovene. In the Slovene example below, our POS-matching rule finds the underlined constructions ‘prvem valu’, ‘preklicu epidemije’, ‘odprtju mej podprtju’, ‘hrvaškega turizma’, ‘eksplozija okužb’, ‘sejo vlade’, ‘logike za boj proti epidemiji’. The weak-annotation rule predicts that a metaphor is present in the last phrase, due to it containing both a seed word ‘boj’ [fight] and a COVID-keyword ‘epidemija’ [epidemic]. As a result of the manual annotation, however, we were able to also discover other metaphors, namely ‘prvem valu’ [first wave] and ‘eksplozija okužb’ [explosion of infections].

“<USER> Podpiral sem <USER> v prvem valu. Po preklicu epidemije , odprtju mej podprtju hrvaškega turizma, je bilo jasno da bo eksplozija okužb. Zdaj se obnaša kot, da bi pred vsako sejo vlade zaužili LSD. Zmedeno , brez logike za boj proti epidemiji. Mora vedet, da je zato odgovorna”

<USER>I supported<USER> in the first wave. After revocation of the epidemic , opening borders support of Croatian tourism, it was clear there was to be an explosion of infections. Now she acts as if before each government meeting they took LSD . Confused , without logic for the fight against the epidemic. She has to know, she is responsible for this

In the next example, the POS-matching rule finds the underlined constructions ‘vrhu lestvic’, ‘napačnih krajih RTV’, ‘prizorišča bojev proti Covid19’, ‘prazen bazen’, ‘sedanje koalicije’ [top of the charts, wrong places RTV, sites of fights against Covid19, empty pool]. Only ‘bojev proti Covid19’ [fights against Covid19] would be detected by our weak annotation rule, while the manual annotation also discovered ‘vrhu lestvic’ (umrlih za covidom) [top of the charts (of covid deaths)], a metaphorical and sarcastic framing of the pandemic in the domain of GAMES or COMPETITION.

“<USER> bori se , a žal neuspešno smo na vrhu lestvic umrlih za covidom . \n Ali pa napačnih krajih RTV , STA , policija , NPU . niso prizorišča bojev proti Covid19, mar ne ? \n torej ima skakanje v prazen bazen enak efekt kot vladanje sedanje koalicije ! ”

<USER> they fight , but unfortunately without success we are at the top of the charts of covid deaths . \n Or at wrong places RTV, STA, the police, NPU. are not sites of fights against Covid19,right ? \nso jumping into an empty pool has the same effect that the administration of the current coalition !

Another example of a metaphorical use of the domain of GAMES/COMPETITION can be observed in the English tweet below. Three POS-matching structures can be found in the tweet: ’healthier hand’, ‘potential game changer’and ‘battle against covid’. The weak-annotation rule predicts metaphoricity for ‘battle against covid’, because it contains a seed word ‘battle’ and a COVID-keyword ‘covid’, while the manual annotation allowed us to also detect the metaphoric phrase ‘game changer’.

“ktla5news Could ya not find a healthier hand to demonstrate this potential game changer in the battle against covid? \n <HASHTAG> ugh. \n <HASHTAG> washyourhandsyoufilthyanimal at The Beautiful City Of Temecula! <URL>”

An example of a JOURNEY metaphor can be observed in the following tweet:

“Dokler ne bo tudi vladna stran prevzela del krivde nase v tej korona katastrofi, do takrat bomo capljali na mestu s tendenco nazadovanja . ”

Until the government side accepts a part of the guiltin this corona disaster, we will toddle in place with a tendency to regress.

“Earlier today, we delivered cartons of 3crownsmilk to NCDC office to serve health workers on the front line. We appreciate all the work of the NCDC as we take active steps to win the war against Covid <NUMBER> \n Lagos. <URL>”

Altogether, 65 additional metaphors were found in the English data, and 80 in Slovene. The more recurring themes, found in both languages, can be grouped in the following four categories:

  • GAME/CHALLENGE/SPORTS: game changer, grim record, challenging conditions; konec neslavnih rekordov [end of inglorious records], na vrhu lestvic umrlih za covidom [at the top of the charts of covid deaths]; država na največji preizkušnji [the country before the biggest test/challenge], ekipno sodelovanje [team collaboration], del ekipe [part of the team]

  • JOURNEY/TRAVELLER: bold and drastic steps, active steps, journey of coronavirus; prosta pot [free/unhindered passage], vrnitev epidemije [return of the epidemic], bistveni premiki [important motions], bližnjice [shortcuts],

  • EXPERIMENT/GAMBLE: guinea pigs; korona eksperiment [corona experiment], eksperiment mehkega ukrepanja [soft action-taking experiment], vlečenje slamice [drawing a straw]

  • SUPERNATURAL/FICTION: super heroes, healthcare heroes, live action heroes without capes; čudežno cepivo [miracle vaccine], italijanski scenarij [the Italian scenario], bajke tipa Gates [Gates-like fables], heroji dela [work heroes]

When analysing phrases containing any of the source domain terms (those that contain both source and target domain lexis and those that contain only source domain lexis), the metaphoricity percentage depends on the particular domain involved. As shown in Tables 14 and 15, almost all of the extracted phrases that contained words from the source domains of WAR and TSUNAMI were in fact used metaphorically. Phrases containing WAR-related lexical items were labelled as metaphors in 80.91% cases in English, and 77.63% in Slovene. Phrases, potentially stemming from the source domain of TSUNAMI, were labelled metaphoric in 74.52% of cases in English, and in 93.08% of cases in Slovene. On the contrary, lexical units from the domains of STORM and MONSTER were mostly used literally. Only 19.21% of extracted phrases containing a lexical item from STORM were labelled metaphoric in English and 40.91% in Slovene. For the domain of MONSTER, around 30% of extracted phrases were metaphorical.

Table 14 Annotation on the English subcorpora, by conceptual domain
Table 15 Annotations on the Slovene subcorpus, by conceptual domain

With regard to the syntactic type of constructions identified as metaphors, the most common were NOUN-NOUN metaphors in English, and NOUN-preposition-NOUN in Slovenian. This is in line with previous studies concluding that the most frequent word classes among metaphors are verbs, nouns and prepositions, followed by adjectives, adverbs (Antloga, 2020b; Cameron, 2003; Krennmayr & Steen, 2017). We are not able to provide specific counts of each of the types for two reasons. First, linguistic processing pipelines produce errors, inter alia, in part-of-speech annotation, especially in non-standard text such as tweets. Secondly, many longer constructions were later shortened by hand, but their linguistic annotations (POS-sequences) were not corrected.

Despite the promising results, our work has some limitations. Firstly, the presented dataset is a result of annotation from one person. Due to the inherent interpretative subjectivity that comes with annotating metaphors, the procedure requires clear guidelines and trained annotators which we were unable to promptly procure for the purposes of this experiment. We thus recognize that further annotation campaigns with a greater number of annotators could show different results. Secondly, our approach was limited to target expressions containing adjectives and nouns, which is why it naturally did not capture all metaphoric expressions nor all possible conceptual domains used for COVID-related metaphors. Although we uncovered some additional metaphoric expressions outside the conceptual domains of WAR, STORM, MONSTER or TSUNAMI, those were not annotated with their respective domains or even more specific conceptual frames. Nevertheless, with the out-of-domain metaphors discovered through manual annotation, the dataset can be further supplemented with conceptual annotations, and can also be used as training data for automatic identification approaches. This could provide even more data and a wider array of alternative metaphoric framings, and allow for a more detailed comparison between the metaphors used in different countries. Lastly, the main focus of the study was to investigate user-generated content on social media. However, we also came across tweets that would not constitute strictly user-generated content, that is, tweets written by public entities such as companies and media outlets. We could not distinguish between those as this is not (yet) provided in the metadata of the retrieved tweets.

7 Conclusion

While computer processing of metaphors has garnered quite a lot of attention in English and other more resourced languages, less-resourced languages such as Slovene have been far less explored. Our study brings a new Slovene and English resource that can be used either as a corpus for further mono-lingual or cross-lingual metaphor analysis or as a training dataset to create new models for automatic metaphor identification. The creation of this dataset also furthers research into relation-level metaphoric expressions by extending the scope from merely ADJ-NOUN and VERB-NOUN to other constructions. In our domain-driven metaphor extraction approach, we uncovered various metaphoric phrases that map the source domains of WAR, STORM, MONSTER and TSUNAMI to the topic of COVID. The manual annotation also allowed us to find additional interesting metaphoric expressions that were not anticipated by the initial method. With the out-of-domain metaphors discovered through manual annotation, the dataset can be used as training data for automatic identification approaches and potentially discover other relevant metaphors and conceptual domains. The methodology described in the article can be applied to other languages which do not have existing tools for automatic metaphor identification, and express metaphors in similar or comparable syntactic structures. It does, however, postulate the existence of a linguistic pre-processing pipeline (tokenization, part-of-speech annotation and lemmatization), and, especially for the informal Twitter setting, some method of normalization of non-standard language.

There are a few limitations to the study presented here. First, the manual annotation of metaphors was only performed by one person, so the annotations of this highly subjective phenomenon are not definitive or indisputable. To resolve this issue and corroborate our results, the annotation would have to be carried out again with at least three trained annotators. Secondly, the study was limited to identifying only certain syntactic constructions. However, metaphors can also be expressed through verbs, adverbs and prepositions. To capture a wider set of metaphorical expressions and consequently provide more evidence of metaphorical framing through the various source domains, our approach could be expanded to include phrases involving other parts of speech. Moreover, our metaphor extraction methodology was limited to four potential source domains. To uncover other topics and the associated lexis in the tweets, a possible avenue to explore would be to apply topic modelling techniques such as LDA (Blei et al., 2003) or Top2Vec (Angelov, 2020).

In our future work, we plan to take steps in several directions. The initial aim, which led to the construction of this dataset, was to compare the metaphorical framing of COVID in different parts of the world. With the annotated dataset, we can automatically find other instances of the unique phrases recognized as metaphors in the full English and Slovene COVID datasets. Thus, we plan to make a detailed comparison between different countries and languages both with regard to the metaphor frequency, source domains, and the periods they were used in. In the future, we also plan to train a neural model on the labelled TCMeta dataset to automatically identify relation-level metaphors.