Advertisement

Irony Detection in a Multilingual Context

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12036)

Abstract

This paper proposes the first multilingual (French, English and Arabic) and multicultural (Indo-European languages vs. less culturally close languages) irony detection system. We employ both feature-based models and neural architectures using monolingual word representation. We compare the performance of these systems with state-of-the-art systems to identify their capabilities. We show that these monolingual models trained separately on different languages using multilingual word representation or text-based features can open the door to irony detection in languages that lack of annotated data for irony.

Keywords

Irony detection Social media Multilingual embeddings 

1 Motivations

Figurative language makes use of figures of speech to convey non-literal meaning [2, 16]. It encompasses a variety of phenomena, including metaphor, humor, and irony. We focus here on irony and uses it as an umbrella term that covers satire, parody and sarcasm.

Irony detection (ID) has gained relevance recently, due to its importance to extract information from texts. For example, to go beyond the literal matches of user queries, Veale enriched information retrieval with new operators to enable the non-literal retrieval of creative expressions [40]. Also, the performances of sentiment analysis systems drastically decrease when applied to ironic texts [5, 19]. Most related work concern English [17, 21] with some efforts in French [23], Portuguese [7], Italian [14], Dutch [26], Hindi [37], Spanish variants [31] and Arabic [11, 22]. Bilingual ID with one model per language has also been explored, like English-Czech [32] and English-Chinese [38], but not within a cross-lingual perspective.

In social media, such as Twitter, specific hashtags (#irony, #sarcasm) are often used as gold labels to detect irony in a supervised learning setting. Although recent studies pointed out the issue of false-alarm hashtags in self-labeled data [20], ID via hashtag filtering provides researchers positive examples with high precision. On the other hand, systems are not able to detect irony in languages where such filtering is not always possible. Multilingual prediction (either relying on machine translation or multilingual embedding methods) is a common solution to tackle under-resourced languages [6, 33]. While multilinguality has been widely investigated in information retrieval [27, 34] and several NLP tasks (e.g., sentiment analysis [3, 4] and named entity recognition [30]), no one explored it for irony.

We aim here to bridge the gap by tackling ID in tweets from both multilingual (French, English and Arabic) and multicultural perspectives (Indo-European languages whose speakers share quite the same cultural background vs. less culturally close languages). Our approach does not rely either on machine translation or parallel corpora (which are not always available), but rather builds on previous corpus-based studies that show that irony is a universal phenomenon and many languages share similar irony devices. For example, Karoui et al. [24] concluded that their multi-layer annotated schema, initially used to annotate French tweets, is portable to English and Italian, observing relatively the same tendencies in terms of irony categories and markers. Similarly, Chakhachiro [8] studies irony in English and Arabic, and shows that both languages share several similarities in the rhetorical (e.g., overstatement), grammatical (e.g., redundancy) and lexical (e.g., synonymy) usage of irony devices. The next step now is to show to what extent these observations are still valid from a computational point of view. Our contributions are:
  1. I.

    A new freely available corpus of Arabic tweets manually annotated for irony detection1.

     
  2. II.

    Monolingual ID: We propose both feature-based models (relying on language-dependent and language-independent features) and neural models to measure to what extent ID is language dependent.

     
  3. III.

    Cross-lingual ID: We experiment using cross-lingual word representation by training on one language and testing on another one to measure how the proposed models are culture-dependent. Our results are encouraging and open the door to ID in languages that lack of annotated data for irony.

     

2 Data

Arabic dataset (Ar = 11, 225 tweets). Our starting point was the corpus built by [22] that we extended to different political issues and events related to the Middle East and Maghreb that hold during the years 2011 to 2018. Tweets were collected using a set of predefined keywords (which targeted specific political figures or events) and containing or not Arabic ironic hashtags Open image in new window2. The collection process resulted in a set of 6, 809 ironic tweets (I) vs. 15, 509 non ironic (NI) written using standard (formal) and different Arabic language varieties: Egypt, Gulf, Levantine, and Maghrebi dialects.

To investigate the validity of using the original tweets labels, a sample of 3, 000 I and 3, 000 NI was manually annotated by two Arabic native speakers which resulted in 2, 636 I vs. 2, 876 NI. The inter-annotator agreement using Cohen’s Kappa was 0.76, while the agreement score between the annotators’ labels and the original labels was 0.6. Agreements being relatively good knowing the difficulty of the task, we sampled 5, 713 instances from the original unlabeled dataset to our manually labeled part. The added tweets have been manually checked to remove duplicates, very short tweets and tweets that depend on external links, images or videos to understand their meaning.

French dataset (Fr = 7, 307 tweets). We rely on the corpus used for the DEFT 2017 French shared task on irony [5] which consists of tweets relative to a set of topics discussed in the media between 2014 and 2016 and contains topic keywords and/or French irony hashtags (#ironie, #sarcasme). Tweets have been annotated by three annotators (after removing the original labels) with a reported Cohen’s Kappa of 0.69.

English dataset (En = 11, 225 tweets). We use the corpus built by [32] which consists of 100, 000 tweets collected using the hashtag #sarcasm. It was used as benchmark in several works [13, 18]. We sliced a subset of approximately 11, 200 tweets to match the sizes of the other languages’ datasets.

Table 1 shows the tweet distribution in all corpora. Across the three languages, we keep a similar number of instances for train and test sets to have fair cross-lingual experiments as well (see Sect. 4). Also, for French, we use the original dataset without any modification, keeping the same number of records for train and test to better compare with state-of-the-art results. For the classes distribution (ironic vs. non ironic), we do not choose a specific ratio but we use the resulted distribution from the random shuffling process.
Table 1.

Tweet distribution in all corpora.

# Ironic

# Not-Ironic

Train

Test

Ar

6, 005

5, 220

10, 219

1, 006

Fr

2, 425

4, 882

5, 843

1, 464

En

5, 602

5, 623

10, 219

1, 006

3 Monolingual Irony Detection

It is important to note that our aim is not to outperform state-of-the-art models in monolingual ID but to investigate which of the monolingual architectures (neural or feature-based) can achieve comparable results with existing systems. The result can show which kind of features works better in the monolingual settings and can be employed to detect irony in a multilingual setting. In addition, it can show us to what extend ID is language dependent by comparing their results to multilingual results. Two models have been built, as explained below. Prior to learning, basic preprocessing steps were performed for each language (e.g., removing foreign characters, ironic hashtags, mentions, and URLs).

Feature-Based Models. We used state-of-the-art features that have shown to be useful in ID: some of them are language-independent (e.g., punctuation marks, positive and negative emoticons, quotations, personal pronouns, tweet’s length, named entities) while others are language-dependent relying on dedicated lexicons (e.g., negation, opinion lexicons, opposition words). Several classical machine learning classifiers were tested with several feature combinations, among them Random Forest (RF) achieved the best result with all features.

Neural Model with Monolingual Embeddings. We used Convolutional Neural Network (CNN) network whose structure is similar to the one proposed by [25]. For the embeddings, we relied on AraVec [36] for Arabic, FastText [15] for French, and Word2vec Google News [29] for English3. For the three languages, the size of the embeddings is 300 and the embeddings were fine-tuned during the training process. The CNN network was tuned with 20% of the training corpus using the Hyperopt4 library.

Results. Table 2 shows the results obtained when using train-test configurations for each language. For English, our results, in terms of macro F-score (F), were not comparable to those of [32, 39], as we used 11% of the original dataset. For French, our scores are in line with those reported in state of the art (cf. best system in the irony shared task achieved \(F=78.3\) [5]). They outperform those obtained for Arabic (\(A=71.7\)) [22] and are comparable to those recently reported in the irony detection shared task in Arabic tweets [11, 12] (\(F=84.4\)). Overall, the results show that semantic-based information captured by the embedding space are more productive comparing to standard surface and lexicon-based features.
Table 2.

Results of the monolingual experiments (in percentage) in terms of accuracy (A), precision (P), recall (R), and macro F-score (F).

Arabic

French

English

A

P

R

F

A

P

R

F

A

P

R

F

RF

68.0

67.0

82.0

68.0

68.5

71.7

87.3

61.0

61.2

60.0

70.0

61.0

CNN

80.5

79.1

84.9

80.4

77.6

68.2

59.6

73.5

77.9

74.6

84.7

77.8

4 Cross-lingual Irony Detection

We use the previous CNN architecture with bilingual embedding and the RF model with surface features (e.g., use of personal pronoun, presence of interjections, emoticon or specific punctuation)5 to verify which pair of the three languages: (a) has similar ironic pragmatic devices, and (b) uses similar text-based pattern in the narrative of the ironic tweets. As continuous word embedding spaces exhibit similar structures across (even distant) languages [28], we use a multilingual word representation which aims to learn a linear mapping from a source to a target embedding space. Many methods have been proposed to learn this mapping such as parallel data supervision and bilingual dictionaries [28] or unsupervised methods relying on monolingual corpora [1, 10, 41]. For our experiments, we use Conneau et al.’s approach as it showed superior results with respect to the literature [10]. We perform several experiments by training on one language (\(lang_1\)) and testing on another one (\(lang_2\)) (henceforth \(lang_1\rightarrow lang_2\)). We get 6 configurations, plus two others to evaluate how irony devices are expressed cross-culturally, i.e. in European vs. non European languages. In each experiment, we took 20% from the training to validate the model before the testing process. Table 3 presents the results.
Table 3.

Results of the cross-lingual experiments.

Train\(\,\rightarrow \,\)Test

CNN

RF

A

P

R

F

A

P

R

F

Ar\(\,\rightarrow \,\)Fr

60.1

37.2

26.6

51.7

47.03

29.9

43.9

46.0

Fr\(\,\rightarrow \,\)Ar

57.8

62.9

45.7

57.3

51.11

61.1

24.0

54.0

Ar\(\,\rightarrow \,\)En

48.5

26.5

17.9

34.1

49.67

49.7

66.2

50.0

En\(\,\rightarrow \,\)Ar

56.7

57.7

62.3

56.4

52.5

58.6

38.5

53.0

Fr\(\,\rightarrow \,\)En

53.0

67.9

11.0

42.9

52.38

52.0

63.6

52.0

En\(\,\rightarrow \,\)Fr

56.7

33.5

29.5

50.0

56.44

74.6

52.7

58.0

(En/Fr)\(\,\rightarrow \,\)Ar

62.4

66.1

56.8

62.4

55.08

56.7

68.5

62.0

Ar\(\,\rightarrow \,\)(En/Fr)

56.3

33.9

09.5

42.7

59.84

60.0

98.7

74.6

From a semantic perspective, despite the language and cultural differences between Arabic and French languages, CNN results show a high performance comparing to the other languages pairs when we train on each of these two languages and test on the other one. Similarly, for the French and English pair, but when we train on French they are quite lower. We have a similar case when we train on Arabic and test on English. We can justify that by, the language presentation of the Arabic and French tweets are quite informal and have many dialect words that may not exist in the pretrained embeddings we used comparing to the English ones (lower embeddings coverage ratio), which become harder for the CNN to learn a clear semantic pattern. Another point is the presence of Arabic dialects, where some dialect words may not exist in the multilingual pretrained embedding model that we used. On the other hand, from the text-based perspective, the results show that the text-based features can help in the case when the semantic aspect shows weak detection; this is the case for the \(Ar\longrightarrow En\) configuration. It is worthy to mention that the highest result we get in this experiment is from the En\(\,\rightarrow \,\)Fr pair, as both languages use Latin characters. Finally, when investigating the relatedness between European vs. non European languages (cf. (En/Fr)\(\,\rightarrow \,\)Ar), we obtain similar results than those obtained in the monolingual experiment (macro F-score 62.4 vs. 68.0) and best results are achieved by Ar\(\,\rightarrow \,\)(En/Fr). This shows that there are pragmatic devices in common between both sides and, in a similar way, similar text-based patterns in the narrative way of the ironic tweets.

5 Discussions and Conclusion

This paper proposes the first multilingual ID in tweets. We show that simple monolingual architectures (either neural or feature-based) trained separately on each language can be successfully used in a multilingual setting providing a cross-lingual word representation or basic surface features. Our monolingual results are comparable to state of the art for the three languages. The CNN architecture trained on cross-lingual word representation shows that irony has a certain similarity between the languages we targeted despite the cultural differences which confirm that irony is a universal phenomena, as already shown in previous linguistic studies [9, 24, 35]. The manual analysis of the common misclassified tweets across the languages in the multilingual setup, shows that classification errors are due to three main factors. (1) First, the absence of context where writers did not provide sufficient information to capture the ironic sense even in the monolingual setting, as in Open image in new window (Let’s start again, get off get off Mubarak!!) where the writer mocks the Egyptian revolution, as the actual president “Sisi” is viewed as Mubarak’s fellows. (2) Second, the presence of out of vocabulary (OOV) terms because of the weak coverage of the multilingual embeddings which make the system fails to generalize when the OOV set of unseen words is large during the training process. We found tweets in all the three languages written in a very informal way, where some characters of the words were deleted, duplicated or written phonetically (e.g phat instead of fat). (3) Another important issue is the difficulty to deal with the Arabic language. Arabic tweets are often characterized by non-diacritised texts, a large variations of unstandardized dialectal Arabic (recall that our dataset has 4 main varieties, namely Egypt, Gulf, Levantine, and Maghrebi), presence of transliterated words (e.g. the word table becomes Open image in new window (tabla)), and finally linguistic code switching between Modern Standard Arabic and several dialects, and between Arabic and other languages like English and French. We found some tweets contain only words from one of the varieties and most of these words do not exist in the Arabic embeddings model. For example in Open image in new window (Since many days Mubarak didn’t die .. is he sick or what? #Egypt), only the words Open image in new window (day), Open image in new window (Mubarak), and Open image in new window (he) exist in the embeddings. Clearly, considering only these three available words, we are not able to understand the context or the ironic meaning of the tweet.

To conclude, our multilingual experiments confirmed that the door is open towards multilingual approaches for ID. Furthermore, our results showed that ID can be applied to languages that lack of annotated data. Our next step is to experiment with other languages such as Hindi and Italian.

Footnotes

  1. 1.
  2. 2.

    All of these words are synonyms where they mean “Irony”.

  3. 3.

    Other available pretrained embeddings models have also been tested.

  4. 4.
  5. 5.

    To avoid language dependencies, we rely on surface features only discarding those that require external semantic resources or morpho-syntactic parsing.

Notes

Acknowledgment

The work of Paolo Rosso was partially funded by the Spanish MICINN under the research project MISMIS-FAKEnHATE (PGC2018-096212-B-C31).

References

  1. 1.
    Artetxe, M., Labaka, G., Agirre, E., Cho, K.: Unsupervised neural machine translation. arXiv preprint (2017)Google Scholar
  2. 2.
    Attardo, S.: Irony as relevant inappropriateness. J. Pragmat. 32(6), 793–826 (2000)CrossRefGoogle Scholar
  3. 3.
    Balahur, A., Turchi, M.: Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis. Comput. Speech Lang. 28(1), 56–75 (2014)CrossRefGoogle Scholar
  4. 4.
    Barnes, J., Klinger, R., Schulte im Walde, S.: Bilingual sentiment embeddings: joint projection of sentiment across languages. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2483–2493. Association for Computational Linguistics (2018)Google Scholar
  5. 5.
    Benamara, F., Grouin, C., Karoui, J., Moriceau, V., Robba, I.: Analyse d’opinion et langage figuratif dans des tweets présentation et résultats du Défi Fouille de Textes DEFT2017. In: Actes de DEFT@TALN2017, Orléans, France (2017)Google Scholar
  6. 6.
    Bikel, D., Zitouni, I.: Multilingual Natural Language Processing Applications: From Theory to Practice, 1st edn. IBM Press, Armonk (2012)Google Scholar
  7. 7.
    Carvalho, P., Sarmento, L., Silva, M.J., Oliveira, E.D.: Clues for detecting irony in user-generated contents: oh...!! it’s “so easy”;-). In: Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion, pp. 53–56. ACM (2009)Google Scholar
  8. 8.
    Chakhachiro, R.: Translating irony in political commentary texts from English into Arabic. Babel 53(3), 216–240 (2007)CrossRefGoogle Scholar
  9. 9.
    Colston, H.L.: Irony as indirectness cross-linguistically: on the scope of generic mechanisms. In: Capone, A., García-Carpintero, M., Falzone, A. (eds.) Indirect Reports and Pragmatics in the World Languages. PPPP, vol. 19, pp. 109–131. Springer, Cham (2019).  https://doi.org/10.1007/978-3-319-78771-8_6CrossRefGoogle Scholar
  10. 10.
    Conneau, A., Lample, G., Ranzato, M., Denoyer, L., Jégou, H.: Word translation without parallel data. arXiv preprint (2017)Google Scholar
  11. 11.
    Ghanem, B., Karoui, J., Benamara, F., Moriceau, V., Rosso, P.: IDAT@FIRE2019: overview of the track on irony detection in Arabic tweets. In: Proceedings of the 11th Forum for Information Retrieval Evaluation, pp. 10–13. ACM (2019)Google Scholar
  12. 12.
    Ghanem, B., Karoui, J., Benamara, F., Moriceau, V., Rosso, P.: IDAT@FIRE2019: overview of the track on irony detection in Arabic tweets. In: Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings, Kolkata, India, vol. 2517, pp. 380–390. CEUR-WS.org (2019)Google Scholar
  13. 13.
    Ghanem, B., Rangel, F., Rosso, P.: LDR at SemEval-2018 task 3: a low dimensional text representation for irony detection. In: Proceedings of the 12th International Workshop on Semantic Evaluation, pp. 531–536 (2018)Google Scholar
  14. 14.
    Gianti, A., Bosco, C., Patti, V., Bolioli, A., Caro, L.D.: Annotating irony in a novel Italian corpus for sentiment analysis. In: Proceedings of the 4th Workshop on Corpora for Research on Emotion Sentiment and Social Signals, Istanbul, Turkey, pp. 1–7 (2012)Google Scholar
  15. 15.
    Grave, E., Bojanowski, P., Gupta, P., Joulin, A., Mikolov, T.: Learning word vectors for 157 languages. CoRR abs/1802.06893 (2018)Google Scholar
  16. 16.
    Grice, H.P.: Logic and conversation. In: Cole, P., Morgan, J.L. (eds.) Speech Acts. Syntax and Semantics, vol. 3, pp. 41–58. Academic Press, New York (1975)Google Scholar
  17. 17.
    Hee, C.V., Lefever, E., Hoste, V.: SemEval-2018 task 3: irony detection in English tweets. In: Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT, New Orleans, Louisiana, 5–6 June 2018, pp. 39–50 (2018)Google Scholar
  18. 18.
    Hernández Farías, D.I., Bosco, C., Patti, V., Rosso, P.: Sentiment polarity classification of figurative language: exploring the role of irony-aware and multifaceted affect features. In: Gelbukh, A. (ed.) CICLing 2017. LNCS, vol. 10762, pp. 46–57. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-77116-8_4CrossRefGoogle Scholar
  19. 19.
    Hernández Farías, D.I., Patti, V., Rosso, P.: Irony detection in twitter: the role of affective content. ACM Trans. Technol. (TOIT) 16(3), 19 (2016)Google Scholar
  20. 20.
    Huang, H.H., Chen, C.C., Chen, H.H.: Disambiguating false-alarm hashtag usages in tweets for irony detection. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (2018)Google Scholar
  21. 21.
    Huang, Y.-H., Huang, H.-H., Chen, H.-H.: Irony detection with attentive recurrent neural networks. In: Jose, J.M., et al. (eds.) ECIR 2017. LNCS, vol. 10193, pp. 534–540. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-56608-5_45CrossRefGoogle Scholar
  22. 22.
    Karoui, J., Benamara, F., Moriceau, V.: SOUKHRIA: towards an irony detection system for Arabic in social media. In: Third International Conference on Arabic Computational Linguistics, ACLING 2017, 5–6 November 2017, Dubai, United Arab Emirates, pp. 161–168 (2017)Google Scholar
  23. 23.
    Karoui, J., Benamara, F., Moriceau, V., Aussenac-Gilles, N., Belguith, L.H.: Towards a contextual pragmatic model to detect irony in tweets. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (Volume 2: Short Papers), ACL-IJCNLP 2015, pp. 644–650 (2015)Google Scholar
  24. 24.
    Karoui, J., Benamara, F., Moriceau, V., Patti, V., Bosco, C., Aussenac-Gilles, N.: Exploring the impact of pragmatic phenomena on irony detection in tweets: a multilingual corpus study. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 262–272. Association for Computational Linguistics (2017)Google Scholar
  25. 25.
    Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics (2014)Google Scholar
  26. 26.
    Liebrecht, C., Kunneman, F., van den Bosch, A.: The perfect solution for detecting sarcasm in tweets# not. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 29–37. ACL, New Brunswick (2013)Google Scholar
  27. 27.
    Litschko, R., Glavaš, G., Ponzetto, S.P., Vulić, I.: Unsupervised cross-lingual information retrieval using monolingual data only. In: The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018, pp. 1253–1256 (2018)Google Scholar
  28. 28.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)Google Scholar
  29. 29.
    Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751 (2013)Google Scholar
  30. 30.
    Ni, J., Florian, R.: Improving multilingual named entity recognition with Wikipedia entity type mapping. CoRR abs/1707.02459 (2017)Google Scholar
  31. 31.
    Ortega-Bueno, R., Rangel, F., Hernández Farıas, D., Rosso, P., Montes-y Gómez, M., Medina Pagola, J.E.: Overview of the task on irony detection in Spanish variants. In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), Co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2019). CEUR-WS.org (2019)Google Scholar
  32. 32.
    Ptáček, T., Habernal, I., Hong, J.: Sarcasm detection on Czech and English twitter. In: Proceedings of the 25th International Conference on Computational Linguistics, COLING 2014: Technical Papers, pp. 213–223 (2014)Google Scholar
  33. 33.
    Ruder, S.: A survey of cross-lingual embedding models. CoRR abs/1706.04902 (2017)Google Scholar
  34. 34.
    Sasaki, S., Sun, S., Schamoni, S., Duh, K., Inui, K.: Cross-lingual learning-to-rank with shared representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pp. 458–463 (2018)Google Scholar
  35. 35.
    Sigar, A., Taha, Z.: A contrastive study of ironic expressions in English and Arabic. Coll. Basic Educ. Res. J. 12(2), 795–817 (2012)Google Scholar
  36. 36.
    Soliman, A.B., Eissa, K., El-Beltagy, S.R.: AraVec: a set of Arabic word embedding models for use in Arabic NLP. In: Third International Conference on Arabic Computational Linguistics, ACLING 2017, 5–6 November 2017, Dubai, United Arab Emirates, pp. 256–265 (2017)Google Scholar
  37. 37.
    Swami, S., Khandelwal, A., Singh, V., Akhtar, S.S., Shrivastava, M.: A corpus of English-Hindi code-mixed tweets for sarcasm detection. In: 19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing) (2018)Google Scholar
  38. 38.
    Tang, Y., Chen, H.: Chinese irony corpus construction and ironic structure analysis. In: Proceedings of the 25th International Conference on Computational Linguistics, COLING 2014: Technical Papers, 23–29 August 2014, Dublin, Ireland, pp. 1269–1278 (2014)Google Scholar
  39. 39.
    Tay, Y., Luu, A.T., Hui, S.C., Su, J.: Reasoning with sarcasm by reading in-between. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1010–1020 (2018)Google Scholar
  40. 40.
    Veale, T.: Creative language retrieval: a robust hybrid of information retrieval and linguistic creativity. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol. 1, pp. 278–287 (2011)Google Scholar
  41. 41.
    Wada, T., Iwata, T.: Unsupervised cross-lingual word embedding by multilingual neural language models. arXiv preprint (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.PRHLT Research CenterUniversitat Politècnica de ValènciaValenciaSpain
  2. 2.AUSY R&DParisFrance
  3. 3.IRIT, CNRS, Université de ToulouseToulouseFrance

Personalised recommendations