Data augmentation and transfer learning for cross-lingual Named Entity Recognition in the biomedical domain Brayan Stiven LancherosGloria Corpas PastorRuslan Mitkov Original Paper Open access 10 May 2024
Features in extractive supervised single-document summarization: case of Persian news Hosein RezaeiSeyed Amid Moeinzadeh MirhosseiniMohamad Saraee Original Paper Open access 08 May 2024
Mismatching-aware unsupervised translation quality estimation for low-resource languages Fatemeh AzadiHeshaam FailiMohammad Javad Dousti Original Paper 05 May 2024
Improving Arabic sentiment analysis across context-aware attention deep model based on natural language processing Abubakr H. OmbabiWael OuardaAdel M. Alimi Originl Paper 27 April 2024
ArEntail: manually-curated Arabic natural language inference dataset from news headlines Rasha ObeidatYara Al-HarahshehMaram Gharaibeh Original Paper 22 April 2024
Faux Hate: unravelling the web of fake narratives in spreading hateful stories: a multi-label and multi-class dataset in cross-lingual Hindi-English code-mixed text Shankar BiradarSunil SaumyaArun Chauhan Original Paper 16 April 2024
Depression symptoms modelling from social media text: an LLM driven semi-supervised learning approach Nawshad FarruqueRandy GoebelOsmar R. Zaïane Original Paper Open access 04 April 2024
A morphologically annotated longitudinal corpus of spoken Czech child–adult interactions Anna ChromáJakub SlámaJolana Treichelová OriginalPaper 30 March 2024
TCMeta: a multilingual dataset of COVID tweets for relation-level metaphor analysis Mojca BrglezOmnia ZayedPaul Buitelaar Original Paper Open access 30 March 2024
A longitudinal multi-modal dataset for dementia monitoring and diagnosis Dimitris GkoumasBo WangMaria Liakata Original Paper Open access 30 March 2024
DILLo: an Italian lexical database for speech-language pathologists Federica BeccariaAngela CristianoGloria Gagliardi Original Paper Open access 23 March 2024
"Approaches to sentiment analysis of Hungarian political news at the sentence level" Orsolya RingMartina Katalin SzabóIstván Üveges Original Paper Open access 23 March 2024
Introducing the 3MT_French dataset to investigate the timing of public speaking judgements Beatrice BiancardiMathieu CholletChloé Clavel OriginalPaper Open access 23 March 2024
VeLeRo: an inflected verbal lexicon of standard Romanian and a quantitative analysis of morphological predictability Borja HerceBogdan Pricop Project Notes Open access 23 March 2024
An aligned corpus of Spanish bibles Gerardo SierraGemma Bel-EnguixNúria Bel Original Paper Open access 15 March 2024
Computational approaches to Portuguese: introduction to the special issue Diana SantosThiago Alexandre Salgueiro Pardo Editorial 06 March 2024 Pages: 1 - 6
SOLD: Sinhala offensive language dataset Tharindu RanasingheIsuri AnuradhaMarcos Zampieri Original Paper Open access 06 March 2024
Infectious risk events and their novelty in event-based surveillance: new definitions and annotated corpus François DelonGabriel BédubourgMarc Tanti Original Paper 05 March 2024
Semantic search as extractive paraphrase span detection Jenna KanervaHanna KittiFilip Ginter Original Paper Open access 01 February 2024
A new methodology for automatic creation of concept maps of Turkish texts Merve BayrakDeniz Dal Original Paper 28 January 2024
Large scale annotated dataset for code-mix abusive short noisy text Paras TiwariSawan RaiC. Ravindranath Chowdary OriginalPaper 25 January 2024
A flexible tool for a qualia-enriched FrameNet: the FrameNet Brasil WebTool Tiago Timponi TorrentEly Edison da Silva MatosVanessa Maria Ramos Lopes Paiva Original Paper 22 January 2024
NewsCom-TOX: a corpus of comments on news articles annotated for toxicity in Spanish Mariona TauléMontserrat NofreXavier Bonet Original Paper Open access 17 January 2024
Toxic comment classification and rationale extraction in code-mixed text leveraging co-attentive multi-task learning Kiran Babu NelatooriHima Bindu Kommanti Original Paper 13 January 2024
Multi-layered semantic annotation and the formalisation of annotation schemas for the investigation of modality in a Latin corpus Helena Bermúdez-SabelFrancesca Dell’OroPaola Marongiu Project Notes 06 January 2024
AC-IQuAD: Automatically Constructed Indonesian Question Answering Dataset by Leveraging Wikidata Kerenza DoxolodeoAdila Alfa Krisnadhi OriginalPaper Open access 03 January 2024
KurdiSent: a corpus for kurdish sentiment analysis Soran BadawiArefeh KazemiVali Rezaie Original Paper 02 January 2024
Syntactic annotation for Portuguese corpora: standards, parsers, and search interfaces Pablo FariaCharlotte GalvesCatarina Magro Original Paper 26 December 2023 Pages: 301 - 346
Linguistic annotation of Byzantine book epigrams Colin SwaelensIlse De VosEls Lefever Original Paper 13 December 2023
Democratizing neural machine translation with OPUS-MT Jörg TiedemannMikko AulamoSami Virpioja Original Paper Open access 13 December 2023
When MIPVU goes to no man’s land: a new language resource for hybrid, morpheme-based metaphor identification in Hungarian Gábor SimonTímea BajzátEszter Szlávich Original Paper Open access 09 December 2023
EmoTwiCS: a corpus for modelling emotion trajectories in Dutch customer service dialogues on Twitter Sofie LabatThomas DemeesterVéronique Hoste Original Paper Open access 08 December 2023
Resources building for sentiment analysis of content disseminated by Tunisian medias in social networks Emna FsihRahma BoujelbaneLamia Hadrich Belguith OriginalPaper 02 December 2023
A corpus of Persian literary text Shahab RajiMalihe AlikhaniMatthew Stone Original Paper Open access 23 November 2023
A corpus of English learners with Arabic and Hebrew backgrounds Omaima AbboudBatia LauferShuly Wintner Project Notes 20 November 2023
The Reading Everyday Emotion Database (REED): a set of audio-visual recordings of emotions in music and language Jia Hoong OngFlorence Yik Nam LeungFang Liu OriginalPaper Open access 20 November 2023
Brazilian Portuguese corpora for teaching and translation: the CoMET project Stella E. O. Tagnin Project Notes 16 November 2023 Pages: 347 - 361
Automatic genre identification: a survey Taja KuzmanNikola Ljubešić Survey Open access 16 November 2023
A multilingual, multimodal dataset of aggression and bias: the ComMA dataset Ritesh KumarShyam RatanAkanksha Bansal Original Paper 16 November 2023
Correction: The DELAD initiative for sharing language resources on speech disorders Alice LeeNicola BessellSatu Saalasti Correction Open access 06 November 2023
LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI Ishan TaruneshSomak AdityaMonojit Choudhury Original Paper 04 November 2023
Building the VisSE Corpus of Spanish SignWriting Antonio F. G. SevillaAlberto Díaz EstebanJosé María Lahoz-Bengoechea Original Paper 26 October 2023
Text augmentation for semantic frame induction and parsing Saba AnwarArtem ShelmanovChris Biemann Original Paper Open access 21 October 2023
A new corpus of geolocated ASR transcripts from Germany Steven Coats Project Notes Open access 21 October 2023
Beyond plain toxic: building datasets for detection of flammable topics and inappropriate statements Nikolay BabakovVarvara LogachevaAlexander Panchenko Original Paper 21 October 2023
NILC-Metrix: assessing the complexity of written and spoken language in Brazilian Portuguese Sidney Evaldo LealMagali Sanches DuranSandra Maria Aluísio SURVEY 17 October 2023 Pages: 73 - 110
A semi-supervised method to generate a persian dataset for suggestion classification Leila SafariZanyar Mohammady Original Paper 29 September 2023
NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links Natalia LoukachevitchEkaterina ArtemovaAlexey Yandutov Original Paper 21 September 2023
A survey and study impact of tweet sentiment analysis via transfer learning in low resource scenarios Manoel Veríssimo dos Santos NetoNádia Félix F. da SilvaAnderson da Silva Soares Original Paper 14 September 2023 Pages: 133 - 174
An eye-tracking-with-EEG coregistration corpus of narrative sentences Stefan L. FrankAnna Aumeistere Original Paper Open access 29 August 2023