Multilingual Media Monitoring and Text Analysis – Challenges for Highly Inflected Languages

  • Ralf Steinberger
  • Maud Ehrmann
  • Júlia Pajzs
  • Mohamed Ebrahim
  • Josef Steinberger
  • Marco Turchi
Conference paper

DOI: 10.1007/978-3-642-40585-3_3

Part of the Lecture Notes in Computer Science book series (LNCS, volume 8082)
Cite this paper as:
Steinberger R., Ehrmann M., Pajzs J., Ebrahim M., Steinberger J., Turchi M. (2013) Multilingual Media Monitoring and Text Analysis – Challenges for Highly Inflected Languages. In: Habernal I., Matoušek V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science, vol 8082. Springer, Berlin, Heidelberg

Abstract

We present the highly multilingual news analysis system Europe Media Monitor (EMM), which gathers an average of 175,000 online news articles per day in tens of languages, categorises the news items and extracts named entities and various other information from them. We also give an overview of EMM’s text mining tool set, focusing on the issue of how the software deals with highly inflected languages such as those of the Slavic and Finno-Ugric language families. The questions we ask are: How to adapt extraction patterns to such languages? How to de-inflect extracted named entities? And: Will document categorisation benefit from lemmatising the texts?

Keywords

multilinguality text mining information extraction text classification inflection Slavic and Finno-Ugric languages media monitoring 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ralf Steinberger
    • 1
  • Maud Ehrmann
    • 2
  • Júlia Pajzs
    • 3
  • Mohamed Ebrahim
    • 4
  • Josef Steinberger
    • 5
  • Marco Turchi
    • 6
  1. 1.European Commission - Joint Research CentreIPSC-GlobeSecIspraItaly
  2. 2.Department of Computer ScienceSapienza University of RomeRomeItaly
  3. 3.Research Institute for LinguisticsHungarian Academy of SciencesBudapestHungary
  4. 4.Cognizant SetConMunichGermany
  5. 5.Faculty of Applied Sciences, Department of Computer Science and Engineering, NTIS CentreUniversity of West BohemiaPilsenCzech Republic
  6. 6.Human Language Technology groupFondazione Bruno KesslerTrentoItaly

Personalised recommendations