Found in Translation

  • Marco Turchi
  • Ilias Flaounas
  • Omar Ali
  • Tijl De Bie
  • Tristan Snowsill
  • Nello Cristianini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5782)

Abstract

We present a complete working system that gathers multilingual news items from the Web, translates them into English, categorises them by topic and geographic location and presents them to the final user in a uniform way. Currently, the system crawls 560 news outlets, in 22 different languages, from the 27 European Union countries. Data gathering is based on RSS crawlers, machine translation on Moses and the text categorisation on SVMs. The system also presents on a European map statistical information about the amount of attention devoted to the various topics in each of the 27 EU countries. The integration of Support Vector Machines, Statistical Machine Translation, Web Technologies and Computer Graphics delivers a complete system where modern Statistical Machine Learning is used at multiple levels and is a crucial enabling part of the resulting functionality.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    McKeown, K., Barzilay, R., et al.: Tracking and summarizing news on a daily basis with columbia’s newsblaster. In: Proceedings of HLT 2002, San Diego, USA (2002)Google Scholar
  2. 2.
    Best, C., der Goot, E.V., et al.: Europe media monitor. Technical report, JRC (2002)Google Scholar
  3. 3.
    Mehler, A., Bao, Y., et al.: Spatial analysis of news sources. IEEE Transactions on Visualization and Computer Graphics 12(5), 765–772 (2006)CrossRefGoogle Scholar
  4. 4.
    Joachims, T.: SVM light (2002), http://svmlight.joachims.org
  5. 5.
    Koehn, P., Hoang, H., et al.: Moses: Open source toolkit for statistical machine translation. In: Proceedings ACL 2007, demonstration session (2007)Google Scholar
  6. 6.
    Brown, P.F., Pietra, S.D., et al.: The mathematic of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1994)Google Scholar
  7. 7.
    Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings NAACL 2003, Morristown, NJ, USA, pp. 48–54 (2003)Google Scholar
  8. 8.
    Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: Machine Translation Summit X, pp. 79–86 (2005)Google Scholar
  9. 9.
    Steinberger, R., Pouliquen, B., et al.: The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. Arxiv preprint cs/0609058 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Marco Turchi
    • 1
  • Ilias Flaounas
    • 2
  • Omar Ali
    • 1
  • Tijl De Bie
    • 1
  • Tristan Snowsill
    • 1
  • Nello Cristianini
    • 1
    • 2
  1. 1.Department of Engineering MathematicsQueen’s BuildingCanada
  2. 2.Department of Computer ScienceMerchant Venturers Building, Bristol UniversityBristolUnited Kingdom

Personalised recommendations