Building Support Tools for Russian-Language Information Extraction

Du, Mian; von Etter, Peter; Kopotev, Mikhail; Novikov, Mikhail; Tarbeeva, Natalia; Yangarber, Roman

doi:10.1007/978-3-642-23538-2_48

Mian Du²¹,
Peter von Etter²¹,
Mikhail Kopotev²¹,
Mikhail Novikov²¹,
Natalia Tarbeeva²¹ &
…
Roman Yangarber²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6836))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

943 Accesses
5 Citations

Abstract

There is currently a paucity of publicly available NLP tools to support analysis of Russian-language text. This especially concerns higher-level applications, such as Information Extraction. We present work on tools for information extraction from text in Russian in the domain of on-line news. On the lower level we employ the AOT toolkit for natural language processing, which provides modules for morphological analysis and partial syntactic chunking. Since the outputs of both lower-level modules contain unresolved ambiguity, we synthesize the outputs and pass the result into a pre-existing English-language analysis pipeline. We describe how the information extraction system is adapted for multi-lingual support, including extensions to the ontologies and to the pattern matching mechanism. While this is work in progress, we present an end-to-end pipeline for event extraction from Russian-language news.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Astaf’eva, I., Bonch-Osmolovskaya, A., Garejshina, A., Grishina, J., D’jachkov, V., Ionov, M., Koroleva, A., Kudrinsky, M., Lityagina, A., Luchina, E., Sidorova, E., Toldova, S., Lyashevskaya, O.S.S., Koval’, S.: NLP evaluation: Russian morphological parsers. In: Proceedings of Dialog Conference, Moscow, Russia (2010)
Google Scholar
Atkinson, M., Belyaeva, J., Zavarella, V., Piskorski, J., Huttunen, S., Vihavainen, A., Yangarber, R.: News mining for border security intelligence. In: Proceedings of IEEE ISI-2010: Intelligence and Security Informatics, Vancouver, BC, Canada (2010)
Google Scholar
Atkinson, M., Piskorski, J., Tanev, H., van der Goot, E., Yangarber, R., Zavarella, V.: Automated event extraction in the domain of border security. In: Proceedings of MINUCS: Workshop on Mining User-Generated Content for Security, at the UCMedia: 1st International ICST Conference on User-Centric Media, Venice, Italy (2009)
Google Scholar
Bontcheva, K., Maynard, D., Tablan, V., Cunningham, H.: GATE: A Unicode-based infrastructure supporting multilingual information extraction. In: Proceedings of Workshop on Information Extraction for Slavonic and other Central and Eastern European Languages, Borovets, Bulgaria (2003)
Google Scholar
von Etter, P., Huttunen, S., Vihavainen, A., Vuorinen, M., Yangarber, R.: Assessment of utility in Web mining for the domain of public health. In: Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents, pp. 29–37. Association for Computational Linguistics, Los Angeles (June 2010), http://www.aclweb.org/anthology/W10-1105
Google Scholar
Järvinen, T., Tapanainen, P.: A dependency parser for English. Tech. Rep. TR-1, Department of General Linguistics, University of Helsinki, Finland (February 1997)
Google Scholar
Linge, J., Steinberger, R., Weber, T., Yangarber, R., van der Goot, E., Khudhairy, D.A., Stilianakis, N.: Internet surveillance systems for early alerting of health threats. Eurosurveillance Journal 14(13) (2009)
Google Scholar
Piskorski, J., Atkinson, M., Belyaeva, J., Zavarella, V., Huttunen, S., Yangarber, R.: Real-time text mining in multilingual news for the creation of a pre-frontier intelligence picture. In: Proceedings of ISI-KDD: ACM SIGKDD Workshop on Intelligence and Security Informatics, at KDD-2010: 16th Conference on Knowledge Discovery and Data Mining, Washington, DC (2010)
Google Scholar
Sokirko, A.: Semantic dictionaries in automatic text analysis, based on DIALING system materials. Ph.D. thesis, Russian State University for the Humanities, Moscow (2001)
Google Scholar
Sokirko, A.: A short description of DIALING project (2001), http://www.aot.ru/docs/sokirko/sokirko-candid-eng.html
Sokirko, A.: Private communication (2011)
Google Scholar
Steinberger, R., Fuart, F., van der Goot, E., Best, C., von Etter, P., Yangarber, R.: Text mining from the web for medical intelligence. In: Perrotta, D., Piskorski, J., Soulié-Fogelman, F., Steinberger, R. (eds.) Mining Massive Data Sets for Security, OIS Press, Amsterdam (2008)
Google Scholar
Thelen, M., Riloff, E.: A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, EMNLP 2002 (2002)
Google Scholar
Wilensky, R.: Common LISPcraft. W. W. Norton and Company, USA (1986)
MATH Google Scholar
Yangarber, R.: Counter-training in discovery of semantic patterns. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan (July 2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Helsinki, Finland
Mian Du, Peter von Etter, Mikhail Kopotev, Mikhail Novikov, Natalia Tarbeeva & Roman Yangarber

Authors

Mian Du
View author publications
You can also search for this author in PubMed Google Scholar
Peter von Etter
View author publications
You can also search for this author in PubMed Google Scholar
Mikhail Kopotev
View author publications
You can also search for this author in PubMed Google Scholar
Mikhail Novikov
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Tarbeeva
View author publications
You can also search for this author in PubMed Google Scholar
Roman Yangarber
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Sciences, University of West Bohemia, Univerzitní 22, 306 14, Pilsen, Czech Republic
Ivan Habernal
Faculty of Applied Sciences, Dept. of Computer Science and Engineering, University of West Bohemia, Univerzitni 8, 306 14, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Du, M., von Etter, P., Kopotev, M., Novikov, M., Tarbeeva, N., Yangarber, R. (2011). Building Support Tools for Russian-Language Information Extraction. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_48

Download citation

DOI: https://doi.org/10.1007/978-3-642-23538-2_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23537-5
Online ISBN: 978-3-642-23538-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics