Texterra: A framework for text analysis

Turdakov, D. Yu.; Astrakhantsev, N. A.; Nedumov, Ya. R.; Sysoev, A. A.; Andrianov, I. A.; Mayorov, V. D.; Fedorenko, D. G.; Korshunov, A. V.; Kuznetsov, S. D.

doi:10.1134/S0361768814050090

Texterra: A framework for text analysis

Published: 21 September 2014

Volume 40, pages 288–295, (2014)
Cite this article

Programming and Computer Software Aims and scope Submit manuscript

D. Yu. Turdakov¹,
N. A. Astrakhantsev¹,
Ya. R. Nedumov¹,
A. A. Sysoev¹,
I. A. Andrianov¹,
V. D. Mayorov¹,
D. G. Fedorenko¹,
A. V. Korshunov¹ &
…
S. D. Kuznetsov¹

265 Accesses
9 Citations
Explore all metrics

Abstract

A framework for fast text analysis, which is developed as a part of the Texterra project, is described. Texterra provides a scalable solution for the fast text processing on the basis of novel methods that exploit knowledge extracted from the Web and text documents. For the developed tools, details of the project, use cases, and evaluation results are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text Mining

SemanticExcel.com: An Online Software for Statistical Analyses of Text Data Based on Natural Language Processing

Die Anwendung von Text Mining in den Sozialwissenschaften

References

Bird, S., Klein, E., Loper, E., and Baldridge, J., Multi-disciplinary instruction with the Natural Language Toolkit, Proc. Third Workshop on Issues in Teaching Computational Linguistics (TeachCL’ 08), Stroudsburg, 2008, pp. 62–70.
Chapter Google Scholar
Cunningham, H., Tablan, V., Roberts, A., and Bontcheva, K., Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics, PLoS Comput. Biol., 2013 vol. 9, no. 2.
Google Scholar
Ferrucci, D. et.al., Towards an interoperability standard for text and multi-modal analytics, Technical report RC24122, IBM, 2006.
Google Scholar
Nozhov, I., Morphological and syntax-oriented text processing (models and programs), Theses of dissertation, 2003.
Google Scholar
Alekseev, A., Dobrov, B., and Lukashevich, N., Linguistic ontology of RuTez thesaurus, Proc. Conf. on Open Semantic Technologies for Intelligent Systems (OSTIS), 2013, pp. 153–158.
Google Scholar
Braslavskii, P.I., Mukhin, M.Yu., Lyashevskaya, O.N., Bonch-Osmolovskaya, A.A., Krizhanovskii, A.A., and Egorov, P.E., YARN: The beginning, Proc. Conf. Dialog-2013, 2013.
Google Scholar
Karkaletsis, V., Fragkou, P., Petasis, G., and Iosif, E., Ontology based information extraction from text, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution, Paliouras, G., Spyropoulos, C., and Tsatsaronis, G., Eds., Lecture Notes Comput. Sci., Berlin: Springer, 2011, vol. 6050, pp. 89–109.
Chapter Google Scholar
Unger, C. and Cimiano, P., Pythia: compositional meaning construction for ontology-based question answering on the semantic web, Lecture Notes Comput. Sci., Berlin: Springer, 2011, vol. 6716, pp. 153–160.
Article Google Scholar
Jimeno-Yepes, Berlanga-Llavori, R., and Rebholz-Schuhmann, D., Ontology refinement for improved information retrieval, Information Processing Management, 2010, vol. 46, no. 4, pp. 426–435.
Article Google Scholar
Grineva, M., Turdakov, D., and Sysoev, A., Blognoon: exploring a topic in the blogosphere, Proc. 20th Int. Conf. Companion on World Wide Web, Hyderabad, 2011, pp. 213–216.
Chapter Google Scholar
Biemann, C., Ontology learning from text: a survey of methods, LDV-Forum, 2005, vol. 20, pp. 75–93.
Google Scholar
Astrakhantsev, N.A. and Turdakov, D.Yu., Automatic construction and enrichment of informal ontologies: a survey, Programming Comput. Software, 2013, vol. 39, no. 1, pp. 34–42.
Article Google Scholar
Segalovich, A., Fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine.
Bocharov, V.V., Alexeeva, S.V., Granovsky, D.V., Protopopova, E.V., Stepanova, M.E., and Surikov, A.V., Crowdsourcing morphological annotation, Computational Linguistics and Intelligent Technologies: Mater. Annu. Int. Conf. Dialog (Bekasovo, 2013), Moscow: RGGU, 2013, vol. 12, no. 19.
Google Scholar
Lyashevskaya, O.N., Plungyan, V.A., and Sichinava, D.V., On the morphological standard of Russian National Corpus, (Russian National Corpus Russian National Corpus 2003–2005: Results and Perspectives,, Moscow, 2005, pp. 111–135.
Google Scholar
Milne, D. and Witten, I.H., Learning to link with Wikipedia, Proc. 17th ACM Conf. on Information and Knowledge Management (CIKM’ 08), New York, 2008.
Google Scholar
Stanford University, Stanford Twitter sentiment general domain dataset. http://www.stanford.edu/~alecmgo/cs224n/trainingandtestdata.zip. Cited July 22, 2012.
Google Scholar
Stanford University, Sentiment140 Twitter sentiment general domain dataset. http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip. Cited July 22, 2012.
Google Scholar
Know-Center GmbH, KnowCenter Twitter sentiment general domain dataset. http://know-center.tugraz.at/loesungen/daten. Cited July 22, 2012.
Google Scholar
Natural Language Processing and Information Retrieval Group, UNED Twitter sentiment general domain dataset. http://nlp.uned.es/~damiano/datasets/enti-tyProfiling_ORM_Twitter.html. Cited July 22, 2012.
Google Scholar
International Conference on Weblogs and Social Media movie domain dataset. http://icwsm.cs.mcgill.ca. Cited December 6, 2013.
Cornell University, Department of Computer Science, IMDb movie review dataset. http://www.cs.cornell.edu/people/pabo/movie-review-data/polarity-html.zip. Cited December 6, 2013.
Google Scholar
Infochimps, Twitter Sentiment Dataset from the 1st 2008 Presidential Debate. http://www.infochimps.com/datasets/twitter-sentiment-dataset-2008-debates. Cited December 6, 2013.
Google Scholar
Mendes, P.N., Jakob, M., Garcia-Silva, A., and Bizer, C., DBpedia spotlight: Shedding light on the Web of documents, Proc. 7th International Conference on Semantic Systems (I-Semantics 2011), Graz, 2011.
Google Scholar
Korshunov, A., Problems and methods of determining attributes of social network users, Proc. 15th All-Russian Scientific Conference “Electronic Libraries: Promising Methods and Technologies, Electronic Collections”, 2013.
Google Scholar
Grineva, M., Grinev, M., and Lizorkin, D., Extracting key terms from noisy and multitheme documents, Proc. 18th International World Wide Web Conference (WWW 2009), 2009.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for System Programming, Russian Academy of Sciences, ul. A. Solzhenitsyna 25, Moscow, 109004, Russia
D. Yu. Turdakov, N. A. Astrakhantsev, Ya. R. Nedumov, A. A. Sysoev, I. A. Andrianov, V. D. Mayorov, D. G. Fedorenko, A. V. Korshunov & S. D. Kuznetsov

Authors

D. Yu. Turdakov
View author publications
You can also search for this author in PubMed Google Scholar
N. A. Astrakhantsev
View author publications
You can also search for this author in PubMed Google Scholar
Ya. R. Nedumov
View author publications
You can also search for this author in PubMed Google Scholar
A. A. Sysoev
View author publications
You can also search for this author in PubMed Google Scholar
I. A. Andrianov
View author publications
You can also search for this author in PubMed Google Scholar
V. D. Mayorov
View author publications
You can also search for this author in PubMed Google Scholar
D. G. Fedorenko
View author publications
You can also search for this author in PubMed Google Scholar
A. V. Korshunov
View author publications
You can also search for this author in PubMed Google Scholar
S. D. Kuznetsov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Yu. Turdakov.

Additional information

Original Russian Text © D.Yu. Turdakov, N.A. Astrakhantsev, Ya.R. Nedumov, A.A. Sysoev, I.A. Andrianov, V.D. Mayorov, D.G. Fedorenko, A.V. Korshunov, S.D. Kuznetsov, 2014, published in Proceedings of the Institute for System Programming of RAS, 2014, vol. 26, issue 1, pp. 421–438.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Turdakov, D.Y., Astrakhantsev, N.A., Nedumov, Y.R. et al. Texterra: A framework for text analysis. Program Comput Soft 40, 288–295 (2014). https://doi.org/10.1134/S0361768814050090

Download citation

Received: 18 March 2014
Published: 21 September 2014
Issue Date: September 2014
DOI: https://doi.org/10.1134/S0361768814050090

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Texterra: A framework for text analysis

Abstract

Access this article

Similar content being viewed by others

Text Mining

SemanticExcel.com: An Online Software for Statistical Analyses of Text Data Based on Natural Language Processing

Die Anwendung von Text Mining in den Sozialwissenschaften

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Texterra: A framework for text analysis

Abstract

Access this article

Similar content being viewed by others

Text Mining

SemanticExcel.com: An Online Software for Statistical Analyses of Text Data Based on Natural Language Processing

Die Anwendung von Text Mining in den Sozialwissenschaften

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation