Skip to main content
Log in

Texterra: A framework for text analysis

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

A framework for fast text analysis, which is developed as a part of the Texterra project, is described. Texterra provides a scalable solution for the fast text processing on the basis of novel methods that exploit knowledge extracted from the Web and text documents. For the developed tools, details of the project, use cases, and evaluation results are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Bird, S., Klein, E., Loper, E., and Baldridge, J., Multi-disciplinary instruction with the Natural Language Toolkit, Proc. Third Workshop on Issues in Teaching Computational Linguistics (TeachCL’ 08), Stroudsburg, 2008, pp. 62–70.

    Chapter  Google Scholar 

  2. Cunningham, H., Tablan, V., Roberts, A., and Bontcheva, K., Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics, PLoS Comput. Biol., 2013 vol. 9, no. 2.

    Google Scholar 

  3. Ferrucci, D. et.al., Towards an interoperability standard for text and multi-modal analytics, Technical report RC24122, IBM, 2006.

    Google Scholar 

  4. Nozhov, I., Morphological and syntax-oriented text processing (models and programs), Theses of dissertation, 2003.

    Google Scholar 

  5. Alekseev, A., Dobrov, B., and Lukashevich, N., Linguistic ontology of RuTez thesaurus, Proc. Conf. on Open Semantic Technologies for Intelligent Systems (OSTIS), 2013, pp. 153–158.

    Google Scholar 

  6. Braslavskii, P.I., Mukhin, M.Yu., Lyashevskaya, O.N., Bonch-Osmolovskaya, A.A., Krizhanovskii, A.A., and Egorov, P.E., YARN: The beginning, Proc. Conf. Dialog-2013, 2013.

    Google Scholar 

  7. Karkaletsis, V., Fragkou, P., Petasis, G., and Iosif, E., Ontology based information extraction from text, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution, Paliouras, G., Spyropoulos, C., and Tsatsaronis, G., Eds., Lecture Notes Comput. Sci., Berlin: Springer, 2011, vol. 6050, pp. 89–109.

    Chapter  Google Scholar 

  8. Unger, C. and Cimiano, P., Pythia: compositional meaning construction for ontology-based question answering on the semantic web, Lecture Notes Comput. Sci., Berlin: Springer, 2011, vol. 6716, pp. 153–160.

    Article  Google Scholar 

  9. Jimeno-Yepes, Berlanga-Llavori, R., and Rebholz-Schuhmann, D., Ontology refinement for improved information retrieval, Information Processing Management, 2010, vol. 46, no. 4, pp. 426–435.

    Article  Google Scholar 

  10. Grineva, M., Turdakov, D., and Sysoev, A., Blognoon: exploring a topic in the blogosphere, Proc. 20th Int. Conf. Companion on World Wide Web, Hyderabad, 2011, pp. 213–216.

    Chapter  Google Scholar 

  11. Biemann, C., Ontology learning from text: a survey of methods, LDV-Forum, 2005, vol. 20, pp. 75–93.

    Google Scholar 

  12. Astrakhantsev, N.A. and Turdakov, D.Yu., Automatic construction and enrichment of informal ontologies: a survey, Programming Comput. Software, 2013, vol. 39, no. 1, pp. 34–42.

    Article  Google Scholar 

  13. Segalovich, A., Fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine.

  14. Bocharov, V.V., Alexeeva, S.V., Granovsky, D.V., Protopopova, E.V., Stepanova, M.E., and Surikov, A.V., Crowdsourcing morphological annotation, Computational Linguistics and Intelligent Technologies: Mater. Annu. Int. Conf. Dialog (Bekasovo, 2013), Moscow: RGGU, 2013, vol. 12, no. 19.

    Google Scholar 

  15. Lyashevskaya, O.N., Plungyan, V.A., and Sichinava, D.V., On the morphological standard of Russian National Corpus, (Russian National Corpus Russian National Corpus 2003–2005: Results and Perspectives,, Moscow, 2005, pp. 111–135.

    Google Scholar 

  16. Milne, D. and Witten, I.H., Learning to link with Wikipedia, Proc. 17th ACM Conf. on Information and Knowledge Management (CIKM’ 08), New York, 2008.

    Google Scholar 

  17. Stanford University, Stanford Twitter sentiment general domain dataset. http://www.stanford.edu/~alecmgo/cs224n/trainingandtestdata.zip. Cited July 22, 2012.

    Google Scholar 

  18. Stanford University, Sentiment140 Twitter sentiment general domain dataset. http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip. Cited July 22, 2012.

    Google Scholar 

  19. Know-Center GmbH, KnowCenter Twitter sentiment general domain dataset. http://know-center.tugraz.at/loesungen/daten. Cited July 22, 2012.

    Google Scholar 

  20. Natural Language Processing and Information Retrieval Group, UNED Twitter sentiment general domain dataset. http://nlp.uned.es/~damiano/datasets/enti-tyProfiling_ORM_Twitter.html. Cited July 22, 2012.

    Google Scholar 

  21. International Conference on Weblogs and Social Media movie domain dataset. http://icwsm.cs.mcgill.ca. Cited December 6, 2013.

  22. Cornell University, Department of Computer Science, IMDb movie review dataset. http://www.cs.cornell.edu/people/pabo/movie-review-data/polarity-html.zip. Cited December 6, 2013.

    Google Scholar 

  23. Infochimps, Twitter Sentiment Dataset from the 1st 2008 Presidential Debate. http://www.infochimps.com/datasets/twitter-sentiment-dataset-2008-debates. Cited December 6, 2013.

    Google Scholar 

  24. Mendes, P.N., Jakob, M., Garcia-Silva, A., and Bizer, C., DBpedia spotlight: Shedding light on the Web of documents, Proc. 7th International Conference on Semantic Systems (I-Semantics 2011), Graz, 2011.

    Google Scholar 

  25. Korshunov, A., Problems and methods of determining attributes of social network users, Proc. 15th All-Russian Scientific Conference “Electronic Libraries: Promising Methods and Technologies, Electronic Collections”, 2013.

    Google Scholar 

  26. Grineva, M., Grinev, M., and Lizorkin, D., Extracting key terms from noisy and multitheme documents, Proc. 18th International World Wide Web Conference (WWW 2009), 2009.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Yu. Turdakov.

Additional information

Original Russian Text © D.Yu. Turdakov, N.A. Astrakhantsev, Ya.R. Nedumov, A.A. Sysoev, I.A. Andrianov, V.D. Mayorov, D.G. Fedorenko, A.V. Korshunov, S.D. Kuznetsov, 2014, published in Proceedings of the Institute for System Programming of RAS, 2014, vol. 26, issue 1, pp. 421–438.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Turdakov, D.Y., Astrakhantsev, N.A., Nedumov, Y.R. et al. Texterra: A framework for text analysis. Program Comput Soft 40, 288–295 (2014). https://doi.org/10.1134/S0361768814050090

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0361768814050090

Keywords

Navigation