Abstract
This paper considers the existing algorithms and suggests new algorithms for preliminary text processing that permit its quality to be increased, including: the deduction-inversion architecture of decomposition, modified algorithm of bidirectional interference, and morphological analysis based on preliminary annotation with tags of parts of speech.
Similar content being viewed by others
References
Vishnaykov, T.N. and Yatsko, V.A., RF Inventor’s Certificate no. 10599, 2008.
Yatsko, V., Larchenko, E., Starikov, M., and Vishnyakov, T., Linguistic Toolbox — Package of Programs for Automatic Text Analysis, Textressourcen und lexikalisches wissen, Berlin, 2008, pp. 119–128.
Vishnyakov, T.N., Larchenko, E.V., Yatsko, V.A., RF Inventor’s Certificate no. 2008615744, 2008.
Yatsko, V. and Kozlov, M., A Bilingual Translation System in Foreign Language Teaching, Proceedings of the 11-th International Conference on Speech and Computer, St.-Petersburg, 2006, pp. 226–231.
Yatsko, V. Shilov, S., and Vishnyakov, T., A Semi-Automatic Text Summarization System, Proceedings of the 10-th International Conference on Speech and Computer, Patras, 2005, pp. 283–288.
Kilgarriff, A. BNC Database and Word Frequency Lists (an Electronic Resource), 2004, URL:http://www.kilgarriff. co.uk/bnc-readme.html (the accessed date: 20.08.2009).
Marchisio. G., Dhillon, N., Liang, J., et al., A Case Study in Natural Language Based Web Search, in Natural Language Processing and Text Mining, Kao, A., Poteet, S., Eds., London, 2007, pp. 69–90.
Mustafaraj, E., Hoof, V., and Freisleben, D., Mining Diagnostic Text Reports by Learning to Annotate Knowledge Roles, in Natural Language Processing and Text Mining, Kao, A., Poteet, S., Eds., London, 2007, pp. 45–68.
Yatsko, V.A. and Vishnyakov, T.N., Some Problems of Developing Modern Systems for Automatic Text Summarization, Nauch.-Techn. Inf., Ser. 2, 2007, no. 9, pp. 7–13.
Tsuruoka, Y., Tsujii, J., Bidirectional Interference with the Easiest-First Strategy for Tagging Sequence Data (an Electronic Resource), 2003, URL: http://www-tsujii.is.s. u-tokyo.ac.jp/~tsuruoka/papers/emnlp05bidir.pdf (the accessed date: 20.08.2009)
Porter, M.F., Snowball: A Language for Stemming Algorithms (an Electronic Resource), 2001, URL: http://snowball. tartarus.org/texts/introduction.html (the accessed date: 20.08.2009).
Paice, C.D., Another Stemmer, SIGIR Forum, 1990, vol. 24, no. 3, pp. 56–61.
Börjars, K. and Burridge, K., Introducing English Grammar, London: Arnold, 2001.
Brinton, L.J., The Structure of Modern English. A Linguistic Introduction. XXI, Amsterdam; Philadelphia: John Benjamins, 2000.
Additional information
Original Russian Text © V.A. Yatsko, M.S. Starikov, E.V. Larchenko, T.N. Vishnyakov, 2009, published in Nauchno-Tekhnicheskaya Informatsiya, Seriya 2, 2009, No. 11, pp. 24–30.
About this article
Cite this article
Yatsko, V.A., Starikov, M.S., Larchenko, E.V. et al. The algorithms for preliminary text processing: Decomposition, annotation, morphological analysis. Autom. Doc. Math. Linguist. 43, 336–343 (2009). https://doi.org/10.3103/S0005105509060041
Received:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0005105509060041