Abstract
The paper presents a service oriented, online engine for processing and clustering texts in the Polish language. The engine, designed according to Web-Oriented Architecture paradigm, allows to run a large number of different language tools (like tagger, named entity recognizer, feature extractor) and clustering tools (like CLUTO or R) from almost any type of applications including HTML/JavaScript’s ones. It allows constructing of a complex workflow, not only a simple chain of tools. To meet high availability requirements, the engine is deployed in a private cloud.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Broda, B., Kędzia, P., Marcińczuk, M., Radziszewski, A., Ramocki, R., Wardyński, A.: Fextor: A feature extraction framework for natural language processing: A case study in word sense disambiguation, relation recognition and anaphora resolution. In: Przepiórkowski, A., Piasecki, M., Jassem, K., Fuglewicz, P. (eds.) Computational Linguistics. SCI, vol. 458, pp. 41–62. Springer, Heidelberg (2013)
Eder, M.: Rolling stylometry. DSH: Digital Scholarship in the Humanities, vol. 30 (in press, 2015)
Hinrichs, M., Zastrow, T., Hinrichs, E.: WebLicht: Web-based LRT Services in a Distributed eScience Infrastructure. In: Proceedings of the International Conference on Language Resources and Evaluation, pp. 489–493. European Language Resources Association (2010)
Kuta, M., Kitowski, J.: Clustering Polish Texts with Latent Semantic Analysis. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2010, Part II. LNCS, vol. 6114, pp. 532–539. Springer, Heidelberg (2010)
Marcińczuk, M., Kocoń, J., Janicki, M.: Liner2 — A Customizable Framework for Proper Names Recognition for Polish. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds.) Intell. Tools for Building a Scientific Information. SCI, vol. 467, pp. 231–254. Springer, Heidelberg (2013)
Ogrodniczuk, M., Lenart, M.: A multi-purpose online toolset for NLP applications. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2013. LNCS, vol. 7934, pp. 392–395. Springer, Heidelberg (2013)
Radziszewski, A., Śniatowski, T.: Maca: a configurable tool to integrate Polish morphological data. In: International Workshop on Free/Open-Source Rule-Based Machine Translation, pp. 29–36 (2011)
Radziszewski, A.: A tiered CRF tagger for polish. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds.) Intell. Tools for Building a Scientific Information. SCI, vol. 467, pp. 215–230. Springer, Heidelberg (2013)
Thies, G., Gottfried, V.: Web-oriented architectures: On the impact of web 2.0 on service-oriented architectures. In: Asia-Pacific Services Computing Conference, pp.1075–1082 (2008)
Wittenburg, P., et al.: Resource and Service Centres as the Backbone for a Sustainable Service Infrastructure. In: Proceedings of the International Conference on Language Resources and Evaluation, pp. 60–63. European Language Resources Association (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Walkowiak, T. (2015). Web Based Engine for Processing and Clustering of Polish Texts. In: Zamojski, W., Mazurkiewicz, J., Sugier, J., Walkowiak, T., Kacprzyk, J. (eds) Theory and Engineering of Complex Systems and Dependability. DepCoS-RELCOMEX 2015. Advances in Intelligent Systems and Computing, vol 365. Springer, Cham. https://doi.org/10.1007/978-3-319-19216-1_49
Download citation
DOI: https://doi.org/10.1007/978-3-319-19216-1_49
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19215-4
Online ISBN: 978-3-319-19216-1
eBook Packages: EngineeringEngineering (R0)