Abstract
In this work we present a new framework for the analysis of Italian texts that could help linguists to perform rapid text analysis. The framework, that performs both statistical and rule-based analysis, is called LG-Starship. The idea is to built a modular software that includes the basic algorithms to perform different kinds of analysis. The framework will include a Preprocessing Module a POS Tagging and Lemmatization module, a Statistic Module, a Semantic Module based on Distributional Analysis algorithms, and a Syntactic Module, which analyze syntax structures of a selected sentence and tag the verbs and its arguments with semantic labels. The objective of the Framework is to build an “all-in-one” platform for NLP which allows any kind of users to perform basic and advanced text analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amato, F., Mazzeo, A., Moscato, V., Picariello, A.: Semantic management of multimedia documents for e-government activity, pp. 1193–1198 (2009). https://doi.org/10.1109/CISIS.2009.195
Amato, F., Moscato, V., Picariello, A., Sperli, G.: Multimedia social network modeling: a proposal, pp. 448–453. Institute of Electrical and Electronics Engineers Inc. (2016). https://doi.org/10.1109/ICSC.2016.20
Amato, F., Castiglione, A., De Santo, A., Moscato, V., Picariello, A., Persia, F., Sperlí, G.: Recognizing human behaviours in online social networks. Comput. Secur. 74, 355–370 (2018). https://doi.org/10.1016/j.cose.2017.06.002
Attardi, G., Fuschetto, A., Tamberi, F., Simi, M., Vecchi, E.M.: Experiments in tagger combination: arbitrating, guessing, correcting, suggesting. In: Proceedings of Workshop Evalita, p. 10 (2009)
Audet, C., Burgess, C., et al.: Using a high-dimensional memory model to evaluate the properties of abstract and concrete words. In: Proceedings of the Cognitive Science Society, pp. 37–42. Citeseer (1999)
Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, pp. 69–72. Association for Computational Linguistics (2006)
Burgess, C.: From simple associations to the building blocks of language: modeling meaning in memory with the HAL model. Behav. Res. Methods Instrum. Comput. 30(2), 188–198 (1998)
Burgess, C.: Representing and resolving semantic ambiguity: a contribution from high-dimensional memory modeling (2001)
Choi, J.D.: Dynamic feature induction: the last gist to the state-of-the-art. In: Proceedings of NAACL-HLT, pp. 271–281 (2016)
Chomsky, N.: Aspects of the Theory of Syntax, vol. 11. MIT Press, Cambridge (1965)
Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 1–8. Association for Computational Linguistics (2002)
Cunningham, H.: Gate, a general architecture for text engineering. Comput. Humanit. 36(2), 223–254 (2002)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: an architecture for development of robust HLT applications. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 168–175. Association for Computational Linguistics (2002)
Elia, A.: Lessico e sintassi tra tempo e massa parlante. In: Marchese, M.P., Nocentini, A. (eds.) Il lessico nella teoria e nella storia linguistica, pp. 15–47. Edizioni il Calamo, Calamo (2014)
Elia, A., Martinelli, M., D’Agostino, E.: Lessico e Strutture sintattiche. Introduzione alla sintassi del verbo italiano. Liguori, Napoli (1981)
Graffi, G., Scalise, S.: Le lingue e il linguaggio. Il Mulino, Bologna (2002)
Gross, M.: Transformational Analysis of French Verbal Constructions. University of Pennsylvania (1971)
Gross, M.: Méthodes en syntaxe. Hermann, Paris (1975)
Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:150801991 (2015)
Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition induction and representation of knowledge. Psychol. Rev. 104(2), 211 (1997)
Loria, S.: TextBlob: simplified text processing. Secondary TextBlob: Simplified Text Processing (2014)
Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Methods Instrum. Comput. 28(2), 203–208 (1996)
Lyding, V., Stemle, E., Borghetti, C., Brunello, M., Castagnoli, S., Dell’Orletta, F., Dittmann, H., Lenci, A., Pirrelli, V.: The PAISA corpus of Italian web texts. In: Proceedings of the 9th Web as Corpus Workshop (WaC-9), pp. 36–43 (2014)
Manning, C.D.: Part-of-speech tagging from 97% to 100%: is it time for some linguistics? In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 171–189. Springer (2011)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL (System Demonstrations), pp. 55–60 (2014)
Morton, T., Kottmann, J., Baldridge, J., Bierner, G.: OpenNLP: a Java-based NLP toolkit (2005)
OpenNLP: A machine learning based toolkit for the processing of natural language text (2018). http://opennlp.apache.org. Accessed 18 June 2013
Pantel, P.: Inducing ontological co-occurrence vectors. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 125–132. Association for Computational Linguistics (2005)
Pianta, E., Zanoli, R.: TagPro: a system for Italian PoS tagging based on SVM. Intelligenza Artificiale 4(2), 8–9 (2007)
Pianta, E., Girardi, C., Zanoli, R.: The TextPro tool suite. In: LREC. Citeseer (2008)
Schmid, H.: Treetagger - a language independent part-of-speech tagger, vol. 43, p. 28. Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart (1995)
Silberztein, M.: NooJ: a linguistic annotation system for corpus processing. In: Proceedings of HLT/EMNLP on Interactive Demonstrations, pp. 10–11. Association for Computational Linguistics (2005)
Silberztein, M.: NooJ manual [electronic resource]. Mode of access (2014)
Smedt, T.D., Daelemans, W.: Pattern for Python. J. Mach. Learn. Res. 13, 2063–2067 (2012)
Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, vol. 13, pp. 63–70. Association for Computational Linguistics (2000)
Vietri, S.: The Italian module for NooJ. In: Proceedings of the First Italian Conference on Computational Linguistics, CLiC-it 2014 (2014)
Wilcock, G.: Text annotation with OpenNLP and UIMA (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Maisto, A. (2020). LG-Starship: A Framework for Text Analysis. In: Barolli, L., Amato, F., Moscato, F., Enokido, T., Takizawa, M. (eds) Web, Artificial Intelligence and Network Applications. WAINA 2020. Advances in Intelligent Systems and Computing, vol 1150. Springer, Cham. https://doi.org/10.1007/978-3-030-44038-1_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-44038-1_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44037-4
Online ISBN: 978-3-030-44038-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)