Skip to main content

LG-Starship: A Framework for Text Analysis

  • Conference paper
  • First Online:
Web, Artificial Intelligence and Network Applications (WAINA 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1150))

  • 2346 Accesses

Abstract

In this work we present a new framework for the analysis of Italian texts that could help linguists to perform rapid text analysis. The framework, that performs both statistical and rule-based analysis, is called LG-Starship. The idea is to built a modular software that includes the basic algorithms to perform different kinds of analysis. The framework will include a Preprocessing Module a POS Tagging and Lemmatization module, a Statistic Module, a Semantic Module based on Distributional Analysis algorithms, and a Syntactic Module, which analyze syntax structures of a selected sentence and tag the verbs and its arguments with semantic labels. The objective of the Framework is to build an “all-in-one” platform for NLP which allows any kind of users to perform basic and advanced text analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amato, F., Mazzeo, A., Moscato, V., Picariello, A.: Semantic management of multimedia documents for e-government activity, pp. 1193–1198 (2009). https://doi.org/10.1109/CISIS.2009.195

  2. Amato, F., Moscato, V., Picariello, A., Sperli, G.: Multimedia social network modeling: a proposal, pp. 448–453. Institute of Electrical and Electronics Engineers Inc. (2016). https://doi.org/10.1109/ICSC.2016.20

  3. Amato, F., Castiglione, A., De Santo, A., Moscato, V., Picariello, A., Persia, F., Sperlí, G.: Recognizing human behaviours in online social networks. Comput. Secur. 74, 355–370 (2018). https://doi.org/10.1016/j.cose.2017.06.002

    Article  Google Scholar 

  4. Attardi, G., Fuschetto, A., Tamberi, F., Simi, M., Vecchi, E.M.: Experiments in tagger combination: arbitrating, guessing, correcting, suggesting. In: Proceedings of Workshop Evalita, p. 10 (2009)

    Google Scholar 

  5. Audet, C., Burgess, C., et al.: Using a high-dimensional memory model to evaluate the properties of abstract and concrete words. In: Proceedings of the Cognitive Science Society, pp. 37–42. Citeseer (1999)

    Google Scholar 

  6. Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, pp. 69–72. Association for Computational Linguistics (2006)

    Google Scholar 

  7. Burgess, C.: From simple associations to the building blocks of language: modeling meaning in memory with the HAL model. Behav. Res. Methods Instrum. Comput. 30(2), 188–198 (1998)

    Article  Google Scholar 

  8. Burgess, C.: Representing and resolving semantic ambiguity: a contribution from high-dimensional memory modeling (2001)

    Google Scholar 

  9. Choi, J.D.: Dynamic feature induction: the last gist to the state-of-the-art. In: Proceedings of NAACL-HLT, pp. 271–281 (2016)

    Google Scholar 

  10. Chomsky, N.: Aspects of the Theory of Syntax, vol. 11. MIT Press, Cambridge (1965)

    Google Scholar 

  11. Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 1–8. Association for Computational Linguistics (2002)

    Google Scholar 

  12. Cunningham, H.: Gate, a general architecture for text engineering. Comput. Humanit. 36(2), 223–254 (2002)

    Article  Google Scholar 

  13. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: an architecture for development of robust HLT applications. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 168–175. Association for Computational Linguistics (2002)

    Google Scholar 

  14. Elia, A.: Lessico e sintassi tra tempo e massa parlante. In: Marchese, M.P., Nocentini, A. (eds.) Il lessico nella teoria e nella storia linguistica, pp. 15–47. Edizioni il Calamo, Calamo (2014)

    Google Scholar 

  15. Elia, A., Martinelli, M., D’Agostino, E.: Lessico e Strutture sintattiche. Introduzione alla sintassi del verbo italiano. Liguori, Napoli (1981)

    Google Scholar 

  16. Graffi, G., Scalise, S.: Le lingue e il linguaggio. Il Mulino, Bologna (2002)

    Google Scholar 

  17. Gross, M.: Transformational Analysis of French Verbal Constructions. University of Pennsylvania (1971)

    Google Scholar 

  18. Gross, M.: Méthodes en syntaxe. Hermann, Paris (1975)

    Google Scholar 

  19. Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)

    Article  Google Scholar 

  20. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:150801991 (2015)

  21. Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition induction and representation of knowledge. Psychol. Rev. 104(2), 211 (1997)

    Article  Google Scholar 

  22. Loria, S.: TextBlob: simplified text processing. Secondary TextBlob: Simplified Text Processing (2014)

    Google Scholar 

  23. Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Methods Instrum. Comput. 28(2), 203–208 (1996)

    Article  Google Scholar 

  24. Lyding, V., Stemle, E., Borghetti, C., Brunello, M., Castagnoli, S., Dell’Orletta, F., Dittmann, H., Lenci, A., Pirrelli, V.: The PAISA corpus of Italian web texts. In: Proceedings of the 9th Web as Corpus Workshop (WaC-9), pp. 36–43 (2014)

    Google Scholar 

  25. Manning, C.D.: Part-of-speech tagging from 97% to 100%: is it time for some linguistics? In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 171–189. Springer (2011)

    Google Scholar 

  26. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL (System Demonstrations), pp. 55–60 (2014)

    Google Scholar 

  27. Morton, T., Kottmann, J., Baldridge, J., Bierner, G.: OpenNLP: a Java-based NLP toolkit (2005)

    Google Scholar 

  28. OpenNLP: A machine learning based toolkit for the processing of natural language text (2018). http://opennlp.apache.org. Accessed 18 June 2013

  29. Pantel, P.: Inducing ontological co-occurrence vectors. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 125–132. Association for Computational Linguistics (2005)

    Google Scholar 

  30. Pianta, E., Zanoli, R.: TagPro: a system for Italian PoS tagging based on SVM. Intelligenza Artificiale 4(2), 8–9 (2007)

    Google Scholar 

  31. Pianta, E., Girardi, C., Zanoli, R.: The TextPro tool suite. In: LREC. Citeseer (2008)

    Google Scholar 

  32. Schmid, H.: Treetagger - a language independent part-of-speech tagger, vol. 43, p. 28. Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart (1995)

    Google Scholar 

  33. Silberztein, M.: NooJ: a linguistic annotation system for corpus processing. In: Proceedings of HLT/EMNLP on Interactive Demonstrations, pp. 10–11. Association for Computational Linguistics (2005)

    Google Scholar 

  34. Silberztein, M.: NooJ manual [electronic resource]. Mode of access (2014)

    Google Scholar 

  35. Smedt, T.D., Daelemans, W.: Pattern for Python. J. Mach. Learn. Res. 13, 2063–2067 (2012)

    Google Scholar 

  36. Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, vol. 13, pp. 63–70. Association for Computational Linguistics (2000)

    Google Scholar 

  37. Vietri, S.: The Italian module for NooJ. In: Proceedings of the First Italian Conference on Computational Linguistics, CLiC-it 2014 (2014)

    Google Scholar 

  38. Wilcock, G.: Text annotation with OpenNLP and UIMA (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessandro Maisto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Maisto, A. (2020). LG-Starship: A Framework for Text Analysis. In: Barolli, L., Amato, F., Moscato, F., Enokido, T., Takizawa, M. (eds) Web, Artificial Intelligence and Network Applications. WAINA 2020. Advances in Intelligent Systems and Computing, vol 1150. Springer, Cham. https://doi.org/10.1007/978-3-030-44038-1_38

Download citation

Publish with us

Policies and ethics