Automatic Text Classification Through Point of Cultural Interest Digital Identifiers

  • Maria Carmela Catone
  • Mariacristina Falco
  • Alessandro MaistoEmail author
  • Serena Pelosi
  • Alfonso Siano
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 96)


The present work faces the problem of automatic classification and representation of unstructured texts into the Cultural Heritage domain. The research is carried out through a methodology based on the exploitation of machine-readable dictionaries of terminological simple words and multiword expressions. In the paper we will discuss the design and the population of a domain ontology, that enters into a complex interaction with the electronic dictionaries and a network of local grammars. A Max-Ent classifier, based on the ontology schema, aims to confer to each analyzed text an object identifier which is related to the semantic dimension of the text. Into this activity, the unstructured texts are processed through the use of the semantically annotated dictionaries in order to discover the underlying structure which facilitates the classification. The final purpose is the automatic attribution of POIds to texts on the base of the semantic features extracted into the texts through NLP strategies.


  1. 1.
    Altınel, B., Ganiz, M.C.: A new hybrid semi-supervised algorithm for text classification with class-based semantics. Knowl.-Based Syst. 108, 50–64 (2016)CrossRefGoogle Scholar
  2. 2.
    Bolasco, S., et al.: Statistica testuale e text mining: alcuni paradigmi applicativi. Quaderni di statistica 7, 17–53 (2005)Google Scholar
  3. 3.
    Chomsky, N.: Aspects of the Theory of Syntax, 11th edn. MIT press, Cambridge (1964)CrossRefGoogle Scholar
  4. 4.
    di Buono, M.P., Monteleone, M., Elia, A.: Terminology and knowledge representation. Italian linguistic resources for the archaeological domain. In: Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing, pp. 24–29 (2014)Google Scholar
  5. 5.
    Elia, A., Cardona, G.R.: Discorso scientifico e linguaggio settoriale. un esempio di analisi lessico-grammaticale di un testo neuro-biologico. Quaderni del Dipartimento di Scienze della Comunicazione–Università di Salerno”, Cicalese A., Landi A., a cura di,“Simboli, linguaggi e contesti, (2) (2002)Google Scholar
  6. 6.
    Grimmer, J., Stewart, B.M.: Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 21(3), 267–297 (2013)CrossRefGoogle Scholar
  7. 7.
    Gross, G.: Les classes d’objets. 28, 111–165 (2008)Google Scholar
  8. 8.
    Humphreys, A., Wang, R.J.H.: Automated text analysis for consumer research. J. Consum. Res. 44(6), 1274–1306 (2017)CrossRefGoogle Scholar
  9. 9.
    Laporte, E., Voyatzi, S.: An electronic dictionary of French multiword adverbs. In: Language Resources and Evaluation Conference. Workshop Towards a Shared Task for Multiword Expressions, pp. 31–34 (2008)Google Scholar
  10. 10.
    Lewis, D.D.: Naive (bayes) at forty: the independence assumption in information retrieval. In: European Conference on Machine Learning, pp. 4–15. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  11. 11.
    McCallum, A., Nigam, K., et al.: A comparison of event models for naive bayes text classification. In: AAAI-1998 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48. Citeseer (1998)Google Scholar
  12. 12.
    Nigam, K., Lafferty, J., McCallum, A.: Using maximum entropy for text classification. In: IJCAI-1999 Workshop on Machine Learning for Information Filtering, vol. 1, pp. 61–67 (1999)Google Scholar
  13. 13.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Silberztein, M.: NooJ computational devices. Formalising Natural Languages with NooJ, 1–13 (2013)Google Scholar
  15. 15.
    Silberztein, M.: An alternative approach to tagging. In: International Conference on Application of Natural Language to Information Systems, pp. 1–11. Springer, Heidelberg (2007)Google Scholar
  16. 16.
    Silberztein, m.: Complex annotations with nooJ. In: Proceedings of the 2007 International NooJ Conference, pp. p–214. Cambridge Scholars Publishing, Cmbridge (2007)Google Scholar
  17. 17.
    Thangaraj, M., Sivakami, M.: Text classification techniques: a literature review. Interdisc. J. Inf. Knowl. Manag. 13 (2018)Google Scholar
  18. 18.
    Vasa, K.: Text classification through statistical and machine learning methods: a survey. Int. J. Eng. Dev. Res. 4, 655–658 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Maria Carmela Catone
    • 1
  • Mariacristina Falco
    • 1
  • Alessandro Maisto
    • 1
    Email author
  • Serena Pelosi
    • 1
  • Alfonso Siano
    • 1
  1. 1.University of SalernoFiscianoItaly

Personalised recommendations