Characterizing User-Generated Text Content Mining: A Systematic Mapping Study of the Portuguese Language

  • Ellen SouzaEmail author
  • Dayvid Castro
  • Douglas Vitório
  • Ingryd Teles
  • Adriano L. I. Oliveira
  • Cristine Gusmão
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 444)


Unstructured data accounts for more than 80 % of enterprise data and is growing at an annual exponential rate of 60 %. Text mining refers to the process of discovering new, previously unknown and potentially useful information from a variety of unstructured data including user-generated text content (UGTC). Given that Portuguese language is one of the most common languages in the world, and it is also the second most frequent language on Twitter, the goal of this work is to plot the landscape of current studies that relates the application of text mining to UGTC in the Portuguese language. The systematic mapping review method was applied to search, select, and to extract data from the included studies. Our manual and automated searches retrieved 6075 studies up to year 2014, from which 35 were included in the study. Text classification concentrates 79 % of all text mining tasks, having the Naïve Bayes as the main classifier and Twitter as the main data source.


Text mining Text classification Opinion mining User-Generated content Portuguese language 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Marine-Roig, E., Anton Clavé, S.: Tourism analytics with massive user-generated content: A case study of Barcelona. J. Destin. Mark. Manag. 1–11 (2015).Google Scholar
  2. 2.
    Delen, D., Crossland, M.D.: Seeding the survey and analysis of research literature with text mining. Expert Syst. Appl. 34, 1707–1720 (2008).Google Scholar
  3. 3.
    Hotho, A., Andreas, N., Paaß, G., Augustin, S.: A Brief Survey of Text Mining. (2005).Google Scholar
  4. 4.
    Tan, A.: Text Mining : The state of the art and the challenges Concept-based. Proc. PAKDD 1999 Work. Knowl. Disocovery from Adv. Databases. 65–70 (1999).Google Scholar
  5. 5.
    Pardo, T., Gasperin, C., Caseli, H., Nunes, M. das G. V.: Computational Linguistics in Brazil : an overview. Proc. NAACL HLT 2010 Am. 1–7 (2010).Google Scholar
  6. 6.
    Poblete, B., Garcia, R., Mendoza, M., Jaimes, A.: Do All Birds Tweet the Same ? Characterizing Twitter Around the World. Society. 1025–1030 (2011).Google Scholar
  7. 7.
    Petersen, K., Feldt, R., Mujtaba, S., Mattsson, M.: Systematic Mapping Studies in Software Engineering. (2007).Google Scholar
  8. 8.
    Kitchenham, B., Charters, S.: Guidelines for performing Systematic Literature Reviews in Software Engineering. Tech. Rep. EBSE-2007-01, (2007).Google Scholar
  9. 9.
    Hotho, A., Nürnberger, A., Paaß, G.: A Brief Survey of Text Mining. Ldv Forum. (2005).Google Scholar
  10. 10.
    da Silva Conrado, M., Felippo, A., Salgueiro Pardo, T., Rezende, S.: A survey of automatic term extraction for Brazilian Portuguese. J. Brazilian Comput. Soc. 20, 12 (2014).Google Scholar
  11. 11.
    Lu, W., Stepchenkova, S.: User-Generated Content as a Research Mode in Tourism and Hospitality Applications: Topics, Methods, and Software. J. Hosp. Mark. Manag. (2015).Google Scholar
  12. 12.
    Laboreiro, G., Bošnjak, M., Sarmento, L., Rodrigues, E.M., Oliveira, E.: Determining language variant in microblog messages. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing - p. 902. ACM Press, USA (2013).Google Scholar
  13. 13.
    Evangelista, T.R., Padilha, T.P.P.: Monitoramento de Posts Sobre Empresas de E-Commerce em Redes Sociais Utilizando Análise de Sentimentos. (2013).Google Scholar
  14. 14.
    Takçı, H., Güngör, T.: A high performance centroid-based classification approach for language identification. Pattern Recognit. Lett. 33, 2077–2084 (2012).Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Ellen Souza
    • 1
    Email author
  • Dayvid Castro
    • 1
  • Douglas Vitório
    • 1
  • Ingryd Teles
    • 1
  • Adriano L. I. Oliveira
    • 2
  • Cristine Gusmão
    • 3
  1. 1.MiningBR Research GroupFederal Rural University of Pernambuco (UFRPE)Serra Talhada, PEBrazil
  2. 2.Centro de Informática, Federal Unversity of Pernambuco (CIn-UFPE)Recife, PEBrazil
  3. 3.Programa de Pós-Graduação Em Engenharia BiomédicaCentro de Tecnologia E Geociências - Federal Unversity of Pernambuco (CTG-UFPE)Recife, PEBrazil

Personalised recommendations