Skip to main content
Log in

Swarm optimization clustering methods for opinion mining

  • Published:
Natural Computing Aims and scope Submit manuscript

Abstract

Supervised machine learning and opinion lexicon are the most frequent approaches for opinion mining, but they require considerable effort to prepare the training data and to build the opinion lexicon, respectively. In this paper, a novel unsupervised clustering approach is proposed for opinion mining. Three swarm algorithms based on Particle Swarm Optimization are evaluated using three corpora with different levels of complexity with respect to size, number of opinions, domains, languages, and class balancing. K-means and Agglomerative clustering algorithms, as well as, the Artificial Bee Colony and Cuckoo Search swarm-based algorithms were selected for comparison. The proposed swarm-based algorithms achieved better accuracy using the word bigram feature model as the pre-processing technique, the Global Silhouette as optimization function, and on datasets with two classes: positive and negative. Although the swarm-based algorithms obtained lower result for datasets with three classes, they are still competitive considering that neither labeled data, nor opinion lexicons are required for the opinion clustering approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Abbasi A, Hassan A, Dhar M (2014) Benchmarking Twitter sentiment analysis tools. In: Proceedings of LREC-2014, the ninth international conference on language resources and evaluation, March, pp 823–829

  • Balazs JA, Velásquez JD (2016) Opinion mining and information fusion: a survey. Inf Fusion 27:95–110. https://doi.org/10.1016/j.inffus.2015.06.002

    Article  Google Scholar 

  • Cagnina L, Errecalde M, Ingaramo D, Rosso P (2014) An efficient particle swarm optimization approach to cluster short texts. Inf Sci 265:36–49. https://doi.org/10.1016/j.ins.2013.12.010

    Article  Google Scholar 

  • Cagnina LC, Errecalde ML, Ingaramo DA (2008) A discrete particle swarm optimizer for clustering short-text corpora. In: Proceedings of international conference on bioinspired optimization methods and their applications, BIOMA 2008, pp 1–10

  • Coletta LFS, d Silva NFF, Hruschka ER, Hruschka ER (2014) Combining classification and clustering for tweet sentiment analysis. In: Brazilian conference on intelligent systems, pp 210–215. https://doi.org/10.1109/BRACIS.2014.46

  • ComScore (2016) Comscore: cross-platform measurement company. http://www.comscore.com/

  • Cornwell B (2015) Linkage criteria for agglomerative hierarchical clustering. Cambridge University Press, Cambridge, pp 270–274. https://doi.org/10.1017/CBO9781316212530.011 Structural Analysis in the Social Sciences,

    Book  Google Scholar 

  • Cui X, Potok TE (2005) Document clustering analysis based on hybrid pso+k-means algorithm. Special issue, pp 27–33

  • Cui X, Potok TE, Palathingal P (2005) Document clustering using particle swarm optimization. In: Proceedings 2005 IEEE swarm intelligence symposium, pp 185–191. https://doi.org/10.1109/SIS.2005.1501621

  • Evangelista TR, Padilha TPP (2013) Monitoramento de Posts Sobre Empresas de E-Commerce em Redes Sociais Utilizando Análise de Sentimentos. Brazilian Workshop on Social Network Analysis and Mining (BraSNAM)

  • Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89. https://doi.org/10.1145/2436256.2436274

    Article  Google Scholar 

  • Filho TMS, Pimentel BA, Souza RM, Oliveira AL (2015) Hybrid methods for fuzzy clustering based on fuzzy c-means and improved particle swarm optimization. Expert Syst Appl 42:6315–6328. https://doi.org/10.1016/j.eswa.2015.04.032

    Article  Google Scholar 

  • Fouladgar N, Lotfi S (2016) A novel approach for optimization in dynamic environments based on modified cuckoo search algorithm. Soft Comput 20(7):2889–2903

    Article  Google Scholar 

  • Go A, Bhayani R, Huang L (2010) Twitter Sentiment classification using distant supervision. Tech rep

  • Huang Y (2016) Conceptually categorizing geographic features from text based on latent semantic analysis and ontologies. Ann GIS 22(2):113–127

    Article  Google Scholar 

  • Ingaramo D, Errecalde M, Cagnina L, Rosso P (2009) Particle swarm optimization for clustering short-text corpora. Front Artif Intell Appl 196(1):3–19. https://doi.org/10.3233/978-1-60750-010-0-3

    Article  Google Scholar 

  • Ingaramo D, Errecalde M, Cagnina L, Rosso P (2011) A particle swarm optimizer to cluster parallel Spanish–English short-text corpora. CEUR Workshop Proc 824(Icl):43–48

    Google Scholar 

  • Kamel N, Ouchen I, Baali K (2016) A sampling-PSO-K-means algorithm for document clustering. Adv Intell Syst Comput 388:405–411. https://doi.org/10.1007/978-3-319-23207-2

    Article  Google Scholar 

  • Karaboga D, Basturk B (2007) A powerful and efficient algorithm for numerical function optimization: artificial bee colony (abc) algorithm. J Global Optim 39(3):459–471

    Article  MathSciNet  MATH  Google Scholar 

  • Karol S, Mangat V (2013) Evaluation of text document clustering approach based on particle swarm optimization. Cent Eur J Comp Sci 3(2):69–90. https://doi.org/10.2478/s13537-013-0104-2

    Article  Google Scholar 

  • Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: IEEE international conference on systems, man, and cybernetics, pp 4–8

  • Kushal D, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: WWW, pp 519–528

  • Li G, Liu F (2010) A clustering-based approach on sentiment analysis. In: International conference on intelligent systems and knowledge engineering (ISKE), pp 331–337. https://doi.org/10.1109/ISKE.2010.5680859

  • Li G, Liu F (2012) Application of a clustering method on sentiment analysis. J Inf Sci 38(2):127–139. https://doi.org/10.1177/0165551511432670

    Article  Google Scholar 

  • Li G, Liu F (2014) Sentiment analysis based on clustering: a framework in improving accuracy and recognizing neutral opinions. Appl Intell 40(3):441–452. https://doi.org/10.1007/s10489-013-0463-3

    Article  MathSciNet  Google Scholar 

  • Liu B, Zhang L (2012) A survey of opinion mining and sentiment analysis. Min Text Data Chapter 1:415–463

    Article  Google Scholar 

  • MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA 1:281–297

  • Marine-Roig E, Anton Clavé S (2015) Tourism analytics with massive user-generated content: a case study of Barcelona. J Destin Mark Manag. https://doi.org/10.1016/j.jdmm.2015.06.004

    Article  Google Scholar 

  • Marques-lucena C, Sarraipa J (2015) Framework for customers sentiment analysis. Adv Intell Syst Comput. https://doi.org/10.1007/978-3-319-11313-5

    Article  Google Scholar 

  • Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113. https://doi.org/10.1016/j.asej.2014.04.011

    Article  Google Scholar 

  • Mostafa MM (2013) More than words: social networks text mining for consumer brand sentiments. Expert Syst Appl 40(10):4241–4251. https://doi.org/10.1016/j.eswa.2013.01.019

    Article  Google Scholar 

  • Owoputi O, Connor BO, Dyer C, Gimpel K, Schneider N (2012) Part-of-speech tagging for Twitter: word clusters and other advances. Carnegie Mellon University, Tech rep

  • Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of LREC, pp 1320–1326. https://doi.org/10.1371/journal.pone.0026624

  • Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics, ACL ’05, pp 115–124. https://doi.org/10.3115/1219840.1219855

  • Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(12):1–135. https://doi.org/10.1561/1500000011

    Article  Google Scholar 

  • Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP) (July), pp 79–86

  • Premalatha K, Natarajan A (2009) Discrete PSO with GA operators for document clustering. Int J Recent Trends Eng 1(1):20–24

    Google Scholar 

  • Premalatha K, Natarajan AM (2010) Hybrid PSO and GA models for document clustering. Int J Adv Soft Comput Its Appl 2(3):302–320

    Google Scholar 

  • Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2015.06.015

    Article  Google Scholar 

  • Sarkar S, Roy A, Purkayastha B (2013) Application of particle swarm optimization in data clustering: a survey. Int J Comput Appl 65(25):38–46

    Google Scholar 

  • Sarkar S, Roy A, Purkayastha B (2014a) Clustering of documents using particle swarm optimization and semantics information. Int J Comput Sci Inf Technol 5(3):4175–4180

    Google Scholar 

  • Sarkar S, Roy A, Purkayastha BS (2014b) A comparative analysis of particle swarm optimization and K-means algorithm for text clustering using Nepali Wordnet. Int J Nat Lang Comput (IJNLC) 3(3):83–92

    Article  Google Scholar 

  • Souza E, Alves T, Teles I, Oliveira ALI, Gusmão C (2016a) TOPIE: an open-source opinion mining pipeline to analyze consumers sentiment in Brazilian Portuguese. In: Computational processing of the Portuguese language: 12th international conference, PROPOR 2016, Tomar, Portugal, July 13–15, 2016, Proceedings. Springer International Publishing, pp 95–105

  • Souza E, Oliveira ALI, Silva A, Oliveira G, Santos D (2016b) An unsupervised particle swarm optimization approach for opinion clustering. In: Brazilian conference on intelligent systems, pp 307–312. https://doi.org/10.1109/BRACIS.2016.54

  • Teles V, Santos D, Souza E (2016) Uma Análise Comparativa de Técnicas Supervisionadas para Mineração de Opinião de Consumidores Brasileiros no Twitter. In: XIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC-2016), pp 217–228

  • Tripathy A, Agrawal A, Rath SK (2016) Classification of sentiment reviews using n-gram machine learning approach. Expert Syst Appl 57:117–126

    Article  Google Scholar 

  • Wu ST, Li Y, Xu Y, Pham B, Chen P (2004) Automatic pattern-taxonomy extraction for web mining. In: IEEE/WIC/ACM international conference on web intelligence, 2004. WI 2004. Proceedings. IEEE, pp 242–248

  • Yang XS, Deb S (2009) Cuckoo search via lévy flights. In: World congress on nature and biologically inspired computing, 2009. NaBIC 2009. IEEE, pp 210–214

  • Zhang Y, Xiong X, Zhang Q (2013) An improved self-adaptive PSO algorithm with detection function for multimodal function optimization problems. Math Probl Eng 2013(2013):716952. https://doi.org/10.1155/2013/716952

Download references

Acknowledgements

Ellen Souza and Alisson Silva are supported by FACEPE (Pernambuco Research Foundation). Adriano L. I. Oliveira, Diego Santos, and Gustavo Oliveira are supported by CNPq (Brazilian Council for Scientific and Technological Development).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ellen Souza.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Souza, E., Santos, D., Oliveira, G. et al. Swarm optimization clustering methods for opinion mining. Nat Comput 19, 547–575 (2020). https://doi.org/10.1007/s11047-018-9681-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11047-018-9681-2

Keywords

Navigation