Skip to main content
Log in

A comprehensive study of domain-specific emoji meanings in sentiment classification

  • Original Paper
  • Published:
Computational Management Science Aims and scope Submit manuscript

Abstract

The inclusion of emojis when solving natural language processing problems (e.g., text‐based emotion detection, sentiment classification, topic analysis) improves the quality of the results. However, the existing literature focuses only on the general meaning transferred by emojis and has not examined emojis in the context of investor sentiment classification. This article provides a comprehensive study of the impact that inclusion of emojis could make in predicting stock investors’ sentiment. We found that a classifier that incorporates domain-specific emoji vectors, which capture the syntax and semantics of emojis in the financial context, could improve the accuracy of investor sentiment classification. Also, when domain-specific emoji vectors are considered, daily time-series of investor sentiment demonstrated additional marginal explanatory power on returns and volatility. Further, a comparison of conducted cluster analysis of domain-specific versus domain-independent emoji vectors showed different natural groupings of emojis reflecting domain specificity when special meaning of emojis is considered. Finally, domain-specific emoji vectors could result in the development of significantly superior emoji sentiment lexicons. Given the importance of domain-specific emojis in investor sentiment classification of social media data, we have developed an emoji lexicon that could be used by other researchers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://stocktwits.com/.

  2. Emojis are officially introduced to Stocktwits platform since August 2016.

  3. (Loughran and McDonald (2011) report that three quarters of the words that are denoted as having negative sentiment by the Harvard Dictionary are not generally considered negative in financial contexts.

  4. The main categories of emojis are: people and facial expressions, animals and nature, food and drinks, activities, travel and places, objects, symbols, and flags.

  5. The measures of classification performance are discussed in Sect. 3.3.1.

  6. The formula to measure the level of investor sentiment is discussed in Sect. 3.3.3.

  7. We have used Scikit-Learn library to implement all these classifiers with their default parameters.

  8. Note that the goal of this research is not to produce state-of-the-art results in investor sentiment analysis; our aim is to show whether the domain-specific emoji vectors improve the classification performance of investor sentiment.

  9. https://github.com/lpolech/hiervis.

  10. Realized volatility is measured using the 5-min sub-sampled intra-day volatility measure available at https://realized.oxford-man.ox.ac.uk/.

  11. Available at http://www.policyuncertainty.com/us_daily.html.

  12. Available at http://www.cboe.com/vix.

  13. Available at https://www.philadelphiafed.org/research-and-data/real-time-center/business-conditions-index.

  14. https://github.com/lpolech/emoji_sentiment.

References

  • Aalborg HA, Molnár P, de Vries JE (2019) What can explain the price, volatility and trading volume of bitcoin? Finance Res Lett 29:255–265. https://doi.org/10.1016/j.frl.2018.08.010

    Article  Google Scholar 

  • Antweiler W, Frank MZ (2004) Is all that talk just noise? The information content of internet stock message boards. J Finance 59(3):1259–1294. https://doi.org/10.1111/j.1540-6261.2004.00662.x

    Article  Google Scholar 

  • Atkins A, Niranjan M, Gerding E (2018) Financial news predicts stock market volatility better than close price. J Finance Data Sci 4(2):120–137. https://doi.org/10.1016/j.jfds.2018.02.002

    Article  Google Scholar 

  • Baker M, Wurgler J (2006) Investor sentiment and the cross-section of stock returns. J Finance 61(4):1645–1680. https://doi.org/10.1111/j.1540-6261.2006.00885.x

    Article  Google Scholar 

  • Baker M, Wurgler J (2007) Investor sentiment in the stock market. J Econ Perspect 21(2):129–152. https://doi.org/10.1257/jep.21.2.129

    Article  Google Scholar 

  • Barbieri F, Kruszewski G, Ronzano F, Saggion H (2016) How cosmopolitan are emojis?: exploring emojis usage and meaning over different languages with distributional semantics. In: Proceedings of the 24th ACM international conference on multimedia. Association for Computing Machinery, Amsterdam, pp 531–535

  • Bishop CM (2006) Pattern recognition and machine learning, 1st edn. Springer, New York

    Google Scholar 

  • Boughorbel S, Jarray F, El-Anbari M (2017) Optimal classifier for imbalanced data using Matthews correlation coefficient metric. PLoS ONE 12(6):1–17. https://doi.org/10.1371/journal.pone.0177678

    Article  Google Scholar 

  • Brown GW, Cliff MT (2004) Investor sentiment and the near-term stock market. J Empir Finance 11(1):1–27. https://doi.org/10.1016/j.jempfin.2002.12.001

    Article  Google Scholar 

  • Cavallo M, Demiralp ÇA (2019) Clustrophile 2: guided visual clustering analysis. IEEE Trans Visual Comput Graph 25(1):267–276. https://doi.org/10.1109/TVCG.2018.2864477

  • Chau F, Deesomsak R, Koutmos D (2016) Does investor sentiment really matter? Int Rev Financ Anal 48:221–232. https://doi.org/10.1016/j.irfa.2016.10.003

    Article  Google Scholar 

  • Cookson JA, Niessner M (2020) Why don’t we agree? Evidence from a social network of investors. J Finance 75(1):173–228. https://doi.org/10.1111/jofi.12852

    Article  Google Scholar 

  • Corsi F (2009) A simple approximate long-memory model of realized volatility. J Financ Econom 7(2):174–196. https://doi.org/10.1093/jjfinec/nbp001

    Article  Google Scholar 

  • Da Z, Engelberg J, Gao P (2015) The sum of all FEARS investor sentiment and asset prices. Rev Financ Stud 28(1):1–32. https://doi.org/10.1093/rfs/hhu072

    Article  Google Scholar 

  • Danesi M (2016) The semiotics of emoji: the rise of visual language in the age of the internet, 1st edn. Bloomsbury Academic, London

    Google Scholar 

  • Das SR, Chen MY (2007) Yahoo! for Amazon: sentiment extraction from small talk on the web. Manage Sci 53(9):1375–1388. https://doi.org/10.1287/mnsc.1070.0704

    Article  Google Scholar 

  • De Long JB, Shleifer A, Summers LH, Waldmann RJ (1990) Noise trader risk in financial markets. J Polit Econ 98(4):703–738

    Article  Google Scholar 

  • De Vries NJ, Olech ŁP, Moscato P (2019) Introducing clustering with a focus in marketing and consumer analysis. In: De Vries NJ, Moscato P (eds) Business and consumer analytics: new ideas. Springer, Berlin, pp 154–175

    Google Scholar 

  • Deng L, Wiebe J, Choi Y (2014) Joint inference and disambiguation of implicit sentiments via implicature constraints. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers. Dublin City University and Association for Computational Linguistics, Dublin, pp 79–88

  • Deveikyte J, Geman H, Piccari C, Provetti A (2020) A sentiment analysis approach to the prediction of market volatility. arXiv preprint arXiv:2012.05906

  • Dimson T (2015) Emojineering part 1: machine learning for emoji trends. Instagram Eng Blog 30:52

    Google Scholar 

  • Eisner B, Rocktäschel T, Augenstein I, Bosnjak M, Riedel S (2016) emoji2vec: learning emoji representations from their description. In: Proceedings of the fourth international workshop on natural language processing for social media. Association for Computational Linguistics, Austin, pp 48–54

  • Esuli A, Sebastiani F (2006) SENTIWORDNET: a publicly available lexical resource for opinion mining. In: Proceedings of the fifth international conference on language resources and evaluation (LREC'06). European Language Resources Association (ELRA), Genoa, pp 417–422

  • Felbo B, Mislove A, Søgaard A, Rahwan I, Lehmann S (2017) Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In: Proceedings of the 2017 conference on empirical methods in natural language processing. Association for Computational Linguistics, Copenhagen, pp 1615–1625

  • Fernández-Gavilanes M, Juncal-Martínez J, García-Méndez S, Costa-Montenegro E, González-Castaño FJ (2018) Creating emoji lexica from unsupervised sentiment analysis of their descriptions. Expert Syst Appl 103:74–91. https://doi.org/10.1016/j.eswa.2018.02.043

    Article  Google Scholar 

  • Godin F, Vandersmissen B, De Neve W, Van de Walle R (2015) Multimedia Lab @ ACL WNUT NER shared task: named entity recognition for Twitter microposts using distributed word representations. In: Proceedings of the workshop on noisy user-generated text. Association for Computational Linguistics, Beijing, pp 146–153

  • Goldman E (2018) Emojis and the law. Wash Law Rev 93(3):1227–1291

    Google Scholar 

  • Grabowski P (2016) Could a smiley make you buy? How using emoji in marketing affects conversions [AdEspresso’s experiment]. Retrieved from https://adespresso.com/blog/emoji-marketing-affects-conversions/

  • Gupta S, Singh R, Singh J (2020, 2–4 Oct. 2020) A hybrid approach for enhancing accuracy and detecting sarcasm in sentiment analysis. Paper presented at the 2020 IEEE International conference on computing, power and communication technologies (GUCON).

  • Hamilton WL, Clark K, Leskovec J, Jurafsky D (2016) Inducing domain-specific sentiment lexicons from unlabeled corpora. In: Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics, Austin, pp 595–605

  • Hovy D (2015) Demographic factors improve classification performance. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers). Association for Computational Linguistics, Beijing, pp 752–762

  • Kamps J, Marx M, Mokken RJ, de Rijke M (2004) Using WordNet to measure semantic orientations of adjectives. In: Proceedings of the fourth international conference on language resources and evaluation (LREC'04). European Language Resources Association, Lisbon, pp 1115–1118

  • Katarya R, Meena SK (2021) Machine learning techniques for heart disease prediction: a comparative study and analysis. Health Technol 11(1):87–97. https://doi.org/10.1007/s12553-020-00505-7

    Article  Google Scholar 

  • Keim D, Andrienko G, Fekete J-D, Görg C, Kohlhammer J, Melançon G (2008) Visual analytics: definition, process, and challenges. In: Kerren A, Stasko JT, Fekete J-D, North C (eds) Information visualization: human-centered issues and perspectives. Springer, Berlin, pp 154–175

    Chapter  Google Scholar 

  • Kho SJ, Padhee S, Bajaj G, Thirunarayan K, Sheth A (2019) Domain-specific use cases for knowledge-enabled social media analysis. In: Agarwal N, Dokoohaki N, Tokdemir S (eds) Emerging research challenges and opportunities in computational social network analysis and mining. Springer International Publishing, Cham, pp 233–246

    Google Scholar 

  • Kim S-H, Kim D (2014) Investor sentiment from internet message postings and the predictability of stock returns. J Econ Behav Organ 107:708–729. https://doi.org/10.1016/j.jebo.2014.04.015

    Article  Google Scholar 

  • Kim N, Lučivjanská K, Molnár P, Villa R (2019) Google searches and stock market activity: evidence from Norway. Finance Res Lett 28:208–220. https://doi.org/10.1016/j.frl.2018.05.003

    Article  Google Scholar 

  • Le Q, Mikolov T (2014) Distributed representations of sentences and documents. Paper presented at the Proceedings of the 31st international conference on international conference on machine learning—volume 32, Beijing

  • Lebduska L (2014) Emoji, emoji, what for art thou? Harlot: A revealing look at the arts of persuasion 1(12)

  • Lerner JS, Li Y, Valdesolo P, Kassam KS (2015) Emotion and decision making. Annu Rev Psychol 66(1):799–823. https://doi.org/10.1146/annurev-psych-010213-115043

    Article  Google Scholar 

  • Liang W-L (2016) Sensitivity to investor sentiment and stock performance of open market share repurchases. J Bank Finance 71:75–94. https://doi.org/10.1016/j.jbankfin.2016.06.003

    Article  Google Scholar 

  • Liang C, Tang L, Li Y, Wei Y (2020) Which sentiment index is more informative to forecast stock market volatility? Evidence from China. Int Rev Financ Anal 71:101552. https://doi.org/10.1016/j.irfa.2020.101552

    Article  Google Scholar 

  • Linderman GC, Steinerberger S (2017) Clustering with t-SNE, provably. arXiv preprint arXiv:1706.02582

  • Liu K-L, Li W-J, Guo M (2012) Emoticon smoothed language models for Twitter sentiment analysis. In: Proceedings of the twenty-sixth AAAI conference on artificial intelligence. AAAI Press, Toronto, pp 1678–1684

  • Ljubešić N, Fišer D (2016) A global analysis of emoji usage. In: Proceedings of the 10th web as corpus workshop. Association for Computational Linguistics, Berlin, pp 82–89

  • Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J Finance 66(1):35–65. https://doi.org/10.1111/j.1540-6261.2010.01625.x

    Article  Google Scholar 

  • Mahmoudi N, Docherty P, Moscato P (2018) Deep neural networks understand investors better. Decis Support Syst 112:23–34. https://doi.org/10.1016/j.dss.2018.06.002

    Article  Google Scholar 

  • McCulloch G, Gawne L (2018) Emoji grammar as beat gestures. In: Proceedings of the 1st international workshop on emoji understanding and applications in social media (Emoji2018). CEUR workshop proceedings, Stanford

  • Miah Y, Prima CNE, Seema SJ, Mahmud M, Shamim Kaiser M (2021) Performance comparison of machine learning techniques in identifying dementia from open access clinical datasets. In: Saeed F, Al-Hadhrami T, Mohammed F, Mohammed E (eds) Advances on Smart and Soft Computing. Advances in Intelligent Systems and Computing, vol 1188. Springer, Singapore. https://doi.org/10.1007/978-981-15-6048-4_8

  • Mian GM, Sankaraguruswamy S (2012) Investor sentiment and stock market response to earnings news. Acc Rev 87(4):1357–1384. https://doi.org/10.2308/accr-50158

    Article  Google Scholar 

  • Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems. Curran Associates Inc., Lake Tahoe, pp 3111–3119

  • Milanović S, Marković N, Pamučar D, Gigović L, Kostić P, Milanović SD (2021) Forest fire probability mapping in Eastern Serbia: logistic regression versus random forest method. Forests 12(1):5

    Article  Google Scholar 

  • Miller H, Thebault-Spieker J, Chang S, Johnson I, Terveen L, Hecht B (2016) "Blissfully happy" or "ready to fight": varying interpretations of emoji. In: Proceedings of the 10th international conference on web and social media. AAAI Press, Cologne, pp 259–268

  • Mohammadi M, Rashid TA, Karim SHT, Aldalwie AHM, Tho QT, Bidaki M, Rahmani AM, Hosseinzadeh M (2021) A comprehensive survey and taxonomy of the SVM-based intrusion detection systems. J Netw Comput Appl 178:102983. https://doi.org/10.1016/j.jnca.2021.102983

    Article  Google Scholar 

  • Naeem MA, Farid S, Faruk B, Shahzad SJH (2020) Can happiness predict future volatility in stock markets? Res Int Bus Finance 54:101298. https://doi.org/10.1016/j.ribaf.2020.101298

    Article  Google Scholar 

  • Novak PK, Smailović J, Sluban B, Mozetič I (2015) Sentiment of emojis. PLoS ONE 10(12):1–22. https://doi.org/10.1371/journal.pone.0144296

    Article  Google Scholar 

  • Olech ŁP, Paradowski M (2016) Hierarchical Gaussian mixture model with objects attached to terminal and non-terminal dendrogram nodes. In: Burduk R, Jackowski K, Kurzyński M, Woźniak M, Żołnierek A (eds) Proceedings of the 9th international conference on computer recognition systems CORES 2015. Springer International Publishing, Wroclaw, pp 191–201

  • Olech ŁP, Spytkowski M, Kwaśnicka H, Michalewicz Z (2021) Hierarchical data generator based on tree-structured stick breaking process for benchmarking clustering methods. Inf Sci 554:99–119. https://doi.org/10.1016/j.ins.2020.12.020

    Article  Google Scholar 

  • Oliveira N, Cortez P, Areal N (2013) On the predictability of stock market behavior using StockTwits sentiment and posting volume. In: Correia L, Reis LP, Cascalho J (eds) Progress in artificial intelligence. Springer, Berlin, pp 355–365

    Chapter  Google Scholar 

  • Oliveira N, Cortez P, Areal N (2016) Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decis Supp Syst 85:62–73. https://doi.org/10.1016/j.dss.2016.02.013

    Article  Google Scholar 

  • Pavalanathan U, Eisenstein J (2016) More emojis, less :) the competition for paralinguistic function in microblog writing. First Monday. https://doi.org/10.5210/fm.v21i11.6879

    Article  Google Scholar 

  • Prakash KB, Kanagachidambaresan GR (2021) Introduction to tensorflow package. In: Prakash KB, Kanagachidambaresan GR (eds) Programming with tensorFlow: solution for edge computing applications. Springer International Publishing, Cham, p 1–4. https://doi.org/10.1007/978-3-030-57077-4_1

  • Rao D, Ravichandran D (2009) Semi-supervised polarity lexicon induction. In: Proceedings of the 12th conference of the European Chapter of the Association for computational linguistics. Association for Computational Linguistics, Athens, pp 675–682

  • Reis PMN, Pinho C (2020) A new European investor sentiment index (EURsent) and its return and volatility predictability. J Behav Exp Finance 27:100373. https://doi.org/10.1016/j.jbef.2020.100373

    Article  Google Scholar 

  • Renault T (2017) Intraday online investor sentiment and return patterns in the U.S. stock market. J Bank Finance 84:25–40. https://doi.org/10.1016/j.jbankfin.2017.07.002

    Article  Google Scholar 

  • San Vicente I et al (2014) Simple, Robust and (almost) Unsupervised generation of polarity lexicons for multiple languages. In: Proceedings of the 14th conference of the European chapter of the association for computational linguistics, Association for Computational Linguistics, Gothenburg, Sweden, pp 88–97. https://doi.org/10.3115/v1/E14-1010

  • Seok SI, Cho H, Ryu D (2019) Firm-specific investor sentiment and the stock market response to earnings news. N Am J Econ Finance 48:221–240. https://doi.org/10.1016/j.najef.2019.01.014

    Article  Google Scholar 

  • Shaham U, Steinerberger S (2017) Stochastic neighbor embedding separates well-separated clusters. arXiv preprint arXiv:1702.02670

  • Shynkevich Y, McGinnity TM, Coleman S, Belatreche A (2015, 7–10 Dec 2015) Predicting stock price movements based on different categories of news articles. Paper presented at the 2015 IEEE symposium series on computational intelligence

  • Spytkowski M, Kwasnicka H (2012) Hierarchical clustering through bayesian inference. In: Nguyen NT, Hoang K, Jȩdrzejowicz P (eds) Computational Collective Intelligence. Technologies and Applications. ICCCI 2012. Lecture Notes in Computer Science, vol 7653. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34630-9_53

  • Spytkowski M, Olech ŁP, Kwaśnicka H (2016) Hierarchy of groups evaluation using different F-SCORE VARIAnts. In: Nguyen NT, Trawiński B, Fujita H, Hong T-P (eds) Intelligent information and database systems. Springer, Berlin, pp 654–664

    Chapter  Google Scholar 

  • Stambaugh RF, Yu J, Yuan Y (2012) The short of it: investor sentiment and anomalies. J Financ Econ 104(2):288–302. https://doi.org/10.1016/j.jfineco.2011.12.001

    Article  Google Scholar 

  • Turney PD, Littman ML (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst 21(4):315–346. https://doi.org/10.1145/944012.944013

    Article  Google Scholar 

  • van der Maaten L (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15:3221–3245

    Google Scholar 

  • van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605

    Google Scholar 

  • Vidakovic B (2013) Engineering biostatistics: an introduction using MATLAB and WinBUGS, 1st edn. Wiley, Hoboken

    Google Scholar 

  • Weiss GM (2004) Mining with rarity: a unifying framework. ACM SIGKDD Explor Newsl 6(1):7–19

    Article  Google Scholar 

  • Widdows D, Dorow B (2002) A graph model for unsupervised lexical acquisition. In: Proceedings of the 19th international conference on computational linguistics. Association for Computational Linguistics, Sinica, Taipei, pp 1–7

  • Wijeratne S, Balasuriya L, Sheth A, Doran D (2017) A semantics-based measure of emoji similarity. In: Proceedings of the international conference on web intelligence. ACM, New York, pp 646–653

  • Willoughby JF, Liu S (2018) Do pictures help tell the story? An experimental test of narrative and emojis in a health text message intervention. Comput Hum Behav 79:75–82. https://doi.org/10.1016/j.chb.2017.10.031

    Article  Google Scholar 

  • Wu Q-W, Xia J-F, Ni J-C, Zheng C-H (2021) GAERF: predicting lncRNA-disease associations by graph auto-encoder and random forest. Brief Bioinform. https://doi.org/10.1093/bib/bbaa391

    Article  Google Scholar 

  • Yang Y, Eisenstein J (2015) Putting things in context: community-specific embedding projections for sentiment analysis. arXiv preprint arXiv:1511.06052

  • Yu J, Yuan Y (2011) Investor sentiment and the mean–variance relation. J Financ Econ 100(2):367–381. https://doi.org/10.1016/j.jfineco.2010.10.011

    Article  Google Scholar 

  • Zhao G, Liu Z, Chao Y, Qian X (2020) CAPER: context-aware personalized emoji recommendation. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2020.2966971

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Zbigniew Michalewicz for his contribution during various stages of the paper preparation. Additionally, we are grateful to the editor and the anonymous reviewers for careful reading of the manuscript and their insightful critical observations that have helped improve our paper. We also express our sincere gratitude to Stocktwits® for providing their data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nader Mahmoudi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Details of Hk++

Appendix: Details of Hk++

Hk++ (Olech and Paradowski 2016) adopts the Gaussian mixture model (GMM) where clusters (mixtures) are defined as Gaussian distributions (Bishop 2006). In GMM, clustering with n clusters is described with the following probability distribution function (pdf):

$$G\left(\psi ,\mu ,\varSigma \right)=\sum _{i=1}^{n}{\psi }_{i}N\left({\mu }_{i},{\varSigma }_{i}\right)$$
(16)

where \(\mu \) is a set of all cluster centers (\({\mu }_{i}\in \mu \)), \(\varSigma \) is a set of all cluster covariance matrices (\({\varSigma }_{i}\in \varSigma \)) and \(\psi \) is a set of mixture weights (\({\psi }_{i}\in \psi \)) such that \({\psi }_{i}\) ranges from 0 to 1 (inclusive on both sides) and \(\sum _{i=1}^{n}{\psi }_{i}=1\). \(N\left({\mu }_{i},{\varSigma }_{i}\right)\) is a multivariate Gaussian distribution of the \(i\)-th mixture with center \({\mu }_{i}\) and covariance matrix \({\varSigma }_{i}\). Hk++ uses multiple GMMs and recursively (breadth-first) arrange them in a hierarchical structure. That hierarchical structure is an object cluster hierarchy, where every parent with its immediate children is represented by an extended GMM pdf function. In the extended version, GMM models the child nodes with the presence of an additional background component \(N\left({\mu }_{B},{\varSigma }_{B}\right)\) representing the parent node:

$$G\left(\alpha ,\psi ,\mu ,{\mu }_{B},\varSigma ,{\varSigma }_{B}\right)=\alpha N\left({\mu }_{B},{\varSigma }_{B}\right)+\left(1-\alpha \right)G\left(\psi ,\mu ,\varSigma \right)$$
(17)

where the balance between a parent and its children is defined using parameter \(\alpha \) that ranges from 0 to 1 (inclusive on both sides), \({\mu }_{B}\) is the center of a parent node, \({\varSigma }_{B}\) is the covariance matrix of a parent node, and the rest of variables are defined as in Eq. (16). The role of the background component is to capture sparsely distributed data, enabling the child clusters to be discovered on a filtered (denoized) set of data points (Olech and Paradowski 2016). Model parameters are estimated using the expectation maximization (EM) algorithm (Bishop 2006). This algorithm consists of two steps: reassignment of points to mixtures (expectation step), and recalculation of \({\mu }_{i}\), \({\varSigma }_{i}\) for every mixture (except for the background component) based on the updated assignment (maximization step). These steps are performed until convergence of the pdf function or up to the predefined number of iterations. In the implemented visual clustering procedure, the Hk++ parameters, including the number of iterations, are dynamically set up by the human operator.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahmoudi, N., Olech, Ł.P. & Docherty, P. A comprehensive study of domain-specific emoji meanings in sentiment classification. Comput Manag Sci 19, 159–197 (2022). https://doi.org/10.1007/s10287-021-00407-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10287-021-00407-7

Navigation