Advertisement

Empirical Software Engineering

, Volume 23, Issue 3, pp 1352–1382 | Cite as

Sentiment Polarity Detection for Software Development

  • Fabio Calefato
  • Filippo Lanubile
  • Federico Maiorano
  • Nicole Novielli
Article

Abstract

The role of sentiment analysis is increasingly emerging to study software developers’ emotions by mining crowd-generated content within social software engineering tools. However, off-the-shelf sentiment analysis tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. Here, we present Senti4SD, a classifier specifically trained to support sentiment analysis in developers’ communication channels. Senti4SD is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity. It exploits a suite of both lexicon- and keyword-based features, as well as semantic features based on word embedding. With respect to a mainstream off-the-shelf tool, which we use as a baseline, Senti4SD reduces the misclassifications of neutral and positive posts as emotionally negative. To encourage replications, we release a lab package including the classifier, the word embedding space, and the gold standard with annotation guidelines.

Keywords

Sentiment Analysis Communication Channels Stack Overflow Word Embedding Social Software Engineering 

Notes

Acknowledgements

This work is partially supported by the project ‘EmoQuest - Investigating the Role of Emotions in Online Question & Answer Sites’, funded by the Italian Ministry of Education, University and Research (MIUR) under the program “Scientific Independence of young Researchers” (SIR). The computational work has been executed on the IT resources made available by two projects, ReCaS and PRISMA, funded by MIUR under the program “PON R&C 2007–2013”. We thank Pierpaolo Basile for insightful discussions and helpful comments and the annotators involved in the gold standard building.

References

  1. Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2012) Discovering value from community activity on focused question answering sites: A case study of stack overflow. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, KDD’12, pp 850–858,  https://doi.org/10.1145/2339530.2339665
  2. Asaduzzaman M, Mashiyat AS, Roy CK, Schneider KA (2013) Answering questions about unanswered questions of stack overflow. In: Proceedings of the 10th Working Conference on Mining Software Repositories, IEEE Press, Piscataway, NJ, USA, MSR ‘13, pp 97–100Google Scholar
  3. Baroni M, Dinu G, Kruszewski G (2014) Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland, pp 238–247CrossRefGoogle Scholar
  4. Barua A, Thomas SW, Hassan AE (2014) What are developers talking about? an analysis of topics and trends in stack over- flow. Empir Softw Eng 19(3):619–654.  https://doi.org/10.1007/s10664-012-9231-y CrossRefGoogle Scholar
  5. Basile P, Novielli N (2015) Uniba: Sentiment analysis of English tweets combining micro-blogging, lexicon and semantic features. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), ACL, pp 595–600Google Scholar
  6. Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155zbMATHGoogle Scholar
  7. Blaz CCA, Becker K (2016) Sentiment analysis in tickets for IT support. In: Proceedings of the 13th International Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ‘16, pp 235–246,  https://doi.org/10.1145/2901739.2901781
  8. Bollegala D, Weir D, Carroll J (2013) Cross-Domain sentiment classification using a sentiment sensitive thesaurus. IEEE Trans Knowl Data Eng 25(8):1719–1731.  https://doi.org/10.1109/TKDE.2012.103
  9. Calefato F, Lanubile F, Marasciulo MC, Novielli N (2015) Mining successful answers in stack overflow. In: Proceedings of the 12th Working Conference on Mining Software Repositories, IEEE Press, Piscataway, NJ, USA, MSR ‘15, pp 430–433Google Scholar
  10. Carofiglio V, de Rosis F, Novielli N (2009) Cognitive Emotion Modeling in Natural Language Communication. Springer London, London, pp 23–44Google Scholar
  11. Cohen J (1968) Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological BulletinGoogle Scholar
  12. Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, ACM, New York, NY, USA, ICML ‘08, pp 160–167,  https://doi.org/10.1145/1390156.1390177
  13. Danescu-Niculescu-Mizil C, Sudhof M, Jurafsky D, Leskovec J, Potts C (2013) A computational approach to politeness with application to social factors. In: ACL (1), The Association for Computer Linguistics, pp 250–259Google Scholar
  14. Ekman P (1999) Handbook of Cognition and Emotion. John Wiley & Sons LtdGoogle Scholar
  15. De Lucia A, Fasano F, Oliveto R, Tortora G (2007) Recovering traceability links in software artifact management sys- tems using information retrieval methods. ACM Trans Softw Eng Methodol 16(4).  https://doi.org/10.1145/1276933.1276934
  16. Denning PJ. (2012) Moods. Commun. ACM, 55(12):33–35Google Scholar
  17. Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: A library for large linear classification. J Mach Learn Res 9:1871–1874 URL http://dl.acm.org/citation.cfm?id=1390681.1442794 zbMATHGoogle Scholar
  18. Ford D and Parnin C (2015) Exploring causes of frustration for software developers. In CHASE, pages 115–116. IEEE PressGoogle Scholar
  19. Gachechiladze D, Lanubile F, Novielli N, and Serebrenik A (2017). Anger and its direction in collaborative software development. In Proceedings of the 39th International Conference on Software Engineering: New Ideas and Emerging Results Track (ICSE-NIER '17). IEEE Press, Piscataway, NJ, USA, 11–14.  https://doi.org/10.1109/ICSE-NIER.2017.18
  20. Graziotin D, Fagerholm F, Wang X, Abrahamsson P (2017) Unhappy Developers: Bad for Themselves, Bad for Process, and Bad for Software Product. To appear as a poster paper in the Proceedings of the 39th International Conference on Software Engineering (ICSE '17)Google Scholar
  21. Guzman E, Bruegge B (2013) Towards emotional awareness in software development teams. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ACM, New York, NY, USA, ESEC/FSE 2013, pp 671–674,  https://doi.org/10.1145/2491411.2494578
  22. Guzman E, Azocar D, Li Y (2014) Sentiment analysis of commit comments in Github: An empirical study. In: Proceedings of the 11th Working Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR 2014, pp 352–355,  https://doi.org/10.1145/2597073.2597118
  23. Guzman E, Alkadhi R, Seyff N (2016) A needle in a haystack: What do twitter users say about software? In: 24th IEEE International Requirements Engineering Conference In: Proceedings of the IEEE 24th International Requirements Engineering Conference (RE), pp. 96–105,  https://doi.org/10.1109/RE.2016.67
  24. He H, Garcia EA (2009) Learning from Imbalanced Data. IEEE Trans Knowl Data Eng 21(9):1263–1284.  https://doi.org/10.1109/TKDE.2008.239 CrossRefGoogle Scholar
  25. Helleputte T (2015) Liblinea R: Linear Predictive Models Based on the LIBLINEAR C/C++ Library. R package version 1.94-2Google Scholar
  26. Hogenboom A, Frasincar F, de Jong F, Kaymak U (2015) Using rhetorical structure in sentiment analysis. Commun ACM 58(7):69–77.  https://doi.org/10.1145/2699418 CrossRefGoogle Scholar
  27. Islam MDR and Zibran MF (2017) Leveraging automated sentiment analysis in software engineering. In Proceedings of the 14th International Conference on Mining Software Repositories (MSR '17). IEEE Press, Piscataway, NJ, USA, 203–214.  https://doi.org/10.1109/MSR.2017.9
  28. Joachims T (1998) Text categorization with suport vector machines: Learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning, Springer-Verlag, London, UK, UK, ECML ‘98, pp 137–142Google Scholar
  29. Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, USA, KDD ‘06, pp 217–226,  https://doi.org/10.1145/1150402.1150429
  30. Jongeling R, Datta S, Serebrenik A (2015) Choosing your weapons: On sentiment analysis tools for software engineering research. In: Software Maintenance and Evolution (ICSME), 2015 I.E. International Conference on, pp 531–535,  https://doi.org/10.1109/ICSM.2015.7332508
  31. Kucuktunc O, Cambazoglu BB, Weber I, Ferhatosmanoglu H (2012) A large- scale sentiment analysis for Yahoo! answers. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, ACM, New York, NY, USA, WSDM ‘12, pp 633–642,  https://doi.org/10.1145/2124295.2124371
  32. Kuhn M (2016) Contributions from Jed Wing, S. Weston, A. Williams, C. Keefer, A. Engelhardt, T. Cooper, Z. Mayer, B. Kenkel, the R Core Team, M. Benesty, R. Lescarbeau, A. Ziem, L. Scrucca, Y. Tang, and C. Candan., caret: Classification and Regression Training, 2016, r package version 6.0–70. Available: https://CRAN.R- project.org/package=caret
  33. Landauer TK, Dutnais ST (1997) A solution to Platos problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 104(2):211–240CrossRefGoogle Scholar
  34. Lazarus R (1991) Emotion and adaptation. Oxford University Press, New YorkGoogle Scholar
  35. Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (Eds) Advances in Neural Information Processing Systems 27, Curran Associates, Inc., pp 2177–2185, URL http://papers.nips.cc/paper/5477-neural-word-embedding-as- implicit-matrix-factorization.pdf
  36. Maalej W, Kurtanovic Z, Nabil H, Stanik C (2016) On the automatic classification of app reviews. Requir Eng 21(3):311–331.  https://doi.org/10.1007/s00766-016-0251-9 CrossRefGoogle Scholar
  37. Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp 55–60Google Scholar
  38. Mäntylä M, Adams B, Destefanis G, Graziotin D, Ortu M (2016) Mining valence, arousal, and dominance: Possibilities for detecting burnout and productivity? In: Proceedings of the 13th International Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ‘16, pp 247–258,  https://doi.org/10.1145/2901739.2901752
  39. Mäntylä MV, Novielli N, Lanubile F, Claes M, and Kuutila M (2017) Bootstrapping a lexicon for emotional arousal in software engineering. In Proceedings of the 14th International Conference on Mining Software Repositories (MSR '17). IEEE Press, Piscataway, NJ, USA, 198-202.  https://doi.org/10.1109/MSR.2017.47
  40. Meta (2017). Meta Stack exchange is too harsh to new users. http://meta.stackexchange.com/questions/179003/stack- exchange-is-too-harsh-to- new-users-please-help-them-improve- low-quality-po, Last accessed: February 2017
  41. Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. CoRR abs/1301.3781Google Scholar
  42. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (Eds) Advances in Neural Information Processing Systems 26, Cur- ran Associates, Inc., pp 3111–3119Google Scholar
  43. Miller GA, Charles WG (1991) Contextual Correlates of Semantic Similarity. Lang Cogn Process 6(1):1–28.  https://doi.org/10.1080/01690969108406936 CrossRefGoogle Scholar
  44. Mitchell TM (1997) Machine Learning (1 ed.). McGraw-Hill, Inc., New York, NY, USAGoogle Scholar
  45. Mohammad SM (2016) Sentiment analysis: Detecting valence, emotions, and other affectual states from text. In: Meiselman H (Ed) Emotion Measurement, ElsevierGoogle Scholar
  46. Mohammad SM, Kiritchenko S, Zhu X (2013) NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. CoRR abs/1308.6242, URL http://arxiv.org/abs/1308.6242
  47. Müller SC and Fritz T (2015) Stuck and frustrated or in flow and happy: sensing developers' emotions and progress. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE '15), Vol. 1. IEEE Press, Piscataway, 688-699Google Scholar
  48. Murgia A, Tourani P, Adams B, Ortu M (2014) Do developers feel emotions? An exploratory analysis of emotions in software artifacts. In: Proceedings of the 11th Working Conference on Mining Software Repositories, ACM, New York, MSR 2014, pp 262–271,  https://doi.org/10.1145/2597073.2597086
  49. Novielli N, Strapparava C (2013) The role of affect analysis in dialogue act identification. IEEE Trans Affect Comput 4(4):439–451.  https://doi.org/10.1109/T-AFFC.2013.20 CrossRefGoogle Scholar
  50. Novielli N, Calefato F, Lanubile F (2014) Towards discovering the role of emotions in Stack Overflow. In Proceedings of the 6th International Workshop on Social Software Engineering (SSE 2014). ACM, New York, 33-36.  https://doi.org/10.1145/2661685.2661689
  51. Novielli N, Calefato F, Lanubile F (2015) The challenges of sentiment detection in the social programmer ecosystem. In: Proceedings of the 7th International Workshop on Social Software Engineering, ACM, New York, SSE 2015, pp 33–40,  https://doi.org/10.1145/2804381.2804387
  52. Ortu M, Adams B, Destefanis G, Tourani P, Marchesi M, Tonelli R (2015) Are bullies more productive?: Empirical study of affectiveness vs. issue fixing time. In: Proceedings of the 12th Working Conference on Mining Software Repositories, IEEE Press, Piscataway, NJ, USA, MSR ‘15, pp 303–313Google Scholar
  53. Ortu M, Murgia A, Destefanis G, Tourani P, Tonelli R, Marchesi M, Adams B (2016) The emotional side of software developers in Jira. In: Proceedings of the 13th International Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ‘16, pp 480–483,  https://doi.org/10.1145/2901739.2903505
  54. Pang B, Lee L (2008) Opinion mining and sentiment anal- ysis. Found Trends Inf Retr 2(1–2):1–135.  https://doi.org/10.1561/1500000011 CrossRefGoogle Scholar
  55. Panichella S, Sorbo AD, Guzman E, Visaggio A, Canfora G, Gall H (2015) How can i improve my app? classifying user reviews for software maintenance and evolution. 31st IEEE International Conference on Software Maintenance and EvolutionGoogle Scholar
  56. Pennebaker J and Francis M, Linguistic Inquiry and Word Count: LIWC. Erlbaum Publishers, 2001Google Scholar
  57. Pletea D, Vasilescu B, and Serebrenik A (2014) Security and emotion: sentiment analysis of security discussions on GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR 2014). ACM, New York, NY, USA, 348-351.  https://doi.org/10.1145/2597073.2597117
  58. R Development Core Team (2008) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna http://www.R-project.org, ISBN 3-900051-07-0Google Scholar
  59. Rahman MM, Roy CK, Keivanloo I (2015) Recommending insightful comments for source code using crowdsourced knowledge. In: 15th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2015, Bremen, Germany, September 27–28, 2015, pp 81–90,  https://doi.org/10.1109/SCAM.2015.7335404
  60. Russell J (1980) A circumplex model of affect. J Pers Soc Psychol 39(6):1161–1178CrossRefGoogle Scholar
  61. Saif H, Fernandez M, He Y, Alani H (2014) On stopwords, filtering and data sparsity for sentiment analysis of twitter. In: Chair) NCC, Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S (eds) Proceedings of the Ninth International Conference on Language Re- sources and Evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik, IcelandGoogle Scholar
  62. Scherer K, Wranik T, Sangsue J, Tran V, Scherer U (2004) Emotions in everyday life: Probability of oc- currence, risk factors, appraisal and reaction patterns. Soc Sci Inf 43(4):499–570CrossRefGoogle Scholar
  63. Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47.  https://doi.org/10.1145/505282.505283 CrossRefGoogle Scholar
  64. SEmotion (2016) Proceedings of the 1st International Workshop on Emotion Awareness in Software Engineering. ACM, New YorkGoogle Scholar
  65. Shaver P, Schwartz J, Kirson D, O’Connor C (1987) Emotion knowledge: Further exploration of a prototype approach. J Pers Soc Psychol 52(6):1061–1086.  https://doi.org/10.1037//0022-3514.52.6.1061 CrossRefGoogle Scholar
  66. Sinha V, Lazar A, Sharif B (2016) Analyzing developer sentiment in commit logs. In: Proceedings of the 13th International Conference on Mining Software Repositories, ACM, New York, NY, USA, MSR ‘16, pp 520–523,  https://doi.org/10.1145/2901739.2903501
  67. Smolensky P (1990) Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artif Intell 46(1–2):159–216.  https://doi.org/10.1016/0004-3702(90)90007-M MathSciNetCrossRefzbMATHGoogle Scholar
  68. Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, pp 1631–1642Google Scholar
  69. Strapparava C, Valitutti A (2004) WordNet-Affect: an affective extension of WordNet. In: Proceedings of LREC, vol 4, pp 1083–1086Google Scholar
  70. Stone PJ, Dunphy DC, Smith MS, Ogilvie DM (1966) The general inquirer: A computer approach to content analysis. The MIT Press, Cambridge, MAGoogle Scholar
  71. Thelwall M, Buckley K, Paltoglou G (2012) Sentiment strength detection for the social web. J Am Soc Inf Sci Technol 63(1):163–173.  https://doi.org/10.1002/asi.21662 CrossRefGoogle Scholar
  72. Tian Y, Lo D, Lawall J (2014) Sewordsim: Software-specific word similarity database. In: Companion Proceedings of the 36th International Conference on Software Engineering, ACM, New York, NY, USA, ICSE Companion 2014, pp 568–571,  https://doi.org/10.1145/2591062.2591071
  73. Tromp E, Pechenizkiy M (2015) Pattern-based emotion classification on social media. In: Gaber MM, Cocea M, Wiratunga N, Goker A (eds) Advances in social media analysis. Studies in Computational Intelligence, vol 602. Springer, ChamGoogle Scholar
  74. Wittgenstein L (1965) Philosophical Investigations. The Macmillan Company, New YorkzbMATHGoogle Scholar
  75. Ye X, Shen H, Ma X, Bunescu RC, Liu C (2016) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14–22, 2016, pp 404–415,  https://doi.org/10.1145/2884781.2884862

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Dipartimento JonicoUniversity of Bari “A. Moro”TarantoItaly
  2. 2.Dipartimento di InformaticaUniversity of Bari “A. Moro”BariItaly

Personalised recommendations