Community Smell Occurrence Prediction on Multi-Granularity by Developer-Oriented Features and Process Metrics

Huang, Zi-Jie; Shao, Zhi-Qing; Fan, Gui-Sheng; Yu, Hui-Qun; Yang, Xing-Guang; Yang, Kang

doi:10.1007/s11390-021-1596-1

Community Smell Occurrence Prediction on Multi-Granularity by Developer-Oriented Features and Process Metrics

Regular Paper
Published: 31 January 2022

Volume 37, pages 182–206, (2022)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Zi-Jie Huang¹,
Zhi-Qing Shao¹,
Gui-Sheng Fan^1,2,
Hui-Qun Yu^1,3,
Xing-Guang Yang¹ &
…
Kang Yang¹

225 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Community smells are sub-optimal developer community structures that hinder productivity. Prior studies performed smell prediction and provided refactoring guidelines from a top-down aspect to help community shepherds. Simultaneously, refactoring smells also requires bottom-up effort from every developer. However, supportive measures and guidelines for them are not available at a fine-grained level. Since recent work revealed developers’ personalities and working states could influence community smells’ emergence and variation, we build prediction models with experience, sentiment, and development process features of developers considering three smells including Organizational Silo, Lone Wolf, and Bottleneck, as well as two related classes including smelly developer and smelly quitter. We predict the five classes in the individual granularity, and we also generate forecasts for the number of smelly developers in the community granularity. The proposed models achieve F-measures ranging from 0.73 to 0.92 in individual-wide within-project, time-wise, and cross-project prediction, and mean R² performance of 0.68 in community-wide Smelly Developer prediction. We also exploit SHAP (SHapley Additive exPlanations) to assess feature importance to explain our predictors. In conclusion, we suggest developers with heavy workload should foster more frequent communication in a straightforward and polite way to build healthier communities, and we recommend community shepherds to use the forecasting model for refactoring planning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An empirical study on the effect of community smells on bug prediction

Article 15 February 2021

Improving change prediction models with code smell-related information

Article 02 August 2019

SCSMiner: mining social coding sites for software developer recommendation with relevance propagation

Article 05 February 2018

References

Tamburri D A, Palomba F, Kazman R. Exploring community smells in open-source: An automated approach. IEEE Trans. Softw. Eng., 2021, 47(3): 630-652. https://doi.org/10.1109/TSE.2019.2901490.
Article Google Scholar
Johnson B, Song Y, Murphy-Hill E, Bowdidge R. Why don’t software developers use static analysis tools to find bugs? In Proc. the 35th IEEE/ACM Int. Conference on Software Engineering, May 2013, pp.672-681. https://doi.org/10.1109/ICSE.2013.6606613.
Pecorelli F, Palomba F, Khomh F, De Lucia A. Developer-driven code smell prioritization. In Proc. the 17th Int. Conference on Mining Software Repositories, June 2020, pp.220-231. https://doi.org/10.1145/3379597.3387457.
Sae-Lim N, Hayashi S, Saeki M. Context-based code smells prioritization for prefactoring. In Proc. the 24th IEEE Int. Conference on Program Comprehension, May 2016. https://doi.org/10.1109/ICPC.2016.7503705.
Martin F, Kent B, John B, William O, Don R. Refactoring: Improving the Design of Existing Code (1st edition). Addison-Wesley, 1999.
Conejero J M, Rodríguez-Echeverría R, Hernández J, Clemente P J, Ortiz-Caraballo C, Jurado E, Sánchez-Figueroa F. Early evaluation of technical debt impact on maintainability. J. Syst. Softw., 2018, 142: 92-114. https://doi.org/10.1016/j.jss.2018.04.035.
Article Google Scholar
Tamburri D A. Software architecture social debt: Managing the incommunicability factor. IEEE Trans. Comput. Soc. Syst., 2019, 6(1): 20-37. https://doi.org/10.1109/TCSS.2018.2886433.
Article Google Scholar
Palomba F, Tamburri D A, Arcelli Fontana F, Oliveto R, Zaidman A, Serebrenik A. Beyond technical aspects: How do community smells influence the intensity of code smells? IEEE Trans. Softw. Eng., 2021, 47(1): 108-129. https://doi.org/10.1109/TSE.2018.2883603.
Article Google Scholar
Palomba F, Tamburri D A. Predicting the emergence of community smells using socio-technical metrics: A machine-learning approach. J. Syst. Softw., 2021, 171: Article No. 110847. https://doi.org/10.1016/j.jss.2020.110847.
Catolino G, Palomba F, Tamburri D A, Serebrenik A, Ferrucci F. Refactoring community smells in the wild: The practitioner’s field manual. In Proc. the 42nd ACM/IEEE Int. Conference on Software Engineering: Software Engineering in Society, June 27-July 19, 2020, pp.25-34. https://doi.org/10.1145/3377815.3381380.
Catolino G, Palomba F, Tamburri D A, Serebrenik A. Understanding community smells variability: A statistical approach. In Proc. the 43rd ACM/IEEE Int. Conference on Software Engineering: Software Engineering in Society, May 2021, pp.77-86. https://doi.org/10.1109/ICSESEIS52602.2021.00017.
Ferreira I, Stewart K, German D, Adams B. A longitudinal study on the maintainers’ sentiment of a large scale open source ecosystem. In Proc. the 4th IEEE/ACM Int. Workshop on Emotion Awareness in Software Engineering, May 2019, pp.17-22. https://doi.org/10.1109/SEmotion.2019.00011.
Tamburri D A, Kazman R, Fahimi H. The architect’s role in community shepherding. IEEE Softw., 2016, 33(6): 70-79. https://doi.org/10.1109/MS.2016.144.
Article Google Scholar
Yue Y, Yu X, You X, Wang Y, Redmiles D. Ideology in open source development. In Proc. the 13th IEEE/ACM Int. Workshop on Cooperative and Human Aspects of Software Engineering, May 2021, pp.71-80. https://doi.org/10.1109/CHASE52884.2021.00016.
Ducheneaut N. Socialization in an open source software community: A socio-technical analysis. Comput. Support. Coop. Work, 2005, 14(4): 323-368. https://doi.org/10.1007/s10606-005-9000-1.
Article Google Scholar
Mäntylä M, Adams B, Destefanis G, Graziotin D, Ortu M. Mining valence, arousal, and dominance: Possibilities for detecting burnout and productivity? In Proc. the 13th Int. Conference on Mining Software Repositories, May 2016, pp.247-258. https://doi.org/10.1145/2901739.2901752.
Cheruvelil J, Da Silva B C. Developers’ sentiment and issue reopening. In Proc. the 4th Int. Workshop on Emotion Awareness in Software Engineering, May 2019, pp.29-33. https://doi.org/10.1109/SEmotion.2019.00013.
Huq S F, Sadiq A Z, Sakib K. Understanding the effect of developer sentiment on Fix-Inducing Changes: An exploratory study on Github pull requests. In Proc. the 26th Asia-Pacific Software Engineering Conference, December 2019, pp.514-521. https://doi.org/10.1109/APSEC48747.2019.00075.
Valdivia-Garcia H, Shihab E, Nagappan M. Characterizing and predicting blocking bugs in open source projects. J. Syst. Softw., 2018, 143: 44-58. https://doi.org/10.1016/j.jss.2018.03.053.
Article Google Scholar
Ortu M, Murgia A, Destefanis G, Tourani P, Tonelli R, Marchesi M, Adams B. The emotional side of software developers in JIRA. In Proc. the 13th International Conference on Mining Software Repositories, May 2016, pp.480-483. https://doi.org/10.1145/2901739.2903505.
Huang Z, Shao Z, Fan G, Gao J, Zhou Z, Yang K, Yang X. Predicting community smells’ occurrence on individual developers by sentiments. In Proc. the 29th IEEE/ACM Int. Conference on Program Comprehension, May 2021, pp.230-241. https://doi.org/10.1109/ICPC52881.2021.00030.
Magnoni S. An approach to measure community smells in software development communities [Master Thesis]. Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 2016.
Catolino G, Palomba F, Tamburri D A, Serebrenik A, Ferrucci F. Gender diversity and community smells: Insights from the trenches. IEEE Softw., 2020, 37(1): 10-16. https://doi.org/10.1109/MS.2019.2944594.
Article Google Scholar
Jongeling R, Datta S, Serebrenik A. Choosing your weapons: On sentiment analysis tools for software engineering research. In Proc. the 31st IEEE Int. Conference on Software Maintenance and Evolution, September 29–October 1, 2015, pp.531-535. 10.1109/ICSM.2015.7332508.
Ortu M, Destefanis G, Kassab M, Counsell S, Marchesi M, Tonelli R. Would you mind fixing this issue? — An empirical analysis of politeness and attractiveness in software developed using agile boards. In Proc. the 16th Int. Conference on Agile Software Development, May 2015, pp.129-140. https://doi.org/10.1007/978-3-319-18612-2_11.
Ortu M, Adams B, Destefanis G, Tourani P, Marchesi M, Tonelli R. Are bullies more productive? Empirical study of affectiveness vs. issue fixing time. In Proc. the 12th IEEE/ACM Working Conference on Mining Software Repositories, May 2015, pp.303-313. https://doi.org/10.1109/MSR.2015.35.
Bell R M, Ostrand T J, Weyuker E J. The limited impact of individual developer data on software defect prediction. Empir. Softw. Eng., 2013, 18(3): 478-505. https://doi.org/10.1007/s10664-011-9178-4.
Article Google Scholar
Catolino G, Palomba F, Tamburri D A. The secret life of software communities: What we know and what we don’t know. In Proc. the 18th Belgium-Netherlands Software Evolution Workshop, November 2019.
Yang Y, Zhou Y, Liu J, Zhao Y, Lu H, Xu L, Xu B, Leung H. Effort-aware just-in-time defect prediction: Simple unsupervised models could be better than supervised models. In Proc. the 24th ACM SIGSOFT Int. Symp. Foundations of Software Engineering, November 2016, pp.157-168. https://doi.org/10.1145/2950290.2950353.
McIntosh S, Kamei Y. Are fix-inducing changes a moving target? A longitudinal case study of just-in-time defect prediction. IEEE Trans. Softw. Eng., 2018, 44(5): 412-428. https://doi.org/10.1109/TSE.2017.2693980.
Article Google Scholar
Spadini D, Aniche M F, Bacchelli A. PyDriller: Python framework for mining software repositories. In Proc. the 26th ACM Joint Meeting on European Software Engineering Conference and Symp. the Foundations of Software Engineering, November 2018, pp.908-911. https://doi.org/10.1145/3236024.3264598.
Jiarpakdee J, Tantithamthavorn C, Grundy J. Practitioners’ perceptions of the goals and visual explanations of defect prediction models. In Proc. the 18th IEEE/ACM Int. Conference on Mining Software Repositories, May 2021, pp.432-443. https://doi.org/10.1109/MSR52588.2021.00055.
Jiarpakdee J, Tantithamthavorn C, Dam H K, Grundy J. An empirical study of model-agnostic techniques for defect prediction models. IEEE Trans. Softw. Eng.. https://doi.org/10.1109/TSE.2020.2982385.
Rajbahadur G K, Wang S, Ansaldi G, Kamei Y, Hassan A E. The impact of feature importance methods on the interpretation of defect classifiers. IEEE Trans. Softw. Eng.. https://doi.org/10.1109/TSE.2021.3056941.
Lundberg S M, Erion G, Chen H, DeGrave A, Prutkin J M, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S I. From local explanations to global understanding with explainable ai for trees. Nat. Mach. Intell., 2020, 2(1): 56-67. https://doi.org/10.1038/s42256-019-0138-9.
Article Google Scholar
Graziotin D, Fagerholm F, Wang X, Abrahamsson P. What happens when software developers are (un)happy. J. Syst. Softw., 2018, 140: 32-47. https://doi.org/10.1016/j.jss.2018.02.041.
Article Google Scholar
Graziotin D, Wang X, Abrahamsson P. Software developers, moods, emotions, and performance. IEEE Softw., 2014, 31(4): 24-27. https://doi.org/10.1109/MS.2014.94.
Article Google Scholar
Thelwall M, Buckley K, Paltoglou G. Sentiment strength detection for the social web. J. Am. Soc. Inf. Sci. Tec., 2012, 63(1): 163-173. https://doi.org/10.1002/asi.21662.
Article Google Scholar
Danescu-Niculescu-Mizil C, Sudhof M, Jurafsky D, Leskovec J, Potts C. A computational approach to politeness with application to social factors. In Proc. the 51st Annual Meeting of the Association for Computational Linguistics, August 2013, pp.250-259.
De Smedt T, Daelemans W. Pattern for Python. J. Mach. Learn. Res., 2012, 13: 2063-2067.
Google Scholar
Islam M R, Zibran M F. Towards understanding and exploiting developers’ emotional variations in software engineering. In Proc. the 14th IEEE Int. Conference on Software Engineering Research, Management and Applications, June 2016, pp.185-192. https://doi.org/10.1109/SERA.2016.7516145.
Tantithamthavorn C, McIntosh S, Hassan A E, Matsumoto K. An empirical comparison of model validation techniques for defect prediction models. IEEE Trans. Softw. Eng., 2017, 43(1): 1-18. https://doi.org/10.1109/TSE.2016.2584050.
Article Google Scholar
Scott A J, Knott M. A cluster analysis method for grouping means in the analysis of variance. Biometrics, 1974, 30(3): 507-512. https://doi.org/10.2307/2529204.
Article MATH Google Scholar
Pedregosa F, Varoquaux G, Gramfort A et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res., 2011, 12: 2825-2830.
MathSciNet MATH Google Scholar
Palomba F, Zanoni M, Fontana F A, De Lucia A, Oliveto R. Toward a smell-aware bug prediction model. IEEE Trans. Softw. Eng., 2019, 45(2): 194-218. https://doi.org/10.1109/TSE.2017.2770122.
Article Google Scholar
Esteves G, Figueiredo E, Veloso A, Viggiato M, Ziviani N. Understanding machine learning software defect predictions. Autom. Softw. Eng., 2020, 27(3): 369-392. https://doi.org/10.1007/s10515-020-00277-4.
Article Google Scholar
Shapley L S. A value for n-person games. In Contributions to the Theory of Games II, Annals of Mathematics Studies, Kuhn H W, Tucker A W (eds.), Princeton University Press, 1953, pp.307-317.
Palomba F, Panichella A, Zaidman A, Oliveto R, Lucia A D. The scent of a smell: An extensive comparison between textual and structural smells. IEEE Trans. Softw. Eng., 2018, 44(10): 977-1000. https://doi.org/10.1109/TSE.2017.2752171.
Article Google Scholar
Kirbas S, Caglayan B, Hall T, Counsell S, Bowes D, Sen A, Bener A. The relationship between evolutionary coupling and defects in large industrial software. J. Softw.: Evol. Process, 2017, 29(4): Article No. e1842. https://doi.org/10.1002/smr.1842.
Chicco D, Warrens M J, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci., 2021, 7: Article No. e263. https://doi.org/10.7717/peerj-cs.623.
Yu X, Bennin K E, Liu J, Keung J W, Yin X, Xu Z. An empirical study of learning to rank techniques for effort-aware defect prediction. In Proc. the 26th IEEE Int. Conference on Software Analysis, Evolution and Reengineering, February 2019, pp.298-309. https://doi.org/10.1109/SANER.2019.8668033.
Saini M, Kaur K. Fuzzy analysis and prediction of commit activity in open source software projects. IET Softw., 2016, 10(5): 136-146. https://doi.org/10.1049/iet-sen.2015.0087.
Article Google Scholar
Manzano M, Ayala C, Gómez C, Cuesta L L. A software service supporting software quality forecasting. In Proc. the 19th IEEE Int. Conference on Software Quality, Reliability and Security Companion, July 2019, pp.130-132. https://doi.org/10.1109/QRS-C.2019.00037.
Ahammed T, Asad M, Sakib K. Understanding the involvement of developers in missing link community smell: An exploratory study on Apache projects. In Proc. the 8th Int. Workshop on Quantitative Approaches to Software Quality, December 2020, pp.64-70.
Hofmann H, Wickham H, Kafadar K. Letter-value plots: Boxplots for large data. J. Comput. Graph. Stat., 2017, 26(3): 469-477. https://doi.org/10.1080/10618600.2017.1305277.
Article MathSciNet Google Scholar
Graziotin D, Wang X, Abrahamsson P. Happy software developers solve problems better: Psychological measurements in empirical software engineering. PeerJ, 2014, 2: Article No. e289. https://doi.org/10.7717/peerj.289.
Müller S C, Fritz T. Stuck and frustrated or in flow and happy: Sensing developers’ emotions and progress. In Proc. the 37th IEEE/ACM Int. Conference on Software Engineering, May 2015, pp.688-699. https://doi.org/10.1109/ICSE.2015.334.
Lin B, Zampetti F, Bavota G, Di Penta M, Lanza M, Oliveto R. Sentiment analysis for software engineering: How far can we go? In Proc. the 40th IEEE/ACM Int. Conference on Software Engineering, May 27–June 3, 2018, pp.94-104. https://doi.org/10.1145/3180155.3180195.
Jiarpakdee J, Tantithamthavorn C, Treude C. AutoSpearman: Automatically mitigating correlated software metrics for interpreting defect models. In Proc. the 34th IEEE Int. Conference on Software Maintenance and Evolution, September 2018, pp.92-103. https://doi.org/10.1109/ICSME.2018.00018.

Download references

Author information

Zhi-Qing Shao helped with the original idea and theoretic design. Gui-Sheng Fan significantly helped with the implementation and writing. They both guided the work of the paper and contributed equally.

Authors and Affiliations

Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
Zi-Jie Huang, Zhi-Qing Shao, Gui-Sheng Fan, Hui-Qun Yu, Xing-Guang Yang & Kang Yang
Shanghai Key Laboratory of Computer Software Testing and Evaluating, Shanghai, 200237, China
Gui-Sheng Fan
Shanghai Engineering Research Center of Smart Energy, Shanghai, 200237, China
Hui-Qun Yu

Authors

Zi-Jie Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Qing Shao
View author publications
You can also search for this author in PubMed Google Scholar
Gui-Sheng Fan
View author publications
You can also search for this author in PubMed Google Scholar
Hui-Qun Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xing-Guang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Kang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhi-Qing Shao or Gui-Sheng Fan.

Supplementary Information

ESM 1

(PDF 865 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, ZJ., Shao, ZQ., Fan, GS. et al. Community Smell Occurrence Prediction on Multi-Granularity by Developer-Oriented Features and Process Metrics. J. Comput. Sci. Technol. 37, 182–206 (2022). https://doi.org/10.1007/s11390-021-1596-1

Download citation

Received: 18 May 2021
Accepted: 10 January 2022
Published: 31 January 2022
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11390-021-1596-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Community Smell Occurrence Prediction on Multi-Granularity by Developer-Oriented Features and Process Metrics

Abstract

Access this article

Similar content being viewed by others

An empirical study on the effect of community smells on bug prediction

Improving change prediction models with code smell-related information

SCSMiner: mining social coding sites for software developer recommendation with relevance propagation

References

Author information

Authors and Affiliations

Corresponding authors

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Community Smell Occurrence Prediction on Multi-Granularity by Developer-Oriented Features and Process Metrics

Abstract

Access this article

Similar content being viewed by others

An empirical study on the effect of community smells on bug prediction

Improving change prediction models with code smell-related information

SCSMiner: mining social coding sites for software developer recommendation with relevance propagation

References

Author information

Authors and Affiliations

Corresponding authors

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation