Coherence of comments and method implementations: a dataset and an empirical investigation

  • Anna Corazza
  • Valerio Maggio
  • Giuseppe Scanniello


In this paper, we present the results of a manual assessment on the coherence between the comments and the implementation of 3636 methods in three open source software applications (for one of these applications, we considered two different subsequent versions) implemented in Java. The results of this assessment have been collected in a dataset we made publicly available on the Web. The creation of this dataset is based on a protocol that is detailed in this paper. We present that protocol to let researchers evaluate the goodness of our dataset and to ease its future possible extensions. Another contribution of this paper consists in preliminarily investigating on the effectiveness of adopting a Vector Space Model (VSM) with the tf-idf schema to discriminate coherent and non-coherent methods. We observed that the lexical similarity alone is not sufficient for this distinction, while encouraging results have been obtained by applying an Support Vector Machine (SVM) classifier on the whole vector space.


Comment coherence Maintenance Experimental protocol Dataset Lexical information Classification 



We would like to thank the annotators of our dataset and the reviewers for their precious and constructive comments and suggestions.


  1. Antoniol, G., Canfora, G., Casazza, G., & De Lucia, A. (2000). Information retrieval models for recovering traceability links between code and documentation. In Proceedings of the international conference on software maintenance (pp. 40–51): IEEE Computer Society.Google Scholar
  2. Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305.MathSciNetMATHGoogle Scholar
  3. Binkley, D., Lawrie, D., Pollock, L., Hill, E., & Vijay-Shanker, K. (2013). A dataset for evaluating identifier splitters, IEEE Computer Society.Google Scholar
  4. Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics), Springer-Verlag New York, Inc., Secaucus.Google Scholar
  5. Campbell, I., & Yiming, Y. (2011). Learning with support vector machines, Morgan and Claypool.Google Scholar
  6. Caprile, B., & Tonella, P. (2000). Restructuring program identifier names. In Proceedings of international conference on software maintenance (pp. 97–107): IEEE Computer Society.Google Scholar
  7. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.CrossRefGoogle Scholar
  8. Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220.CrossRefGoogle Scholar
  9. Corazza, A., Di Martino, S., & Maggio, V. (2012). LINSEN: an efficient approach to split identifiers and expand abbreviations. In Proceedings of international conference on software maintenance (pp. 233–242): IEEE Computer Society.Google Scholar
  10. Corazza, A., Di Martino, S., Maggio, V., & Scanniello, G. (2011). Investigating the use of lexical information for software system clustering. In Proceedings of European conference on software maintenance and reengineering (pp. 35–44): IEEE Computer Society.Google Scholar
  11. Corazza, A., Maggio, V., & Scanniello, G. (2015). On the coherence between comments and implementations in source code. In Proceedings of EUROMICRO conference on software engineering and advanced applications (pp. 76–83): IEEE Computer Society.Google Scholar
  12. de Souza, S. C. B., Anquetil, N., & de Oliveira, K. M. (2005). A study of the documentation essential to software maintenance. In Proceedings of the international conference on design of communication: documenting & designing for pervasive information (pp. 68–75): ACM.Google Scholar
  13. DeLine, R., Khella, A., Czerwinski, M., & Robertson, G. (2005). Towards understanding programs through wear-based filtering. In Proceedings of the 2005 ACM symposium on Software visualization, SoftVis ’05 (pp. 183–192): ACM.Google Scholar
  14. Dit, B., Revelle, M., Gethers, M., & Poshyvanyk, D. (2013). Feature location in source code: a taxonomy and survey. Journal of Software: Evolution and Process, 25 (1), 53–95.Google Scholar
  15. Fluri, B., Wursch, M., & Gall, H. (2007). Do code and comments co-evolve? on the relation between source code and comment changes. In Proceedings of the working conference on reverse engineering (pp. 70–79): IEEE Computer Society.Google Scholar
  16. Fowler, M. (1999). Refactoring: improving the design of existing code. Boston: Addison-Wesley Longman Publishing Co., Inc.MATHGoogle Scholar
  17. Freund, R. J., & Wilson, W. J. (2003). Statistical methods, 2nd edn. Academic Press.Google Scholar
  18. Jiang, Z. M., & Hassan, A. E. (2006). Examining the evolution of code comments in postgresql. In Diehl, S., Gall, H., & Hassan, A. E. (Eds.) Proceedings of mining software repositories (pp. 179–180. ACM).Google Scholar
  19. Keyes, J. (2002). Software engineering handbook: Taylor & Francis.Google Scholar
  20. Kuhn, A., Ducasse, S., & Gîrba, T. (2007). Semantic clustering identifying topics in source code. Information & Software Technology, 49(3), 230–243.CrossRefGoogle Scholar
  21. LaToza, T. D., Venolia, G., & DeLine, R. (2006). Maintaining mental models: a study of developer work habits. In Proceedings of the 28th international conference on software engineering, ICSE ’06 (pp. 492–501): ACM.Google Scholar
  22. Lawrie, D., Binkley, D., & Morrell, C. (2010). Normalizing source code vocabulary. In Proceedings of working conference on reverse engineering (pp. 3–12): IEEE Computer Society.Google Scholar
  23. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. New York: Cambridge University Press.CrossRefMATHGoogle Scholar
  24. McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., & Xie, Q. (2012). Exemplar: a source code search engine for finding highly relevant applications. IEEE Transactions on Software Engineering, 38(5), 1069–1087.CrossRefGoogle Scholar
  25. Robillard, M. P., Coelho, W., & code, G. C. Murphy. (2004). How effective developers investigate source. An exploratory study. IEEE Transactions on Software Engineering, 30(12), 889–903.CrossRefGoogle Scholar
  26. Roehm, T., Tiarks, R., Koschke, R., & Maalej, W. (2012). How do professional developers comprehend software?. In Proceedings of the 2012 international conference on software engineering, ICSE 2012 (pp. 255–265). Piscataway, NJ, USA: IEEE Press.Google Scholar
  27. Salviulo, F., & Scanniello, G. (2014). Dealing with identifiers and comments in source code comprehension and maintenance: Results from an ethnographically-informed study with students and professionals. In Proceedings of International Conference on Evaluation and Assessment in Software Engineering (pp. 423–432): ACM Press.Google Scholar
  28. Scanniello, G., Marcus, A., & Pascale, D. (2015). Link analysis algorithms for static concept location: an empirical assessment. Empirical Software Engineering, 20 (6), 1666–1720.CrossRefGoogle Scholar
  29. Singer, J., Lethbridge, T., Vinson, N., & Anquetil, N. (1997). An examination of software engineering work practices. In Proceedings of the conference of the centre for advanced studies on collaborative research (p. 21): IBM Press.Google Scholar
  30. Soloway, E., & Ehrlich, K. (1984). Empirical studies of programming knowledge. IEEE Transactions on Software Engineering, 10(5), 595–609.CrossRefGoogle Scholar
  31. Steidl, D., Hummel, B., & Jürgens, E. (2013). Quality analysis of source code comments. In Proceedings of international conference on program comprehension (pp. 83–92): IEEE Computer Society.Google Scholar
  32. Tan, L., Yuan, D., Krishna, G., & Zhou, Y. (2007). iComment: Bugs or bad comments? ACM.Google Scholar
  33. Tan, S. H., Marinov, D., Tan, L., & Leavens, G. T. (2012). @tcomment: Testing javadoc comments to detect comment-code inconsistencies. In Proceedings of international conference on software testing (pp. 260–269): IEEE Computer Society.Google Scholar
  34. Van Der Maaten, L. (2014). Accelerating t-sne using tree-based algorithms. Journal of Machine Learning Research, 15(1), 3221–3245.MathSciNetMATHGoogle Scholar
  35. Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.CrossRefMATHGoogle Scholar
  36. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., & Wesslén, A. (2012). Experimentation in software engineering. Computer science: Springer.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Anna Corazza
    • 1
  • Valerio Maggio
    • 2
  • Giuseppe Scanniello
    • 3
  1. 1.Department of Electrical Engineering and Information TechnologiesUniversity of Naples “Federico II”NaplesItaly
  2. 2.Fondazione Bruno KesslerTrentoItaly
  3. 3.Department of Mathematics, Information Technology, and EconomicsUniversity of BasilicataPotenzaItaly

Personalised recommendations