Skip to main content

Coherence of comments and method implementations: a dataset and an empirical investigation


In this paper, we present the results of a manual assessment on the coherence between the comments and the implementation of 3636 methods in three open source software applications (for one of these applications, we considered two different subsequent versions) implemented in Java. The results of this assessment have been collected in a dataset we made publicly available on the Web. The creation of this dataset is based on a protocol that is detailed in this paper. We present that protocol to let researchers evaluate the goodness of our dataset and to ease its future possible extensions. Another contribution of this paper consists in preliminarily investigating on the effectiveness of adopting a Vector Space Model (VSM) with the tf-idf schema to discriminate coherent and non-coherent methods. We observed that the lexical similarity alone is not sufficient for this distinction, while encouraging results have been obtained by applying an Support Vector Machine (SVM) classifier on the whole vector space.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.

    1 The comment right before/after of the definition of a method, a class, abstract class and so on.

  2. 2.

  3. 3.

    3 In our case, an annotator is a person that produces annotations to software associating coherence information to methods.

  4. 4.


  5. 5.


  6. 6.


  7. 7.


  8. 8.


  9. 9.


  10. 10.

    10 Such approach is usually referred to as macroaveraging (Manning et al. 2008).

  11. 11.



  1. Antoniol, G., Canfora, G., Casazza, G., & De Lucia, A. (2000). Information retrieval models for recovering traceability links between code and documentation. In Proceedings of the international conference on software maintenance (pp. 40–51): IEEE Computer Society.

  2. Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305.

    MathSciNet  MATH  Google Scholar 

  3. Binkley, D., Lawrie, D., Pollock, L., Hill, E., & Vijay-Shanker, K. (2013). A dataset for evaluating identifier splitters, IEEE Computer Society.

  4. Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics), Springer-Verlag New York, Inc., Secaucus.

  5. Campbell, I., & Yiming, Y. (2011). Learning with support vector machines, Morgan and Claypool.

  6. Caprile, B., & Tonella, P. (2000). Restructuring program identifier names. In Proceedings of international conference on software maintenance (pp. 97–107): IEEE Computer Society.

  7. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.

    Article  Google Scholar 

  8. Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220.

    Article  Google Scholar 

  9. Corazza, A., Di Martino, S., & Maggio, V. (2012). LINSEN: an efficient approach to split identifiers and expand abbreviations. In Proceedings of international conference on software maintenance (pp. 233–242): IEEE Computer Society.

  10. Corazza, A., Di Martino, S., Maggio, V., & Scanniello, G. (2011). Investigating the use of lexical information for software system clustering. In Proceedings of European conference on software maintenance and reengineering (pp. 35–44): IEEE Computer Society.

  11. Corazza, A., Maggio, V., & Scanniello, G. (2015). On the coherence between comments and implementations in source code. In Proceedings of EUROMICRO conference on software engineering and advanced applications (pp. 76–83): IEEE Computer Society.

  12. de Souza, S. C. B., Anquetil, N., & de Oliveira, K. M. (2005). A study of the documentation essential to software maintenance. In Proceedings of the international conference on design of communication: documenting & designing for pervasive information (pp. 68–75): ACM.

  13. DeLine, R., Khella, A., Czerwinski, M., & Robertson, G. (2005). Towards understanding programs through wear-based filtering. In Proceedings of the 2005 ACM symposium on Software visualization, SoftVis ’05 (pp. 183–192): ACM.

  14. Dit, B., Revelle, M., Gethers, M., & Poshyvanyk, D. (2013). Feature location in source code: a taxonomy and survey. Journal of Software: Evolution and Process, 25 (1), 53–95.

    Google Scholar 

  15. Fluri, B., Wursch, M., & Gall, H. (2007). Do code and comments co-evolve? on the relation between source code and comment changes. In Proceedings of the working conference on reverse engineering (pp. 70–79): IEEE Computer Society.

  16. Fowler, M. (1999). Refactoring: improving the design of existing code. Boston: Addison-Wesley Longman Publishing Co., Inc.

    MATH  Google Scholar 

  17. Freund, R. J., & Wilson, W. J. (2003). Statistical methods, 2nd edn. Academic Press.

  18. Jiang, Z. M., & Hassan, A. E. (2006). Examining the evolution of code comments in postgresql. In Diehl, S., Gall, H., & Hassan, A. E. (Eds.) Proceedings of mining software repositories (pp. 179–180. ACM).

  19. Keyes, J. (2002). Software engineering handbook: Taylor & Francis.

  20. Kuhn, A., Ducasse, S., & Gîrba, T. (2007). Semantic clustering identifying topics in source code. Information & Software Technology, 49(3), 230–243.

    Article  Google Scholar 

  21. LaToza, T. D., Venolia, G., & DeLine, R. (2006). Maintaining mental models: a study of developer work habits. In Proceedings of the 28th international conference on software engineering, ICSE ’06 (pp. 492–501): ACM.

  22. Lawrie, D., Binkley, D., & Morrell, C. (2010). Normalizing source code vocabulary. In Proceedings of working conference on reverse engineering (pp. 3–12): IEEE Computer Society.

  23. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. New York: Cambridge University Press.

    Book  MATH  Google Scholar 

  24. McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., & Xie, Q. (2012). Exemplar: a source code search engine for finding highly relevant applications. IEEE Transactions on Software Engineering, 38(5), 1069–1087.

    Article  Google Scholar 

  25. Robillard, M. P., Coelho, W., & code, G. C. Murphy. (2004). How effective developers investigate source. An exploratory study. IEEE Transactions on Software Engineering, 30(12), 889–903.

    Article  Google Scholar 

  26. Roehm, T., Tiarks, R., Koschke, R., & Maalej, W. (2012). How do professional developers comprehend software?. In Proceedings of the 2012 international conference on software engineering, ICSE 2012 (pp. 255–265). Piscataway, NJ, USA: IEEE Press.

  27. Salviulo, F., & Scanniello, G. (2014). Dealing with identifiers and comments in source code comprehension and maintenance: Results from an ethnographically-informed study with students and professionals. In Proceedings of International Conference on Evaluation and Assessment in Software Engineering (pp. 423–432): ACM Press.

  28. Scanniello, G., Marcus, A., & Pascale, D. (2015). Link analysis algorithms for static concept location: an empirical assessment. Empirical Software Engineering, 20 (6), 1666–1720.

    Article  Google Scholar 

  29. Singer, J., Lethbridge, T., Vinson, N., & Anquetil, N. (1997). An examination of software engineering work practices. In Proceedings of the conference of the centre for advanced studies on collaborative research (p. 21): IBM Press.

  30. Soloway, E., & Ehrlich, K. (1984). Empirical studies of programming knowledge. IEEE Transactions on Software Engineering, 10(5), 595–609.

    Article  Google Scholar 

  31. Steidl, D., Hummel, B., & Jürgens, E. (2013). Quality analysis of source code comments. In Proceedings of international conference on program comprehension (pp. 83–92): IEEE Computer Society.

  32. Tan, L., Yuan, D., Krishna, G., & Zhou, Y. (2007). iComment: Bugs or bad comments? ACM.

  33. Tan, S. H., Marinov, D., Tan, L., & Leavens, G. T. (2012). @tcomment: Testing javadoc comments to detect comment-code inconsistencies. In Proceedings of international conference on software testing (pp. 260–269): IEEE Computer Society.

  34. Van Der Maaten, L. (2014). Accelerating t-sne using tree-based algorithms. Journal of Machine Learning Research, 15(1), 3221–3245.

    MathSciNet  MATH  Google Scholar 

  35. Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.

    Book  MATH  Google Scholar 

  36. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., & Wesslén, A. (2012). Experimentation in software engineering. Computer science: Springer.

Download references


We would like to thank the annotators of our dataset and the reviewers for their precious and constructive comments and suggestions.

Author information



Corresponding author

Correspondence to Anna Corazza.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Corazza, A., Maggio, V. & Scanniello, G. Coherence of comments and method implementations: a dataset and an empirical investigation. Software Qual J 26, 751–777 (2018).

Download citation


  • Comment coherence
  • Maintenance
  • Experimental protocol
  • Dataset
  • Lexical information
  • Classification