Skip to main content

Classifying Python Code Comments Based on Supervised Learning

Part of the Lecture Notes in Computer Science book series (LNISA,volume 11242)


Code comments can provide a great data source for understanding programmer’s needs and underlying implementation. Previous work has illustrated that code comments enhance the reliability and maintainability of the code, and engineers use them to interpret their code as well as help other developers understand the code intention better. In this paper, we studied comments from 7 python open source projects and contrived a taxonomy through an iterative process. To clarify comments characteristics, we deploy an effective and automated approach using supervised learning algorithms to classify code comments according to their different intentions. With our study, we find that there does exist a pattern across different python projects: Summary covers about 75% of comments. Finally, we conduct an evaluation on the behaviors of two different supervised learning classifiers and find that Decision Tree classifier is more effective on accuracy and runtime than Naive Bayes classifier in our research.


  • Code comments classification
  • Supervised learning
  • Python

Supported by the Natural Science Foundation of Jiangsu Province of China (Grant No. BK20140611), the Natural Science Foundation of China (Grant Nos. 61272080, 61403187).

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-02934-0_4
  • Chapter length: 9 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-02934-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.


  1. 1.


  1. Arafati, O., Riehle, D.: The comment density of open source software code. In: 2009 31st International Conference on Software Engineering - Companion Volume, pp. 195–198, May 2009.

  2. Fjeldstad, R.K., Hamlen, W.T.: Application program maintenance study: report to our respondents. In: Proceedings GUIDE, vol. 48, April 1983

    Google Scholar 

  3. Fluri, B., Wursch, M., Gall, H.C.: Do code and comments co-evolve? On the relation between source code and comment changes. In: Proceedings of the 14th Working Conference on Reverse Engineering, WCRE 2007, pp. 70–79. IEEE Computer Society, Washington, DC (2007)

    Google Scholar 

  4. Lidwell, W., Holden, K., Butler, J.: Universal Principles of Design, Revised and Updated: 125 Ways to Enhance Usability, Influence Perception, Increase Appeal, Make Better Design Decisions. Rockport Publishers, Beverly (2010)

    Google Scholar 

  5. Nurvitadhi, E., Leung, W.W., Cook, C.: Do class comments aid Java program understanding? In: 33rd Annual Frontiers in Education, FIE 2003, vol. 1, pp. T3C-13–T3C-17, November 2003.

  6. Howden, W.E.: Comments analysis and programming errors. IEEE Trans. Softw. Eng. 16(1), 72–81 (1990)

    CrossRef  Google Scholar 

  7. Pascarella, L., Bacchelli, A.: Classifying code comments in Java open-source software systems. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp. 227–237, May 2017

    Google Scholar 

  8. Steidl, D., Hummel, B., Juergens, E.: Quality analysis of source code comments. In: 2013 21st International Conference on Program Comprehension (ICPC), pp. 83–92, May 2013.

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Lei Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Zhang, J., Xu, L., Li, Y. (2018). Classifying Python Code Comments Based on Supervised Learning. In: Meng, X., Li, R., Wang, K., Niu, B., Wang, X., Zhao, G. (eds) Web Information Systems and Applications. WISA 2018. Lecture Notes in Computer Science(), vol 11242. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02933-3

  • Online ISBN: 978-3-030-02934-0

  • eBook Packages: Computer ScienceComputer Science (R0)