Skip to main content

Characterization of Source Code Defects by Data Mining Conducted on GitHub

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 9159)

Abstract

In software systems the coding errors are unavoidable due to the frequent source changes, the tight deadlines and the inaccurate specifications. Therefore, it is important to have tools that help us in finding these errors. One way of supporting bug prediction is to analyze the characteristics of the previous errors and identify the unknown ones based on these characteristics. This paper aims to characterize the known coding errors.

Nowadays, the popularity of the source code hosting services like GitHub are increasing rapidly. They provide a variety of services, among which the most important ones are the version and bug tracking systems. Version control systems store all versions of the source code, and bug tracking systems provide a unified interface for reporting errors. Bug reports can be used to identify the wrong and the previously fixed source code parts, thus the bugs can be characterized by static source code metrics or by other quantitatively measured properties using the gathered data.

We chose GitHub for the base of data collection and we selected 13 Java projects for analysis. As a result, a database was constructed, which characterizes the bugs of the examined projects, thus can be used, inter alia, to improve the automatic detection of software defects.

Keywords

  • Bug database
  • GitHub
  • Data mining

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-21413-9_4
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   54.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-21413-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   69.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bakota, T., Hegedus, P., Kortvelyesi, P., Ferenc, R., Gyimothy, T.: A probabilistic software quality model. In: 2011 27th IEEE International Conference on Software Maintenance (ICSM), pp. 243–252 (September 2011)

    Google Scholar 

  2. Bangcharoensap, P., Ihara, A., Kamei, Y., Matsumoto, K.: Locating source code to be fixed based on initial bug reports - a case study on the eclipse project. In: 2012 Fourth International Workshop on Empirical Software Engineering in Practice (IWESEP), pp. 10–15 (October 2012)

    Google Scholar 

  3. Bird, C., Rigby, P.C., Barr, E.T., Hamilton, D.J., German, D.M., Devanbu, P.: The promises and perils of mining git. In: 6th IEEE International Working Conference on Mining Software Repositories, MSR 2009, pp. 1–10 (May 2009)

    Google Scholar 

  4. Shyam, R.: Chidamber and Chris F Kemerer. A metrics suite for object oriented design. IEEE Transactions on Software Engineering 20(6), 476–493 (1994)

    CrossRef  Google Scholar 

  5. Couto, C., Silva, C., Valente, M.T., Bigonha, R., Anquetil, N.: Uncovering causal relationships between software metrics and bugs. In: 2012 16th European Conference on Software Maintenance and Reengineering (CSMR), pp. 223–232 (March 2012)

    Google Scholar 

  6. Dallmeier, V., Zimmermann, T.: Automatic extraction of bug localization benchmarks from history. Technical report, Universitat des Saarlandes and Saarbrücken and Germany (2007)

    Google Scholar 

  7. Dallmeier, V., Zimmermann, T.: Extraction of bug localization benchmarks from history. In: Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, pp. 433–436. ACM (2007)

    Google Scholar 

  8. D’Ambros, M., Lanza, M., Robbes, R.: An extensive comparison of bug prediction approaches. In: Proceedings of MSR 2010 (7th IEEE Working Conference on Mining Software Repositories), pp. 31–41 (2010)

    Google Scholar 

  9. Gyimothy, Tibor, Ferenc, Rudolf, Siket, Istvan: Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Transactions on Software Engineering 31(10), 897–910 (2005)

    CrossRef  Google Scholar 

  10. Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D.M., Damian, D.: The promises and perils of mining github. In: MSR 2014 Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 92–101 (2014)

    Google Scholar 

  11. Menzies, T., Caglayan, B., He, Z., Kocaguneli, E., Krall, J., Peters, F., Turhan, B.: The promise repository of empirical software engineering data (June 2012)

    Google Scholar 

  12. Wang, D., Lin, M., Zhang, H., Hu, H.: Detect related bugs from source code using bug information. Computer Software and Applications Conference (COMPSAC) (2010)

    Google Scholar 

  13. Chadd, C.: Williams and Jeffrey K Hollingsworth. Automatic mining of source code repositories to improve bug finding techniques. IEEE Transactions on Software Engineering 31(6), 466–480 (2005)

    CrossRef  Google Scholar 

  14. Wu, R., Zhang, H., Kim, S., Cheung, S.-C.: Relink: recovering links between bugs and changes. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, pp. 15–25. ACM (2011)

    Google Scholar 

  15. Toth, Z., Novak, G., Ferenc, R., Siket, I.: Using version control history to follow the changes of source code elements. Software Maintenance and Reengineering (CSMR) (2013)

    Google Scholar 

  16. Zhou, J., Zhang, H., Lo, D.: Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: 2012 34th International Conference on Software Engineering (ICSE) (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zoltán Tóth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Gyimesi, P., Gyimesi, G., Tóth, Z., Ferenc, R. (2015). Characterization of Source Code Defects by Data Mining Conducted on GitHub. In: , et al. Computational Science and Its Applications -- ICCSA 2015. ICCSA 2015. Lecture Notes in Computer Science(), vol 9159. Springer, Cham. https://doi.org/10.1007/978-3-319-21413-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21413-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21412-2

  • Online ISBN: 978-3-319-21413-9

  • eBook Packages: Computer ScienceComputer Science (R0)