Skip to main content
Log in

Cross-project smell-based defect prediction

  • Data analytics and machine learning
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Defect prediction is a technique introduced to optimize the testing phase of the software development pipeline by predicting which components in the software may contain defects. Its methodology trains a classifier with data regarding a set of features measured on each component from the target software project to predict whether the component may be defective or not. However, suppose the defective information is not available in the training set. In that case, we need to rely on an alternate approach that uses the training set of external projects to train the classifier. This approached is called cross-project defect prediction. Bad code smells are a category of features that have been previously explored in defect prediction and have been shown to be a good predictor of defects. Code smells are patterns of poor development in the code and indicate flaws in its design and implementation. Although they have been previously studied in the context of defect prediction, they have not been studied as features for cross-project defect prediction. In our experiment, we train defect prediction models for 100 projects to evaluate the predictive performance of the bad code smells. We implemented four cross-project approaches known in the literature and compared the performance of 37 smells with 56 code metrics, commonly used for defect prediction. The results show that the cross-project defect prediction models trained with code smells significantly improved \(6.50\%\) on the ROC AUC compared against the code metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of data and material

The data sets generated during and analyzed during the current study are available in the public data repository https://zenodo.org/record/4697491.

References

Download references

Funding

This work was supported by the Cyber Security Research Center at the Ben-Gurion University of the Negev.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: BSM, MK; Funding acquisition: MK; Investigation: BSM, MK; Methodology: BSM, MK; Supervision: MK; Visualization: BSM; Writing—original draft: BSM; Writing—review & editing: BSM, MK.

Corresponding author

Correspondence to Bruno Sotto-Mayor.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing interests or personal relationships that could have influenced the work reported in this paper.

Code availability

The software developed during the current study is available from the public repository at the website of https://github.com/Bruno81930/smells.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sotto-Mayor, B., Kalech, M. Cross-project smell-based defect prediction. Soft Comput 25, 14171–14181 (2021). https://doi.org/10.1007/s00500-021-06254-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-06254-7

Keywords

Navigation