Skip to main content
Log in

Adversarial domain adaptation for cross-project defect prediction

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Cross-Project Defect Prediction (CPDP) is an attractive topic for locating defects in projects with little labeled data (target projects) by using the prediction model from other projects with sufficient data (source projects). However, previous models may not fully capture the semantic features of programs because of inappropriate feature extraction models. Besides, researchers may fail to consider the relationship between the decision boundary and target project data when matching two feature distributions by adopting transfer learning methods, which would lead to the misclassification of target samples that are near boundary. To handle these drawbacks, we propose a novel Adversarial Domain Adaptation (ADA) model for CPDP. Specifically, we leverage a Long Short-Term Memory network with attention mechanism to extract semantic features that better represent programs. Then, we train two classifiers to correctly categorize source samples and distinguish ambiguous target instances that influence prediction accuracy. Next, we treat the classifiers as a discriminator and feature extraction model as a generator, and train them based on adversarial learning methods to depict the desired relationship. As the classifiers know this relationship, they should attain better performance. Extensive experiments on two benchmark datasets are conducted to verify the effectiveness of the proposed ADA methods. Experimental and statistical results show that ADA significantly outperforms other state-of-the-art baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability Statements

The PROMISE dataset we use in this work is available at https://doi.org/10.1145/1868328.1868342. The AEEEM dataset we use in this work is available at https://doi.org/10.1109/MSR.2010.5463279. The implementation of the proposed ADA model is available from the corresponding author on a reasonable request.

Notes

  1. https://pypi.org/project/javalang/0.13.0/

References

Download references

Acknowledgements

This work was supported by the Science and Technology Planning Project of Guangzhou (Grant No. 202102020637).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siyu Jiang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by: Leandro L. Minku.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, H., Wu, G., Ma, L. et al. Adversarial domain adaptation for cross-project defect prediction. Empir Software Eng 28, 127 (2023). https://doi.org/10.1007/s10664-023-10371-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-023-10371-2

Keywords

Navigation