Skip to main content

Data augmentation using MG-GAN for improved cancer classification on gene expression data

Abstract

Molecular biology studies on cancer, using gene expression datasets, have revealed that the datasets have a very small number of samples. Obtaining medical data is difficult and expensive due to privacy constraints. Accuracy of classifiers depends greatly on the quality and quantity of input data. The problem of small sample size or small data size has been addressed by augmentation. Owing to the sensitivity of synthetic data samples for the cancer data classification for gene expression data, this paper is motivated to investigate data augmentation using GAN. GAN is based on the principle of two blocks (generator and discriminator) working in a collaborative yet adversarial way. This paper proposes modified generator GAN (MG-GAN) where the generator is fed with original data and multivariate noise to generate data with Gaussian distribution. As the generated data lie within latent space, we reach saddle point faster. GAN has been widely used in data augmentation for image datasets. As per our understanding, this is the first attempt of using GAN for augmentation on gene expression dataset. The performance merit of proposed MG-GAN was compared with KNN and Basic GAN. As compared to KNN and GAN, MG-GAN improves classification accuracy by 18.8% and 11.9%, respectively. The loss value of the error function for MG-GAN is drastically reduced, from 0.6978 to 0.0082, ensuring sensitivity of the generated data. Improved classification accuracy and reduction in the loss value make our improved MG-GAN method better suited for critical applications with sensitive data.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  1. Antipov G, Baccouche M, Dugelay JL (2017) Face aging with conditional generative adversarial networks. In: 2017 IEEE international conference on image Processing (ICIP), Beijing, China, pp 2089–2093

  2. Antoniou A, Storkey A, Edwards H (2017) Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340

  3. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: International conference on machine learning, pp 214–223

  4. Chan S, Elsheikh AH (2017) Parametrization and generation of geological models with generative adversarial networks. arXiv preprint arXiv:1708.01810

  5. Chaudhari P, Agarwal H (2018) Improving feature selection using elite breeding QPSO on gene data set for cancer classification. In: Intelligent engineering informatics, advances in intelligent systems and computing book series, vol. 695, pp. 209–219

  6. Chaudhari P, Agarwal H (2019) Data augmentation for cancer classification in oncogenomics: an improved KNN based approach. Evol Intell. https://doi.org/10.1007/s12065-019-00283-w

    Article  Google Scholar 

  7. Chen X, Yu J, Kong S, Wu Z, Fang X, Wen L (2017) Towards quality advancement of underwater machine vision with generative adversarial networks. arXiv preprint arXiv:1712.00736

  8. Collins F (2002) Oncogenomics: cancer and technology. Nat Genet 31:117–119

    Article  Google Scholar 

  9. Creswell A, Bharath AA (2018) Inverting the generator of a generative adversarial network. IEEE Trans Neural Netw Learn Syst 30(7):1967–1974

    Article  Google Scholar 

  10. Deng X, Zhu Y, Newsam S (2018) What is it like down there?: generating dense ground-level views and image features from overhead imagery using conditional generative adversarial networks. In: Proceedings of the 26th ACM SIGSPATIAL international conference on advances in geographic information systems, Seattle, Washington, pp 43–52

  11. Deverall J, Lee J, Ayala M (2017) Using generative adversarial networks to design shoes: the preliminary steps. CS231n in Stanford. http://cs231n.stanford.edu/reports/2017/pdfs/119.pdf

  12. Dutt RK, Premchand P (2017) Generative adversarial networks (GAN) review. CVR J Sci Technol 13:1–5

    Google Scholar 

  13. Eghbal-zadeh H, Widmer G (2017) Likelihood estimation for generative adversarial networks. arXiv preprint arXiv:1707.07530

  14. Frid-Adar M, Klang E, Amitai M, Goldberger J, Greenspan H (2018) Synthetic data augmentation using GAN for improved liver lesion classification. In: 2018 IEEE 15th international symposium on biomedical Imaging (ISBI 2018), Washington, DC, USA, pp 289–293

  15. Gharakhanian A (2017) Generative adversarial networks—hot topic in machine learning. http://www.kdnuggets.com/2017/01/generative-adversarial-networks-hot-topic-machine-learning.html

  16. Ghasedi DK, Wang X, Huang H (2018) Semi-supervised generative adversarial network for gene expression inference. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, London, UK, pp 1435–1444

  17. Gong M, Niu X, Zhang P, Li Z (2017) Generative adversarial networks for change detection in multispectral imagery. IEEE Geosci Remote Sens Lett 14(12):2310–2314

    Article  Google Scholar 

  18. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Bengio Y (2014) Generative adversarial networks. Adv Neural Inf Process Syst 3:2672–2680

    Google Scholar 

  19. Gurumurthy S, Kiran Sarvadevabhatla R, Venkatesh Babu R (2017) Deligan: generative adversarial networks for diverse and limited data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 166–174

  20. Huang X, Li Y, Poursaeed O, Hopcroft J, Belongie S (2017) Stacked generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, vol 1, pp 5077–5086

  21. Hui J (2018) GAN—whats generative adversary networks GAN? https://medium.com/@jonathan_hui/gan-whats-generative-adversarial-networks-and-its-application-f39ed278ef09

  22. Huszár, F (2015) How (not) to train your generative model: scheduled sampling, likelihood, adversary?. arXiv preprint arXiv:1511.05101

  23. Khémiri A, Echi AK, Elloumi M (2019) Bayesian versus convolutional networks for arabic handwriting recognition. Arab J Sci Eng 44(11):9301–9319

    Article  Google Scholar 

  24. Konidaris F, Tagaris T, Sdraka M, Stafylopatis A (2018) Generative Adversarial Networks as an Advanced Data Augmentation Technique for MRI Data. IEEE Trans Med Imaging 37(3):673–679

    Article  Google Scholar 

  25. Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Shi W (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, pp 4681–4690

  26. Li J, Madry A, Peebles J, Schmidt L (2017) On the limitations of first-order approximation in GAN dynamics. arXiv preprint arXiv:1706.09884

  27. Li D, Chen D, Goh J, Ng SK (2018) Anomaly detection with generative adversarial networks for multivariate time series. arXiv preprint arXiv:1809.04758

  28. Li Y, Xiao N, Ouyang W (2018b) Improved boundary equilibrium generative adversarial networks. IEEE Access 6:11342–11348

    Article  Google Scholar 

  29. Li J, He H, Li L, Chen G (2019) A novel generative model with bounded-gan for reliability classification of gear safety. IEEE Trans Industr Electron 66(11):8772–8781

    Article  Google Scholar 

  30. Liu F, Jiao L, Tang X (2019a) Task-oriented GAN for PolSAR image classification and clustering. IEEE Trans Neural Netw Learn Syst 30(9):2707–2719

    Article  Google Scholar 

  31. Liu Y, Zhou Y, Liu X, Dong F, Wang C, Wang Z (2019b) Wasserstein GAN-based small-sample augmentation for new-generation artificial intelligence: a case study of cancer-staging data in biology. Engineering 5(1):156–163

    Article  Google Scholar 

  32. Lu Y, Kakillioglu B, Velipasalar S (2018) Autonomously and simultaneously refining deep neural network parameters by a bi-generative adversarial network aided genetic algorithm. arXiv preprint arXiv:1809.10244

  33. Luc P, Couprie C, Chintala S, Verbeek J (2016) Semantic segmentation using adversarial networks. arXiv preprint arXiv:1611.08408

  34. Lucas A, Lopez-Tapiad S, Molinae R, Katsaggelos AK (2019) Generative adversarial networks and perceptual losses for video super-resolution. IEEE Trans Image Process 28(7):3312–3327

    MathSciNet  Article  Google Scholar 

  35. Matlab Documentation Classification using Nearest neighbours (2019). https://ch.mathworks.com/help/stats/classification-using-nearest-neighbors.html

  36. Marchesi M (2017) Megapixel size image creation using generative adversarial networks. arXiv preprint arXiv:1706.00082

  37. Marouf M, Machart P, Magruder DSS, Bansal V, Kilian C, Krebs CF, Bonn S (2018) Realistic in silico generation and augmentation of single cell RNA-seq data using Generative Adversarial Neural Networks. bioRxiv 390153

  38. Metz L, Poole B, Pfau D, Sohl-Dickstein J (2016) Unrolled generative adversarial networks. arXiv preprint arXiv:1611.02163

  39. Mustafa M, Bard D, Bhimji W, Lukić Z, Al-Rfou R, Kratochvil JM (2019) CosmoGAN: creating high-fidelity weak lensing convergence maps using Generative Adversarial Networks. Comput Astrophys Cosmol 6(1):1

    Article  Google Scholar 

  40. Namozov A, Im Cho Y (2018) An efficient deep learning algorithm for fire and smoke detection with limited data. Adv Electr Comput Eng 18(4):121–129

    Article  Google Scholar 

  41. Oliehoek FA, Savani R, Gallego J, van der Pol E, Groß R (2018) Beyond local nash equilibria for adversarial networks. arXiv preprint arXiv:1806.07268

  42. Pan Z, Yu W, Yi X, Khan A, Yuan F, Zheng Y (2019) Recent progress on generative adversarial networks (GANs): a survey. IEEE Access 7:36322–36333

    Article  Google Scholar 

  43. Quan TM, Nguyen-Duc T, Jeong WK (2018) Compressed sensing MRI reconstruction using a generative adversarial network with a cyclic loss. IEEE Trans Med Imaging 37(6):1488–1497

    Article  Google Scholar 

  44. Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434

  45. Shang C, Palmer A, Sun J, Chen KS, Lu J, Bi J (2017) VIGAN: Missing view imputation with generative adversarial networks. In: 2017 IEEE international conference on big data (big data), Boston, MA, USA, pp 766–775

  46. Tembine H (2019) Deep learning meets game theory: Bregman-based algorithms for interactive deep generative adversarial networks. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2018.2886238

    Article  Google Scholar 

  47. Vertolli MO, Davies J (2017) Image quality assessment techniques show improved training and evaluation of autoencoder generative adversarial networks. arXiv preprint arXiv:1708.02237

  48. Wan G et al (2018) Spatiotemporal regulation of liquid-like condensates in epigenetic inheritance. Nature 557:679–683. https://doi.org/10.1038/s41586-018-0132-0

    Article  Google Scholar 

  49. Wang X, Ghasedi Dizaji K, Huang H (2018) Conditional generative adversarial network for gene expression inference. Bioinformatics 34(17):i603–i611

    Article  Google Scholar 

  50. Wang C, Xu C, Yao X, Tao D (2019) Evolutionary generative adversarial networks. IEEE Trans Evol Comput 23(6):921–934

    Article  Google Scholar 

  51. Weng L (2017) From GAN to WGAN. https://lilianweng.github.io/lil-log/2017/08/20/from-GAN-to-WGAN.html

  52. Wu D, Rice CM, Wang X (2012) Cancer bioinformatics: a new approach to systems clinical medicine. BMC Bioinf 13(1):71

    Article  Google Scholar 

  53. Xuan Q, Chen Z, Liu Y, Huang H, Bao G, Zhang D (2018) Multi-view generative adversarial network and its application in pearl classification. IEEE Trans Industr Electron 66(10):8244–8252

    Article  Google Scholar 

  54. Yu B, Zhou L, Wang L, Shi Y, Fripp J, Bourgeat P (2019) Ea-GANs: edge-aware generative adversarial networks for cross-modality MR image synthesis. IEEE Trans Med Imaging 38(7):1750–1762

    Article  Google Scholar 

  55. Zhu L, Chen Y, Ghamisi P, Benediktsson JA (2018) Generative adversarial networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens 56(9):5046–5063

    Article  Google Scholar 

Download references

Funding

There are no funding agencies involved in this research.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Poonam Chaudhari.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

The authors declare that they have no consent.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by V. Loia.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chaudhari, P., Agrawal, H. & Kotecha, K. Data augmentation using MG-GAN for improved cancer classification on gene expression data. Soft Comput 24, 11381–11391 (2020). https://doi.org/10.1007/s00500-019-04602-2

Download citation

Keywords

  • Data augmentation
  • Generative adversarial network
  • Gene expression dataset
  • Cancer detection
  • Modified generator GAN
  • Multivariate noise
  • Gaussian distribution
  • Latent space
  • Saddle point