Skip to main content
Log in

Adap-BDCM: Adaptive Bilinear Dynamic Cascade Model for Classification Tasks on CNV Datasets

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Copy number variation (CNV) is an essential genetic driving factor of cancer formation and progression, making intelligent classification based on CNV feasible. However, there are a few challenges in the current machine learning and deep learning methods, such as the design of base classifier combination schemes in ensemble methods and the selection of layers of neural networks, which often result in low accuracy. Therefore, an adaptive bilinear dynamic cascade model (Adap-BDCM) is developed to further enhance the accuracy and applicability of these methods for intelligent classification on CNV datasets. In this model, a feature selection module is introduced to mitigate the interference of redundant information, and a bilinear model based on the gated attention mechanism is proposed to extract more beneficial deep fusion features. Furthermore, an adaptive base classifier selection scheme is designed to overcome the difficulty of manually designing base classifier combinations and enhance the applicability of the model. Lastly, a novel feature fusion scheme with an attribute recall submodule is constructed, effectively avoiding getting stuck in local solutions and missing some valuable information. Numerous experiments have demonstrated that our Adap-BDCM model exhibits optimal performance in cancer classification, stage prediction, and recurrence on CNV datasets. This study can assist physicians in making diagnoses faster and better.

Graphical Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Code Availability

The source codes for the Adap-BDCM model and the CNV dataset for cancer classification have been archived in the GitHub repository (https://github.com/junhonga/Adap-BDCM).

References

  1. Jin J, Wu X, Yin J et al (2019) Identification of genetic mutations in cancer: challenge and opportunity in the new era of targeted therapy. Front Onco 9:263. https://doi.org/10.3389/fonc.2019.00263

    Article  Google Scholar 

  2. Poduri A, Evrony GD, Cai X et al (2013) Somatic mutation, genomic variation, and neurological disease. Science 341(6141):1237758. https://doi.org/10.1126/science.1237758

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Redon R, Ishikawa S, Fitch KR et al (2006) Global variation in copy number in the human genome. Nature 444(7118):444–454. https://doi.org/10.1038/nature05329

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Buchynska LG, Brieiev OV, Iurchenko NP (2019) Assessment of HER-2/neu, c-MYC and CCNE1 gene copy number variations and protein expression in endometrial carcinomas. Exp Oncol 41(2):138–143. https://doi.org/10.32471/exp-oncology.2312-8852.vol-41-no-2.12973

    Article  CAS  PubMed  Google Scholar 

  5. Tian T, Bi H, Liu Y et al (2020) Copy number variation of ubiquitin-specific proteases genes in blood leukocytes and colorectal cancer. Cancer Biol Ther 21(7):637–646. https://doi.org/10.1080/15384047.2020.1750860

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Zhang N, Wang M, Zhang P et al (1860) (2016) Classification of cancers based on copy number variation landscapes. Bba-gen Subjects 11:2750–2755. https://doi.org/10.1016/j.bbagen.2016.06.003

    Article  CAS  Google Scholar 

  7. Liang Y, Wang H, Yang J et al (2020) A deep learning framework to predict tumor tissue-of-origin based on copy number alteration. Front Bioeng Biotech 8:701. https://doi.org/10.3389/fbioe.2020.00701

    Article  Google Scholar 

  8. Wu Q, Li D (2022) CRIA: an interactive gene selection algorithm for cancers prediction based on copy number variations. Front Plant Sci 13:839044. https://doi.org/10.3389/fpls.2022.839044

    Article  PubMed  PubMed Central  Google Scholar 

  9. Zhou ZH, Feng J (2019) Deep forest. Natl Sci Rev 6(1):74–86. https://doi.org/10.1093/nsr/nwy108

    Article  PubMed  Google Scholar 

  10. Guo Y, Liu S, Li Z et al (2018) BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data. BMC Bioinform 19(5):1–13. https://doi.org/10.1186/s12859-018-2095-4

    Article  Google Scholar 

  11. El-Nabawy A, Belal NA, El-Bendary N (2021) A cascade deep forest model for breast cancer subtype classification using multi-omics data. Mathematics 9(13):1574. https://doi.org/10.3390/math9131574

    Article  Google Scholar 

  12. Zhong L, Meng Q, Chen Y (2021) A cascade flexible neural forest model for cancer subtypes classification on gene expression data. Comput Intel Neurosc 2021:1–11. https://doi.org/10.1155/2021/6480456

    Article  Google Scholar 

  13. Shaaban MA, Hassan YF, Guirguis SK (2022) Deep convolutional forest: a dynamic deep ensemble approach for spam detection in text. Complex Intell Syst 8(6):4897–4909. https://doi.org/10.1007/s40747-022-00741-6

    Article  Google Scholar 

  14. Tenenbaum JB, Freeman WT (2000) Separating style and content with bilinear models. Neural Comput 12(6):1247–1283. https://doi.org/10.1162/089976600300015349

    Article  CAS  PubMed  Google Scholar 

  15. Lin TY, RoyChowdhury A, Maji S (2015) Bilinear CNN models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on computer vision, pp 1449–1457. https://doi.org/10.1109/ICCV.2015.170

  16. Gao Y, Beijbom O, Zhang N et al (2016) Compact bilinear pooling. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 317–326. https://doi.org/10.1109/CVPR.2016.41

  17. Kim JH, On KW, Lim W et al (2016) Hadamard product for low-rank bilinear pooling. arXiv. https://doi.org/10.48550/arXiv.1610.04325

  18. Li Y, Wang N, Liu J et al (2017) Factorized bilinear models for image recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2079–2087. https://doi.org/10.1109/ICCV.2017.229

  19. Li E, Samat A, Du P et al (2020) Improved bilinear CNN model for remote sensing scene classification. IEEE Geosci Remote Sens Lett 19:1–5. https://doi.org/10.1109/LGRS.2020.3040153

    Article  Google Scholar 

  20. Yu Z, Yu J, Fan J et al (2017) Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: Proceedings of the IEEE International Conference on computer vision, pp 1821–1830. https://doi.org/10.1109/ICCV.2017.202

  21. Gao C, Chen Y, Jiang X et al (2023) Bi-STAN: bilinear spatial-temporal attention network for wearable human activity recognition. Int J Mach Learn Cyb 14(7):2545–2561. https://doi.org/10.1007/s13042-023-01781-1

    Article  Google Scholar 

  22. Wang Z, Li R, Wang M et al (2021) GPDBN: deep bilinear network integrating both genomic data and pathological images for breast cancer prognosis prediction. Bioinformatics 37(18):2963–2970. https://doi.org/10.1093/bioinformatics/btab185

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Li R, Wu X, Li A et al (2022) HFBSurv: hierarchical multimodal fusion with factorized bilinear models for cancer survival prediction. Bioinformatics 38(9):2587–2594. https://doi.org/10.1093/bioinformatics/btac113

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Qiu L, Khormali A, Liu K (2023) Deep biological pathway informed pathology-genomic multimodal survival prediction. arXiv. https://doi.org/10.48550/arXiv.2301.02383

  25. Dudoit S, Fridlyand J (2003) Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9):1090–1099. https://doi.org/10.1093/bioinformatics/btg038

    Article  CAS  PubMed  Google Scholar 

  26. Wang A, Liu H, Yang J et al (2022) Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data. Comput Biol Med 142:105208. https://doi.org/10.1016/j.compbiomed.2021.105208

    Article  CAS  PubMed  Google Scholar 

  27. Elmi J, Eftekhari M, Mehrpooya A et al (2023) A novel framework based on the multi-label classification for dynamic selection of classifiers. Int J Mach Learn Cyb 14(6):2137–2154. https://doi.org/10.1007/s13042-022-01751-z

    Article  Google Scholar 

  28. Hashemi A, Dowlatshahi MB, Nezamabadi-pour H (2022) Ensemble of feature selection algorithms: a multi-criteria decision-making approach. Int J Mach Learn Cyb 13(1):49–69. https://doi.org/10.1007/s13042-021-01347-z

    Article  Google Scholar 

  29. Ferreira AJ, Figueiredo MAT (2012) Boosting algorithms: a review of methods, theory, and applications. In: Zhang C, Ma Y (eds) Ensemble machine learning: methods and applications. Spring, New York, pp 35–85. https://doi.org/10.1007/978-1-4419-9326-7_2

    Chapter  Google Scholar 

  30. Wang FY, Zhou DW, Ye HJ et al (2022) Foster: Feature boosting and compression for class-incremental learning In: European Conference on Computer Vision, pp 398–414. https://doi.org/10.1007/978-3-031-19806-9_23

  31. Mostafaei SH, Tanha J (2023) OUBoost: boosting based over and under sampling technique for handling imbalanced data. Int J Mach Learn Cyb 14(10):3393–3411. https://doi.org/10.1007/s13042-023-01839-0

    Article  CAS  Google Scholar 

  32. Roshan S, Tanha J, Hallaji F et al (2023) IMBoost: a new weighting factor for boosting to improve the classification performance of imbalanced data. Complexity 2023:2176891. https://doi.org/10.1155/2023/2176891

    Article  Google Scholar 

  33. Liong VE, Lu J, Wang G (2013) Face recognition using deep PCA. In: 2013 9th International Conference on Information, Communications & Signal Processing, pp 1–5. https://doi.org/10.1109/ICICS.2013.6782777

  34. Chan TH, Jia K, Gao S et al (2015) PCANet: a simple deep learning baseline for image classification? IEEE T Image Process 4(12):5017–5032. https://doi.org/10.1109/TIP.2015.2475625

    Article  Google Scholar 

  35. Wang W, Dai QY, Li F et al (2021) MLCDForest: multi-label classification with deep forest in disease prediction for long non-coding RNAs. Brief Bioinform 22(3):bbaa104. https://doi.org/10.1093/bib/bbaa104

    Article  CAS  PubMed  Google Scholar 

  36. Peng L, Tan J, Tian X et al (2022) EnANNDeep: an ensemble-based lncRNA–protein interaction prediction framework with adaptive k-nearest neighbor classifier and deep models. Interdiscip Sci 14(1):209–232. https://doi.org/10.1007/s12539-021-00483-y

    Article  CAS  PubMed  Google Scholar 

  37. Muthukrishnan R, Rohini R (2016) LASSO: A feature selection technique in predictive modeling for machine learning. In: 2016 IEEE International Conference on advances in computer applications (ICACA), pp 18–20. https://doi.org/10.1109/ICACA.2016.7887916

  38. Arevalo J, Solorio T, Montes-y-Gómez M et al (2017) Gated multimodal units for information fusion. arXiv. https://doi.org/10.48550/arXiv.1702.01992

  39. Zhu T, Lin Y, Liu Y (2017) Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recogn 72:327–340. https://doi.org/10.1016/j.patcog.2017.07.024

    Article  Google Scholar 

  40. Cerami E, Gao J, Dogrusoz U et al (2012) The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov 2(5):401–404. https://doi.org/10.1158/2159-8290.CD-12-0095

    Article  PubMed  Google Scholar 

  41. Gao JJ, Aksoy BA, Dogrusoz U et al (2013) Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal 6(269):l1. https://doi.org/10.1126/scisignal.2004088

    Article  CAS  Google Scholar 

  42. Mermel CH, Schumacher SE, Hill B et al (2011) GISTIC2. 0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol 12:1–14. https://doi.org/10.1186/gb-2011-12-4-r41

    Article  CAS  Google Scholar 

  43. Ciriello G, Miller ML, Aksoy BA et al (2013) Emerging landscape of oncogenic signatures across human cancers. Nat Genet 45(10):1127–1133. https://doi.org/10.1038/ng.2762

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Li J, Cheng K, Wang S et al (2017) Feature selection: a data perspective. ACM Comput Surv (CSUR) 50(6):1–45. https://doi.org/10.1145/3136625

    Article  Google Scholar 

  45. Fisher A, Rudin C, Dominici F (2019) All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. J Mach Learn Res 20(177):1–81. https://doi.org/10.1016/j.petsci.2022.09.003

    Article  CAS  Google Scholar 

  46. Pan X, Hu XH, Zhang YH et al (2019) Identification of the copy number variant biomarkers for breast cancer subtypes. Mol Genet Genom 294:95–110. https://doi.org/10.1007/s00438-018-1488-4

    Article  CAS  Google Scholar 

  47. Huang T, Chen C, Du J et al (2023) A tRF-5a fragment that regulates radiation resistance of colorectal cancer cells by targeting MKNK1. J Cell Mol Med 27(24):4021–4033. https://doi.org/10.1111/jcmm.17982

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Fernandez-Rozadilla C, Cazier JB, Tomlinson IP et al (2013) A colorectal cancer genome-wide association study in a Spanish cohort identifies two variants associated with colorectal cancer risk at 1p33 and 8p12. BMC Genom 14:1–11. https://doi.org/10.1186/1471-2164-14-55

    Article  CAS  Google Scholar 

  49. Kim S, Kim JM, Lee HJ et al (2020) Alteration of CYP4A11 expression in renal cell carcinoma: diagnostic and prognostic implications. J Cancer 11(6):1478. https://doi.org/10.7150/jca.36438

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Lee K, Jeong H, Lee S et al (2019) CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network. Sci Rep-UK 9(1):16927. https://doi.org/10.1038/s41598-019-53034-3

    Article  CAS  Google Scholar 

  51. Shen J, Shi J, Luo J et al (2022) Deep learning approach for cancer subtype classification using high-dimensional gene expression data. BMC Bioinform 23(1):1–17. https://doi.org/10.1186/s12859-022-04980-9

    Article  Google Scholar 

Download references

Acknowledgements

In this paper, the work is supported by the Fundamental Research Program of Shanxi Province (General program) (Grant No. 202303021211082, 202303021211025).

Author information

Authors and Affiliations

Authors

Contributions

Concept and design: Liancheng Jiang and Liye Jia, data collection and analysis: Yizhen Wang; drafting of the article: Liancheng Jiang, Liye Jia and Junhong Yue; critical revision of the article for important content: Liancheng Jiang, Liye Jia, Yizhen Wang, Yongfei Wu, Junhong Yue, All the authors approved the final article.

Corresponding author

Correspondence to Junhong Yue.

Ethics declarations

Competing Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethics statement

Not applicable.

Informed consent

Not applicable.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, L., Jia, L., Wang, Y. et al. Adap-BDCM: Adaptive Bilinear Dynamic Cascade Model for Classification Tasks on CNV Datasets. Interdiscip Sci Comput Life Sci (2024). https://doi.org/10.1007/s12539-024-00635-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12539-024-00635-w

Keywords

Navigation