Skip to main content
Log in

A pattern recognition model to distinguish cancerous DNA sequences via signal processing methods

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Cancer is one of the life-threatening diseases caused by changes in the structure of genetic components of the cell. DNA sequences are one of the most important factors in the formation and spread of this disease. The signal processing approach is one of the scientific fields that has been developed in the last two decades in the analysis of DNA sequences. In this research, a hybrid model of discrete Fourier transform and anti-notch digital filter has been used for this purpose. The aim of using these techniques is to model an approach that can distinguish cancerous samples from non-cancerous ones. In other words, a pattern recognition model is designed to discriminate cancerous cell samples based on the features of protein coding regions of DNA sequences. Some computational and statistical techniques have been used in feature extraction and feature selection stages. Despite the proposed model simplicity, it doesn’t face conventional challenges such as high computational complexity or memory dissipation. Case studies have been tested with the least possible feature, depending on the nature of the features. Experimental results and features relationship led to the proposal of the SVM classifier to discriminate two categories. The output features and classification show good discrimination results among the cancerous and non-cancerous samples. One of the main advantages of the proposed model is the independence of its performance over the data length. Evaluation and validation results indicate the high accuracy and precision of the proposed method which emphasizes the biological genetic mutation nature of cancer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  • Akhtar M (2008) Genomic sequence processing: gene finding in eukaryotes (Doctoral dissertation, The University of New South Wales)

  • Anjali Chithraranjan AD, Hariprasad SA, Saneesh Cleatus T, Ganesh MM (2014) 19-2014-Novel approach on cancer detection. In: International conference on electrical, electronics and computer engineering (ICEECE-2014), pp 60–63

  • Barman S, Saha S, Mondal A, Roy M (2001) Signal processing techniques for the analysis of human genome associated with cancer cells. In: 2nd annual international conference IEMCON, pp 570–573

  • Barman S, Biswas S, Das S, Roy M (2012a) Performance analysis and simulation of IIR anti-notch filter with various structures for gene prediction application. In: 2012 5th International conference on computers and devices for communication (CODEC), pp 1–4

  • Barman S, Saha S, Mandal A, Roy M (2012b) Prediction of protein coding regions of a DNA sequence through spectral analysis. In: 2012 international conference on informatics, electronics & vision (ICIEV), pp 12–16

  • Berger JA, Mitra SK, Astola J (2003) Power spectrum analysis for DNA sequences. In: Seventh international symposium on signal processing and its applications, 2003. Proceedings, vol 2, pp 29–32

  • Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory, pp 144–152

  • Burset M, Guigo R (1996) Evaluation of gene structure prediction programs. Genomics 34(3):353–367

    Google Scholar 

  • Cappelli E, Felici G, Weitschek E (2018) Combining DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction. BioData Min. 11(1):22

    Google Scholar 

  • Celli F, Cumbo F, Weitschek E (2018) Classification of large DNA methylation datasets for identifying cancer drivers. Big Data Res 13:21–28

    Google Scholar 

  • Chakraborty S, Gupta V (2016) DWT based cancer identification using EIIP. In: 2016 second international conference on computational intelligence & communication technology (CICT), pp 718–723

  • Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Das J, Barman S (2014) Bayesian fusion in cancer gene prediction. Int J Comput Appl 1:5–10

    Google Scholar 

  • Das J, Barman S (2017) DSP based entropy estimation for identification and classification of Homo sapiens cancer genes. Microsyst Technol 23(9):4145–4154

    Google Scholar 

  • Das L, Nanda S, Das JK (2018) An integrated approach for identification of exon locations using recursive Gauss Newton tuned adaptive Kaiser window. Genomics 111(3):284–296

    Google Scholar 

  • Datta S, Asif A (2004) DFT based DNA splicing algorithms for prediction of protein coding regions. In: Conference record of the thirty-eighth asilomar conference on signals, systems and computers, vol 1, pp 45–49

  • Fuentes AR, Ginori JVL, Ábalo RG (2006) Detection of coding regions in large DNA sequences using the short time Fourier Transform with reduced computational load. In: Iberoamerican congress on pattern recognition, pp 902–909

  • Gayathri TT (2017) Analysis of genomic sequences for prediction of cancerous cells using wavelet technique. Int Res J Eng Technol 4(4):1071–1077

    Google Scholar 

  • GenBank National Center for Biotechnology Information Database. [Online]. Available: http://www.ncbi.nlm.nih.gov

  • Ghosh A, Barman S (2013) Prediction of prostate cancer cells based on principal component analysis technique. Proc Technol 10:37–44

    Google Scholar 

  • Ghosh A, Barman S (2015) Realization of an EVD Model in LABVIEW Envirenent for Identification of Cancer and Healthy Homo sapiens Genes. Ann Fac Eng Hunedoara 13(2):195

    Google Scholar 

  • Ghosh A, Barman S (2016) Application of BT and PC-BT in Homo sapiens gene prediction. Microsyst Technol 22(11):2691–2705

    Google Scholar 

  • Hota MK, Srivastava VK (2010) Performance analysis of different DNA to numerical mapping techniques for identification of protein coding regions using tapered window based short-time discrete Fourier transform. In: 2010 international conference on power, control and embedded systems (ICPCES), pp 1–4

  • Hota MK, Srivastava VK (2012) Identification of protein coding regions using antinotch filters. Digit Signal Process 22(6):869–877

    MathSciNet  Google Scholar 

  • James B, James B, David FO (1986) Biochemical engineering fundamentals. Mc Grow Hill Book Company, New York

    Google Scholar 

  • Jindal R, Banerji B, Grover D (2015) Prediction and identification of cancerous cells using genomic signal processing. Int J Res Eng IT Soc Sci 5:14–26

    Google Scholar 

  • Joachims T (1999) Transductive inference for text classification using support vector machines. ICML 99:200–209

    Google Scholar 

  • Kanehisa M, Bork P (2003) Bioinformatics in the post-sequence era. Nat Genet 33(3):305–310

    Google Scholar 

  • Kaysar MS, Khan MI (2019) Chapman–Kolmogorov relation based median string algorithm for DNA consensus classification. In: 2019 1st International conference on advances in science, engineering and robotics technology (ICASERT), pp 1–6

  • Kouser K, Lavanya PG, Rangarajan L (2016) Effective feature selection for classification of promoter sequences. PLoS ONE 11(12):e0167165

    Google Scholar 

  • Kwan HK, Kwan BYM, Kwan JYY (2012) Novel methodologies for spectral classification of exon and intron sequences. EURASIP J Adv Signal Process 2012(1):50–63

    Google Scholar 

  • La Rosa M, Fiannaca A, Rizzo R, Urso A (2015) Probabilistic topic modeling for the analysis and classification of genomic sequences. BMC Bioinform 16(Suppl 6):S2

    Google Scholar 

  • Lee PS, Lee KH (2000) Genomic analysis. Curr Opin Biotechnol 11(2):171–175

    Google Scholar 

  • Liu B (2019) BioSeq-analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Brief Bioinform 20(4):1280–1294

    Google Scholar 

  • Marhon SA, Kremer SC (2011) Gene prediction based on DNA spectral analysis: a literature review. J Comput Biol 18(4):639–676

    MathSciNet  Google Scholar 

  • Mesa A, Basterrech S, Guerberoff G, Alvarez-Valin F (2016) Hidden Markov models for gene sequence classification. Pattern Anal Appl 19(3):793–805

    MathSciNet  Google Scholar 

  • Mining WID (2006) Data mining: concepts and techniques. Morgan Kaufinann, Amsterdam

    Google Scholar 

  • Mitra SK, Kuo Y (2006) Digital signal processing: a computer-based approach, vol 2. McGraw-Hill, New York

    Google Scholar 

  • Naeem SM, Mabrouk MS, Eldosoky MA (2017) Detecting genetic variants of breast cancer using different power spectrum methods. In: 2017 13th international computer engineering conference (ICENCO), pp 147–153

  • Osuna E, Freund R, Girosit F (1997) Training support vector machines: an application to face detection. In: IEEE computer society conference on Computer vision and pattern recognition, proceedings, pp 130–136

  • Pontil M, Verri A (1998) Support vector machines for 3D object recognition. IEEE Trans Pattern Anal Mach Intell 20(6):637–646

    Google Scholar 

  • Ramírez V, Román-Godínez I, Torres-Ramos S (2019) DNA-MC: tool for mapping and clustering DNA sequences. In: Latin American conference on biomedical engineering, pp 736–742

  • Rampone S (2004) An error tolerant software equipment for human DNA characterization. IEEE Trans Nucl Sci 51(5):2018–2026

    Google Scholar 

  • Rampone S, Russo C (2012) A fuzzified BRAIN algorithm for learning DNF from incomplete data. Electron J Appl Stat Anal 5(2):256–270

    MathSciNet  Google Scholar 

  • Rao N, Lei X, Guo J, Huang H, Ren Z (2009) An efficient sliding window strategy for accurate location of eukaryotic protein coding regions. Comput Biol Med 39(4):392–395

    Google Scholar 

  • Remita MA, Halioui A, Diouara AAM, Daigle B, Kiani G, Diallo AB (2017) A machine learning approach for viral genome classification. BMC Bioinform 18(1):208

    Google Scholar 

  • Roy T, Barman S (2014) A behavioral study of healthy and cancer genes by modeling electrical network. Gene 550(1):81–92

    Google Scholar 

  • Roy T, Barman S (2016a) Modeling of cancer classifier to predict site of origin. IEEE Trans Nanobiosci 15(5):481–487

    Google Scholar 

  • Roy T, Barman S (2016b) Performance analysis of network model to identify healthy and cancerous colon genes. IEEE J Biomed Health Inform 20(2):710–716

    Google Scholar 

  • Roy T, Barman S (2016c) Design and development of cancer regulatory system by modeling electrical network of gene. Microsyst Technol 22(11):2641–2653

    Google Scholar 

  • Roy SS, Barman S (2018) A non-invasive cancer gene detection technique using FLANN based adaptive filter. In: Microsystem technologies

  • Rushdi A, Tuqan J (2005) Gene identification using the Z-curve representation. In: 2006 IEEE international conference on acoustics, speech and signal processing, 2006. ICASSP 2006 Proceedings, vol 2, pp II–II

  • Saberkari HS, Shamsi M, Sedaaghi MH (2014) A hybrid anti-notch/goertzel model for gene prediction in DNA sequences. Appl Med Inform 34(2):13–22

    Google Scholar 

  • Satapathi GN, Srihari P, Jyothi A, Lavanya S (2013) Prediction of cancer cell using DSP techniques. In: 2013 international conference on communications and signal processing (ICCSP), pp 149–153

  • Shakya DK, Saxena R, Sharma SN (2011) A DSP-based approach for gene prediction in eukaryotic genes. Int J Electr Eng Inform 3(4):480–487

    Google Scholar 

  • Shakya DK, Saxena R, Sharma SN (2013a) Improved exon prediction with transforms by de-noising period-3 measure. Digit Signal Process 23(2):499–505

    MathSciNet  Google Scholar 

  • Shakya DK, Saxena R, Sharma SN (2013b) An adaptive window length strategy for eukaryotic CDS prediction. IEEE/ACM Trans Comput Biol Bioinform 10(5):1241–1252

    Google Scholar 

  • Sharma S, Sandal K, Garg P, Sharma SD (2017) Performance analysis of window functions for exon prediction in DNA sequences. In: 2017 International conference on computing, communication and automation (ICCCA), pp 283–286

  • Siegel R, Ward E, Brawley O, Jemal A (2011) Cancer statistics, 2011: the impact of eliminating socioeconomic and racial disparities on premature cancer deaths. Ca-a Cancer J Clin 61(4):212–236

    Google Scholar 

  • Soentpiet R (1999) Advances in kernel methods: support vector learning. MIT Press, Cambridge

    Google Scholar 

  • Stepanyan IV, Petoukhov SV (2017) The matrix method of representation, analysis and classification of long genetic sequences. Information 8(1):12

    Google Scholar 

  • Theodoridis S, Koutroumbas K (2008) Pattern recognition. IEEE Trans Neural Netw 19(2):376

    MATH  Google Scholar 

  • Tiwari S, Ramachandran S, Bhattacharya A, Bhattacharya S, Ramaswamy R (1997) Prediction of probable genes by Fourier analysis of genomic sequences. Bioinformatics 13(3):263–270

    Google Scholar 

  • Vaidyanathan PP (2004) Genomics and proteomics: a signal processor’s tour. IEEE Circuits Syst Mag 4(4):6–29

    Google Scholar 

  • Vaidyanathan PP, Yoon B-J (2002a) Gene and exon prediction using allpass-based filters. In: Proceedings of IEEE workshop on genomic signal processing and statistics

  • Vaidyanathan PP, Yoon B-J (2002b) Digital filters for gene prediction applications. In: Conference record of the thirty-sixth Asilomar conference on signals, systems and computers, vol 1, pp 306–310

  • Vaidyanathan PP, Yoon B-J (2004) The role of signal-processing concepts in genomics and proteomics. J Franklin Inst 341(1–2):111–135

    MATH  Google Scholar 

  • Wan V, Campbell WM (2000) Support vector machines for speaker verification and identification. In: Neural networks for signal processing X, 2000. Proceedings of the 2000 IEEE signal processing society workshop, vol 2, pp 775–784

  • Weitschek E, Di Lauro S, Cappelli E, Bertolazzi P, Felici G (2018) CamurWeb: a classification software and a large knowledge base for gene expression data of cancer. BMC Bioinform 19(10):354

    Google Scholar 

  • Wu Q et al (2018) Deep learning for predicting disease status using genomic data. PeerJ Preprints

  • Yin C, Yau SS-T (2007) Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. J Theor Biol 247(4):687–694

    MathSciNet  Google Scholar 

  • Yoon BJ (2007) Signal processing methods for genomic sequence analysis (Doctoral dissertation, California Institute of Technology)

  • Zainal Ariffin O, Nor Saleha IT (2011) National cancer registry report 2007, Malaysia Ministty of Health

  • Zhang W-F, Yan H (2012) Exon prediction using empirical mode decomposition and Fourier transform of structural profiles of DNA sequences. Pattern Recognit 45(3):947–955

    Google Scholar 

  • Zhang R, Zhang C-T (1994) Z curves, an intutive tool for visualizing and analyzing the DNA sequences. J Biomol Struct Dyn 11(4):767–782

    Google Scholar 

  • Zhang L, Tian F, Wang S (2012) A modified statistically optimal null filter method for recognizing protein-coding regions. Genom Proteom Bioinform 10(3):166–173

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amin Khodaei.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khodaei, A., Feizi-Derakhshi, MR. & Mozaffari-Tazehkand, B. A pattern recognition model to distinguish cancerous DNA sequences via signal processing methods. Soft Comput 24, 16315–16334 (2020). https://doi.org/10.1007/s00500-020-04942-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-020-04942-4

Keywords

Navigation