Abstract
The Gene identification problem or the computational techniques for finding the protein-coding regions in a given DNA sequence is one of the extensive research issues in bioinformatics. Conventional statistical methods that are used to identify the protein-coding region from DNA sequences are expensive and time-consuming. Over the past two decades, there has been an increasing interest in exploring DSP-based solutions to this problem. A class of infinite impulse response (IIR) anti-notch filters (ANF) has been previously used to meet the requirements. However, the prediction accuracy of these filters is still limited due to the inherent nonlinear phase delay and distortion. This paper seeks the remedy of this problem by using a bidirectional filtering approach. The obtained results support the idea of bidirectional filtering as the prediction performance of conventional ANF is improved by 12–17%.
Similar content being viewed by others
Data availability
The datasets analyzed during the current study are available from the corresponding author on reasonable request.
Code availability
The code related to this work can be made available from the corresponding author on reasonable request.
Abbreviations
- TBP:
-
Three base periodicity
- STDFT:
-
Short time discrete Fourier transform
- ANF:
-
Anti-notch filter
- CSANF:
-
Conjugate suppression anti-notch filter
- HSANF:
-
Harmonic suppression anti-notch filter
- MA:
-
Moving average filter
- P-3:
-
Period-3
- ROC:
-
Receiver operating characteristics curve
- AUC:
-
Area under the ROC curve
References
Akhtar M, Epps J, Ambikairajah E (2008) Signal processing in sequence analysis: advances in eukaryotic gene prediction. IEEE J Sel Top Signal Process 2(3):310–321. https://doi.org/10.1109/JSTSP.2008.923854
Anastassiou D (2001) Genomic signal processing. IEEE Signal Process Mag 18(4):8–20. https://doi.org/10.1109/79.939833
Gustafsson F (1996) Determining the initial states in forward-backward filtering. IEEE Trans Signal Process 44(4):988–992. https://doi.org/10.1109/78.492552
Hota MK, Srivastava VK (2012) Identification of protein-coding regions using antinotch filters. Digit Signal Process 22:869–877. https://doi.org/10.1016/j.dsp.2012.06.005
Kwan JYY, Benjamin YMK, Kwan HK (2010) Spectral analysis of numerical exon and intron sequences. In: 2010 IEEE Int Conf Bioinform Biomed Workshops. https://doi.org/10.1109/BIBMW.2010.5703954
Kwan HK, Benjamin YMK, Kwan JYY (2012) Novel methodologies for spectral classification of exon and intron sequences. EURASIP J Adv Signal Process 50(1):1–14. https://doi.org/10.1186/1687-6180-2012-50
Mena-Chalco JP, Carrer H, Zana Y, Cesar RM Jr (2008) Identification of protein coding region using the modified Gabor wavelet transform. IEEE /ACM Trans Comput Biol Bioinform 5:198–207. https://doi.org/10.1109/TCBB.2007.70259
Praokis JG, Manolakis D (2008) Digital signal processing: principles, algorithms and applications, 4th edn. PHI Learning Pvt. Ltd, New Delhi, pp 960–985
Rogic S, Mackworth AK, Ouellette FBF (2001) Evaluation of gene finding programs on mammalian sequences. Genome Res 11(5):817–832
Roy M, Barman S (2014) Effective gene prediction by high resolution frequency estimator based on least-norm solution technique. EURASIP J Bioinform Syst Biol 2(1):1–13. https://doi.org/10.1186/1687-4153-2014-2
Roy M, Barman S (2016) Improved gene prediction by principal component analysis based autoregressive Yule-Walker method. Gene 575:88–497. https://doi.org/10.1016/j.gene.2015.09.023
Sahu SS, Panda G (2010) Identification of protein-coding regions in DNA sequences using a time-frequency filtering approach. Genom Proteom Bionform 9:45–55. https://doi.org/10.1016/S1672-0229(11)60007-7
Singh AK, Srivastava VK (2018) Improved exon prediction technique by de-noising period-3 spectrum with SVD algorithm. In: 5th IEEE Uttar Pradesh Sect Int Conf Electr Electron Compu Eng (UPCON). https://doi.org/10.1109/UPCON.2018.8596884
Singh AK, Srivastava VK (2019) Performance evaluation of different window functions for STDFT based exon prediction technique taking paired numeric mapping scheme. In: 6th Int Conf Signal Process Integr Netw (SPIN). https://doi.org/10.1109/SPIN.2019.8711741
Singh AK, Srivastava VK (2020) A tri-nucleotide mapping scheme based on residual volume of amino acids for short length exon prediction using sliding window DFT method. Netw Modeling Anal Health Inform Bioinform 9(26):1–13. https://doi.org/10.1007/s13721-020-00230-1
Smith SW (1997) Recursive filters. Sci Eng’s Guide Digital Signal Process, Chapter 19:319–332
Tiwari S, Ramachandran S, Bhattacharya A, Bhattacharya S, Ramaswamy R (1997) Prediction of probable genes by Fourier analysis of genomic sequences. CABIOS 13(3):263–270. https://doi.org/10.1093/bioinformatics/13.3.263
Vaidyanathan PP, Yoon BJ (2002) Digital filters for gene prediction applications. Conf Rec Thirty-Sixth Asilomar Conf Signals, Syst Comput, Pac Grove, CA, USA 1:306–310. https://doi.org/10.1109/ACSSC.2002.1197196
Vaidyanathan PP, Yoon BJ (2004) The role of signal-processing concepts in genomics and proteomics. J Franklin Inst 341:111–135. https://doi.org/10.1016/j.jfranklin.2003.12.001
Voss RF (1992) Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys Rev Lett 68:3805–3808. https://doi.org/10.1103/PhysRevLett.68.3805
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Singh, A.K., Srivastava, V.K. Bidirectional filtering approach for the improved protein coding region identification in eukaryotes. Netw Model Anal Health Inform Bioinforma 11, 13 (2022). https://doi.org/10.1007/s13721-022-00358-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-022-00358-2