Abstract
As knowledge of genetics and genome elements increases, the demand for the development of bioinformatics tools for analyzing these data is raised. Riboswitches are genetic components, usually located in the untranslated regions of mRNAs, that regulate gene expression. Additionally, their interaction with antibiotics has been recently suggested, implying a role in antibiotic effects and resistance. Following a previously published sequential block finding algorithm, herein, we report the development of a new block location-based feature extraction strategy (BLBFE). This procedure utilizes the locations of family-specific sequential blocks on riboswitch sequences as features. Furthermore, the performance of other feature extraction strategies, including mono- and dinucleotide frequencies, k-mer, DAC, DCC, DACC, PC-PseDNC-General and SC-PseDNC-General methods, was investigated. KNN, LDA, naïve Bayes, PNN and decision tree classifiers accompanied by V-fold cross-validation were applied for all methods of feature extraction, and their performances based on the defined feature extraction strategies were compared. Performance measures of accuracy, sensitivity, specificity and F-score for each method of feature extraction were studied. The proposed feature extraction strategy resulted in classification of riboswitches with an average correct classification rate (CCR) of 90.8%. Furthermore, the obtained data confirmed the performance of the developed feature extraction method with an average accuracy of 96.1%, an average sensitivity of 90.8%, an average specificity of 97.52% and an average F-score of 90.69%. Our results implied that the proposed feature extraction (BLBFE) method can classify and discriminate riboswitch families with high CCR, accuracy, sensitivity, specificity and F-score values.
Similar content being viewed by others
References
Aghdam EM, Barzegar A, Hejazi MS (2014a) Evolutionary origin and conserved structural building blocks of riboswitches and ribosomal RNAs: riboswitches as probable target sites for aminoglycosides interaction. Adv Pharm Bull 4:225
Aghdam EM, Hejazi ME, Hejazi MS, Barzegar A (2014b) Riboswitches as potential targets for aminoglycosides compared with rRNA molecules: in silico study. J Microb Biochem Technol 6:1–9
Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79
Baird NJ, Inglese J, Ferre-D’Amare AR (2015) Rapid RNA-ligand interaction analysis through high-information content conformational and stability landscapes. Nat Commun 6:8898
Barrick JE, Breaker RR (2007) The distributions, mechanisms, and structures of metabolite-binding riboswitches. Genome Biol 8:R239
Bartel DP (2009) MicroRNAs: target recognition and regulatory functions. Cell 136:215–233
Blount KF, Wang JX, Lim J, Sudarsan N, Breaker RR (2007) Antibacterial lysine analogs that target lysine riboswitches. Nat Chem Biol 3:44–49
Braga-Neto UM, Dougherty ER (2004) Is cross-validation valid for small-sample microarray classification? Bioinformatics 20:374–380
Breiman L, Spector P (1992) Submodel selection and evaluation in regression. The X-random case. Int Stat Rev 60:291–319
Cech TR (1986) A model for the RNA-catalyzed replication of RNA. Proc Natl Acad Sci USA 83:4360–4363
Chen J, Gottesman S (2014) Riboswitch regulates RNA. Science 345:876–877
Chen W, Zhang X, Brooker J, Lin H, Zhang L, Chou KC (2015) PseKNC-general: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics 31:119–120
Dong Q, Zhou S, Guan J (2009) A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 25:2655–2662
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley-Interscience, New York
Eddy SR (2001) Non-coding RNA genes and the modern RNA world. Nat Rev Genet 2:919–929
Eddy SR, Durbin R (1994) RNA sequence analysis using covariance models. Nucleic Acids Res 22:2079–2088
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27:861–874
Friedel M, Nikolajewa S, Sühnel J, Wilhelm T (2009) DiProDB: a database for dinucleotide properties. Nucleic Acids Res 37:D37–D40
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29:131–163
Golabi F, Shamsi M, Sedaaghi MH, Barzegar A, Hejazi MS (2018) Development of a new sequential block finding strategy for detection of conserved sequences in riboswitches. Bioimpacts 8:15–24
Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A (2005) Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res 33:D121–D124
Guo Y, Yu L, Wen Z, Li M (2008) Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Res 36:3025–3030
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York
Havill J, Bhatiya C, Johnson S (2014) A new approach for detecting riboswitches in DNA sequences. Bioinformatics 30:3012–3019
Heijden F, Duin RPW, de Ridder D, Tax DMJ (2004) Classification, parameter estimation and state estimation: an engineering approach using MATLAB. Wiley, Hoboken
Isaacs FJ, Dwyer DJ, Ding C, Pervouchine DD, Cantor CR, Collins JJ (2004) Engineered riboregulators enable post-transcriptional control of gene expression. Nat Biotechnol 22:841–847
John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., Montreal, pp 338–345
Kang M, Peterson R, Feigon J (2009) Structural insights into riboswitch control of the biosynthesis of queuosine, a modified nucleotide found in the anticodon of tRNA. Mol Cell 33:784–790
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence, vol 2. Morgan Kaufmann Publishers Inc., Montreal, pp 1137–1143
Krogh A, Mian IS, Haussler D (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res 22:4768–4778
Lee ER, Blount KF, Breaker RR (2009) Roseoflavin is a natural antibacterial compound that binds to FMN riboswitches and regulates gene expression. RNA Biol 6:187–194
Liu B, Liu F, Wang X, Chen J, Fang L, Chou K-C (2015) Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res 43:W65–W71
Liu B, Wu H, Chou K-C (2017) Pse-in-one 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nat Sci 09:67–91
Mandal M, Breaker RR (2004) Gene regulation by riboswitches. Nat Rev Mol Cell Biol 5:451–463
Nahvi A, Sudarsan N, Ebert MS, Zou X, Brown KL, Breaker RR (2002) Genetic control by a metabolite binding mRNA. Chem Biol 9:1043
Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY, Eddy SR, Floden EW, Gardner PP, Jones TA, Tate J (2014) Rfam 12.0: updates to the RNA families database. Nucleic Acids Res gku1063
Peselis A, Serganov A (2014) Themes and variations in riboswitch structure and function. Biochim Biophys Acta 1839:908–918
Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15:1119–1125
Quinlan JR (2014) C4.5: programs for machine learning. Elsevier, Amsterdam
Robinson C, Vincent H, Wu M (2014) Modular riboswitch toolsets for synthetic genetic control in diverse bacterial species. J Am Chem Soc 136:10615–10624
Roth A, Breaker RR (2009) The structural and functional diversity of metabolite-binding riboswitches. Annu Rev Biochem 78:305–334
Roth A, Winkler WC, Regulski EE, Lee BW, Lim J, Jona I, Barrick JE, Ritwik A, Kim JN, Welz R (2007) A riboswitch selective for the queuosine precursor preQ1 contains an unusually small aptamer domain. Nat Struct Mol Biol 14:308–317
Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26:544–548
Serganov A, Nudler E (2013) A decade of riboswitches. Cell 152:17–24
Serganov A, Huang L, Patel DJ (2009) Coenzyme recognition and gene regulation by a flavin mononucleotide riboswitch. Nature 458:233–237
Singh S, Singh R (2016) Application of supervised machine learning algorithms for the classification of regulatory RNA riboswitches. Brief Funct Genom 16:99–105
Singh P, Bandyopadhyay P, Bhattacharya S, Krishnamachari A, Sengupta S (2009) Riboswitch detection using profile hidden Markov models. BMC Bioinform 10:325
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45:427–437
Specht DF (1990) Probabilistic neural networks. Neural Netw 3:109–118
Storz G (2002) An expanding universe of noncoding RNAs. Science 296:1260–1263
Sudarsan N, Cohen-Chalamish S, Nakamura S, Emilsson GM, Breaker RR (2005) Thiamine pyrophosphate riboswitches are targets for the antimicrobial compound pyrithiamine. Chem Biol 12:1325–1335
Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 40:3358–3378
Wei L, Liao M, Gao Y, Ji R, He Z, Zou Q (2014) Improved and promising identification of human microRNAs by incorporating a high-quality negative set. IEEE ACM Trans Comput Biol Bioinform 11:192–201
Winkler WC, Nahvi A, Breaker RR (2002) Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature 419:952–956
Winkler WC, Nahvi A, Sudarsan N, Barrick JE, Breaker RR (2003) An mRNA structure that controls gene expression by binding S-adenosylmethionine. Nat Struct Mol Biol 10:701–707
Winkler WC, Nahvi A, Roth A, Collins JA, Breaker RR (2004) Control of gene expression by a natural metabolite-responsive ribozyme. Nature 428:281–286
Yoon B-j, Vaidyanathan P (2004) HMM with auxiliary memory: a new tool for modeling RNA secondary structures. In: Proceedings of 38th Asilomar conference on signals, systems, and computers. Citeseer
Yoon B, Vaidyanathan P (2008) Structural alignment of RNAs using profile-csHMMs and its application to RNA homology search: overview and new results. IEEE Trans Autom Control 53:10–25
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
All authors (5 authors) declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Golabi, F., Shamsi, M., Sedaaghi, M.H. et al. Development of a new oligonucleotide block location-based feature extraction (BLBFE) method for the classification of riboswitches. Mol Genet Genomics 295, 525–534 (2020). https://doi.org/10.1007/s00438-019-01642-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00438-019-01642-z