i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation

Hasan, Md. Mehedi; Manavalan, Balachandran; Shoombuatong, Watshara; Khatun, Mst. Shamima; Kurata, Hiroyuki

doi:10.1007/s11103-020-00988-y

i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation

Published: 05 March 2020

Volume 103, pages 225–234, (2020)
Cite this article

Plant Molecular Biology Aims and scope Submit manuscript

Md. Mehedi Hasan^1,2,
Balachandran Manavalan³,
Watshara Shoombuatong⁴,
Mst. Shamima Khatun¹ &
…
Hiroyuki Kurata ORCID: orcid.org/0000-0003-4254-2214^1,5

1291 Accesses
58 Citations
Explore all metrics

Abstract

DNA N⁶-methyladenine (6 mA) is one of the most vital epigenetic modifications and involved in controlling the various gene expression levels. With the avalanche of DNA sequences generated in numerous databases, the accurate identification of 6 mA plays an essential role for understanding molecular mechanisms. Because the experimental approaches are time-consuming and costly, it is desirable to develop a computation model for rapidly and accurately identifying 6 mA. To the best of our knowledge, we first proposed a computational model named i6mA-Fuse to predict 6 mA sites from the Rosaceae genomes, especially in Rosa chinensis and Fragaria vesca. We implemented the five encoding schemes, i.e., mononucleotide binary, dinucleotide binary, k-space spectral nucleotide, k-mer, and electron–ion interaction pseudo potential compositions, to build the five, single-encoding random forest (RF) models. The i6mA-Fuse uses a linear regression model to combine the predicted probability scores of the five, single encoding-based RF models. The resultant species-specific i6mA-Fuse achieved remarkably high performances with AUCs of 0.982 and 0.978 and with MCCs of 0.869 and 0.858 on the independent datasets of Rosa chinensis and Fragaria vesca, respectively. In the F. vesca-specific i6mA-Fuse, the MBE and EIIP contributed to 75% and 25% of the total prediction; in the R. chinensis-specific i6mA-Fuse, Kmer, MBE, and EIIP contribute to 15%, 65%, and 20% of the total prediction. To assist high-throughput prediction for DNA 6 mA identification, the i6mA-Fuse is publicly accessible at https://kurata14.bio.kyutech.ac.jp/i6mA-Fuse/.

Key message

The existing prediction models are not suitable to identify 6mA in the Rosaceae genome because the existing algorithms are species-specific. Thus, a novel predictor is desired to be established to identify 6mA sites in the Rosaceae genome. To the best of our knowledge, we first propose a computation model named i6mA-Fuse (Identification of N6-MethylAdenine sites by Fusing multiple feature representation) to predict 6mA sites from the Rosaceae genomes, especially in Rosa chinensis and Fragaria vesca.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DNA barcoding, an effective tool for species identification: a review

Article 29 October 2022

RNA-Seq Data Analysis in Galaxy

A survey of best practices for RNA-seq data analysis

Article Open access 26 January 2016

References

Basith S, Manavalan B, Shin TH, Lee G (2019) SDM6A: a web-based integrative machine-learning framework for predicting 6mA sites in the rice genome. Mol Ther Nucleic Acids 18:131–141. https://doi.org/10.1016/j.omtn.2019.08.011
Article CAS PubMed PubMed Central Google Scholar
Basith S, Manavalan B, Shin TH, Lee G (2020) Machine intelligence in peptide therapeutics: a next-generation tool for rapid disease screening. Med Res Rev. https://doi.org/10.1002/med.21658
Article PubMed Google Scholar
Boopathi V, Subramaniyam S, Malik A, Lee G, Manavalan B, Yang DC (2019) mACPpred: a support vector machine-based meta-predictor for identification of anticancer peptides. Int J Mol Sci. https://doi.org/10.3390/ijms20081964
Article PubMed PubMed Central Google Scholar
Charoenkwan P, Shoombuatong W, Lee HC, Chaijaruwanich J, Huang HL, Ho SY (2013) SCMCRYS: predicting protein crystallization using an ensemble scoring card method with estimating propensity scores of P-collocated amino acid pairs. PLoS ONE 8:e72368. https://doi.org/10.1371/journal.pone.0072368
Article CAS PubMed PubMed Central Google Scholar
Chen W, Lin H, Chou KC (2015) Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. Mol BioSyst 11:2620–2634. https://doi.org/10.1039/c5mb00155b
Article CAS PubMed Google Scholar
Chen W, Lv H, Nie F, Lin H (2019a) i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics 35:2796–2800. https://doi.org/10.1093/bioinformatics/btz015
Article CAS PubMed Google Scholar
Chen W, Tang H, Ye J, Lin H, Chou KC (2016) iRNA-PseU: identifying RNA pseudouridine sites. Mol Ther Nucleic Acids 5:e332. https://doi.org/10.1038/mtna.2016.37
Article CAS PubMed PubMed Central Google Scholar
Chen Z et al (2019b) Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences. Brief Bioinform. https://doi.org/10.1093/bib/bbz112
Article PubMed Google Scholar
Chou KC (2011) Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 273:236–247. https://doi.org/10.1016/j.jtbi.2010.12.024
Article CAS PubMed Google Scholar
Chou KC (2019) Advance in predicting subcellular localization of multi-label proteins and its implication for developing multi-target drugs. Curr Med Chem. https://doi.org/10.2174/0929867326666190507082559
Article PubMed Google Scholar
Ding H, Yang W, Tang H, Feng PM, Huang J, Chen W, Lin H (2016) PHYPred: a tool for identifying bacteriophage enzymes and hydrolases. Virol Sin 31:350–352. https://doi.org/10.1007/s12250-016-3740-6
Article PubMed Google Scholar
Du K et al (2019) Epigenetically modified N(6)-methyladenine inhibits DNA replication by human DNA polymerase eta. DNA Repair 78:81–90. https://doi.org/10.1016/j.dnarep.2019.03.015
Article CAS PubMed Google Scholar
Feng P, Yang H, Ding H, Lin H, Chen W, Chou KC (2019) iDNA6mA-PseKNC: identifying DNA N(6)-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics 111:96–102. https://doi.org/10.1016/j.ygeno.2018.01.005
Article CAS PubMed Google Scholar
Frank E, Hall M, Trigg L, Holmes G, Witten IH (2004) Data mining in bioinformatics using Weka. Bioinformatics 20:2479–2481. https://doi.org/10.1093/bioinformatics/bth261
Article CAS PubMed Google Scholar
Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150–3152. https://doi.org/10.1093/bioinformatics/bts565
Article CAS PubMed PubMed Central Google Scholar
Hasan MM, Khatun MS, Kurata H (2019a) Large-scale assessment of bioinformatics tools for lysine succinylation sites. Cells. https://doi.org/10.3390/cells8020095
Article PubMed PubMed Central Google Scholar
Hasan MM, Manavalan B, Khatun MS, Kurata H (2019b) i4mC-ROSE, a bioinformatics tool for the identification of DNA N4-methylcytosine sites in the Rosaceae genome. Int J Biol Macromol. https://doi.org/10.1016/j.ijbiomac.2019.12.009
Article PubMed Google Scholar
Hasan MM, Manavalan B, Khatun MS, Kurata H (2019c) Prediction of S-nitrosylation sites by integrating support vector machines and random forest. Mol Omics 15:451–458. https://doi.org/10.1039/c9mo00098d
Article CAS PubMed Google Scholar
Hasan MM, Rashid MM, Khatun MS, Kurata H (2019d) Computational identification of microbial phosphorylation sites by the enhanced characteristics of sequence information. Sci Rep 9:8258. https://doi.org/10.1038/s41598-019-44548-x
Article CAS PubMed PubMed Central Google Scholar
Huang Q, Zhang J, Wei L, Guo F, Zou Q (2020) 6mA-RicePred: a method for identifying DNA N(6)-methyladenine sites in the rice genome based on feature fusion. Front Plant Sci 11:4. https://doi.org/10.3389/fpls.2020.00004
Article PubMed PubMed Central Google Scholar
Jia C, Yang Q, Zou Q (2018) NucPosPred: predicting species-specific genomic nucleosome positioning via four different modes of general PseKNC. J Theor Biol 450:15–21. https://doi.org/10.1016/j.jtbi.2018.04.025
Article CAS PubMed Google Scholar
Khatun MS, Hasan MM, Kurata H (2019a) PreAIP: computational prediction of anti-inflammatory peptides by integrating multiple complementary features. Front Genet 10:129. https://doi.org/10.3389/fgene.2019.00129
Article CAS PubMed PubMed Central Google Scholar
Khatun S, Hasan M, Kurata H (2019b) Efficient computational model for identification of antitubercular peptides by integrating amino acid patterns and properties. FEBS Lett 593:3029–3039. https://doi.org/10.1002/1873-3468.13536
Article CAS PubMed Google Scholar
Li F et al (2019) DeepCleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz721
Article PubMed PubMed Central Google Scholar
Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2:18–22
Google Scholar
Liu B, Liu F, Wang X, Chen J, Fang L, Chou KC (2015) Pse-in-One: a web server for generating various modes of pseudo components of DNA RNA, and protein sequences. Nucleic Acids Res 43:W65–71. https://doi.org/10.1093/nar/gkv458
Article CAS PubMed PubMed Central Google Scholar
Liu B, Fang L, Long R, Lan X, Chou KC (2016) iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics 32:362–369. https://doi.org/10.1093/bioinformatics/btv604
Article CAS PubMed Google Scholar
Liu B, Li K, Huang DS, Chou KC (2018a) iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach. Bioinformatics 34:3835–3842. https://doi.org/10.1093/bioinformatics/bty458
Article CAS PubMed Google Scholar
Liu X, Lai W, Zhang N, Wang H (2018b) Predominance of N(6)-methyladenine-specific DNA fragments enriched by multiple immunoprecipitation. Anal Chem 90:5546–5551. https://doi.org/10.1021/acs.analchem.8b01087
Article CAS PubMed Google Scholar
Liu ZY et al (2019) MDR: an integrative DNA N6-methyladenine and N4-methylcytosine modification database for Rosaceae. Hortic Res 6:78. https://doi.org/10.1038/s41438-019-0160-4
Article CAS PubMed PubMed Central Google Scholar
Lv H et al (2019a) iDNA6mA-Rice: a computational tool for detecting N6-methyladenine sites in rice. Front Genet 10:793. https://doi.org/10.3389/fgene.2019.00793
Article CAS PubMed PubMed Central Google Scholar
Lv Z, Jin S, Ding H, Zou Q (2019b) A random forest sub-Golgi protein classifier optimized via dipeptide and amino acid composition features. Front Bioeng Biotechnol 7:215. https://doi.org/10.3389/fbioe.2019.00215
Article PubMed PubMed Central Google Scholar
Manavalan B, Shin TH, Lee G (2018) DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest. Oncotarget 9:1944–1956. https://doi.org/10.18632/oncotarget.23099
Article PubMed Google Scholar
Manavalan B, Shin TH, Kim MO, Lee G (2018) AIPpred: sequence-based prediction of anti-inflammatory peptides using random forest. Front Pharmacol 9:276. https://doi.org/10.3389/fphar.2018.00276
Article CAS PubMed PubMed Central Google Scholar
Manavalan B, Basith S, Shin TH, Wei L, Lee G (2018a) mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation. Bioinformatics. https://doi.org/10.1093/bioinformatics/bty1047
Article Google Scholar
Manavalan B, Basith S, Shin TH, Lee DY, Wei L, Lee G (2019a) 4mCpred-EL: an ensemble learning framework for identification of DNA N(4)-methylcytosine sites in the mouse. Genome Cells. https://doi.org/10.3390/cells8111332
Article PubMed Google Scholar
Manavalan B, Basith S, Shin TH, Wei L, Lee G (2019b) AtbPpred: a robust sequence-based prediction of anti-tubercular peptides using extremely randomized trees. Comput Struct Biotechnol J 17:972–981. https://doi.org/10.1016/j.csbj.2019.06.024
Article CAS PubMed PubMed Central Google Scholar
Manavalan B, Basith S, Shin TH, Wei L, Lee G (2019c) mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation. Bioinformatics 35:2757–2765. https://doi.org/10.1093/bioinformatics/bty1047
Article CAS PubMed Google Scholar
Manavalan B, Basith S, Shin TH, Wei L, Lee G (2019d) Meta-4mCpred: a sequence-based meta-predictor for accurate DNA 4mC site prediction using effective feature representation. Mol Ther Nucleic Acids 16:733–744. https://doi.org/10.1016/j.omtn.2019.04.019
Article CAS PubMed PubMed Central Google Scholar
McIntyre ABR, Alexander N, Grigorev K, Bezdan D, Sichtig H, Chiu CY, Mason CE (2019) Single-molecule sequencing detection of N6-methyladenine in microbial reference materials. Nat Commun 10:579. https://doi.org/10.1038/s41467-019-08289-9
Article CAS PubMed PubMed Central Google Scholar
O'Brown ZK, Greer EL (2016) N6-methyladenine: a conserved and dynamic DNA mark. Adv Exp Med Biol 945:213–246. https://doi.org/10.1007/978-3-319-43624-1_10
Article CAS PubMed PubMed Central Google Scholar
Qianfei Huang F, Zhang Z, Wei L, Guo F, Zou Q (2020) 6mA-RicePred: a method for identifying DNA N6-methyladenine sites in the rice genome based on feature fusion. Front Plant Sci. https://doi.org/10.3389/fpls.2020.00004
Article PubMed PubMed Central Google Scholar
Schaduangrat N, Nantasenamat C, Prachayasittikul V, Shoombuatong W (2019) Meta-iAVP: a sequence-based meta-predictor for improving the prediction of antiviral peptides using effective feature representation. Int J Mol Sci. https://doi.org/10.3390/ijms20225743
Article PubMed PubMed Central Google Scholar
Shoombuatong W, Schaduangrat N, Pratiwi R, Nantasenamat C (2019) THPep: a machine learning-based approach for predicting tumor homing peptides. Comput Biol Chem 80:441–451. https://doi.org/10.1016/j.compbiolchem.2019.05.008
Article CAS PubMed Google Scholar
Su R, Hu J, Zou Q, Manavalan B, Wei L (2019) Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools. Brief Bioinform. https://doi.org/10.1093/bib/bby124
Article PubMed Google Scholar
Sun S, Wang C, Ding H, Zou Q (2020) Machine learning and its applications in plant molecular studies. Brief Funct Genom 19:40–48. https://doi.org/10.1093/bfgp/elz036
Article Google Scholar
Vacic V, Iakoucheva LM, Radivojac P (2006) Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics 22:1536–1537. https://doi.org/10.1093/bioinformatics/btl151
Article CAS PubMed Google Scholar
Wang X, Yan R (2018) RFAthM6A: a new tool for predicting m(6)A sites in Arabidopsis thaliana. Plant Mol Biol 96:327–337. https://doi.org/10.1007/s11103-018-0698-9
Article CAS PubMed Google Scholar
Wei L, Su R, Luan S, Liao Z, Manavalan B, Zou Q, Shi X (2019) Iterative feature representations improve N4-methylcytosine site prediction. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz408
Article PubMed PubMed Central Google Scholar
Win TS, Malik AA, Prachayasittikul V, Wikberg SJE, Nantasenamat C, Shoombuatong W (2017) HemoPred: a web server for predicting the hemolytic activity of peptides. Future Med Chem 9:275–291. https://doi.org/10.4155/fmc-2016-0188
Article CAS PubMed Google Scholar
Xiong J, Ye TT, Ma CJ, Cheng QY, Yuan BF, Feng YQ (2019) N 6-Hydroxymethyladenine: a hydroxylation derivative of N6-methyladenine in genomic DNA of mammals. Nucleic Acids Res 47:1268–1277. https://doi.org/10.1093/nar/gky1218
Article CAS PubMed Google Scholar
Xu ZC, Feng PM, Yang H, Qiu WR, Chen W, Lin H (2019) iRNAD: a computational tool for identifying D modification sites in RNA sequence. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz358
Article PubMed PubMed Central Google Scholar
Yang W, Zhu XJ, Huang J, Ding H, Lin H (2019) A brief survey of machine learning methods in protein sub-Golgi localization. Curr Bioinform 14:234–240
Article CAS Google Scholar
Yang H, Yang W, Dao FY, Lv H, Ding H, Chen W, Lin H (2019) A comparison and assessment of computational method for identifying recombination hotspots in Saccharomyces cerevisiae. Brief Bioinform. https://doi.org/10.1093/bib/bbz123
Article PubMed Google Scholar
Yu H, Dai Z (2019) SNNRice6mA: a deep learning method for predicting DNA N6-methyladenine sites in rice genome. Front Genet 10:1071. https://doi.org/10.3389/fgene.2019.01071
Article PubMed PubMed Central Google Scholar
Zhang G et al (2015) N6-methyladenine DNA modification in Drosophila. Cell 161:893–906. https://doi.org/10.1016/j.cell.2015.04.018
Article CAS PubMed Google Scholar
Zhang Q et al (2018) N(6)-methyladenine DNA methylation in Japonica and Indica rice genomes and its association with gene expression, plant development, and stress responses. Mol Plant 11:1492–1508. https://doi.org/10.1016/j.molp.2018.11.005
Article CAS PubMed Google Scholar
Zhang Y et al (2019) PeNGaRoo, a combined gradient boosting and ensemble learning framework for predicting non-classical secreted proteins. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz629
Article PubMed PubMed Central Google Scholar
Zhou Y, Zeng P, Li YH, Zhang Z, Cui Q (2016) SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features. Nucleic Acids Res 44:e91. https://doi.org/10.1093/nar/gkw104
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work is supported by the Grant-in-Aid for JSPS Research Fellow (19F19377) from Japan Society for the Promotion of Science (JSPS), partially supported from Japan Society for the Promotion of Science by Grant-in-Aid for Scientific Research (B) (19H04208) and by the developing key technologies for discovering and manufacturing pharmaceuticals used for next-generation treatments and diagnoses both from the Ministry of Economy, Trade and Industry, Japan (METI) and from Japan Agency for Medical Research and Development (AMED).

Author information

Authors and Affiliations

Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
Md. Mehedi Hasan, Mst. Shamima Khatun & Hiroyuki Kurata
Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan
Md. Mehedi Hasan
Department of Physiology, Ajou University School of Medicine, Suwon, 443380, Korea
Balachandran Manavalan
Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
Watshara Shoombuatong
Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
Hiroyuki Kurata

Authors

Md. Mehedi Hasan
View author publications
You can also search for this author in PubMed Google Scholar
Balachandran Manavalan
View author publications
You can also search for this author in PubMed Google Scholar
Watshara Shoombuatong
View author publications
You can also search for this author in PubMed Google Scholar
Mst. Shamima Khatun
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyuki Kurata
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MMH and HK conceived the project. MMH and KMS collected and analyzed the datasets. MMH drafted the manuscript. HK, MMH, MB, SW and KMS thoroughly revised the manuscript. All authors approved and read the final manuscript.

Corresponding author

Correspondence to Hiroyuki Kurata.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 164 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hasan, M.M., Manavalan, B., Shoombuatong, W. et al. i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation. Plant Mol Biol 103, 225–234 (2020). https://doi.org/10.1007/s11103-020-00988-y

Download citation

Received: 06 February 2020
Accepted: 29 February 2020
Published: 05 March 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s11103-020-00988-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation

Abstract

Key message

Access this article

Similar content being viewed by others

DNA barcoding, an effective tool for species identification: a review

RNA-Seq Data Analysis in Galaxy

A survey of best practices for RNA-seq data analysis

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary file1 (DOCX 164 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation

Abstract

Key message

Access this article

Similar content being viewed by others

DNA barcoding, an effective tool for species identification: a review

RNA-Seq Data Analysis in Galaxy

A survey of best practices for RNA-seq data analysis

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary file1 (DOCX 164 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation