DNase I hypersensitive sites (DHSs) are hallmarks of chromatin zones containing transcriptional regulatory elements, making them critical in understanding the regulatory mechanisms of gene expression. Although large amounts of DHSs in the plant genome have been identified by high-throughput techniques, current DHSs obtained from experimental methods cover only a fraction of plant species and cell processes. Furthermore, these experimental methods are both time-consuming and expensive. Hence, it is urgent to develop automated computational means to efficiently and accurately predict DHSs in the plant genome. Recently, several methods have been proposed to predict the DHSs. However, all these methods took a lot of time to build the model, making them inappropriate for data with massive volume. In the present work, a new ensemble extreme learning machine (ELM)-based model called pDHS-ELM was proposed to predict the DHSs in the plant genome by fusing two different modes of pseudo-nucleotide composition. Here, two kinds of features including reverse complement kmer and pseudo-nucleotide composition were used to represent the DHSs. The ELM model was used to build the base classifiers. Then, an ensemble framework was employed to combine the outputs of these base classifiers. When applied to DHSs in Arabidopsis thaliana and rice (Oryza sativa) genome, the proposed method could obtain accuracies up to 88.48 and 87.58%, respectively. Compared with the state-of-the-art techniques, pDHS-ELM achieved higher sensitivity, specificity, and Matthew’s correlation coefficient with much less training and test time. By employing pDHS-ELM, we identified 42,370 and 103,979 DHSs in A. thaliana and rice genome, respectively. The predicted DHSs were depleted of bulk nucleosomes and were tightly associated with transcription factors. Approximately 90% of the predicted DHSs were overlapped with transcription factors. Meanwhile, we demonstrated that the predicted DHSs were also associated with DNA methylation, nucleosome positioning/occupancy, and histone modification. This result suggests that pDHS-ELM can be considered as a new promising and powerful tool for transcriptional regulatory elements analysis. Our pDHS-ELM tool is available from the following website https://github.com/shanxinzhang/pDHS-ELM/.
DNase I hypersensitive sites Extreme learning machine Plant Prediction Ensemble
This is a preview of subscription content, log in to check access.
This work was supported by the Fundamental Research Funds for the Central Universities (No. JUSRP115A27).
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
Huang GB, Zhu QY et al (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501CrossRefGoogle Scholar
Huang GB, Wang DH et al (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122CrossRefGoogle Scholar
Huang GB, Zhou H et al (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B 42(2):513–529CrossRefGoogle Scholar
Jia J, Liu Z et al (2016) pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. J Theor Biol 394:223–230CrossRefPubMedGoogle Scholar
Jiang J (2015) The ‘dark matter’ in the plant genomes: non-coding and unannotated DNA sequences associated with open chromatin. Curr Opin Plant Biol 24:17–23CrossRefPubMedGoogle Scholar
Jin C, Zang C et al (2009) H3.3/H2A.Z double variant-containing nucleosomes mark ‘nucleosome-free regions’ of active promoters and other regulatory regions. Nat Genet 41(8):941–945CrossRefPubMedPubMedCentralGoogle Scholar
Jin W, Tang Q et al (2015) Genome-wide detection of DNase I hypersensitive sites in single cells and FFPE tissue samples. Nature 528(7580):142–146PubMedPubMedCentralGoogle Scholar
Kabir M, Yu D-J (2017) Predicting DNase I hypersensitive sites via un-biased pseudo trinucleotide composition. Chemom Intell Lab Syst 167:78–84CrossRefGoogle Scholar
Lan Y, Soh YC et al (2009) Ensemble of online sequential extreme learning machine. Neurocomputing 72(13):3391–3395CrossRefGoogle Scholar
Liu N, Wang H (2010) Ensemble based extreme learning machine. IEEE Signal Process Lett 17(8):754–757CrossRefGoogle Scholar
Liu B, Liu F et al (2015a) Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res 43(W1):W65–W71CrossRefPubMedPubMedCentralGoogle Scholar
Liu G, Xing Y et al (2015b) Using weighted features to predict recombination hotspots in Saccharomyces cerevisiae. J Theor Biol 382:15–22CrossRefPubMedGoogle Scholar
Liu B, Long R et al (2016) iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. Bioinformatics 32(16):2411–2418CrossRefPubMedGoogle Scholar
Zhang W, Zhang T et al (2012b) Genome-wide identification of regulatory DNA elements and protein-binding footprints using signatures of open chromatin in Arabidopsis. Plant Cell 24(7):2719–2731CrossRefPubMedPubMedCentralGoogle Scholar