Advertisement

Soft Computing

, Volume 23, Issue 19, pp 9175–9188 | Cite as

Nucleosome positioning based on generalized relative entropy

  • Mengye Lu
  • Shuai LiuEmail author
Focus

Abstract

Nucleosome positioning played significant roles in various biological processes. With the development of high-throughput techniques, many methods and software were developed for nucleosome positioning. Although results with high accuracy (Acc) were obtained, the key factors for determining nucleosome positioning under less time complexity remain unresolved. Therefore, combining generalized relative entropy with self-similarity of DNA sequences, a novel method of nucleosome positioning was proposed for predicting nucleosome positioning in human, worm, fly and yeast genomes, respectively. Experimental results showed that prediction Acc of nucleosome positioning in aforementioned datasets reached 87.78%, 87.98%, 83.36% and 100%, respectively. Furthermore, it was found that five-nucleotide and six-nucleotide sequences were the determinant factors in nucleosome positioning.

Keywords

Nucleosome positioning Generalized relative entropy Random forest Support vector machines 

Notes

Acknowledgements

This research is funded by National Natural Science Foundation of China project with Grant No. 61502254, Program for Yong Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region with Grant No. NJYT-18-B10, and Open Funds of Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education with Grant No. 93K172018K07.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

References

  1. Astrovskaya I, Tork B, Mangul S, Westbrooks K, Mandoiu I, Balfe P, Zelikovsky A (2011) Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinform 12(Suppl6):S1.  https://doi.org/10.1186/1471-2105-12-S6-S1 CrossRefGoogle Scholar
  2. Awazu A (2017) Prediction of nucleosome positioning by the incorporation of frequencies and distributions of three different nucleotide segment lengths into a general pseudo k-tuple nucleotide composition. Bioinformatics 33(1):42–48.  https://doi.org/10.1093/bioinformatics/btw562 MathSciNetCrossRefGoogle Scholar
  3. Beigi S, Gohari A (2014) Quantum achievability proof via collision relative entropy. IEEE Trans Inf Theory 60(12):7980–7986.  https://doi.org/10.1109/TIT.2014.2361632 MathSciNetCrossRefzbMATHGoogle Scholar
  4. Benson G (2002) A new distance measure for comparing sequence profiles based on path lengths along an entropy surface. Bioinformatics 18(suppl_2):S44–S53.  https://doi.org/10.1093/bioinformatics/18.suppl_2.s44 CrossRefGoogle Scholar
  5. Berbenetz NM, Nislow C, Brown GW (2010) Diversity of eukaryotic DNA replication origins revealed by genome-wide analysis of chromatin structure. PLoS Genet.  https://doi.org/10.1371/journal.pgen.1001092 Google Scholar
  6. Bhasin M, Raghava G (2004) ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST. Nucl Acids Res 32(suppl_2):W414–W419.  https://doi.org/10.1093/nar/gkh350 CrossRefGoogle Scholar
  7. Chen H, Zhou L (2012) A relative entropy approach to group decision making with interval reciprocal relations based on COWA operator. Group Decis Negot 21(4):585–599.  https://doi.org/10.1007/s10726-011-9228-8 CrossRefGoogle Scholar
  8. Chen W, Lin H, Feng PM, Ding C, Zuo YC, Chou KC (2012) iNuc-PhysChem: a sequence-based predictor for identifying nucleosomes via physicochemical properties. PLoS ONE 7(10):e47843.  https://doi.org/10.1371/journal.pone.0047843 CrossRefGoogle Scholar
  9. Chen W, Lin H, Chou KC (2015) Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. Mol BioSyst 11(10):2620–2634.  https://doi.org/10.1039/C5MB00155B CrossRefGoogle Scholar
  10. Chen W, Feng P, Ding H, Lin H, Chou KC (2016) Using deformation energy to analyze nucleosome positioning in genomes. Genomics 107(2–3):69–75.  https://doi.org/10.1016/j.ygeno.2015.12.005 CrossRefGoogle Scholar
  11. Fabris F, Doherty A, Palmer D, de Magalhaes JP, Freitas AA (2018) A new approach for interpreting random forest models and its application to the biology of ageing. Bioinformatics 34(14):2449–2456.  https://doi.org/10.1093/bioinformatics/bty087 CrossRefGoogle Scholar
  12. Flores O, Orozco M (2011) nucleR: a package for nonparametric nucleosome positioning. Bioinformatics 27(15):2149–2150.  https://doi.org/10.1093/bioinformatics/btr345 CrossRefGoogle Scholar
  13. Freeman GS, Lequieu JP, Hinckley DM, de Pablo J (2014) DNA shape dominates sequence affinity in nucleosome formation. Phys Rev Lett 113(16):168101.  https://doi.org/10.1103/PhysRevLett.113.168101 CrossRefGoogle Scholar
  14. Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23):3150–3152.  https://doi.org/10.1093/bioinformatics/bts565 CrossRefGoogle Scholar
  15. Gibb S, Strimmer K (2015) Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis. Bioinformatics 31(19):3156–3162.  https://doi.org/10.1093/bioinformatics/btv334 CrossRefGoogle Scholar
  16. Guo SH, Deng EZ, Xu LQ, Ding H, Lin H, Chen W, Chou KC (2014) iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics 30(11):1522–1529.  https://doi.org/10.1093/bioinformatics/btu083 CrossRefGoogle Scholar
  17. Ide H, Umezawa M, Ohwada H (2016) Function prediction of disease-related long intergenic non-coding rna using random forest. In: Proceedings of the 7th international conference on computational systems-biology and bioinformatics.  https://doi.org/10.1145/3029375.3029384
  18. Ioshikhes I, Bolshoy A, Derenshteyn K, Borodovsky M, Trifonov EN (1996) Nucleosome dna sequence pattern revealed by multiple alignment of experimentally mapped sequences. J Mol Biol 262(2):129–139.  https://doi.org/10.1006/jmbi.1996.0503 CrossRefGoogle Scholar
  19. Ioshikhes IP, Albert I, Zanton SJ, Pugh BF (2006) Nucleosome positions predicted through comparative genomics. Nat Genet 38(10):1210–1215.  https://doi.org/10.1038/ng1878 CrossRefGoogle Scholar
  20. Ismail H, Saigo H, Dukka K (2017) RF-NR: random forest based approach for improved classification of nuclear receptors. IEEE/ACM Trans Comput Biol Bioinform.  https://doi.org/10.1109/tcbb.2017.2773063 Google Scholar
  21. Kaplan N, Moore IK, Fondufe-Mittendorf Y, Gossett AJ, Tillo D, Field Y, LeProust EM, Hughes TR, Lieb JD, Widom J et al (2009) The DNA-encoded nucleosome organization of a eukaryotic genome. Nature 458:362–366.  https://doi.org/10.1038/nature07667 CrossRefGoogle Scholar
  22. Karlekar NP, Gomathi N (2018) OW-SVM: ontology and whale optimization-based support vector machine for privacy-preserved medical data classification in cloud. Int J Commun Syst.  https://doi.org/10.1002/dac.3700 Google Scholar
  23. Lee W, Tillo D, Bray N, Morse RH, Davis RW, Hughes TR, Nislow C (2007) A high-resolution atlas of nucleosome occupancy in yeast. Nat Genet 39:1235–1244.  https://doi.org/10.1038/ng2117 CrossRefGoogle Scholar
  24. Lin W, Ji D, Lu Y (2017) Disorder recognition in clinical texts using multi-label structured SVM. BMC Bioinform 18:75.  https://doi.org/10.1186/s12859-017-1476-4 CrossRefGoogle Scholar
  25. Liu H, Duan X, Yu S, Sun X (2011) Analysis of nucleosome positioning determined by DNA helix curvature in the human genome. BMC Genomics 12:72.  https://doi.org/10.1186/1471-2164-12-72 CrossRefGoogle Scholar
  26. Lu M, Liu S, Kumarsangaiah A (2017) Nucleosome positioning with fractal entropy increment of diversity in telemedicine. IEEE Access 6:33451–33459.  https://doi.org/10.1109/ACCESS.2017.2779850 CrossRefGoogle Scholar
  27. Magliery TJ, Regan L (2005) Sequence variation in ligand binding sites in proteins. BMC Bioinform 6:240.  https://doi.org/10.1186/1471-2105-6-240 CrossRefGoogle Scholar
  28. Mavrich TN, Ioshikhes IP, Venters BJ, Jiang C, Tomsho LP, Qi J, Schuster SC, Albert I, Pugh BF (2008) A barrier nucleosome model for statistical positioning of nucleosomes throughout the yeast genome. Genome Res.  https://doi.org/10.1101/gr.078261.108 Google Scholar
  29. Meng Z, Shen H, Huang H (2018) Search result diversification on attributed networks via nonnegative matrix factorization. Inf Process Manag 54(6):1271–1291.  https://doi.org/10.1016/j.ipm.2018.05.005 CrossRefGoogle Scholar
  30. Peckham HE, Thurman RE, Fu Y, Stamatoyannopoulos JA, Noble WS, Struhl K, Weng Z (2007) Nucleosome positioning signals in genomic DNA. Genome Res.  https://doi.org/10.1101/gr.6101007 Google Scholar
  31. Petralia F, Wang P, Yang J, Tu Z (2015) Integrative random forest for gene regulatory network inference. Bioinformatics 31(12):i197–i205.  https://doi.org/10.1093/bioinformatics/btv268 CrossRefGoogle Scholar
  32. Polishko A, Ponts N, Le Roch KG, Lonardi S (2012) Normal: accurate nucleosome positioning using a modified gaussian mixture model. Bioinformatics 28(12):i242–i249.  https://doi.org/10.1093/bioinformatics/bts206 CrossRefGoogle Scholar
  33. Rahman R, Otridge J, Pal R (2017) Integratedmrf: random forest-based framework for integrating prediction from different data types. Bioinformatics 33(9):1407–1410.  https://doi.org/10.1093/bioinformatics/btw765 CrossRefGoogle Scholar
  34. Sangaiah AK, Samuel OW, Li X (2017) Towards an efficient risk assessment in software projects—fuzzy reinforcement paradigm. Comput Electr Eng.  https://doi.org/10.1016/j.compeleceng.2017.07.022 Google Scholar
  35. Sarosi G, Ugajin T (2016) Relative entropy of excited states in two dimensional conformal field theories. J High Energy Phys 2016:114.  https://doi.org/10.1007/JHEP07(2016)114 MathSciNetCrossRefzbMATHGoogle Scholar
  36. Satchwell SC, Drew HR, Travers AA (1986) Sequence periodicities in chicken nucleosome core DNA. J Mol Biol 191(4):659–675.  https://doi.org/10.1016/0022-2836(86)90452-3 CrossRefGoogle Scholar
  37. Schones DE, Cui K, Cuddapah S, Roh TY, Barski A, Wang Z, Wei G, Zhao K (2008) Dynamic regulation of nucleosome positioning in the human genome. Cell 132(5):887–898.  https://doi.org/10.1016/j.cell.2008.02.022 CrossRefGoogle Scholar
  38. Segal E, Widom J (2009) Poly (DA: DT) tracts: major determinants of nucleosome organization. Curr Opin Struct Biol 19(1):65–71.  https://doi.org/10.1016/j.sbi.2009.01.004 CrossRefGoogle Scholar
  39. Shao LH, Li YM, Luo Y, Xi ZJ (2017) Quantum coherence quantifiers based on Renyi α-relative entropy. Commun Theor Phys 67(6):631–636.  https://doi.org/10.1088/0253-6102/67/6/631 MathSciNetCrossRefzbMATHGoogle Scholar
  40. Sinoquet C (2018) A method combining a random forest-based technique with the modeling of linkage disequilibrium through latent variables, to run multilocus genome-wide association studies. BMC Bioinform 19:106.  https://doi.org/10.1186/s12859-018-2054-0 CrossRefGoogle Scholar
  41. Struhl K, Segal E (2013) Determinants of nucleosome positioning. Nat Struct Mol Biol 20:267–273.  https://doi.org/10.1038/nsmb.2506 CrossRefGoogle Scholar
  42. Taherzadeh G, Zhou Y, Liew AWC, Yang Y (2017) Structure-based prediction of protein-peptide binding regions using random forest. Bioinformatics 34(3):477–484.  https://doi.org/10.1093/bioinformatics/btx614 CrossRefGoogle Scholar
  43. Tahir M, Hayat M (2016) iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of saac and chou’s pseaac. Mol BioSyst 12(8):2587–2593.  https://doi.org/10.1039/C6MB00221H CrossRefGoogle Scholar
  44. Tolstorukov MY, Choudhary V, Olson WK, Zhurkin VB, Park PJ (2008) nuScore: a web-interface for nucleosome positioning predictions. Bioinformatics 24(12):1456–1458.  https://doi.org/10.1093/bioinformatics/btn212 CrossRefGoogle Scholar
  45. Vacic V, Uversky VN, Dunker AK, Lonardi S (2007) Composition profiler: a tool for discovery and visualization of amino acid composition differences. BMC Bioinform 8:211.  https://doi.org/10.1186/1471-2105-8-211 CrossRefGoogle Scholar
  46. Vernikos GS, Parkhill J (2006) Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the salmonella pathogenicity islands. Bioinformatics 22(18):2196–2203.  https://doi.org/10.1093/bioinformatics/btl369 CrossRefGoogle Scholar
  47. Wan S, Mak MW, Kung SY (2013) GOASVM: a subcellular location predictor by incorporating term frequency gene ontology into the general form of Chou’s pseudo-amino acid composition. J Theor Biol 323:40–48.  https://doi.org/10.1016/j.jtbi.2013.01.012 CrossRefzbMATHGoogle Scholar
  48. Wang K, Samudrala R (2006) Incorporating background frequency improves entropy-based residue conservation measures. BMC Bioinform 7:385.  https://doi.org/10.1186/1471-2105-7-385 CrossRefGoogle Scholar
  49. Woo S, Zhang X, Sauteraud R, Robert F, Gottardo R (2013) PING 2.0: an R/Bioconductor package for nucleosome positioning using next-generation sequencing data. Bioinformatics 29(16):2049–2050.  https://doi.org/10.1093/bioinformatics/btt348 CrossRefGoogle Scholar
  50. Xi L, Fondufe-Mittendorf Y, Xia L, Flatow J, Widom J, Wang JP (2010) Predicting nucleosome positioning using a duration Hidden Markov model. BMC Bioinform 11:346.  https://doi.org/10.1186/1471-2105-11-346 CrossRefGoogle Scholar
  51. Yasuda T, Sugasawa K, Shimizu Y, Iwai S, Shiomi T, Hanaoka F (2005) Nucleosomal structure of undamaged DNA regions suppresses the non-specific DNA binding of the XPC complex. DNA Repair 4(3):389–395.  https://doi.org/10.1016/j.dnarep.2004.10.008 CrossRefGoogle Scholar
  52. Yudong Z, Shuihua W, Ping S, Preetha P (2015) Pathological brain detection based on wavelet entropy and Hu moment invariants. Bio-Med Mater Eng 26(s1):S1283–S1290.  https://doi.org/10.3233/BME-151426 CrossRefGoogle Scholar
  53. Zhang YD, Wu LN (2008) Pattern recognition via PCNN and Tsallis entropy. Sensors 8(11):7518–7529.  https://doi.org/10.3390/s8117518 CrossRefGoogle Scholar
  54. Zhang Y, Wu L (2011) Optimal multi-level thresholding based on maximum Tsallis entropy via an artificial bee colony approach. Entropy 13(4):841–859.  https://doi.org/10.3390/e13040841 MathSciNetCrossRefzbMATHGoogle Scholar
  55. Zhang Y, Gao X, Katayama S (2015) Weld appearance prediction with BP neural network improved by genetic algorithm during disk laser welding. J Manuf Syst 34:53–59.  https://doi.org/10.1016/j.jmsy.2014.10.005 CrossRefGoogle Scholar
  56. Zhang J, Hadj-Moussa H, Storey KB (2016) Current progress of high-throughput microRNA differential expression analysis and random forest gene selection for model and non-model systems: an R implementation. J Integr Bioinformatics 13(5):35–46.  https://doi.org/10.1515/jib-2016-306 CrossRefGoogle Scholar
  57. Zhang C, Li D, Sangaiah A (2017) Merger and acquisition target selection based on interval neutrosophic multigranulation rough sets over two universes. Symmetry 9(7):126.  https://doi.org/10.3390/sym9070126 CrossRefGoogle Scholar
  58. Zhang J, Peng W, Wang L (2018a) LeNup: learning nucleosome positioning from DNA sequences with improved convolutional neural networks. Bioinformatics 34(10):1705–1712.  https://doi.org/10.1093/bioinformatics/bty003/4796955 CrossRefGoogle Scholar
  59. Zhang C, Li D, Broumi S (2018b) Medical diagnosis based on single-valued neutrosophic probabilistic rough multisets over two universes. Symmetry 10(6):213.  https://doi.org/10.3390/sym10060213 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.College of Computer ScienceInner Mongolia UniversityHohhotChina
  2. 2.Inner Mongolia Key Laboratory of Social Computing and Data ProcessingInner Mongolia UniversityHohhotChina

Personalised recommendations