Abstract
More and more reported results of nucleosome positioning and histone modifications showed that DNA structure play a well-established role in splicing. In this study, a set of DNA geometric flexibility parameters originated from molecular dynamics (MD) simulations were introduced to discuss the structure organization around splice sites at the DNA level. The obtained profiles of specific flexibility/stiffness around splice sites indicated that the DNA physical-geometry deformation could be used as an alternative way to describe the splicing junction region. In combination with structural flexibility as discriminatory parameter, we developed a hybrid computational model for predicting potential splicing sites. And the better prediction performance was achieved when the benchmark dataset evaluated. Our results showed that the mechanical deformability character of a splice junction is closely correlated with both the splice site strength and structural information in its flanking sequences.
Similar content being viewed by others
Abbreviations
- MD:
-
Molecular dynamics
- snRNP:
-
Small nuclear ribonucleoproteins
- PCWM:
-
Position-correlation weight matrix
- SVM:
-
Support vector machine
- SFs:
-
Score functions
References
Andersson R, Enroth S, Rada-Iglesias A, Wadelius C, Komorowski J (2009) Nucleosomes are well positioned in exons and carry characteristic histone modifications. Genome Res 19:1732–1741
Cao XQ, Zeng J, Yan H (2009) Physical signals for protein—DNA recognition. Phys Biol 6:036012
Carrillo Oesterreich F, Preibisch S, Neugebauer KM (2010) Global analysis of nascent RNA reveals transcriptional pausing in terminal exons. Mol Cell 40:571–581
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2:1–27
Chen L, Zeng WM, Cai YD, Feng KY, Chou KC (2012a) Predicting anatomical therapeutic chemical (ATC) classification of drugs by integrating chemical-chemical interactions and similarities. PLoS One 7:e35254
Chen W, Feng P, Lin H (2012b) Prediction of replication origins by calculating DNA structural properties. FEBS Lett 586:934–938
Chen W, Feng PM, Lin H, Chou KC (2013) iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res 41:e68
Chen W, Lin H (2012) Identification of voltage-gated potassium channel subfamilies from sequence information using support vector machine. Comput Biol Med 42:504–507
Chen W, Lin H, Feng PM (2014) DNA physical parameters modulate nucleosome positioning in the Saccharomyces cerevisiae genome. Curr Bioinforma 9:188–193
Chen W, Lin H, Feng PM, Ding C, Zuo YC, Chou KC (2012c) iNuc-PhysChem: a sequence-based predictor for identifying nucleosomes via physicochemical properties. PLoS One 7:e47843
Chen W, Luo L, Zhang L (2010) The organization of nucleosomes around splice sites. Nucleic Acids Res 38:2788–2798
Chou KC (1988) Low-frequency collective motion in biomacromolecules and its biological functions. Biophys Chem 30:3–48
Chou KC (1989) Low-frequency resonance and cooperativity of hemoglobin. Trends Biochem Sci 14:212
Chou KC (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19
Chou KC (2011) Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 273:236–247
Chou KC (2013) Some remarks on predicting multi-label attributes in molecular biosystems. Mol BioSyst 9:1092–1100
Chou KC, Chen NY (1977) The biological functions of low-frequency phonons. Sci Sinica 20
Chou KC, Maggiora GM, Mao B (1989) Quasi-continuum models of twist-like and accordion-like low-frequency motions in DNA. Biophys J 56:295–305
Chou KC, Shen HB (2009) Recent advances in developing web-servers for predicting protein attributes. Nat Sci 1:63–92
Chou KC, Shen HB (2010) Plant-mPLoc: a top-down strategy to augment the power for predicting plant protein subcellular localization. PLoS One 5:e11335
Chou KC, Zhang CT (1995) Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349
Chou K (1984) Low-frequency vibrations of DNA molecules. Biochem J 221:27–31
Chou KC (2001) Prediction of protein cellular attributes using pseudo‐amino acid composition. Proteins Struct Funct Bioinform 43:246–255
Ding C, Yuan LF, Guo SH, Chen W, Lin H (2012) Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. J Proteome 77:321–328
Esmaeili M, Mohabatkar H, Mohsenzadeh S (2010) Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol 263:203–209
Fincher JA, Vera DL, Hughes DD, McGinnis KM, Dennis JH, Bass HW (2013) Genome-wide prediction of nucleosome occupancy in maize reveals plant chromatin structural features at genes and other elements at multiple scales. Plant Physiol 162:1127–1141
Flores K, Wolschin F, Corneveaux JJ, Allen AN, Huentelman MJ, Amdam GV (2012) Genome-wide association between DNA methylation and alternative splicing in an invertebrate. BMC Genomics 13:480
Goñi JR, Pérez A, Torrents D, Orozco M (2007) Determining promoter location based on DNA structure first-principles calculations. Genome Biol 8:R263
Graveley BR (2001) Alternative splicing: increasing diversity in the proteomic world. Trends Genet 17:100–107
Guo SH, Deng EZ, Xu LQ, Ding H, Lin H, Chen W, Chou KC (2014) iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics. doi:10.1093/bioinformatics/btu083
Kazan K (2003) Alternative splicing and proteome diversity in plants: the tip of the iceberg has just emerged. Trends Plant Sci 8:468–471
Li QZ, Lin H (2006) The recognition and prediction of sigma70 promoters in Escherichia coli K-12. J Theor Biol 242:135–141
Lin H, Li QZ (2011) Eukaryotic and prokaryotic promoter prediction using hybrid approach. Theory Biosci 130:91–100
Lin SX, Lapointe J (2013) Theoretical and experimental biology in one. J Biomed Sci Eng 6:435–442
Lin WZ, Fang JA, Xiao X, Chou KC (2012) Predicting secretory proteins of malaria parasite by incorporating sequence evolution information into pseudo amino acid composition via grey system model. PLoS One 7:e49040
Lin WZ, Fang JA, Xiao X, Chou KC (2013) iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. Mol BioSyst 9:634–644
Liu B, Zhang D, Xu R, Xu J, Wang X, Chen Q, Dong Q, Chou KC (2013) Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30:472–479
Loomis RJ, Naoe Y, Parker JB, Savic V, Bozovsky MR, Macfarlan T, Manley JL, Chakravarti D (2009) Chromatin binding of SRp20 and ASF/SF2 and dissociation from mitotic chromosomes is modulated by histone H3 serine 10 phosphorylation. Mol Cell 33:450–461
Ma Y, Wang SQ, Xu WR, Wang RL, Chou KC (2012) Design novel dual agonists for treating type-2 diabetes by targeting peroxisome proliferator-activated receptors with core hopping approach. PLoS One 7:e38546
Meysman P, Marchal K, Engelen K (2012) DNA structural properties in the classification of genomic transcription regulation elements. Bioinforma Biol Insights 6:155–168
Min JL, Xiao X, Chou KC (2013) iEzy-Drug: a web server for identifying the interaction between enzymes and drugs in cellular networking. Biomed Res Int 2013:701317
Mohabatkar H (2010) Prediction of cyclin proteins using Chous pseudo amino acid composition. Protein Pept Lett 17:1207–1214
Muñoz MJ, Santangelo M, Paronetto MP, de la Mata M, Pelisch F, Boireau S, Glover-Cutter K, Ben-Dov C, Blaustein M, Lozano JJ (2009) DNA damage regulates alternative splicing through inhibition of RNA polymerase II elongation. Cell 137:708–720
Norton PA (1994) Polypyrimidine tract sequences direct selection of alternative branch sites and influence protein binding. Nucleic Acids Res 22:3854–3860
Olson WK, Gorin AA, Lu X-J, Hock LM, Zhurkin VB (1998) DNA sequence-dependent deformability deduced from protein–DNA crystal complexes. Proc Natl Acad Sci 95:11163–11168
Padgett RA (2012) New connections between splicing and human disease. Trends Genet 28:147–154
Painter P, Mosher L, Rhoads C (1982) Low‐frequency modes in the Raman spectra of proteins. Biopolymers 21:1469–1472
Painter PC, Mosher L, Rhoads C (1981) Low‐frequency modes in the raman spectrum of DNA. Biopolymers 20:243–247
Pedersen AG, Baldi P, Chauvin Y, Brunak S (1999) The biology of eukaryotic promoter prediction—a review. Comput Chem 23:191–207
Perez A, Lankas F, Luque FJ, Orozco M (2008) Towards a molecular dynamics consensus view of B-DNA flexibility. Nucleic Acids Res 36:2379–2394
Qiu WR, Xiao X, Chou KC (2014) iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. Int J Mol Sci 15:1746–1766
Rappsilber J, Ryder U, Lamond AI, Mann M (2002) Large-scale proteomic analysis of the human spliceosome. Genome Res 12:1231–1245
Sahu SS, Panda G (2010) A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction. Comput Biol Chem 34:320–327
Schnell JR, Chou JJ (2008) Structure and mechanism of the M2 proton channel of influenza A virus. Nature 451:591–595
Schwartz S, Ast G (2010) Chromatin density and splicing destiny: on the cross-talk between chromatin structure and splicing. EMBO J 29:1629–1636
Schwartz S, Meshorer E, Ast G (2009) Chromatin organization marks exon-intron structure. Nat Struct Mol Biol 16:990–995
Stamm S, Riethoven J-J, Le Texier V, Gopalakrishnan C, Kumanduri V, Tang Y, Barbosa-Morais NL, Thanaraj TA (2006) ASD: a bioinformatics resource on alternative splicing. Nucleic Acids Res 34:D46–D55
Tilgner H, Knowles DG, Johnson R, Davis CA, Chakrabortty S, Djebali S, Curado J, Snyder M, Gingeras TR, Guigó R (2012) Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res 22:1616–1625
Wang JF, Chou KC (2009) Insight into the molecular switch mechanism of human Rab5a from molecular dynamics simulations. Biochem Biophys Res Commun 390:608–612
Xia H, Bi J, Li Y (2006) Identification of alternative 5′/3′ splice sites based on the mechanism of splice site competition. Nucleic Acids Res 34:6305–6313
Xiao X, Min JL, Wang P, Chou KC (2013a) iCDI-PseFpt: identify the channel–drug interaction in cellular networking with PseAAC and molecular fingerprints. J Theor Biol 337:71–79
Xiao X, Min JL, Wang P, Chou KC (2013b) iGPCR-Drug: a web server for predicting interaction between GPCRs and drugs in cellular networking. PLoS One 8:e72234
Xiao X, Wang P, Lin WZ, Jia J-H, Chou KC (2013c) iAMP-2 L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Anal Biochem 436:168–177
Xiao X, Wu ZC, Chou KC (2011) A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites. PLoS One 6:e20592
Xu Y, Ding J, Wu LY, Chou KC (2013a) iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS One 8:e55844
Xu Y, Shao XJ, Wu LY, Deng NY, Chou KC (2013b) iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. Peer J 1:e171
Yang W, Li QZ (2008) One parameter to describe the mechanism of splice sites competition. Biochem Biophys Res Commun 368:379–381
Zhang Q, Peng Q, Li K, Kang X, Li J (2009) Splice sites detection by combining Markov and hidden Markov model. In: Biomedical Engineering and Informatics. BMEI’09. 2nd International Conference on, 2009. 1:1-5
Zhang SW, Zhang YL, Yang HF, Zhao CH, Pan Q (2008) Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies. Amino Acids 34:565–572
Zuo YC, Li QZ (2009) Analysis of plant TATA and TATA-less promoters by using sequence and structure features. Prog Biochem Biophys 36:863–871
Zuo YC, Li QZ (2011) Identification of TATA and TATA-less promoters in plant genomes by integrating diversity measure, GC-Skew and DNA geometric flexibility. Genomics 97:112–120
Acknowledgments
We would like to thank Dr. Xin Wang for his critical reading and edition of this manuscript. This work was supported by the High-level Scientific Research Foundation Award for introduction of talent, Inner Mongolia University (No. 115115); the Specialized Research Fund for the Doctoral Program of Higher Education (20131501120009); and the Natural Science Foundation of Inner Mongolia Autonomous Region (No. 2013MS0503).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Responsible Editor: Tatsuo Fukagawa.
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(DOC 1152 kb)
Rights and permissions
About this article
Cite this article
Zuo, Y., Zhang, P., Liu, L. et al. Sequence-specific flexibility organization of splicing flanking sequence and prediction of splice sites in the human genome. Chromosome Res 22, 321–334 (2014). https://doi.org/10.1007/s10577-014-9414-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10577-014-9414-z