Skip to main content
Log in

Sequence-specific flexibility organization of splicing flanking sequence and prediction of splice sites in the human genome

  • Published:
Chromosome Research Aims and scope Submit manuscript

Abstract

More and more reported results of nucleosome positioning and histone modifications showed that DNA structure play a well-established role in splicing. In this study, a set of DNA geometric flexibility parameters originated from molecular dynamics (MD) simulations were introduced to discuss the structure organization around splice sites at the DNA level. The obtained profiles of specific flexibility/stiffness around splice sites indicated that the DNA physical-geometry deformation could be used as an alternative way to describe the splicing junction region. In combination with structural flexibility as discriminatory parameter, we developed a hybrid computational model for predicting potential splicing sites. And the better prediction performance was achieved when the benchmark dataset evaluated. Our results showed that the mechanical deformability character of a splice junction is closely correlated with both the splice site strength and structural information in its flanking sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Abbreviations

MD:

Molecular dynamics

snRNP:

Small nuclear ribonucleoproteins

PCWM:

Position-correlation weight matrix

SVM:

Support vector machine

SFs:

Score functions

References

  • Andersson R, Enroth S, Rada-Iglesias A, Wadelius C, Komorowski J (2009) Nucleosomes are well positioned in exons and carry characteristic histone modifications. Genome Res 19:1732–1741

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Cao XQ, Zeng J, Yan H (2009) Physical signals for protein—DNA recognition. Phys Biol 6:036012

    Article  PubMed  Google Scholar 

  • Carrillo Oesterreich F, Preibisch S, Neugebauer KM (2010) Global analysis of nascent RNA reveals transcriptional pausing in terminal exons. Mol Cell 40:571–581

    Article  CAS  PubMed  Google Scholar 

  • Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2:1–27

    Article  Google Scholar 

  • Chen L, Zeng WM, Cai YD, Feng KY, Chou KC (2012a) Predicting anatomical therapeutic chemical (ATC) classification of drugs by integrating chemical-chemical interactions and similarities. PLoS One 7:e35254

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Chen W, Feng P, Lin H (2012b) Prediction of replication origins by calculating DNA structural properties. FEBS Lett 586:934–938

    Article  CAS  PubMed  Google Scholar 

  • Chen W, Feng PM, Lin H, Chou KC (2013) iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res 41:e68

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Chen W, Lin H (2012) Identification of voltage-gated potassium channel subfamilies from sequence information using support vector machine. Comput Biol Med 42:504–507

    Article  CAS  PubMed  Google Scholar 

  • Chen W, Lin H, Feng PM (2014) DNA physical parameters modulate nucleosome positioning in the Saccharomyces cerevisiae genome. Curr Bioinforma 9:188–193

    Article  CAS  Google Scholar 

  • Chen W, Lin H, Feng PM, Ding C, Zuo YC, Chou KC (2012c) iNuc-PhysChem: a sequence-based predictor for identifying nucleosomes via physicochemical properties. PLoS One 7:e47843

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Chen W, Luo L, Zhang L (2010) The organization of nucleosomes around splice sites. Nucleic Acids Res 38:2788–2798

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Chou KC (1988) Low-frequency collective motion in biomacromolecules and its biological functions. Biophys Chem 30:3–48

    Article  CAS  PubMed  Google Scholar 

  • Chou KC (1989) Low-frequency resonance and cooperativity of hemoglobin. Trends Biochem Sci 14:212

    Article  CAS  PubMed  Google Scholar 

  • Chou KC (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19

    Article  CAS  PubMed  Google Scholar 

  • Chou KC (2011) Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 273:236–247

    Article  CAS  PubMed  Google Scholar 

  • Chou KC (2013) Some remarks on predicting multi-label attributes in molecular biosystems. Mol BioSyst 9:1092–1100

    Article  CAS  PubMed  Google Scholar 

  • Chou KC, Chen NY (1977) The biological functions of low-frequency phonons. Sci Sinica 20

  • Chou KC, Maggiora GM, Mao B (1989) Quasi-continuum models of twist-like and accordion-like low-frequency motions in DNA. Biophys J 56:295–305

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Chou KC, Shen HB (2009) Recent advances in developing web-servers for predicting protein attributes. Nat Sci 1:63–92

    CAS  Google Scholar 

  • Chou KC, Shen HB (2010) Plant-mPLoc: a top-down strategy to augment the power for predicting plant protein subcellular localization. PLoS One 5:e11335

    Article  PubMed Central  PubMed  Google Scholar 

  • Chou KC, Zhang CT (1995) Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349

    Article  CAS  PubMed  Google Scholar 

  • Chou K (1984) Low-frequency vibrations of DNA molecules. Biochem J 221:27–31

    CAS  PubMed Central  PubMed  Google Scholar 

  • Chou KC (2001) Prediction of protein cellular attributes using pseudo‐amino acid composition. Proteins Struct Funct Bioinform 43:246–255

    Article  CAS  Google Scholar 

  • Ding C, Yuan LF, Guo SH, Chen W, Lin H (2012) Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. J Proteome 77:321–328

    Article  CAS  Google Scholar 

  • Esmaeili M, Mohabatkar H, Mohsenzadeh S (2010) Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol 263:203–209

    Article  CAS  PubMed  Google Scholar 

  • Fincher JA, Vera DL, Hughes DD, McGinnis KM, Dennis JH, Bass HW (2013) Genome-wide prediction of nucleosome occupancy in maize reveals plant chromatin structural features at genes and other elements at multiple scales. Plant Physiol 162:1127–1141

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Flores K, Wolschin F, Corneveaux JJ, Allen AN, Huentelman MJ, Amdam GV (2012) Genome-wide association between DNA methylation and alternative splicing in an invertebrate. BMC Genomics 13:480

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Goñi JR, Pérez A, Torrents D, Orozco M (2007) Determining promoter location based on DNA structure first-principles calculations. Genome Biol 8:R263

    Article  PubMed Central  PubMed  Google Scholar 

  • Graveley BR (2001) Alternative splicing: increasing diversity in the proteomic world. Trends Genet 17:100–107

    Article  CAS  PubMed  Google Scholar 

  • Guo SH, Deng EZ, Xu LQ, Ding H, Lin H, Chen W, Chou KC (2014) iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics. doi:10.1093/bioinformatics/btu083

    Google Scholar 

  • Kazan K (2003) Alternative splicing and proteome diversity in plants: the tip of the iceberg has just emerged. Trends Plant Sci 8:468–471

    Article  CAS  PubMed  Google Scholar 

  • Li QZ, Lin H (2006) The recognition and prediction of sigma70 promoters in Escherichia coli K-12. J Theor Biol 242:135–141

    Article  CAS  PubMed  Google Scholar 

  • Lin H, Li QZ (2011) Eukaryotic and prokaryotic promoter prediction using hybrid approach. Theory Biosci 130:91–100

    Article  PubMed  Google Scholar 

  • Lin SX, Lapointe J (2013) Theoretical and experimental biology in one. J Biomed Sci Eng 6:435–442

    Article  CAS  Google Scholar 

  • Lin WZ, Fang JA, Xiao X, Chou KC (2012) Predicting secretory proteins of malaria parasite by incorporating sequence evolution information into pseudo amino acid composition via grey system model. PLoS One 7:e49040

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Lin WZ, Fang JA, Xiao X, Chou KC (2013) iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. Mol BioSyst 9:634–644

    Article  CAS  PubMed  Google Scholar 

  • Liu B, Zhang D, Xu R, Xu J, Wang X, Chen Q, Dong Q, Chou KC (2013) Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30:472–479

    Article  PubMed  Google Scholar 

  • Loomis RJ, Naoe Y, Parker JB, Savic V, Bozovsky MR, Macfarlan T, Manley JL, Chakravarti D (2009) Chromatin binding of SRp20 and ASF/SF2 and dissociation from mitotic chromosomes is modulated by histone H3 serine 10 phosphorylation. Mol Cell 33:450–461

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Ma Y, Wang SQ, Xu WR, Wang RL, Chou KC (2012) Design novel dual agonists for treating type-2 diabetes by targeting peroxisome proliferator-activated receptors with core hopping approach. PLoS One 7:e38546

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Meysman P, Marchal K, Engelen K (2012) DNA structural properties in the classification of genomic transcription regulation elements. Bioinforma Biol Insights 6:155–168

    Article  CAS  Google Scholar 

  • Min JL, Xiao X, Chou KC (2013) iEzy-Drug: a web server for identifying the interaction between enzymes and drugs in cellular networking. Biomed Res Int 2013:701317

    PubMed Central  PubMed  Google Scholar 

  • Mohabatkar H (2010) Prediction of cyclin proteins using Chous pseudo amino acid composition. Protein Pept Lett 17:1207–1214

    Article  CAS  PubMed  Google Scholar 

  • Muñoz MJ, Santangelo M, Paronetto MP, de la Mata M, Pelisch F, Boireau S, Glover-Cutter K, Ben-Dov C, Blaustein M, Lozano JJ (2009) DNA damage regulates alternative splicing through inhibition of RNA polymerase II elongation. Cell 137:708–720

    Article  PubMed  Google Scholar 

  • Norton PA (1994) Polypyrimidine tract sequences direct selection of alternative branch sites and influence protein binding. Nucleic Acids Res 22:3854–3860

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Olson WK, Gorin AA, Lu X-J, Hock LM, Zhurkin VB (1998) DNA sequence-dependent deformability deduced from protein–DNA crystal complexes. Proc Natl Acad Sci 95:11163–11168

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Padgett RA (2012) New connections between splicing and human disease. Trends Genet 28:147–154

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Painter P, Mosher L, Rhoads C (1982) Low‐frequency modes in the Raman spectra of proteins. Biopolymers 21:1469–1472

    Article  CAS  PubMed  Google Scholar 

  • Painter PC, Mosher L, Rhoads C (1981) Low‐frequency modes in the raman spectrum of DNA. Biopolymers 20:243–247

    Article  CAS  Google Scholar 

  • Pedersen AG, Baldi P, Chauvin Y, Brunak S (1999) The biology of eukaryotic promoter prediction—a review. Comput Chem 23:191–207

    Article  CAS  PubMed  Google Scholar 

  • Perez A, Lankas F, Luque FJ, Orozco M (2008) Towards a molecular dynamics consensus view of B-DNA flexibility. Nucleic Acids Res 36:2379–2394

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Qiu WR, Xiao X, Chou KC (2014) iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. Int J Mol Sci 15:1746–1766

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Rappsilber J, Ryder U, Lamond AI, Mann M (2002) Large-scale proteomic analysis of the human spliceosome. Genome Res 12:1231–1245

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Sahu SS, Panda G (2010) A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction. Comput Biol Chem 34:320–327

    Article  CAS  PubMed  Google Scholar 

  • Schnell JR, Chou JJ (2008) Structure and mechanism of the M2 proton channel of influenza A virus. Nature 451:591–595

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Schwartz S, Ast G (2010) Chromatin density and splicing destiny: on the cross-talk between chromatin structure and splicing. EMBO J 29:1629–1636

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Schwartz S, Meshorer E, Ast G (2009) Chromatin organization marks exon-intron structure. Nat Struct Mol Biol 16:990–995

    Article  CAS  PubMed  Google Scholar 

  • Stamm S, Riethoven J-J, Le Texier V, Gopalakrishnan C, Kumanduri V, Tang Y, Barbosa-Morais NL, Thanaraj TA (2006) ASD: a bioinformatics resource on alternative splicing. Nucleic Acids Res 34:D46–D55

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Tilgner H, Knowles DG, Johnson R, Davis CA, Chakrabortty S, Djebali S, Curado J, Snyder M, Gingeras TR, Guigó R (2012) Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res 22:1616–1625

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Wang JF, Chou KC (2009) Insight into the molecular switch mechanism of human Rab5a from molecular dynamics simulations. Biochem Biophys Res Commun 390:608–612

    Article  CAS  PubMed  Google Scholar 

  • Xia H, Bi J, Li Y (2006) Identification of alternative 5′/3′ splice sites based on the mechanism of splice site competition. Nucleic Acids Res 34:6305–6313

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Xiao X, Min JL, Wang P, Chou KC (2013a) iCDI-PseFpt: identify the channel–drug interaction in cellular networking with PseAAC and molecular fingerprints. J Theor Biol 337:71–79

    Article  CAS  PubMed  Google Scholar 

  • Xiao X, Min JL, Wang P, Chou KC (2013b) iGPCR-Drug: a web server for predicting interaction between GPCRs and drugs in cellular networking. PLoS One 8:e72234

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Xiao X, Wang P, Lin WZ, Jia J-H, Chou KC (2013c) iAMP-2 L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Anal Biochem 436:168–177

    Article  CAS  PubMed  Google Scholar 

  • Xiao X, Wu ZC, Chou KC (2011) A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites. PLoS One 6:e20592

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Xu Y, Ding J, Wu LY, Chou KC (2013a) iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS One 8:e55844

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Xu Y, Shao XJ, Wu LY, Deng NY, Chou KC (2013b) iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. Peer J 1:e171

    Article  PubMed Central  PubMed  Google Scholar 

  • Yang W, Li QZ (2008) One parameter to describe the mechanism of splice sites competition. Biochem Biophys Res Commun 368:379–381

    Article  CAS  PubMed  Google Scholar 

  • Zhang Q, Peng Q, Li K, Kang X, Li J (2009) Splice sites detection by combining Markov and hidden Markov model. In: Biomedical Engineering and Informatics. BMEI’09. 2nd International Conference on, 2009. 1:1-5

  • Zhang SW, Zhang YL, Yang HF, Zhao CH, Pan Q (2008) Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies. Amino Acids 34:565–572

    Article  CAS  PubMed  Google Scholar 

  • Zuo YC, Li QZ (2009) Analysis of plant TATA and TATA-less promoters by using sequence and structure features. Prog Biochem Biophys 36:863–871

    CAS  Google Scholar 

  • Zuo YC, Li QZ (2011) Identification of TATA and TATA-less promoters in plant genomes by integrating diversity measure, GC-Skew and DNA geometric flexibility. Genomics 97:112–120

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

We would like to thank Dr. Xin Wang for his critical reading and edition of this manuscript. This work was supported by the High-level Scientific Research Foundation Award for introduction of talent, Inner Mongolia University (No. 115115); the Specialized Research Fund for the Doctoral Program of Higher Education (20131501120009); and the Natural Science Foundation of Inner Mongolia Autonomous Region (No. 2013MS0503).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yongchun Zuo or Qianzhong Li.

Additional information

Responsible Editor: Tatsuo Fukagawa.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(DOC 1152 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zuo, Y., Zhang, P., Liu, L. et al. Sequence-specific flexibility organization of splicing flanking sequence and prediction of splice sites in the human genome. Chromosome Res 22, 321–334 (2014). https://doi.org/10.1007/s10577-014-9414-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10577-014-9414-z

Keywords

Navigation