Skip to main content

An Information Integration Approach for Classifying Coding and Non-Coding Genomic Data

  • Conference paper
  • First Online:
The Proceedings of the Second International Conference on Communications, Signal Processing, and Systems

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 246))

  • 955 Accesses

Abstract

Reliable methods to classify coding and non-coding transcripts from large scale genomic data will help researchers annotate novel RNA transcripts. In this manuscript we explored some of the distinguishing properties of these two classes of transcripts, such as the features of their secondary structures, differential expression scores obtained from typical RNA-seq experiments, and G+C content scores. We trained two classification methods—Conditional Random Forest (CRF) and the Support Vector Machines (SVMs) with the extracted features from the genomic data and applied the trained model to predict a test set comprised of the two classes of transcripts from three well known annotation sources and found important characteristics of the extracted features regarding the classification problem. A comparative analysis shows that our method outperforms the existing two state-of-the-art methods—the CPC (Coding Potential Calculator) and the PORTRAIT in classifying transcripts from the test dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Arrial R, Togawa R, Brigido M (2009) Screening non-coding RNAs in transcriptomes from neglected species using PORTRAIT: case study of the pathogenic fungus Paracoccidioides brasiliensis. BMC Bioinformatics 10(1):239

    Article  Google Scholar 

  2. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  3. Chang C, Lin C (2011) LIBSVM: a library for support vector machines. ACM T Intell Syst Technol (TIST) 2(3):27

    Google Scholar 

  4. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  5. Edgar R, Domrachev M, Lash A (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210

    Article  Google Scholar 

  6. Flicek P, Amode M, Barrell D, Beal K, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S et al (2011) Ensembl 2011. Nucleic Acids Res 39(suppl 1):D800–D806

    Google Scholar 

  7. Hofacker I, Fontana W, Stadler P, Bonhoeffer L, Tacker M, Schuster P (1994) Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie (Chemical Monthly) 125(2):167–188

    Article  Google Scholar 

  8. Karolchik D, Hinrichs A, Furey T, Roskin K, Sugnet C, Haussler D, Kent W (2004) The UCSC table browser data retrieval tool. Nucleic Acids Res 32(suppl 1):D493–D496

    Article  Google Scholar 

  9. Kim T, Hemberg M, Gray J, Costa A, Bear D, Wu J, Harmin D, Laptewicz M, Barbara-Haley K, Kuersten S et al (2010) Widespread transcription at neuronal activity-regulated enhancers. Nature 465(7295):182–187

    Google Scholar 

  10. Kong L, Zhang Y, Ye Z, Liu X, Zhao S, Wei L, Gao G (2007) CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res 35(suppl 2):W345–W349

    Article  Google Scholar 

  11. Machado-Lima A, Del Portillo H, Durham A (2008) Computational methods in noncoding RNA research. J Math Biol 56(1):15–49

    Google Scholar 

  12. Mortazavi A, Williams B, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Meth 5(7):621–628

    Article  Google Scholar 

  13. Pruitt K, Tatusova T, Brown G, Maglott D (2012) NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res 40(D1):D130–D135

    Article  Google Scholar 

  14. Rivas E, Eddy S (2000) Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics 16(7):583–605

    Article  Google Scholar 

  15. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57–63

    Article  Google Scholar 

  16. Waterman M et al (1995) Introduction to computational biology: maps, sequences and genomes. Chapman & Hall, London

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ashis Kumer Biswas or Jean X. Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Biswas, A.K., Zhang, B., Wu, X., Gao, J.X. (2014). An Information Integration Approach for Classifying Coding and Non-Coding Genomic Data. In: Zhang, B., Mu, J., Wang, W., Liang, Q., Pi, Y. (eds) The Proceedings of the Second International Conference on Communications, Signal Processing, and Systems. Lecture Notes in Electrical Engineering, vol 246. Springer, Cham. https://doi.org/10.1007/978-3-319-00536-2_125

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-00536-2_125

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-00535-5

  • Online ISBN: 978-3-319-00536-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics