Skip to main content

Random Forest in Splice Site Prediction of Human Genome

  • Conference paper
  • First Online:
XIV Mediterranean Conference on Medical and Biological Engineering and Computing 2016

Part of the book series: IFMBE Proceedings ((IFMBE,volume 57))

Abstract

With the rapid growth of huge amounts of DNA sequence, genes identification has become an important task in bioinformatics. To detect genes, it is important to accurately predict splice sites, i.e. exon intron boundaries. Moreover, in biology where structures are described by a large number of features as splice sites, the feature selection is an important step toward the classification task. It provides useful biological knowledge and allows for a faster and better classification. Feature selection techniques can be divided into two groups: feature-ranking and feature-subset selection. This paper investigates the performance of combining support vector machine (SVM) with two different feature ranking methods, namely F-score and Random Forest feature ranking competitively in splice site detection of Human genome. Also a new classification method based on Random Forest for splice site prediction is presented.

The original version of this chapter was inadvertently published with an incorrect chapter pagination 512–517 and DOI 10.1007/978-3-319-32703-7_99. The page range and the DOI has been re-assigned. The correct page range is 518–523 and the DOI is 10.1007/978-3-319-32703-7_100. The erratum to this chapter is available at DOI: 10.1007/978-3-319-32703-7_260

An erratum to this chapter can be found at http://dx.doi.org/10.1007/978-3-319-32703-7_260

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baten A, Halgamuge S, Chang B (2008) Fast splice site detection using information content and feature reduction. BMC Bioinformatics 9(Suppl 12).

    Google Scholar 

  2. Zhang Q, Peng Q, Zhang Q, et al. (2010) Splice site prediction of human genome using Length-variable Markov model and feature selection. Expert Systems with Applications 37: 2771-2782.

    Google Scholar 

  3. Yin M, Wang J (2001) Effective hidden Markov models for detecting splicing junction sites in DNA sequences. Information Sciences 139: 139-163.

    Google Scholar 

  4. Cai D, Delcher A, Kao B, et al. (2000) Modeling splice sites with Bayes networks. Bioinformatics 16: 152-158.

    Google Scholar 

  5. Chen T, Lu C, Li W (2005) Prediction of splice sites with dependency graphs and their expanded bayesian networks. bioinformatics 21: 471-482.

    Google Scholar 

  6. Rajapakse J, Ho L (2005) Markov encoding for detecting signals in genomic sequences. IEEE-Acm Transactions on Computational Biology and Bioinformations 2: 131-142.

    Google Scholar 

  7. Marashi S, Goodarzi H, Sadeghi M, et al. (2006) Importance of RNA secondary structure information for yeast donor and acceptor splice site prediction by neural networks. Comput Biol Chem 30: 50-57.

    Google Scholar 

  8. Baten A, Chang B, Halgamuge S, et al. (2006) Splice site identification using probabilistic parameters and SVM classification. BMC Bioinformatics 7(Suppl 5).

    Google Scholar 

  9. Sonnenburg S, Schweikert G, Philips P, et al. (2007) Accurate splice site prediction using support vector machines. BMC Bioinformatics 8(Suppl 10).

    Google Scholar 

  10. Lopes H, Lima C, Murata N (2007) A configware approach for high-speed parallel analysis of genomic data. Jornal of Circuits Systems and Computers 16: 527-540.

    Google Scholar 

  11. Tsai K, Lin S, Shih S, et al. (2009) Genomic splice sirte prediction algorithm based on nucleotide sequence pattern for RNA viruses. Comput Biol Chem 33: 171-175.

    Google Scholar 

  12. Bin W, Jing Z (2014) A Novel Artificial Neural Network and an Improved Particle Swarm Optimization used in Splice Site Prediction. J Appl Computat Math 3: 166. Doi:10.4172/2168-9679.1000166

    Google Scholar 

  13. Zhang Y, Chu C-H, Chen Y, et al. (2006) Splice site prediction using support vector machines with a Bayes kernel. Expert Systems with Applications 30: 73-81.

    Google Scholar 

  14. Wei D, Zhang H, Wei Y, et al. (2013) A Novel Splice Site Prediction Method using Support Vector Machine. Journal of Computational Information Systems 9: 8053-8060.

    Google Scholar 

  15. Xue B, Zhang M, Browne WN (2012) Single Feature Ranking and Binary Particle Swarm Optimisation Based Feature Subset Ranking for Feature Selection. in ACSC (ed) Thirty-Fifth Australasian Computer Science Conference, Melbourne, Australia, 2012, pp. 27-36

    Google Scholar 

  16. Ruiz R, Aguilar–Ruiz JS, Riquelme JC, et al. (2005) Analysis of Feature Rankings for Classification. Advances in Intelligent Data Analysis VI 3646: 62–372,.

    Google Scholar 

  17. Liu H, Motoda H (1998) Feature Selection for Knowlegde Discovery and Data Mining Kluwer Academic Publisher, london

    Google Scholar 

  18. Saeys Y, Degroeve S, Aeyels D, et al. (2004) Feature selection for splice site prediction: A new method using EDA-based feature ranking. BMC Bioinformatics 5.

    Google Scholar 

  19. Huang J, Li T, Chen K, et al. (2006) An approach of encoding for prediction of splice sites using SVM. Biochimie 88: 923-929.

    Google Scholar 

  20. Chen Y-W, Lin C-J (2006) Combining SVMs with Various Feature Selection Strategies. in I. Guyon SG, M. Nikrevesh, L. Zadeh (ed)Feature Extraction Studies in Fuzziness and Soft Computing, Springer, New York

    Google Scholar 

  21. Filimon A (2011) Hedge Fund Fraud prediction using classication algorithms. Science in Applied Mathematics University of Zurich, Merlin, 2011

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elham Pashaei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Pashaei, E., Ozen, M., Aydin, N. (2016). Random Forest in Splice Site Prediction of Human Genome. In: Kyriacou, E., Christofides, S., Pattichis, C. (eds) XIV Mediterranean Conference on Medical and Biological Engineering and Computing 2016. IFMBE Proceedings, vol 57. Springer, Cham. https://doi.org/10.1007/978-3-319-32703-7_100

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32703-7_100

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32701-3

  • Online ISBN: 978-3-319-32703-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics