Abstract
With the rapid growth of huge amounts of DNA sequence, genes identification has become an important task in bioinformatics. To detect genes, it is important to accurately predict splice sites, i.e. exon intron boundaries. Moreover, in biology where structures are described by a large number of features as splice sites, the feature selection is an important step toward the classification task. It provides useful biological knowledge and allows for a faster and better classification. Feature selection techniques can be divided into two groups: feature-ranking and feature-subset selection. This paper investigates the performance of combining support vector machine (SVM) with two different feature ranking methods, namely F-score and Random Forest feature ranking competitively in splice site detection of Human genome. Also a new classification method based on Random Forest for splice site prediction is presented.
The original version of this chapter was inadvertently published with an incorrect chapter pagination 512–517 and DOI 10.1007/978-3-319-32703-7_99. The page range and the DOI has been re-assigned. The correct page range is 518–523 and the DOI is 10.1007/978-3-319-32703-7_100. The erratum to this chapter is available at DOI: 10.1007/978-3-319-32703-7_260
An erratum to this chapter can be found at http://dx.doi.org/10.1007/978-3-319-32703-7_260
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baten A, Halgamuge S, Chang B (2008) Fast splice site detection using information content and feature reduction. BMC Bioinformatics 9(Suppl 12).
Zhang Q, Peng Q, Zhang Q, et al. (2010) Splice site prediction of human genome using Length-variable Markov model and feature selection. Expert Systems with Applications 37: 2771-2782.
Yin M, Wang J (2001) Effective hidden Markov models for detecting splicing junction sites in DNA sequences. Information Sciences 139: 139-163.
Cai D, Delcher A, Kao B, et al. (2000) Modeling splice sites with Bayes networks. Bioinformatics 16: 152-158.
Chen T, Lu C, Li W (2005) Prediction of splice sites with dependency graphs and their expanded bayesian networks. bioinformatics 21: 471-482.
Rajapakse J, Ho L (2005) Markov encoding for detecting signals in genomic sequences. IEEE-Acm Transactions on Computational Biology and Bioinformations 2: 131-142.
Marashi S, Goodarzi H, Sadeghi M, et al. (2006) Importance of RNA secondary structure information for yeast donor and acceptor splice site prediction by neural networks. Comput Biol Chem 30: 50-57.
Baten A, Chang B, Halgamuge S, et al. (2006) Splice site identification using probabilistic parameters and SVM classification. BMC Bioinformatics 7(Suppl 5).
Sonnenburg S, Schweikert G, Philips P, et al. (2007) Accurate splice site prediction using support vector machines. BMC Bioinformatics 8(Suppl 10).
Lopes H, Lima C, Murata N (2007) A configware approach for high-speed parallel analysis of genomic data. Jornal of Circuits Systems and Computers 16: 527-540.
Tsai K, Lin S, Shih S, et al. (2009) Genomic splice sirte prediction algorithm based on nucleotide sequence pattern for RNA viruses. Comput Biol Chem 33: 171-175.
Bin W, Jing Z (2014) A Novel Artificial Neural Network and an Improved Particle Swarm Optimization used in Splice Site Prediction. J Appl Computat Math 3: 166. Doi:10.4172/2168-9679.1000166
Zhang Y, Chu C-H, Chen Y, et al. (2006) Splice site prediction using support vector machines with a Bayes kernel. Expert Systems with Applications 30: 73-81.
Wei D, Zhang H, Wei Y, et al. (2013) A Novel Splice Site Prediction Method using Support Vector Machine. Journal of Computational Information Systems 9: 8053-8060.
Xue B, Zhang M, Browne WN (2012) Single Feature Ranking and Binary Particle Swarm Optimisation Based Feature Subset Ranking for Feature Selection. in ACSC (ed) Thirty-Fifth Australasian Computer Science Conference, Melbourne, Australia, 2012, pp. 27-36
Ruiz R, Aguilar–Ruiz JS, Riquelme JC, et al. (2005) Analysis of Feature Rankings for Classification. Advances in Intelligent Data Analysis VI 3646: 62–372,.
Liu H, Motoda H (1998) Feature Selection for Knowlegde Discovery and Data Mining Kluwer Academic Publisher, london
Saeys Y, Degroeve S, Aeyels D, et al. (2004) Feature selection for splice site prediction: A new method using EDA-based feature ranking. BMC Bioinformatics 5.
Huang J, Li T, Chen K, et al. (2006) An approach of encoding for prediction of splice sites using SVM. Biochimie 88: 923-929.
Chen Y-W, Lin C-J (2006) Combining SVMs with Various Feature Selection Strategies. in I. Guyon SG, M. Nikrevesh, L. Zadeh (ed)Feature Extraction Studies in Fuzziness and Soft Computing, Springer, New York
Filimon A (2011) Hedge Fund Fraud prediction using classication algorithms. Science in Applied Mathematics University of Zurich, Merlin, 2011
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Pashaei, E., Ozen, M., Aydin, N. (2016). Random Forest in Splice Site Prediction of Human Genome. In: Kyriacou, E., Christofides, S., Pattichis, C. (eds) XIV Mediterranean Conference on Medical and Biological Engineering and Computing 2016. IFMBE Proceedings, vol 57. Springer, Cham. https://doi.org/10.1007/978-3-319-32703-7_100
Download citation
DOI: https://doi.org/10.1007/978-3-319-32703-7_100
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32701-3
Online ISBN: 978-3-319-32703-7
eBook Packages: EngineeringEngineering (R0)