Using Over-Represented Tetrapeptides to Predict Protein Submitochondria Locations
- 354 Downloads
The mitochondrion is a key organelle of eukaryotic cell that provides the energy for cellular activities. Correctly identifying submitochondria locations of proteins can provide plentiful information for understanding their functions. However, using web-experimental methods to recognize submitochondria locations of proteins are time-consuming and costly. Thus, it is highly desired to develop a bioinformatics method to predict the submitochondria locations of mitochondrion proteins. In this work, a novel method based on support vector machine was developed to predict the submitochondria locations of mitochondrion proteins by using over-represented tetrapeptides selected by using binomial distribution. A reliable and rigorous benchmark dataset including 495 mitochondrion proteins with sequence identity ≤25 % was constructed for testing and evaluating the proposed model. Jackknife cross-validated results showed that the 91.1 % of the 495 mitochondrion proteins can be correctly predicted. Subsequently, our model was estimated by three existing benchmark datasets. The overall accuracies are 94.0, 94.7 and 93.4 %, respectively, suggesting that the proposed model is potentially useful in the realm of mitochondrion proteome research. Based on this model, we built a predictor called TetraMito which is freely available at http://lin.uestc.edu.cn/server/TetraMito.
KeywordsSubmitochondria location Tetrapeptide Binomial distribution Support vector machine
We are grateful to Dr. Loris Nanni for his help. This work was supported by the National Nature Scientific Foundation of China (No. 61202256, 61100092), the Project of Education Department in Sichuan (12ZA112), the Fundamental Research Funds for the Central Universities (ZYGX2012J113) and the Scientific Research Startup Foundation of UESTC.
- Fan GL, Li QZ (2012) Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition. Amino Acids 43:545–555Google Scholar
- Fan RE, Chen PH, Lin CJ (2005) Working set selection using the second order information for training SVM. J Mach Learn Res 6:1889–1918Google Scholar