Predicting the Subcellular Localization of Proteins with Multiple Sites Based on Multiple Features Fusion
Protein sub-cellular localization prediction is an important and meaningful task in bioinformatics. It can provide important clues for us to study the functions of proteins and targeted drug discovery. Traditional experiment techniques which can determine the protein sub-cellular locations are almost costly and time consuming. In the last two decades, a great many machine learning algorithms and protein sub-cellular location predictors have been developed to deal with this kind of problems. However, most of the algorithms can only solve the single-location proteins. With the progress of techniques, more and more proteins which have two or even more sub-cellular locations are found, it is much more significant to study this kind of proteins for they have extremely useful implication in both basic biological research and drug discovery. If we want to improve the accuracy of prediction, we have to extract much more feature information. In this paper, we use fusion feature extraction methods to extract the feature information simultaneously, and the multi-label k nearest neighbors (ML-KNN) algorithm to predict protein sub-cellular locations, the best overall accuracy rate we got in dataset s1 in constructing Gpos-mploc is 66.1568% and 59.9206% in dataset s2 in constructing Virus-mPLoc.
KeywordsN-terminal signals pseudo amino acid composition Physicochemical properties Amino acid index distribution multi-label k nearest neighbor
Unable to display preview. Download preview PDF.
- 6.Chou, K.C., Shen, H.B.: A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-Mploc 2.0. Plos ONE 5, E9931 (2010)Google Scholar
- 8.Zhang, S., Zhang, H.X.: Modified KNN Algorithm for Multi-Label Learning. Application Research of Computers 28(12), 4445–4446 (2011)Google Scholar
- 9.Duan, Z., Cheng, J.X., Zhang, L.: Research on Multi-Label Learning Method Based on Covering. Computer Engineering and Applications 46(14), 20–23 (2010)Google Scholar