A Support Vector Machine based Framework for Protein Membership Prediction
The support vector machine (SVM) is a key algorithm for learning from biological data and in tasks such as protein membership prediction. Predicting structural information for a protein from its sequence alone is possible, but the extreme data complexity demands kernels with a dedicated design like the state-ofthe- art profile kernel that exploits a very large feature space. Such a huge representation and the enormous data bases used in proteomics require an effort mirrored in an increased processing time that must be reduced to an acceptable amount. Considering the present computation paradigm, the implementation of such systems shall take advantage of parallelization and concurrency. A special machine learning architecture based on SVM binary models and a neural network (NN) is proposed to handle the large multiclass problem of protein superfamily prediction, and parallelized through a multi-agent strategy that uses JADE (Java Agent DEvelopment Framework) to reduce the total processing time when getting a prediction for a new query protein. The efficiency of the algorithm and the advantages of the parallelization are shown.
KeywordsProtein SVM NN Parallelization JADE
Unable to display preview. Download preview PDF.