Prediction and analysis of cell-penetrating peptides using pseudo-amino acid composition and random forest models
- 574 Downloads
Cell-penetrating peptides, a group of short peptides, can traverse cell membranes to enter cells and thus facilitate the uptake of various molecular cargoes. Thus, they have the potential to become powerful drug delivery systems. The correct identification of peptides as cell-penetrating or non-cell-penetrating would accelerate this application. In this study, we determined which features were important for a peptide to be cell-penetrating or non-cell-penetrating and built a predictive model based on the key features extracted from this analysis. The investigated peptides were retrieved from a previous study, and each was encoded as a numeric vector according to six properties of amino acids—amino acid frequency, codon diversity, electrostatic charge, molecular volume, polarity, and secondary structure—by the pseudo-amino acid composition method. Methods of minimum redundancy maximum relevance and incremental feature selection were then employed to analyze these features, and some were found to be key determinants of cell penetration. In parallel, an optimal random forest prediction model was built. We hope that our findings will provide new resources for the study of cell-penetrating peptides.
KeywordsCell-penetrating peptide Pseudo-amino acid composition Minimum redundancy maximum relevance Incremental feature selection Random forest
This study was supported by the National Basic Research Program of China (2011CB510101, 2011CB510102), the National Natural Science Foundation of China (61202021, 31371335, 61373028), the Innovation Program of Shanghai Municipal Education Commission (12YZ120, 12ZZ087), and the Shanghai Educational Development Foundation (12CG55).
Conflict of interest
The authors declare that they have no conflict of interest.
- Anaspec I (2010) Cell permeable peptides (CPP)/drug delivery peptides. In: Anaspec I (ed) Anaspec’s catalog listing of cell permeable peptides (CPP)Google Scholar
- Chen L, Lu J, Zhang N, Huang T, Cai Y-D (2014b) A hybrid method for prediction and repositioning of drug anatomical therapeutic chemical classes. Mol Bio Syst 10(4):868–877Google Scholar
- Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of international joint conference on artificial intelligence, 1995. Lawrence Erlbaum Associates Ltd, pp 1137–1145Google Scholar
- Matthews B (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein. Structure 405(2):442–451Google Scholar
- Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 1226–1238Google Scholar
- Witten IH, Frank E (2005) Data Mining: practical machine learning tools and techniques. Morgan Kaufmann Pub, San FranciscoGoogle Scholar