CUP Classification Based on a Tree Structure with MiRNA Feature Selection
Given the low sensitivity of identifying the origin of cancer tissues using miRNAs in previous research, we adopt a decision tree structure to build a new SVM based model for identifying a variety of Cancer of Unknown Primary Origin (CUP). We use an information gain based feature selection method provided by Weka to select miRNAs and combine them with previously recognized features to determine several most useful miRNAs. Next we design a layer-by-layer classification tree based on the expression levels of these selected miRNAs. Then we use a polynomial kernel SVM classifier, which is more effective in dealing with binary classification problem, for classification at each node of the decision tree structure. In our experiments, a final overall sensitivity of the test set reached 87%, and the sensitivity of identifying the metastatic samples in the test set significantly increased by 9%. The 10-fold cross-validation on this model shows that the sensitivity of the test set is not less than the sensitivity of the training set, indicating that the model has good generalization ability. Additionally, the use of general feature selection makes the approach of this paper more adaptable and suitable for other areas.
KeywordsmiRNAs SVM CUP feature selection sensitivity
Unable to display preview. Download preview PDF.
- 3.Shedden, K.A., Taylor, J.M., Giordano, T.J., Kuick, R., Misek, D.E., Rennert, G., Schwartz, D.R., Gruber, S.B., Logsdon, C., Simeone, D., Kardia, S.L., Greenson, J.K., Cho, K.R., Beer, D.G., Fearon, E.R., Hanash, S.: Accurate molecular classification of human cancers based on gene expression using a simple classifier with a pathological tree-based framework. Am. J. Pathol. 163(5), 1985–1995 (2003)CrossRefGoogle Scholar
- 4.Tothill, R.W., Kowalczyk, A., Rischin, D., Bousioutas, A., Haviv, I., Van Laar, R.K., Waring, P.M., Zalcberg, J., Ward, R., Biankin, A.V., Sutherland, R.L., Henshall, S.M., Fong, K., Pollack, J.R., Bowtell, D.D., Holloway, A.J.: An expression-based site of origin diagnostic method designed for clinical application to cancer of unknown origin. Cancer Research 65(10), 4031–4040 (2005)CrossRefGoogle Scholar
- 5.Rosenfeld, N., Aharonov, R., Meiri, E., Rosenwald, S., Spector, Y., Zepeniuk, M., Benjamin, H., Shabes, N., Tabak, S., Levy, A., Lebanony, D., Goren, Y., Silberschein, E., Targan, N., Ben-Ari, A., Gilad, S., Ion-Vardy, N.S., Tobar, A., Feinmesse, M.R., Kharenko, O., Nativ, O., Nass, D., Perelman, M., Yosepovich, A., Shalmon, B., Polak-Charcon, S., Fridman, E., Avniel, A., Bentwich, I., Bentwich, Z., Cohen, D., Chajut, A., Barshack, I.: MicroRNAs accurately identify cancer tissue origin. Nature Biotechnology 26(4), 462–469 (2008)CrossRefGoogle Scholar
- 6.Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Twentieth International Conference on Machine Learning, vol. 2(2), pp. 856–863 (2003)Google Scholar
- 7.Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann (2005)Google Scholar
- 8.Lu, X., Peng, X., Liu, P., Deng, Y., Feng, B., Liao, B.: A novel feature selection method based on CFS in cancer recognition. In: 2012 IEEE 6th International Conference on Systems Biology, pp. 226–231 (2012)Google Scholar
- 12.Lin, H.T., Lin, C.J.: A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods. Neural Computation, 1–32 (2003)Google Scholar