CUP Classification Based on a Tree Structure with MiRNA Feature Selection

  • Xiaoxue Zhang
  • Dunwei Wen
  • Ke Wang
  • Yinan Yang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8265)


Given the low sensitivity of identifying the origin of cancer tissues using miRNAs in previous research, we adopt a decision tree structure to build a new SVM based model for identifying a variety of Cancer of Unknown Primary Origin (CUP). We use an information gain based feature selection method provided by Weka to select miRNAs and combine them with previously recognized features to determine several most useful miRNAs. Next we design a layer-by-layer classification tree based on the expression levels of these selected miRNAs. Then we use a polynomial kernel SVM classifier, which is more effective in dealing with binary classification problem, for classification at each node of the decision tree structure. In our experiments, a final overall sensitivity of the test set reached 87%, and the sensitivity of identifying the metastatic samples in the test set significantly increased by 9%. The 10-fold cross-validation on this model shows that the sensitivity of the test set is not less than the sensitivity of the training set, indicating that the model has good generalization ability. Additionally, the use of general feature selection makes the approach of this paper more adaptable and suitable for other areas.


miRNAs SVM CUP feature selection sensitivity 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kim, K.J., Cho, S.B.: Exploring features and classifiers to classify microRNA expression profiles of human cancer. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010, Part II. LNCS, vol. 6444, pp. 234–241. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  2. 2.
    Leidinger, P., Keller, A., Borries, A., Reichrath, J., Rass, K., Jager, S.U., Lenhof, H., Meese, E.: High-throughput miRNA profiling of human melanoma blood samples. BMC Cancer 10(1), 262 (2010)CrossRefGoogle Scholar
  3. 3.
    Shedden, K.A., Taylor, J.M., Giordano, T.J., Kuick, R., Misek, D.E., Rennert, G., Schwartz, D.R., Gruber, S.B., Logsdon, C., Simeone, D., Kardia, S.L., Greenson, J.K., Cho, K.R., Beer, D.G., Fearon, E.R., Hanash, S.: Accurate molecular classification of human cancers based on gene expression using a simple classifier with a pathological tree-based framework. Am. J. Pathol. 163(5), 1985–1995 (2003)CrossRefGoogle Scholar
  4. 4.
    Tothill, R.W., Kowalczyk, A., Rischin, D., Bousioutas, A., Haviv, I., Van Laar, R.K., Waring, P.M., Zalcberg, J., Ward, R., Biankin, A.V., Sutherland, R.L., Henshall, S.M., Fong, K., Pollack, J.R., Bowtell, D.D., Holloway, A.J.: An expression-based site of origin diagnostic method designed for clinical application to cancer of unknown origin. Cancer Research 65(10), 4031–4040 (2005)CrossRefGoogle Scholar
  5. 5.
    Rosenfeld, N., Aharonov, R., Meiri, E., Rosenwald, S., Spector, Y., Zepeniuk, M., Benjamin, H., Shabes, N., Tabak, S., Levy, A., Lebanony, D., Goren, Y., Silberschein, E., Targan, N., Ben-Ari, A., Gilad, S., Ion-Vardy, N.S., Tobar, A., Feinmesse, M.R., Kharenko, O., Nativ, O., Nass, D., Perelman, M., Yosepovich, A., Shalmon, B., Polak-Charcon, S., Fridman, E., Avniel, A., Bentwich, I., Bentwich, Z., Cohen, D., Chajut, A., Barshack, I.: MicroRNAs accurately identify cancer tissue origin. Nature Biotechnology 26(4), 462–469 (2008)CrossRefGoogle Scholar
  6. 6.
    Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Twentieth International Conference on Machine Learning, vol. 2(2), pp. 856–863 (2003)Google Scholar
  7. 7.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann (2005)Google Scholar
  8. 8.
    Lu, X., Peng, X., Liu, P., Deng, Y., Feng, B., Liao, B.: A novel feature selection method based on CFS in cancer recognition. In: 2012 IEEE 6th International Conference on Systems Biology, pp. 226–231 (2012)Google Scholar
  9. 9.
    Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)zbMATHGoogle Scholar
  10. 10.
    Chang, C., Lin, C.: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(3), 1–27 (2011)CrossRefGoogle Scholar
  11. 11.
    Keerthi, S.S., Lin, C.J.: Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Computation 15(7), 1667–1689 (2003)CrossRefzbMATHGoogle Scholar
  12. 12.
    Lin, H.T., Lin, C.J.: A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods. Neural Computation, 1–32 (2003)Google Scholar
  13. 13.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11(3), 10–18 (2009)CrossRefGoogle Scholar
  14. 14.
    Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1(3), 131–156 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Xiaoxue Zhang
    • 1
  • Dunwei Wen
    • 2
  • Ke Wang
    • 1
  • Yinan Yang
    • 1
  1. 1.College of Communication EngineeringJilin UniversityChina
  2. 2.School of Computing and Information SystemsAthabasca UniversityCanada

Personalised recommendations