Advertisement

Discovery of Knowledge Patterns in Lymphographic Clinical Data through Data Mining Methods and Techniques

  • Shomona Gracia Jacob
  • R. Geetha Ramani
  • P. Nancy
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 178)

Abstract

Data mining refers to a process that aims at extracting knowledge by discovering new patterns from large datasets. Classification is a data mining task that generalizes an established, proven structure to apply to new data. A dominant area of modern-day research is the field of medical investigations that include disease prediction and malady categorization. In this paper, our focus is to design an efficient classifier that is trained to classify oncogenic data. The Lymphographic dataset is utilized by means of machine learning techniques to train the classifier using feature selection and classification algorithms. Feature selection is a supervised method that attempts to select a subset of the predictor features based on the information gain. The Lymphography dataset comprises of 18 predictor attributes and 148 instances with the class label having four distinct values. This paper highlights the performance of sixteen classification algorithms on the Lymphographic dataset that enables the classifier to accurately perform multi-class categorization of medical data. Furthermore our research work also places emphasis on the performance of four feature selection algorithms and their impact on the classification accuracy. Our work asserts the fact that the Random Tree algorithm and the Quinlan’s C4.5 algorithm give 100 percent classification accuracy with all the predictor features and also with the feature subset selected by the Fisher Filtering feature selection algorithm. Moreover ReliefF feature selection algorithm gives improved results for Radial Basis Function algorithm improving the classifier accuracy by 1.35%. It is also stated here that the C4.5 algorithm offers more efficient classification since the decision tree size generated is smaller than the Random Tree.

Keywords

Data Mining Lymphography Feature Selection Classification Machine Learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kotsiantis, S.B.: Supervised Machine Learning: A Review of Classification Techniques. Informatica (31), 249–268 (2007)Google Scholar
  2. 2.
    Han, J., Kamber, M.: Data Mining; Concepts and Techniques. Morgan Kaufmann Publishers (2000)Google Scholar
  3. 3.
    Mitchell, T.M.: Machine Learning. The Mc-Graw-Hill Companies, Inc. (1997)Google Scholar
  4. 4.
    Nancy, P., Geetha Ramani, R., Jacob, S.G.: Discovery of Gender Classification Rules for Social Network Data using Data Mining Algorithms. In: Proceedings of the IEEE International Conference on Computational Intelligence and Computing Research (ICCIC 2011), Kanyakumari, India, pp. 808–812 (2011a), IEEE Catalog Number: CFP1120J-PRT, ISBN:978-1-61284-766-5Google Scholar
  5. 5.
    Nancy, P., Geetha Ramani, R.: A Comparison on Performance of Data Mining Algorithms in Classification of Social Network Data. International Journal of Computer Applications 32(8), 47–54 (2011b), doi:10.5120/3927-5555Google Scholar
  6. 6.
    Tan, Steinbach, Kumar: Introduction to Data Mining (2004)Google Scholar
  7. 7.
    Vapnik, V.N.: The Nature of Statistical Learning Theory, 2nd edn. Springer (2000)Google Scholar
  8. 8.
    Jacob, S.G., Geetha Ramani, R.: Discovery of Knowledge Patterns in Clinical Data through Data Mining Algorithms: Multi-class Categorization of Breast Tissue Data. International Journal of Computer Applications (IJCA) 32(7), 46–53 (2011a), doi:10.5120/3920-5521Google Scholar
  9. 9.
    Warwick, R., Williams, P.L.: Angiology, ch. 6. Gray’s anatomy. Illustrated by Richard E. M. Moore, 3rd edn., pp. 588–785, Longman, London (1973) (1858) Google Scholar
  10. 10.
    Guermazi, A., Brice, P., Hennequin, C., Sarfati, E.: Lymphography: an old technique retains its usefulness. Radiographics 23(6), 1541–1558, discussion 1559–1560 (2003)Google Scholar
  11. 11.
    Chuang, T.-C., Ersoy, O.K., Gelfand, S.B.: Boosting Classification Accuracy With Samples Chosen From A Validation Set. In: ANNIE, Intelligent Engineering Systems through Artificial Neural Networks, St. Louis, MO, pp. 455–461 (2007)Google Scholar
  12. 12.
    Polat, K., Gunes, S.: A novel hybrid intelligent method based on C4.5 decision tree classifier and one-against-all approach for multi-class classification problems. Expert Systems with Applications: An International Journal 36(2) (2009)Google Scholar
  13. 13.
    Holte, R.C.: Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11, 63–91 (1993)MATHCrossRefGoogle Scholar
  14. 14.
    McSherry, D.: Conversational case-based reasoning in medical decision making. Artificial Intelligence Med. 52(2), 59–66 (2011)CrossRefGoogle Scholar
  15. 15.
    SGI - MLC++: Datasets from UCIGoogle Scholar
  16. 16.
    Tanagra Data Mining tutorials, http://data-mining-tutorials.blogspot.com/
  17. 17.
    Garcia-Lopez, F.C., Garcia-Torres, M., Melian, B., Moreno-Perez, J.A., Moreno-Vega, J.M.: Solving feature subset selection problem by a Parallel Scatter Search. European Journal of Operational Research 169(2), 477–489 (2006)MathSciNetMATHCrossRefGoogle Scholar
  18. 18.
    Nguyen, H., Franke, K., Petrovic, S.: Optimizing a class of feature selection measures. In: Proceedings of the NIPS 2009 Workshop on Discrete Optimization in Machine Learning: Sub modularity, Sparsity & Polyhedra (DISCML), Vancouver, Canada (2009)Google Scholar
  19. 19.
    Jacob, S.G., Geetha Ramani, R., Nancy, P.: Feature Selection and Classification in Breast Cancer Datasets through Data Mining Algorithms. In: Proceedings of the IEEE International Conference on Computational Intelligence and Computing Research (ICCIC 2011), Kanyakumari, India, pp. 661–667 (2011b), IEEE Catalog Number: CFP1120J-PRT, ISBN: 978-1-61284-766-5Google Scholar
  20. 20.
    Jacob, S.G., Geetha Ramani, R., Nancy, P.: Efficient Classifier for Classification of Hepatitis C Virus Clinical Data through Data Mining Algorithms and Techniques. In: Proceedings of the International Conference on Computer Applications, Pondicherry, India, January 27-31, Techno Forum Group, India (2012), doi:10.73445/ISBN_0768, ISBN: 978-81-920575-8-3, ACM#.dber.imera.10.73445Google Scholar
  21. 21.
    Dat, T.H., Guan, C.: Feature Selection Based on Fisher Ratio and Mutual Information Analyses for Robust Brain Computer Interface. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2007)Google Scholar
  22. 22.
    Kohavi, R., Quinlan, R.: Decision Tree Discovery (2009)Google Scholar
  23. 23.
    Breiman, L., Cuttler, A.: Random Trees, http://www.stat.berkeley.edu/users/breiman/RandomForests/
  24. 24.
    Korting, T.S.: C4.5 algorithm and Multivariate Decision Trees. Image Processing Division, National Institute for Space Research – INPESão José dos Campos–SP, Brazil (2006)Google Scholar
  25. 25.
    Chandra, B., Basker, S.: A new approach for classification of patterns having categorical attributes. In: 2011 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Anchorage, AK, October 9-12, pp. 960–964 (2011), doi:10.1109/ICSMC.2011.6083793, ISSN:1062-922X, ISBN: 978-1-4577-0652-3, INSPEC Accession Number: 12387415Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Shomona Gracia Jacob
    • 1
  • R. Geetha Ramani
    • 1
  • P. Nancy
    • 1
  1. 1.Department of Computer Science and EngineeringRajalakshmi Engineering CollegeChennaiIndia

Personalised recommendations