Advertisement

Attribute Clustering and Dimensionality Reduction Based on In/Out Degree of Attributes in Dependency Graph

  • Asit Kumar Das
  • Jaya Sil
  • Santanu Phadikar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7076)

Abstract

In order to mine useful information from huge datasets development of appropriate tools and techniques are needed to organize and evaluate such data. However, ultra high dimensionality of data poses serious challenges in data mining research. The method proposed in the paper encompasses a new strategy in dimensionality reduction by attribute clustering based on the dependency graph of the attributes. Information gain, an established theory of measuring uncertainty and quantified the information contained in the system, of each attribute is calculated that expresses dependency relationship between the attributes in the graph. The underlying principles able to select the optimum set of attributes, called reduct able to classify the dataset as could be done in presence of all attributes. The rate of dimension reduction of the datasets of UCI repository is measured and compared with existing methods and also the classification accuracy with reduced dataset is calculated by various classifiers to measure the effectiveness of the method.

Keywords

Feature Selection Dimensionality Reduction Information Gain Dependency Graph Conditional Entropy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baldonado Pal, S.K., Mitra, S.: Neuro-Fuzzy pattern Recognition: Methods in Soft Computing. Willey, New York (1999)Google Scholar
  2. 2.
    Carreira-Perpinan, M.A.: A review of dimension reduction techniques. Technical report CS-96-09, Department of Computer Science, University of Sheffield (1997)Google Scholar
  3. 3.
    An, A., Huang, Y., Huang, X., Cercone, N.J.: Feature Selection with Rough Sets for Web Page Classification. In: Peters, J.F., Skowron, A., Dubois, D., Grzymała-Busse, J.W., Inuiguchi, M., Polkowski, L. (eds.) Transactions on Rough Sets II. LNCS, vol. 3135, pp. 1–13. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  4. 4.
    Pawlak, Z.: Rough sets. International Journal of information and Computer Sciences 11, 341–356 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Pawlak, Z.: Rough set theory and its applications to data analysis. Cybernetics and Systems 29(1998), 661–688 (1998)CrossRefzbMATHGoogle Scholar
  6. 6.
    Gupta, S.C., Kapoor, V.K.: Fundamental of Mathematical Statistics. Sultan Chand & Sons, A.S. Printing Press, India (1994)Google Scholar
  7. 7.
    Devroye, L., Gyorfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996)CrossRefzbMATHGoogle Scholar
  8. 8.
    Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs (1992)zbMATHGoogle Scholar
  9. 9.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques, MK (2001)Google Scholar
  10. 10.
    Witten, I.H., Frank, E.: Data Mining:Practical Machine Learning Tools and Techniques with Java Implementations, MK (2000)Google Scholar
  11. 11.
    Deo, N.: Graph Theory with Applications to Engineering and Computer Science. Prentice-Hall of India Pvt. (1995) ISBN-81-203-0145-5Google Scholar
  12. 12.
    WEKA: Machine Learning Software, http://www.cs.waikato.ac.nz/~ml/
  13. 13.
    Murphy, P., Aha, W.: UCI repository of machine learning databases (1996), http://www.ics.uci.edu/mlearn/MLRepository.html
  14. 14.
    Hall, M.A.: Correlation-Based Feature Selection for Machine Learning PhD thesis, Dept. of Computer Science, Univ. of Waikato, Hamilton, New Zealand (1998)Google Scholar
  15. 15.
    Liu, H., Setiono, R.: A Probabilistic Approach to Feature Selection: A Filter Solution. In: Proc.13th Int’l Conf. Machine Learning, pp. 319–327 (1996)Google Scholar
  16. 16.
    Kerber, R.: ChiMerge: Discretization of Numeric Attributes. In: Proceedings of AAAI 1992, Ninth Int’l Conf. Artificial Intelligence, pp. 123–128. AAAI Press (1992)Google Scholar
  17. 17.
    Daren, Y., Qinghua, H., Wen, B.: Combining multiple neural networks for classification based on rough set reduction. In: IEEE int. Conf. Neural Network & Signal Processing, Nanjing, China, December 14-17 (2003)Google Scholar
  18. 18.
    Jain, A., Murty, M., Flynn, P.: Data clustering: A review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRefGoogle Scholar
  19. 19.
    Everitt, B., Landau, S., Leese, M.: Cluster Analysis. Arnold, London (2001)zbMATHGoogle Scholar
  20. 20.
    Hall, M.A.: Correlation-Based Feature Selection for Machine Learning PhD thesis, Dept. of Computer Science, Univ. of Waikato, Hamilton, New Zealand (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Asit Kumar Das
    • 1
  • Jaya Sil
    • 1
  • Santanu Phadikar
    • 2
  1. 1.Department of Computer Science and TechnologyBengal Engineering and Science UniversityHowrahIndia
  2. 2.Department of Computer Science and EngineeringWest Bengal University of TechnologyKolkataIndia

Personalised recommendations