Skip to main content

Unsupervised Classification of Mixed Data Type of Attributes Using Genetic Algorithm (Numeric, Categorical, Ordinal, Binary, Ratio-Scaled)

  • Conference paper
  • First Online:
Proceedings of the Third International Conference on Soft Computing for Problem Solving

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 258))

Abstract

Data mining discloses hidden, previously unknown, and potentially useful information from large amounts of data. As comparison to the traditional statistical and machine learning data analysis techniques, data mining emphasizes to provide a convenient and complete environment for the data analysis. Data mining has become a popular technology in analyzing complex data. Clustering is one of the data mining core techniques. In the field of data mining and data clustering, it is a highly desirable task to perform cluster analysis on large data sets with mixed numeric, categorical, ordinal, and ratio-scaled with binary and nominal values. However, most already available data merging and grouping through clustering algorithms are effective for the numeric data rather than the mixed data set. For this purpose, this paper makes efforts to present a new amalgamation algorithm for these mixed data sets by modifying the common cost function, trace of the within cluster dispersion matrix. The genetic algorithm (GA) is used to optimize the new cost function to obtain valid clustering result. We can compare and analyze that the GA-based clustering algorithm is feasible for the high-dimensional data sets with mixed data values that are obtained in real life results. Core Idea of Our Paper: By this paper, we try to describe a technique for estimating the cost function metrics from mixed numeric, categorical and other type databases by using an uncertain grade-of-membership clustering model with the efficiency of Genetic Algorithm. This technique can be applied to the problem of opportunity analysis for business decision-making. This general approach could be adapted to many other applications where a decision agent needs to assess the value of items from a set of opportunities with respect to a reference set representing its business. For processing numeric attributes, instead of generalizing them, a prototype may be developed for experiments with synthetic and real data sets, and comparison with those of the traditional approaches. The results confirmed the feasibility of the framework and the superiority of the extended techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Li, J., Gao, X., Jiao, L.-C.: A GA-based clustering algorithm for large data sets with mixed numeric and categorical values. In: National Key Laboratory of Radar Signal Processing, Xidian University, Xi’an

    Google Scholar 

  2. Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pp.1–8, Department of Computer Science, The University of British Columbia, Canada

    Google Scholar 

  3. Krovi, R.: Genetic Algorithm for Clustering: A Preliminary Investigation pp. 504–544. IEEE press, Piscataway (1991)

    Google Scholar 

  4. Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999)

    Google Scholar 

  5. Christian, H., Liao, T.F.: How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification. In: University College London, UK and University of Illinois, Urbana–Champaign

    Google Scholar 

  6. Pinisetty, V.N.P., Valaboju, R., Rao, N.R.: Hybrid Algorithm for Clustering Mixed Data Sets. http://www.iosrjournals.org

  7. Yang, S.-B., Wu Y.-G.: Genetic algorithm for clustering mixed-type data. Electron Imaging 20(1), 013003 (10 April 2010, 12 August 2010, 03 December 2010, 08 February 2011). doi:10.1117/1.3537836

  8. Chatzis, S.P.: A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional. In: Expert Systems with Applications 38, 8684–8689 (2011). doi:10.1016/j.eswa.2011.01.074, Source: DBLP

  9. Holland, J.H.: Adoption in Natural and Artificial System. University of Michigan Press, Ann Arbor (1975)

    Google Scholar 

  10. Hortaa, D.: Evolutionary Fuzzy Clustering of Relational Data. ICMC—USP, São Carlos (2012)

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank the reviewers for their valuable suggestions. They would also like to thank rev. HOD-CSE, Prof. Dr. R. Radhakrishnan for his involvement and valuable suggestions on soft-computing in the early stage of this paper. We would like to thank our friends, family and seniors for their motivation and encouragement. Last but definitely not the least we would thank the almighty god without whose grace this paper would not have achieved success.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rohit Rastogi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer India

About this paper

Cite this paper

Rastogi, R., Agarwal, S., Sharma, P., Kaul, U., Jain, S. (2014). Unsupervised Classification of Mixed Data Type of Attributes Using Genetic Algorithm (Numeric, Categorical, Ordinal, Binary, Ratio-Scaled). In: Pant, M., Deep, K., Nagar, A., Bansal, J. (eds) Proceedings of the Third International Conference on Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol 258. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1771-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-1771-8_11

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-1770-1

  • Online ISBN: 978-81-322-1771-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics