Abstract
Data mining discloses hidden, previously unknown, and potentially useful information from large amounts of data. As comparison to the traditional statistical and machine learning data analysis techniques, data mining emphasizes to provide a convenient and complete environment for the data analysis. Data mining has become a popular technology in analyzing complex data. Clustering is one of the data mining core techniques. In the field of data mining and data clustering, it is a highly desirable task to perform cluster analysis on large data sets with mixed numeric, categorical, ordinal, and ratio-scaled with binary and nominal values. However, most already available data merging and grouping through clustering algorithms are effective for the numeric data rather than the mixed data set. For this purpose, this paper makes efforts to present a new amalgamation algorithm for these mixed data sets by modifying the common cost function, trace of the within cluster dispersion matrix. The genetic algorithm (GA) is used to optimize the new cost function to obtain valid clustering result. We can compare and analyze that the GA-based clustering algorithm is feasible for the high-dimensional data sets with mixed data values that are obtained in real life results. Core Idea of Our Paper: By this paper, we try to describe a technique for estimating the cost function metrics from mixed numeric, categorical and other type databases by using an uncertain grade-of-membership clustering model with the efficiency of Genetic Algorithm. This technique can be applied to the problem of opportunity analysis for business decision-making. This general approach could be adapted to many other applications where a decision agent needs to assess the value of items from a set of opportunities with respect to a reference set representing its business. For processing numeric attributes, instead of generalizing them, a prototype may be developed for experiments with synthetic and real data sets, and comparison with those of the traditional approaches. The results confirmed the feasibility of the framework and the superiority of the extended techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Li, J., Gao, X., Jiao, L.-C.: A GA-based clustering algorithm for large data sets with mixed numeric and categorical values. In: National Key Laboratory of Radar Signal Processing, Xidian University, Xi’an
Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pp.1–8, Department of Computer Science, The University of British Columbia, Canada
Krovi, R.: Genetic Algorithm for Clustering: A Preliminary Investigation pp. 504–544. IEEE press, Piscataway (1991)
Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999)
Christian, H., Liao, T.F.: How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification. In: University College London, UK and University of Illinois, Urbana–Champaign
Pinisetty, V.N.P., Valaboju, R., Rao, N.R.: Hybrid Algorithm for Clustering Mixed Data Sets. http://www.iosrjournals.org
Yang, S.-B., Wu Y.-G.: Genetic algorithm for clustering mixed-type data. Electron Imaging 20(1), 013003 (10 April 2010, 12 August 2010, 03 December 2010, 08 February 2011). doi:10.1117/1.3537836
Chatzis, S.P.: A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional. In: Expert Systems with Applications 38, 8684–8689 (2011). doi:10.1016/j.eswa.2011.01.074, Source: DBLP
Holland, J.H.: Adoption in Natural and Artificial System. University of Michigan Press, Ann Arbor (1975)
Hortaa, D.: Evolutionary Fuzzy Clustering of Relational Data. ICMC—USP, São Carlos (2012)
Acknowledgments
The authors would like to thank the reviewers for their valuable suggestions. They would also like to thank rev. HOD-CSE, Prof. Dr. R. Radhakrishnan for his involvement and valuable suggestions on soft-computing in the early stage of this paper. We would like to thank our friends, family and seniors for their motivation and encouragement. Last but definitely not the least we would thank the almighty god without whose grace this paper would not have achieved success.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer India
About this paper
Cite this paper
Rastogi, R., Agarwal, S., Sharma, P., Kaul, U., Jain, S. (2014). Unsupervised Classification of Mixed Data Type of Attributes Using Genetic Algorithm (Numeric, Categorical, Ordinal, Binary, Ratio-Scaled). In: Pant, M., Deep, K., Nagar, A., Bansal, J. (eds) Proceedings of the Third International Conference on Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol 258. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1771-8_11
Download citation
DOI: https://doi.org/10.1007/978-81-322-1771-8_11
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1770-1
Online ISBN: 978-81-322-1771-8
eBook Packages: EngineeringEngineering (R0)