Unsupervised Classification of Mixed Data Type of Attributes Using Genetic Algorithm (Numeric, Categorical, Ordinal, Binary, Ratio-Scaled)

Rastogi, Rohit; Agarwal, Saumya; Sharma, Palak; Kaul, Uarvarshi; Jain, Shilpi

doi:10.1007/978-81-322-1771-8_11

Rohit Rastogi⁶,
Saumya Agarwal⁶,
Palak Sharma⁶,
Uarvarshi Kaul⁶ &
…
Shilpi Jain⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 258))

1721 Accesses
6 Citations

Abstract

Data mining discloses hidden, previously unknown, and potentially useful information from large amounts of data. As comparison to the traditional statistical and machine learning data analysis techniques, data mining emphasizes to provide a convenient and complete environment for the data analysis. Data mining has become a popular technology in analyzing complex data. Clustering is one of the data mining core techniques. In the field of data mining and data clustering, it is a highly desirable task to perform cluster analysis on large data sets with mixed numeric, categorical, ordinal, and ratio-scaled with binary and nominal values. However, most already available data merging and grouping through clustering algorithms are effective for the numeric data rather than the mixed data set. For this purpose, this paper makes efforts to present a new amalgamation algorithm for these mixed data sets by modifying the common cost function, trace of the within cluster dispersion matrix. The genetic algorithm (GA) is used to optimize the new cost function to obtain valid clustering result. We can compare and analyze that the GA-based clustering algorithm is feasible for the high-dimensional data sets with mixed data values that are obtained in real life results. Core Idea of Our Paper: By this paper, we try to describe a technique for estimating the cost function metrics from mixed numeric, categorical and other type databases by using an uncertain grade-of-membership clustering model with the efficiency of Genetic Algorithm. This technique can be applied to the problem of opportunity analysis for business decision-making. This general approach could be adapted to many other applications where a decision agent needs to assess the value of items from a set of opportunities with respect to a reference set representing its business. For processing numeric attributes, instead of generalizing them, a prototype may be developed for experiments with synthetic and real data sets, and comparison with those of the traditional approaches. The results confirmed the feasibility of the framework and the superiority of the extended techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Interblend fusing of genetic algorithm-based attribute selection for clustering heterogeneous data set

Article 03 December 2018

Improved Clustering for Categorical Data with Genetic Algorithm

An Improved Genetic Clustering Algorithm for Categorical Data

References

Li, J., Gao, X., Jiao, L.-C.: A GA-based clustering algorithm for large data sets with mixed numeric and categorical values. In: National Key Laboratory of Radar Signal Processing, Xidian University, Xi’an
Google Scholar
Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pp.1–8, Department of Computer Science, The University of British Columbia, Canada
Google Scholar
Krovi, R.: Genetic Algorithm for Clustering: A Preliminary Investigation pp. 504–544. IEEE press, Piscataway (1991)
Google Scholar
Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999)
Google Scholar
Christian, H., Liao, T.F.: How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification. In: University College London, UK and University of Illinois, Urbana–Champaign
Google Scholar
Pinisetty, V.N.P., Valaboju, R., Rao, N.R.: Hybrid Algorithm for Clustering Mixed Data Sets. http://www.iosrjournals.org
Yang, S.-B., Wu Y.-G.: Genetic algorithm for clustering mixed-type data. Electron Imaging 20(1), 013003 (10 April 2010, 12 August 2010, 03 December 2010, 08 February 2011). doi:10.1117/1.3537836
Chatzis, S.P.: A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional. In: Expert Systems with Applications 38, 8684–8689 (2011). doi:10.1016/j.eswa.2011.01.074, Source: DBLP
Holland, J.H.: Adoption in Natural and Artificial System. University of Michigan Press, Ann Arbor (1975)
Google Scholar
Hortaa, D.: Evolutionary Fuzzy Clustering of Relational Data. ICMC—USP, São Carlos (2012)
Google Scholar

Download references

Acknowledgments

The authors would like to thank the reviewers for their valuable suggestions. They would also like to thank rev. HOD-CSE, Prof. Dr. R. Radhakrishnan for his involvement and valuable suggestions on soft-computing in the early stage of this paper. We would like to thank our friends, family and seniors for their motivation and encouragement. Last but definitely not the least we would thank the almighty god without whose grace this paper would not have achieved success.

Author information

Authors and Affiliations

Computer Science and Engineering Department, ABES Engineering College, Ghaziabad, Uttar Pradesh, India
Rohit Rastogi, Saumya Agarwal, Palak Sharma, Uarvarshi Kaul & Shilpi Jain

Authors

Rohit Rastogi
View author publications
You can also search for this author in PubMed Google Scholar
Saumya Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
Palak Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Uarvarshi Kaul
View author publications
You can also search for this author in PubMed Google Scholar
Shilpi Jain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rohit Rastogi .

Editor information

Editors and Affiliations

Department of Paper Technology, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
Millie Pant
Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
Kusum Deep
Department of Mathematics and Computer Science, Liverpool Hope University, Liverpool, United Kingdom
Atulya Nagar
Department of Applied Mathematics, South Asian University, New Delhi, India
Jagdish Chand Bansal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rastogi, R., Agarwal, S., Sharma, P., Kaul, U., Jain, S. (2014). Unsupervised Classification of Mixed Data Type of Attributes Using Genetic Algorithm (Numeric, Categorical, Ordinal, Binary, Ratio-Scaled). In: Pant, M., Deep, K., Nagar, A., Bansal, J. (eds) Proceedings of the Third International Conference on Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol 258. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1771-8_11

Download citation

DOI: https://doi.org/10.1007/978-81-322-1771-8_11
Published: 04 March 2014
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1770-1
Online ISBN: 978-81-322-1771-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Unsupervised Classification of Mixed Data Type of Attributes Using Genetic Algorithm (Numeric, Categorical, Ordinal, Binary, Ratio-Scaled)

Abstract

Access this chapter

Similar content being viewed by others

Interblend fusing of genetic algorithm-based attribute selection for clustering heterogeneous data set

Improved Clustering for Categorical Data with Genetic Algorithm

An Improved Genetic Clustering Algorithm for Categorical Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Unsupervised Classification of Mixed Data Type of Attributes Using Genetic Algorithm (Numeric, Categorical, Ordinal, Binary, Ratio-Scaled)

Abstract

Access this chapter

Similar content being viewed by others

Interblend fusing of genetic algorithm-based attribute selection for clustering heterogeneous data set

Improved Clustering for Categorical Data with Genetic Algorithm

An Improved Genetic Clustering Algorithm for Categorical Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation