Advertisement

Clustering Methodology in Mixed Data Sets

  • Jacobo Gerardo González LeónEmail author
  • Miguel Félix Mata RiveraEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1053)

Abstract

One of the most challenging tasks of data analysis is finding clusters in mixed data sets, as they have numerical and categorical variables, and lack a labeled variable to serve as a guide. These clusters could serve to summarize all the variables of a data set into one and be able to find information more easily than generating summarizations for each variable. In this research thesis, a methodology of clustering on mixed data sets is proposed, which yields better results than the methods applied in the state of the art.

Keywords

Clustering Ensemble methods Mixed data set 

References

  1. 1.
    Ströing, P.: Scientific Phenomena and Patterns in Data. Ludwig-Maximilians-Universität, München (2018)Google Scholar
  2. 2.
    Zaki, M.J., Meira, W.: Data Mining and Analysis. Cambridge University Press, Cambridge (2014)CrossRefGoogle Scholar
  3. 3.
    Bramer, M.: Principles of Data Mining. Springer, London (2016).  https://doi.org/10.1007/978-1-4471-7307-6CrossRefzbMATHGoogle Scholar
  4. 4.
    Soley-Bori, M.: Dealing with missing data: key assumptions and methods for applied analysis, vol. 23. Boston University (2013)Google Scholar
  5. 5.
    Yadav, M., Roychoudhury, B.: Handling missing values: a study of popular imputation packages in R. Knowl.-Based Syst. 160, 104–118 (2018)CrossRefGoogle Scholar
  6. 6.
    Larose, D., Larose, C.: Discovering Knowledge in Data: An Introduction to Data Mining, 2nd edn. Wiley, Hoboken (2014)zbMATHGoogle Scholar
  7. 7.
    Patki, N., Wedge, R., Veeramachaneni, K.: The synthetic data vault. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 399–410. IEEE (2016)Google Scholar
  8. 8.
    Adolfsson, A., Ackerman, M., Brownstein, N.: To cluster, or not to cluster: an analysis of clusterability methods. Pattern Recogn. 88, 13–26 (2019)CrossRefGoogle Scholar
  9. 9.
    McCue, C.: Public-safety-specific evaluation. In: Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis, pp. 157–183. Butterworth-Heinemann (2015)Google Scholar
  10. 10.
    Wu, X., Ma, T., Cao, J., Tian, Y., Alabdulkarim, A.: A comparative study of clustering ensemble algorithms. Comput. Electr. Eng. 68, 603–615 (2018)CrossRefGoogle Scholar
  11. 11.
    Jukes, E.: Encyclopedia of machine learning and data mining (2nd edition). Ref. Rev. 32, 3–4 (2018)Google Scholar
  12. 12.
    Loshin, D.: Knowledge discovery and data mining for predictive analytics. In: Business Intelligence. The Savvy Manager’s Guide MK Series on Business Intelligence, 2nd edn., pp. 271–286 (2013)Google Scholar
  13. 13.
    Tharwat, A.: Classification assessment methods. Appl. Comput. Inform. (2018)Google Scholar
  14. 14.
    Hennig, C.: What are the true clusters? Pattern Recogn. Lett. 64, 53–62 (2015)CrossRefGoogle Scholar
  15. 15.
    Gurrutxaga, I., Muguerza, J., Arbelaitz, O., Pérez, J., Martín, J.: Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recogn. Lett. 32, 505–515 (2011)CrossRefGoogle Scholar
  16. 16.
    Jauhiainen, J., Kärkkäinen, S.: Comparison of internal clustering validation indices for prototype-based clustering. Algorithms 10, 105 (2017)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Desgraupes, B.: Clustering Indices. University of Paris Ouest-Lab Modal’X, vol. 1, pp. 34 (2013)Google Scholar
  18. 18.
    Han, J., Kamber, M., Pei, J.: Cluster analysis: basic concepts and methods. In: Data Mining, pp. 443–495 (2012)Google Scholar
  19. 19.
    Benabdellah, A., Benghabrit, A., Bouhaddou, I.: A survey of clustering algorithms for an industrial context. Proc. Comput. Sci. 148, 291–302 (2019)CrossRefGoogle Scholar
  20. 20.
    Rodriguez, M., Comin, C., Casanova, D., Bruno, O., Amancio, D., Costa, L., Rodrigues, F.: Clustering algorithms: a comparative approach. PLoS One 14, e0210236 (2019)CrossRefGoogle Scholar
  21. 21.
    Yang, Y.: Temporal Data Mining via Unsupervised Ensemble Learning. Elsevier Science, Amsterdam (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Instituto Politécnico Nacional, Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías AvanzadasMexico CityMexico

Personalised recommendations