Skip to main content

PSS: New Parametric Based Clustering for Data Category

  • 61 Accesses

Part of the Lecture Notes in Networks and Systems book series (LNNS,volume 457)

Abstract

This paper proposes a new clustering technique for handling a categorical data called Parametric Soft set (PSS). It bases on statistical distribution namely multinomial multivariate function. The probability of the data category with binary value can be calculated by binomial distribution. Its generalization called multinomial distribution function for data category with multivariate values. Firstly, the data is represented as multi soft set where every object in each soft set has its probability. The probability of each object is calculated by cluster joint distribution function following the multivariate multinomial distribution function. The highest probability will be assigned to the related cluster. The first experiment is conducted to estimate the parameter of the data drawn from random multivariate mixtures distribution. While the second experiment is evaluated the processing times, purity and rand index using benchmarks datasets. The experiment results show that the proposed approach has improved the processing times up to 92.96%. It also has better performance in term of purity and rand index and error mean of the estimation parameters.

Keywords

  • Clustering
  • Categorical data
  • Multi soft set
  • Multinomial distribution function

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-031-00828-3_2
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   229.00
Price excludes VAT (USA)
  • ISBN: 978-3-031-00828-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   299.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

References

  1. Haixia, X.U., Zheng, T.: An optimal spectral clustering approach based on Cauchy-Schwarz divergence. Chin. J. Electron. 18(1), 105–108 (2009)

    Google Scholar 

  2. Von Luxburg, U., Williamson, R.C., Guyon, I.: Clustering: science or art? J. Mach. Learn. Res. Proc. Track 27, 65–80 (2012)

    Google Scholar 

  3. Leopold, N., Rose, O.: UNIC: a fast nonparametric clustering. Pattern Recognit. 100, 107117 (2020)

    Google Scholar 

  4. Nooraeni, R., Arsa, M.I., Kusumo Projo, N.W.: Fuzzy centroid and genetic algorithms: solutions for numeric and categorical mixed data clustering. Procedia Comput. Sci. 179(2020), 677–684 (2021)

    Google Scholar 

  5. Golzari Oskouei, A., Balafar, M.A., Motamed, C.: FKMAWCW: categorical fuzzy k-modes clustering with automated attribute-weight and cluster-weight learning. Chaos Solitons Fractals 153, 111494 (2021)

    Google Scholar 

  6. Kuo, R.J., Zheng, Y.R., Nguyen, T.P.Q.: Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering. Inf. Sci. (Ny) 557, 1–15 (2021)

    MathSciNet  CrossRef  Google Scholar 

  7. Hennig, C.: What are the true clusters? Pattern Recognit. Lett. 64, 53–62 (2015)

    CrossRef  Google Scholar 

  8. Li, J., Ray, S., Lindsay, B.G.: A nonparametric statistical approach to clustering via mode identification. J. Mach. Learn. Res. 8, 1687–1723 (2007)

    MathSciNet  MATH  Google Scholar 

  9. Ramos Emmendorfer, L., de Paula Canuto, A.M.: A generalized average linkage criterion for hierarchical agglomerative clustering. Appl. Soft Comput. 100, 106990 (2021)

    Google Scholar 

  10. Bi, X., Luo, X., Sun, Q.: Branch tire packet classification algorithm based on single-linkage clustering. Math. Comput. Simul. 155, 78–91 (2019)

    MathSciNet  CrossRef  Google Scholar 

  11. Schmidt, M., Kutzner, A., Heese, K.: A novel specialized single-linkage clustering algorithm for taxonomically ordered data. J. Theor. Biol. 427, 1–7 (2017)

    CrossRef  Google Scholar 

  12. Xiong, Y., et al.: A spectra partition algorithm based on spectral clustering for interval variable selection. Infrared Phys. Technol. 105, 103259 (2020)

    Google Scholar 

  13. Nguyen, T.P.Q., Kuo, R.J.: Partition-and-merge based fuzzy genetic clustering algorithm for categorical data. Appl. Soft Comput. J. 75, 254–264 (2019)

    Google Scholar 

  14. Sinharay, S.: Discrete probability distributions. Int. Encycl. Educ., 132–134 (2010). https://doi.org/10.1016/B978-0-08-044894-7.01721-8

  15. Herawan, T., Deris, M.M.: On multi-soft sets construction in information systems. In: Huang, DS., Jo, KH., Lee, HH., Kang, HJ., Bevilacqua, V. (eds.) ICIC 2009. LNCS, vol. 5755. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04020-7_12

  16. Malefaki, S., Iliopoulos, G.: Simulating from a multinomial distribution with large number of categories. Comput. Stat. Data Anal. 51(12), 5471–5476 (2007)

    MathSciNet  CrossRef  Google Scholar 

  17. Molodtsov, D.: Soft set theory—first results. Comput. Math. with Appl. 37(4–5), 19–31 (1999)

    MathSciNet  CrossRef  Google Scholar 

  18. Maji, P.K., Biswas, R., Roy, A.R.: Soft set theory. Comput. Math. with Appl. 45(4–5), 555–562 (2003)

    MathSciNet  CrossRef  Google Scholar 

  19. Yang, M.S., Chiang, Y.H., Chen, C.C., Lai, C.Y.: A fuzzy k-partitions model for categorical data and its comparison to the GoM model. Fuzzy Sets Syst. 159(4), 390–405 (2008)

    MathSciNet  CrossRef  Google Scholar 

  20. Dheeru, D., Karra Taniskidou, E.: UCI Machine Learning Repository (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iwan Tri Riyadi Yanto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Yanto, I.T.R., Deris, M.M., Senan, N. (2022). PSS: New Parametric Based Clustering for Data Category. In: Ghazali, R., Mohd Nawi, N., Deris, M.M., Abawajy, J.H., Arbaiy, N. (eds) Recent Advances in Soft Computing and Data Mining. SCDM 2022. Lecture Notes in Networks and Systems, vol 457. Springer, Cham. https://doi.org/10.1007/978-3-031-00828-3_2

Download citation