Fast Hard Clustering Based on Soft Set Multinomial Distribution Function

Yanto, Iwan Tri Riyadi; Setiyowati, Ririn; Deris, Mustafa Mat; Senan, Norhalina

doi:10.1007/978-3-031-00828-3_1

Iwan Tri Riyadi Yanto^14,17,
Ririn Setiyowati¹⁵,
Mustafa Mat Deris¹⁶ &
…
Norhalina Senan¹⁷

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 457))

Included in the following conference series:

International Conference on Soft Computing and Data Mining

265 Accesses
1 Citations

Abstract

Categorical data clustering is still an issue due to difficulties/complexities of measuring the similarity of data. Several approaches have been introduced and recently the centroid-based approaches were introduced to reduce the complexities of the similarity of categorical data. However, those techniques still produce high computational times. In this paper, we proposed a clustering technique based on soft set theory for categorical data via multinomial distribution called Hard Clustering using Soft Set based on Multinomial Distribution Function (HCSS). The data is represented as a multi soft set where every soft set have its probability to be a member of the clusters. Firstly, the corrected proof is shown mathematically. Then, the experiment is conducted to evaluate the processing times, purity and rand index using benchmarks datasets. The experiment results show that the proposed approach have improve the processing times up to 95.03% by not compromising the purity and rand index as compared with baseline techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Hardcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

\(S\)::: Information system/information Table
\({\mathrm{S}}_{\left\{\mathrm{0,1}\right\}}\) ::: System with value {0, 1}
\(U\)::: Universe
\(|U|\)::: Cardinality of U
\(u\)::: Object of U
\(A\)::: Set of Attribute/Variables
\(a\)::: Subset of attribute
\(E\)::: Parameter in soft set
\(i\)::: Index \(i\)
\(j\)::: Index \(j\)
\(k\)::: Indek \(k\)
\(l\)::: Index \(l\)
\(e\)::: Subset of parameter
\(V\)::: Domain Value set
\({V}_{a}\)::: Domain (values set) of variable \(a\)
\(f\)::: Information Function
\(F\)::: Maps parameter function
\(y\)::: Object
\(P(U)\)::: Power of Universe
\((F,A)\)::: Soft set
\(F\left(a\right)\)::: Soft set of parameter \(a\)
\({C}_{\left(F,E\right)}\)::: Class soft set
\(P\)::: Probability
\({p}_{i}\)::: Probability for each trial \(i\)
\(f\left(x,{a}_{k}\right)\)::: Probability mass function
\({n}_{i}, {N}_{i}\)::: Number of Trial \(i\)
\(\lambda \)::: Probability of multinomial distribution
\({C}_{k}\)::: Cluster \(k\)
\(K\)::: Number of clusters
\({z}_{ik}\)::: Indicator function
\(CML\left(z,\lambda \right)\)::: Conditional maximum likelihood function
\(Maximize{L}_{CML}\left(z,\lambda \right)\)::: Maximizing the log-likelihood function
\({L}_{CML}\left(z,\lambda ,{w}_{1},{w}_{2}\right)\)::: Lagrange function
\({w}_{1}\)::: Lagrange multiplier constrains 1
\({w}_{2}\)::: Lagrange multiplier constrains 2
HCSS::: Hard Clustering using Soft Set based on Multinomial Distribution Function

References

Arora, J., Tushir, M.: An enhanced spatial intuitionistic fuzzy c-means clustering for image segmentation. Procedia Comput. Sci. 167, 646–655 (2020)
Article Google Scholar
Chen, L., Wang, K., Wu, M., Pedrycz, W., Hirota, K.: K-means clustering-based kernel canonical correlation analysis for multimodal emotion recognition. IFAC-PapersOnLine 53(2), 10250–10254 (2020)
Article Google Scholar
Singh, S., Srivastava, S.: Review of clustering techniques in control system. Procedia Comput. Sci. 173, 272–280 (2020)
Article Google Scholar
Sinaga, K.P., Yang, M.: Unsupervised k-means clustering algorithm. IEEE Access 8, 80716–80727 (2020)
Article Google Scholar
Joshi, R., Prasad, R., Mewada, P., Saurabh, P.: Modified LDA approach for cluster based gene classification using k-mean method. Procedia Comput. Sci. 171, 2493–2500 (2020)
Article Google Scholar
Ng, M.K., Li, M.J., Huang, J.Z., He, Z.: On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 503–507 (2007)
Article Google Scholar
San, O.M., Van-Nam, H., Nakamori, Y.: An alternative extension of the k-means algorithm for clustering categorical data. Int. J. Appl. 14(2), 241–247 (2004)
MathSciNet MATH Google Scholar
He, Z., Deng, S., Xu, X.: Improving k-modes algorithm considering frequencies of attribute values in mode. In: Hao, Y., et al. (eds.) CIS 2005. LNCS (LNAI), vol. 3801, pp. 157–162. Springer, Heidelberg (2005). https://doi.org/10.1007/11596448_23
Chapter Google Scholar
Huang, M.K.N.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999). https://doi.org/10.1109/91.784206
Article Google Scholar
Wei, M.W.M., Xuedong, H.X.H., Zhibo, C.Z.C., Haiyan, Z.H.Z., Chunling, W.C.W.: Multi-agent reinforcement learning based on bidding. In: 2009 First International Conference on Information Science and Engineering (ICISE), vol. 20, no. 3 (2009)
Google Scholar
Wei, W., Liang, J., Guo, X., Song, P., Sun, Y.: Hierarchical division clustering framework for categorical data. Neurocomputing 341, 118–134 (2019)
Article Google Scholar
Saha, I., Sarkar, J.P., Maulik, U.: Integrated rough fuzzy clustering for categorical data analysis. Fuzzy Sets Syst. 361, 1–32 (2019)
Article MathSciNet Google Scholar
Xiao, Y., Huang, C., Huang, J., Kaku, I., Xu, Y.: Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering. Pattern Recog. 90, 183–195 (2019)
Article Google Scholar
Zhu, S., Xu, L.: Many-objective fuzzy centroids clustering algorithm for categorical data. Expert Syst. Appl. 96, 230–248 (2018)
Article Google Scholar
Liu, C., et al.: A moving shape-based robust fuzzy k-modes clustering algorithm for electricity profiles. Electr. Power Syst. Res. 187, 106425 (2020)
Google Scholar
Golzari Oskouei, A., Balafar, M.A., Motamed, C.: FKMAWCW: categorical fuzzy k-modes clustering with automated attribute-weight and cluster-weight learning. Chaos, Solitons Fractals 153, 111494 (2021)
Google Scholar
Kuo, R.J., Zheng, Y.R., Nguyen, T.P.Q.: Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering. Inf. Sci. (Ny) 557, 1–15 (2021)
Article MathSciNet Google Scholar
Kim, D.-W., Lee, K.H., Lee, D.: Fuzzy clustering of categorical data using fuzzy centroids. Pattern Recogn. Lett. 25(11), 1263–1271 (2004)
Article Google Scholar
Nooraeni, R., Arsa, M.I., Kusumo Projo, N.W.: Fuzzy centroid and genetic algorithms: solutions for numeric and categorical mixed data clustering. Procedia Comput. Sci. 179(2020), 677–684 (2021)
Google Scholar
Schubert, E., Rousseeuw, P.J.: Fast and eager k-medoids clustering: O(k) runtime improvement of the PAM, CLARA, and CLARANS algorithms. Inf. Syst. 101, 101804 (2021)
Google Scholar
Leopold, N., Rose, O.: UNIC: A fast nonparametric clustering. Pattern Recogn. 100, 107117 (2020)
Google Scholar
Morris, D.S., Raim, A.M., Sellers, K.F.: A conway–maxwell-multinomial distribution for flexible modeling of clustered categorical data. J. Multivar. Anal. 179, 104651 (2020)
Google Scholar
Yang, M.S., Chiang, Y.H., Chen, C.C., Lai, C.Y.: A fuzzy k-partitions model for categorical data and its comparison to the GoM model. Fuzzy Sets Syst. 159(4), 390–405 (2008)
Article MathSciNet Google Scholar
Herawan, T., Deris, M.M.: On multi-soft sets construction in information systems. In: Huang, D.-S., Jo, K.-H., Lee, H.-H., Kang, H.-J., Bevilacqua, V. (eds.) ICIC 2009. LNCS (LNAI), vol. 5755, pp. 101–110. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04020-7_12
Chapter Google Scholar
Molodtsov, D.: Soft set theory—first results. Comput. Math. Appl. 37(4–5), 19–31 (1999)
Article MathSciNet Google Scholar
Hartama, D., Yanto, I.T.R., Zarlis, M.: A soft set approach for fast clustering attribute selection. In: 2016 International Conference on Informatics and Computing (ICIC), pp. 12–15 (2016)
Google Scholar
Jacob, D.W., Yanto, I.T.R., Md Fudzee, M.F., Salamat, M.A.: Maximum attribute relative approach of soft set theory in selecting cluster attribute of electronic government data set. In: Ghazali, R., Deris, M.M., Nawi, N.M., Abawajy, J.H. (eds.) SCDM 2018. AISC, vol. 700, pp. 473–484. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-72550-5_45
Chapter Google Scholar
Sutoyo, E., Yanto, I.T.R., Saadi, Y., Chiroma, H., Hamid, S., Herawan, T.: A framework for clustering of web users transaction based on soft set theory. In: Abawajy, J.H., Othman, M., Ghazali, R., Deris, M.M., Mahdin, H., Herawan, T. (eds.) Proceedings of the International Conference on Data Engineering 2015 (DaEng-2015), pp. 307–314. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1799-6_32
Chapter Google Scholar
Malefaki, S., Iliopoulos, G.: Simulating from a multinomial distribution with large number of categories. Comput. Stat. Data Anal. 51(12), 5471–5476 (2007)
Article MathSciNet Google Scholar
Dheeru, D., Karra Taniskidou, E.: UCI Machine Learning Repository (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Systems, University Ahmad Dahlan, Yogyakarta, Indonesia
Iwan Tri Riyadi Yanto
Department of Mathematics, Universitas Sebelas Maret, Jalan Ir. Sutami 36A, Kentingan, Surakarta, Indonesia
Ririn Setiyowati
Faculty of Applied Science and Technology, Universiti Tun Hussein Onn Malaysia, 86400, Parit Raja, Batu Pahat, Johor, Malaysia
Mustafa Mat Deris
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, 86400, Parit Raja, Batu Pahat, Johor, Malaysia
Iwan Tri Riyadi Yanto & Norhalina Senan

Authors

Iwan Tri Riyadi Yanto
View author publications
You can also search for this author in PubMed Google Scholar
Ririn Setiyowati
View author publications
You can also search for this author in PubMed Google Scholar
Mustafa Mat Deris
View author publications
You can also search for this author in PubMed Google Scholar
Norhalina Senan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iwan Tri Riyadi Yanto .

Editor information

Editors and Affiliations

Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
Rozaida Ghazali
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
Nazri Mohd Nawi
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
Mustafa Mat Deris
School of Information Technology Faculty of Science, Engineering and Built Environment, Deakin University, Geelong, VIC, Australia
Jemal H. Abawajy
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
Nureize Arbaiy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yanto, I.T.R., Setiyowati, R., Deris, M.M., Senan, N. (2022). Fast Hard Clustering Based on Soft Set Multinomial Distribution Function. In: Ghazali, R., Mohd Nawi, N., Deris, M.M., Abawajy, J.H., Arbaiy, N. (eds) Recent Advances in Soft Computing and Data Mining. SCDM 2022. Lecture Notes in Networks and Systems, vol 457. Springer, Cham. https://doi.org/10.1007/978-3-031-00828-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-00828-3_1
Published: 04 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00827-6
Online ISBN: 978-3-031-00828-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics