An integration of fuzzy association rules and WordNet for document clustering

Chen, Chun-Ling; Tseng, Frank S. C.; Liang, Tyne

doi:10.1007/s10115-010-0364-2

An integration of fuzzy association rules and WordNet for document clustering

Regular Paper
Published: 27 November 2010

Volume 28, pages 687–708, (2011)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Chun-Ling Chen¹,
Frank S. C. Tseng^nAff2 &
Tyne Liang¹

438 Accesses
22 Citations
Explore all metrics

Abstract

With the rapid growth of text documents, document clustering technique is emerging for efficient document retrieval and better document browsing. Recently, some methods had been proposed to resolve the problems of high dimensionality, scalability, accuracy, and meaningful cluster labels by using frequent itemsets derived from association rule mining for clustering documents. In order to improve the quality of document clustering results, we propose an effective Fuzzy Frequent Itemset-based Document Clustering (F²IDC) approach that combines fuzzy association rule mining with the background knowledge embedded in WordNet. A term hierarchy generated from WordNet is applied to discover generalized frequent itemsets as candidate cluster labels for grouping documents. We have conducted experiments to evaluate our approach on Classic4, Re0, R8, and WebKB datasets. Our experimental results show that our proposed approach indeed provide more accurate clustering results than prior influential clustering methods presented in recent literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD international conference on management of data, pp 207–216
Beil F, Ester M, Xu X (2002) Frequent term-based text clustering. In: International conference on knowledge discovery and data mining (KDD’02), pp 436–442
Chen CL, Tseng FSC, Liang T (2008) Hierarchical document clustering using fuzzy association rule mining. In: The 3rd international conference of innovative computing information and control (ICICIC2008), pp 326–330
Chen CL, Tseng FSC, Liang T (2010) Mining fuzzy frequent itemsets for hierarchical document clustering. Inf Process Manag 46(2): 193–211
Article Google Scholar
Craven M, DiPasquo D, McCallum A, Mitchell T, Nigam K, Slattery S (1998) Learning to extract symbolic knowledge from the World Wide Web. In: AAAI-98
Cutting DR, Karger DR, Pederson JO, Tukey JW (1992) Scatter/gather: a cluster-based approach to browsing large document collections. In: The 15th international ACM SIGIR conference on research and development in information retrieval, pp 318–329
Exarchos TP, Tsipouras MG, Papaloukas C, Fotiadis DI (2009) An optimized sequential pattern matching methodology for sequence classification. Knowl Inf Syst 19(2): 249–264
Article Google Scholar
Fung B, Wang K, Ester M (2003) Hierarchical document clustering using frequent itemsets. In: SIAM international conference on data mining (SDM’03), pp 59–70
Hong TP, Lin KY, Wang SL (2003) Fuzzy data mining for interesting generalized association rules. Fuzzy Sets Syst 138(2): 255–269
Article MathSciNet Google Scholar
Hotho A, Staab S, Stumme G (2003) Wordnet improves text document clustering. In: SIGIR international conference on Semantic Web Workshop
Huang Z, Sun S, Wang W (2010) Efficient mining of skyline objects in subspaces over data streams. Knowl Inf Syst 22(2): 159–183
Article Google Scholar
Kaya M, Alhajj R (2006) Utilizing genetic algorithms to optimize membership functions for fuzzy weighted association rule mining. Appl Intell 24(1): 7–15
Article Google Scholar
Kushal Dave DMP, Lawrence S (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: The 12th international conference on World Wide Web (WWW)
Lewis DD, Yang Y, Rose TG, Li F (2004) RCV1: a new benchmark collection for text categorization research. J Mach Learn Res 5: 361–397
Google Scholar
Liu B, Hsu W, Ma Y (1999) Pruning and summarizing the discovered associations. In: The ACM SIGKDD conference on knowledge discovery and data mining, pp 125–134
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: The 5th Berkeley Symposium on Mathematical Statistics and Probability, pp 281–297
Mandhani B, Joshi S, Kummamuru K (2003) A matrix density based algorithm to hierarchically co-cluster documents and words. In: The 12th international conference on World Wide Web (WWW), pp 511–518
Martín-Bautista MJ, Sánchez D, Chamorro-Martínez J, Serrano JM, Vila MA (2004) Mining web documents to find additional query terms using fuzzy association rules. Fuzzy Sets Syst 148(1): 85–104
Article MATH Google Scholar
Michenerand CD, Sokal RR (1957) A quantitative approach to a problem in classification. Evolution 11: 130–162
Article Google Scholar
Miller GA (1995) WordNet: a lexical database for English. J Commun ACM 38(11): 39–41
Article Google Scholar
Porter MF (1980) An algorithm for suffix stripping. Program 14(3): 130–137
Google Scholar
Scott S, Matwin S (1998) Text classification using WordNet hypernyms. In: Proceedings of Worksh Usage of WordNet in NLP Systems at COLING-98, pp 38–44
Sedding J, Kazakov D (2004) WordNet-based text document clustering. In: COLING-2004 workshop on robust methods in analysis of natural language data
Shihab K (2004) Improving clustering performance by using feature selection and extraction techniques. J Intell Syst 13(3): 135–161
Google Scholar
Singhal A, Salton G (1993) Automatic text browsing using vector space model. Technical Report, Department of Computer Science, Cornell University
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: The 6th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)
Wang P, Hu J, Zeng H-J, Chen Z (2009) Wikipedia knowledge to improve text classification. Knowl Inf Syst 19(3): 265–281
Article Google Scholar
Wei C, Hu P, Dong YX (2002) Managing document categories in e-commerce environments: an evolution-based approach. Eur J Inf Syst 11(3): 208–222
Article Google Scholar
Willett P (1988) Recent trends in hierarchic document clustering: a critical review. Inf Process Manag 24(5): 577–597
Article Google Scholar
Xu W, Gong Y (2004) Document clustering by concept factorization. In: The 27th ACM SIGIR conference on research and development in information retrieval, pp 202–209
Yu H, Searsmith D, Li X, Han J (2004) Scalable construction of topic directory with nonparametric closed termset mining. In: The IEEE international conference on data mining series (ICDM 2004), pp 563–566
Zadeh LA (1965) Fuzzy sets. Inf Control 8: 338–353
Article MathSciNet MATH Google Scholar

Download references

Author information

Frank S. C. Tseng
Present address: Department of Information Management, National Kaohsiung 1st University of Science and Technology, 1, University Road, YenChao, Kaoshiung County, 824, Taiwan, ROC

Authors and Affiliations

Department of Computer Science, National Chiao Tung University, HsinChu, 300, Taiwan, ROC
Chun-Ling Chen & Tyne Liang

Authors

Chun-Ling Chen
View author publications
You can also search for this author in PubMed Google Scholar
Frank S. C. Tseng
View author publications
You can also search for this author in PubMed Google Scholar
Tyne Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frank S. C. Tseng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, CL., Tseng, F.S.C. & Liang, T. An integration of fuzzy association rules and WordNet for document clustering. Knowl Inf Syst 28, 687–708 (2011). https://doi.org/10.1007/s10115-010-0364-2

Download citation

Received: 27 August 2009
Revised: 21 March 2010
Accepted: 04 November 2010
Published: 27 November 2010
Issue Date: September 2011
DOI: https://doi.org/10.1007/s10115-010-0364-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An integration of fuzzy association rules and WordNet for document clustering

Abstract

Access this article

Similar content being viewed by others

Incremental document clustering using fuzzy-based optimization strategy

A Frequent Term-Based Multiple Clustering Approach for Text Documents

A Fuzzy Document Clustering Model Based on Relevant Ranked Terms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An integration of fuzzy association rules and WordNet for document clustering

Abstract

Access this article

Similar content being viewed by others

Incremental document clustering using fuzzy-based optimization strategy

A Frequent Term-Based Multiple Clustering Approach for Text Documents

A Fuzzy Document Clustering Model Based on Relevant Ranked Terms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation