The Dictionary-Based Quantified Conceptual Relations for Hard and Soft Chinese Text Clustering

Hu, Yi; Lu, Ruzhan; Chen, Yuquan; Liu, Hui; Zhang, Dongyi

doi:10.1007/978-3-540-73351-5_9

The Dictionary-Based Quantified Conceptual Relations for Hard and Soft Chinese Text Clustering

Yi Hu¹,
Ruzhan Lu¹,
Yuquan Chen¹,
Hui Liu¹ &
…
Dongyi Zhang²

Conference paper

967 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4592))

Abstract

In this paper we present a new similarity of text on the basis of combining cosine measure with the quantified conceptual relations by linear interpolation for text clustering. These relations derive from the entries and the words in their definitions in a dictionary, which are quantified under the assumption that the entries and their definitions are equivalent in meaning. This kind of relations is regarded as “knowledge” for text clustering. Under the framework of k-means algorithm, the new interpolated similarity improves the performance of clustering system significantly in terms of optimizing hard and soft criterion functions. Our results show that introducing the conceptual knowledge from the un-structured dictionary into the similarity measure tends to provide potential contributions for text clustering in future.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderberg, M.R.: Cluster analysis for applications. Academic Press, San Diego (1973)
MATH Google Scholar
Bloehdorn, S., Hotho, A.: Text classification by boosting weak learners based on terms and concepts. In: Proc. of the 4th IEEE International Conference on Data Mining, UK, pp. 331–334 (2004)
Google Scholar
Caraballo, S.: Automatic construction of a hypernym-based noun hierarch from text. In: Proc. of the Annual meeting of the association for computational linguistics, USA, pp. 120–126 (1999)
Google Scholar
Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. Journal of Artificial Intelligence Research 24, 305–339 (2005)
MATH Google Scholar
Ding, C., He, X., Zha, H., Gu, M., Simon, H.: Spectral min-max cut for graph partitioning and data clustering. Technical Report TR-2001-XX, Lawrence Berkeley National Laboratory, University of California, Berkeley, CA (2001)
Google Scholar
Han, J., Kamber, M.: Data mining: concepts and techniques. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Hindle, D.: Noun classification from predicate-argument structures. In: Proc. of the Annual meeting of the association for computational linguistics, USA, pp. 268–275 (1990)
Google Scholar
Hotho, A., Staab, S., Stumme, G.: WordNet improves Text Text Clustering. In: Proc. of the Semantic Web Workshop at SIGIR-2003, 26th Annual International ACM SIGIR Conference, Canada (2003)
Google Scholar
Jing, L., Ng, M.K., Xu, J., et al.: Subspace clustering of text texts with feature weighting k-means algorithm. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 802–812. Springer, Heidelberg (2005)
Google Scholar
Jing, L., Zhou, L., Ng, M.K., et al.: Ontology-based distance measure for text clustering. In: Proc. of the SIAM SDM on Text Mining Workshop (2006)
Google Scholar
Li, X.J.: Modern Chinese Standard Dictionary. Beijing Foreign Language and Resarch Press and Chinese Press (2004)
Google Scholar
Mitchell, T.M.: Machine Learning, pp. 191–196. McGraw–Hill, Boston (1997)
MATH Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)
Article MATH Google Scholar
Steinbach, M., Karypis, G., Kumar, V.: A comparison of text clustering techniques. In: Proc. of KDD Workshop on Text Mining, USA (2000)
Google Scholar
Velardi, P., Fabriani, R., Missikoff, M.: Using text processing techniques to automatically enrich a domain ontology. In: Proc. of the international conference on Formal ontology in information systems, USA, pp. 270–284 (2001)
Google Scholar
Zhao, Y., Karypis, G.: Criterion functions for text clustering: Experiments and analysis. Technical Report TR #01–40, Department of Computer Science, University of Minnesota, Minneapolis, MN (2001)
Google Scholar
Zhao, Y., Karypis, G.: Soft Clustering Criterion Functions for Partitional Text Clustering. Technical Report TR #01–40, Department of Computer Science, University of Minnesota, Minneapolis, MN (2001)
Google Scholar
Zhao, Y., Karypis, G.: Comparison of agglomerative and partitional text clustering algorithms. Technical report, University of Minnesota, pp. 2–14 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Yi Hu, Ruzhan Lu, Yuquan Chen & Hui Liu
Network Management Center, Politics College of Xi’an, Xi’an, China
Dongyi Zhang

Authors

Yi Hu
View author publications
You can also search for this author in PubMed Google Scholar
Ruzhan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yuquan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Dongyi Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Zoubida Kedad Nadira Lammari Elisabeth Métais Farid Meziane Yacine Rezgui

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, Y., Lu, R., Chen, Y., Liu, H., Zhang, D. (2007). The Dictionary-Based Quantified Conceptual Relations for Hard and Soft Chinese Text Clustering. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds) Natural Language Processing and Information Systems. NLDB 2007. Lecture Notes in Computer Science, vol 4592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73351-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-73351-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73350-8
Online ISBN: 978-3-540-73351-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics