Abstract
In this paper we present a new similarity of text on the basis of combining cosine measure with the quantified conceptual relations by linear interpolation for text clustering. These relations derive from the entries and the words in their definitions in a dictionary, which are quantified under the assumption that the entries and their definitions are equivalent in meaning. This kind of relations is regarded as “knowledge” for text clustering. Under the framework of k-means algorithm, the new interpolated similarity improves the performance of clustering system significantly in terms of optimizing hard and soft criterion functions. Our results show that introducing the conceptual knowledge from the un-structured dictionary into the similarity measure tends to provide potential contributions for text clustering in future.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Anderberg, M.R.: Cluster analysis for applications. Academic Press, San Diego (1973)
Bloehdorn, S., Hotho, A.: Text classification by boosting weak learners based on terms and concepts. In: Proc. of the 4th IEEE International Conference on Data Mining, UK, pp. 331–334 (2004)
Caraballo, S.: Automatic construction of a hypernym-based noun hierarch from text. In: Proc. of the Annual meeting of the association for computational linguistics, USA, pp. 120–126 (1999)
Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. Journal of Artificial Intelligence Research 24, 305–339 (2005)
Ding, C., He, X., Zha, H., Gu, M., Simon, H.: Spectral min-max cut for graph partitioning and data clustering. Technical Report TR-2001-XX, Lawrence Berkeley National Laboratory, University of California, Berkeley, CA (2001)
Han, J., Kamber, M.: Data mining: concepts and techniques. Morgan Kaufmann, San Francisco (2001)
Hindle, D.: Noun classification from predicate-argument structures. In: Proc. of the Annual meeting of the association for computational linguistics, USA, pp. 268–275 (1990)
Hotho, A., Staab, S., Stumme, G.: WordNet improves Text Text Clustering. In: Proc. of the Semantic Web Workshop at SIGIR-2003, 26th Annual International ACM SIGIR Conference, Canada (2003)
Jing, L., Ng, M.K., Xu, J., et al.: Subspace clustering of text texts with feature weighting k-means algorithm. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 802–812. Springer, Heidelberg (2005)
Jing, L., Zhou, L., Ng, M.K., et al.: Ontology-based distance measure for text clustering. In: Proc. of the SIAM SDM on Text Mining Workshop (2006)
Li, X.J.: Modern Chinese Standard Dictionary. Beijing Foreign Language and Resarch Press and Chinese Press (2004)
Mitchell, T.M.: Machine Learning, pp. 191–196. McGraw–Hill, Boston (1997)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of text clustering techniques. In: Proc. of KDD Workshop on Text Mining, USA (2000)
Velardi, P., Fabriani, R., Missikoff, M.: Using text processing techniques to automatically enrich a domain ontology. In: Proc. of the international conference on Formal ontology in information systems, USA, pp. 270–284 (2001)
Zhao, Y., Karypis, G.: Criterion functions for text clustering: Experiments and analysis. Technical Report TR #01–40, Department of Computer Science, University of Minnesota, Minneapolis, MN (2001)
Zhao, Y., Karypis, G.: Soft Clustering Criterion Functions for Partitional Text Clustering. Technical Report TR #01–40, Department of Computer Science, University of Minnesota, Minneapolis, MN (2001)
Zhao, Y., Karypis, G.: Comparison of agglomerative and partitional text clustering algorithms. Technical report, University of Minnesota, pp. 2–14 (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hu, Y., Lu, R., Chen, Y., Liu, H., Zhang, D. (2007). The Dictionary-Based Quantified Conceptual Relations for Hard and Soft Chinese Text Clustering. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds) Natural Language Processing and Information Systems. NLDB 2007. Lecture Notes in Computer Science, vol 4592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73351-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-73351-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73350-8
Online ISBN: 978-3-540-73351-5
eBook Packages: Computer ScienceComputer Science (R0)