Abstract
Document clustering description is a problem of labeling the clustering results of document collection clustering. It can help users determine whether one of the clusters is relevant to their information requirements or not. To resolve the problem of the weak readability of document clustering results, a method of automatic labeling document clusters based on machine learning is put forward. Clustering description extraction in application to topic digital library construction is introduced firstly. Then, the descriptive results of five models are analyzed respectively, and their performances are compared.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Glenisson, P., Glänzel, W., Janssens, F., De Moor, B.: Combining Full Text and Bibliometric Information in Mapping Scientific Disciplines. Information Processing & Management 41(6), 1548–1572 (2005)
Lai, K.K., Wu, S.J.: Using the Patent Co-citation Approach to Establish a New Patent Classification System. Information Processing & Management 41(2), 313–330 (2005)
Tseng, Y.-H., Lin, C.-J., Chen, H.-H., Lin, Y.-I.: Toward generic title generation for clustered documents. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 145–157. Springer, Heidelberg (2006)
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In: 15th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 318–329. ACM Press, New York (1992)
Cutting, D.R., Karger, D.R., Pedersen, J.O.: Constant Interaction-time Scatter/Gather Browsing of Large Document Collections. In: 16th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 126–135. ACM Press, New York (1993)
Muller, A., Dorre, J., Gerstl, P., Seiffert, R.: The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection. In: 32nd Hawaii International Conference on System Sciences, pp. 2034–2042. IEEE Press, New York (1999)
Anton, V.L., Croft, W.B.: An Evaluation of Techniques for Clustering Search Results. Technical Report, Department of Computer Science, University of Massachusetts, Amherst (1996)
Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 46–54. ACM Press, New York (1998)
Lawrie, D., Croft, W.B., Rosenberg, A.L.: Finding Topic Words for Hierarchical Summarization. In: 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 249–357. ACM Press, New York (2001)
Glover, E., Pennock, D.M., Lawrence, S., Krovetz, R.: Inferring Hierarchical Descriptions. In: 11th International Conference on Information and Knowledge Management, pp. 4–9. McLean, VA (2002)
Pucktada, T., Jamie, C.: Automatically Labeling Hierarchical Clusters. In: 2006 International Conference on Digital government research, pp. 167–176. ACM Press, New York (2006)
Dawid, W.: Descriptive Clustering as a Method for Exploring Text Collections. Ph.D Thesis. Poznan University of Technology, Poznań, Poland (2006)
Gao, B.J., Ester, M.: Clustering description Formats, Problems and Algorithms. In: 6th SIAM International Conference on Data Mining. ACM Press, New York (2006)
Kummamuru, K., Lotlikar, R., Roy, S., Singal, K., Krishnapuram, R.: A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results. In: 13th International WWW Conference, pp. 658–665. ACM Press, New York (2004)
Ayad, H.G., Kamel, M.S.: Topic discovery from text using aggregation of different clustering methods. In: Cohen, R., Spencer, B. (eds.) Canadian AI 2002. LNCS (LNAI), vol. 2338, pp. 161–175. Springer, Heidelberg (2002)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
SVM-light Support Vector Machine, http://svmlight.joachims.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, C., Wang, H., Liu, Y., Xu, H. (2009). Document Clustering Description Extraction and Its Application. In: Li, W., Mollá-Aliod, D. (eds) Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy. ICCPOL 2009. Lecture Notes in Computer Science(), vol 5459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00831-3_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-00831-3_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00830-6
Online ISBN: 978-3-642-00831-3
eBook Packages: Computer ScienceComputer Science (R0)