Document Clustering Description Extraction and Its Application

Zhang, Chengzhi; Wang, Huilin; Liu, Yao; Xu, Hongjiao

doi:10.1007/978-3-642-00831-3_37

Chengzhi Zhang^21,22,
Huilin Wang²²,
Yao Liu²² &
…
Hongjiao Xu²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5459))

Included in the following conference series:

International Conference on Computer Processing of Oriental Languages

881 Accesses
2 Citations

Abstract

Document clustering description is a problem of labeling the clustering results of document collection clustering. It can help users determine whether one of the clusters is relevant to their information requirements or not. To resolve the problem of the weak readability of document clustering results, a method of automatic labeling document clusters based on machine learning is put forward. Clustering description extraction in application to topic digital library construction is introduced firstly. Then, the descriptive results of five models are analyzed respectively, and their performances are compared.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Glenisson, P., Glänzel, W., Janssens, F., De Moor, B.: Combining Full Text and Bibliometric Information in Mapping Scientific Disciplines. Information Processing & Management 41(6), 1548–1572 (2005)
Article Google Scholar
Lai, K.K., Wu, S.J.: Using the Patent Co-citation Approach to Establish a New Patent Classification System. Information Processing & Management 41(2), 313–330 (2005)
Article Google Scholar
Tseng, Y.-H., Lin, C.-J., Chen, H.-H., Lin, Y.-I.: Toward generic title generation for clustered documents. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 145–157. Springer, Heidelberg (2006)
Chapter Google Scholar
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In: 15th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 318–329. ACM Press, New York (1992)
Google Scholar
Cutting, D.R., Karger, D.R., Pedersen, J.O.: Constant Interaction-time Scatter/Gather Browsing of Large Document Collections. In: 16th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 126–135. ACM Press, New York (1993)
Google Scholar
Muller, A., Dorre, J., Gerstl, P., Seiffert, R.: The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection. In: 32nd Hawaii International Conference on System Sciences, pp. 2034–2042. IEEE Press, New York (1999)
Google Scholar
Anton, V.L., Croft, W.B.: An Evaluation of Techniques for Clustering Search Results. Technical Report, Department of Computer Science, University of Massachusetts, Amherst (1996)
Google Scholar
Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 46–54. ACM Press, New York (1998)
Google Scholar
Lawrie, D., Croft, W.B., Rosenberg, A.L.: Finding Topic Words for Hierarchical Summarization. In: 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 249–357. ACM Press, New York (2001)
Google Scholar
Glover, E., Pennock, D.M., Lawrence, S., Krovetz, R.: Inferring Hierarchical Descriptions. In: 11th International Conference on Information and Knowledge Management, pp. 4–9. McLean, VA (2002)
Google Scholar
Pucktada, T., Jamie, C.: Automatically Labeling Hierarchical Clusters. In: 2006 International Conference on Digital government research, pp. 167–176. ACM Press, New York (2006)
Google Scholar
Dawid, W.: Descriptive Clustering as a Method for Exploring Text Collections. Ph.D Thesis. Poznan University of Technology, Poznań, Poland (2006)
Google Scholar
Gao, B.J., Ester, M.: Clustering description Formats, Problems and Algorithms. In: 6th SIAM International Conference on Data Mining. ACM Press, New York (2006)
Google Scholar
Kummamuru, K., Lotlikar, R., Roy, S., Singal, K., Krishnapuram, R.: A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results. In: 13th International WWW Conference, pp. 658–665. ACM Press, New York (2004)
Google Scholar
Ayad, H.G., Kamel, M.S.: Topic discovery from text using aggregation of different clustering methods. In: Cohen, R., Spencer, B. (eds.) Canadian AI 2002. LNCS (LNAI), vol. 2338, pp. 161–175. Springer, Heidelberg (2002)
Chapter Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Book Google Scholar
SVM-light Support Vector Machine, http://svmlight.joachims.org

Download references

Author information

Authors and Affiliations

Department of Information Management, Nanjing University of Science & Technology, Nanjing, 210093, China
Chengzhi Zhang
Institute of Scientific & Technical Information of China, Beijing, 100038, China
Chengzhi Zhang, Huilin Wang, Yao Liu & Hongjiao Xu

Authors

Chengzhi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huilin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hongjiao Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Wenjie Li
Division of Information and Communication Sciences, Macquarie University, NSW 2109, Sydney, Australia
Diego Mollá-Aliod

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, C., Wang, H., Liu, Y., Xu, H. (2009). Document Clustering Description Extraction and Its Application. In: Li, W., Mollá-Aliod, D. (eds) Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy. ICCPOL 2009. Lecture Notes in Computer Science(), vol 5459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00831-3_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-00831-3_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00830-6
Online ISBN: 978-3-642-00831-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics