Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5459))

Included in the following conference series:

Abstract

Document clustering description is a problem of labeling the clustering results of document collection clustering. It can help users determine whether one of the clusters is relevant to their information requirements or not. To resolve the problem of the weak readability of document clustering results, a method of automatic labeling document clusters based on machine learning is put forward. Clustering description extraction in application to topic digital library construction is introduced firstly. Then, the descriptive results of five models are analyzed respectively, and their performances are compared.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Glenisson, P., Glänzel, W., Janssens, F., De Moor, B.: Combining Full Text and Bibliometric Information in Mapping Scientific Disciplines. Information Processing & Management 41(6), 1548–1572 (2005)

    Article  Google Scholar 

  2. Lai, K.K., Wu, S.J.: Using the Patent Co-citation Approach to Establish a New Patent Classification System. Information Processing & Management 41(2), 313–330 (2005)

    Article  Google Scholar 

  3. Tseng, Y.-H., Lin, C.-J., Chen, H.-H., Lin, Y.-I.: Toward generic title generation for clustered documents. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 145–157. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In: 15th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 318–329. ACM Press, New York (1992)

    Google Scholar 

  5. Cutting, D.R., Karger, D.R., Pedersen, J.O.: Constant Interaction-time Scatter/Gather Browsing of Large Document Collections. In: 16th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 126–135. ACM Press, New York (1993)

    Google Scholar 

  6. Muller, A., Dorre, J., Gerstl, P., Seiffert, R.: The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection. In: 32nd Hawaii International Conference on System Sciences, pp. 2034–2042. IEEE Press, New York (1999)

    Google Scholar 

  7. Anton, V.L., Croft, W.B.: An Evaluation of Techniques for Clustering Search Results. Technical Report, Department of Computer Science, University of Massachusetts, Amherst (1996)

    Google Scholar 

  8. Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 46–54. ACM Press, New York (1998)

    Google Scholar 

  9. Lawrie, D., Croft, W.B., Rosenberg, A.L.: Finding Topic Words for Hierarchical Summarization. In: 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 249–357. ACM Press, New York (2001)

    Google Scholar 

  10. Glover, E., Pennock, D.M., Lawrence, S., Krovetz, R.: Inferring Hierarchical Descriptions. In: 11th International Conference on Information and Knowledge Management, pp. 4–9. McLean, VA (2002)

    Google Scholar 

  11. Pucktada, T., Jamie, C.: Automatically Labeling Hierarchical Clusters. In: 2006 International Conference on Digital government research, pp. 167–176. ACM Press, New York (2006)

    Google Scholar 

  12. Dawid, W.: Descriptive Clustering as a Method for Exploring Text Collections. Ph.D Thesis. Poznan University of Technology, Poznań, Poland (2006)

    Google Scholar 

  13. Gao, B.J., Ester, M.: Clustering description Formats, Problems and Algorithms. In: 6th SIAM International Conference on Data Mining. ACM Press, New York (2006)

    Google Scholar 

  14. Kummamuru, K., Lotlikar, R., Roy, S., Singal, K., Krishnapuram, R.: A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results. In: 13th International WWW Conference, pp. 658–665. ACM Press, New York (2004)

    Google Scholar 

  15. Ayad, H.G., Kamel, M.S.: Topic discovery from text using aggregation of different clustering methods. In: Cohen, R., Spencer, B. (eds.) Canadian AI 2002. LNCS (LNAI), vol. 2338, pp. 161–175. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  16. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    Book  Google Scholar 

  17. SVM-light Support Vector Machine, http://svmlight.joachims.org

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, C., Wang, H., Liu, Y., Xu, H. (2009). Document Clustering Description Extraction and Its Application. In: Li, W., Mollá-Aliod, D. (eds) Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy. ICCPOL 2009. Lecture Notes in Computer Science(), vol 5459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00831-3_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00831-3_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00830-6

  • Online ISBN: 978-3-642-00831-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics