Document Clustering Description Extraction and Its Application

  • Chengzhi Zhang
  • Huilin Wang
  • Yao Liu
  • Hongjiao Xu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5459)

Abstract

Document clustering description is a problem of labeling the clustering results of document collection clustering. It can help users determine whether one of the clusters is relevant to their information requirements or not. To resolve the problem of the weak readability of document clustering results, a method of automatic labeling document clusters based on machine learning is put forward. Clustering description extraction in application to topic digital library construction is introduced firstly. Then, the descriptive results of five models are analyzed respectively, and their performances are compared.

Keywords

clustering description document clustering machine learning topic digital library 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Glenisson, P., Glänzel, W., Janssens, F., De Moor, B.: Combining Full Text and Bibliometric Information in Mapping Scientific Disciplines. Information Processing & Management 41(6), 1548–1572 (2005)CrossRefGoogle Scholar
  2. 2.
    Lai, K.K., Wu, S.J.: Using the Patent Co-citation Approach to Establish a New Patent Classification System. Information Processing & Management 41(2), 313–330 (2005)CrossRefGoogle Scholar
  3. 3.
    Tseng, Y.-H., Lin, C.-J., Chen, H.-H., Lin, Y.-I.: Toward generic title generation for clustered documents. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 145–157. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In: 15th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 318–329. ACM Press, New York (1992)Google Scholar
  5. 5.
    Cutting, D.R., Karger, D.R., Pedersen, J.O.: Constant Interaction-time Scatter/Gather Browsing of Large Document Collections. In: 16th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 126–135. ACM Press, New York (1993)Google Scholar
  6. 6.
    Muller, A., Dorre, J., Gerstl, P., Seiffert, R.: The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection. In: 32nd Hawaii International Conference on System Sciences, pp. 2034–2042. IEEE Press, New York (1999)Google Scholar
  7. 7.
    Anton, V.L., Croft, W.B.: An Evaluation of Techniques for Clustering Search Results. Technical Report, Department of Computer Science, University of Massachusetts, Amherst (1996)Google Scholar
  8. 8.
    Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 46–54. ACM Press, New York (1998)Google Scholar
  9. 9.
    Lawrie, D., Croft, W.B., Rosenberg, A.L.: Finding Topic Words for Hierarchical Summarization. In: 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 249–357. ACM Press, New York (2001)Google Scholar
  10. 10.
    Glover, E., Pennock, D.M., Lawrence, S., Krovetz, R.: Inferring Hierarchical Descriptions. In: 11th International Conference on Information and Knowledge Management, pp. 4–9. McLean, VA (2002)Google Scholar
  11. 11.
    Pucktada, T., Jamie, C.: Automatically Labeling Hierarchical Clusters. In: 2006 International Conference on Digital government research, pp. 167–176. ACM Press, New York (2006)Google Scholar
  12. 12.
    Dawid, W.: Descriptive Clustering as a Method for Exploring Text Collections. Ph.D Thesis. Poznan University of Technology, Poznań, Poland (2006)Google Scholar
  13. 13.
    Gao, B.J., Ester, M.: Clustering description Formats, Problems and Algorithms. In: 6th SIAM International Conference on Data Mining. ACM Press, New York (2006)Google Scholar
  14. 14.
    Kummamuru, K., Lotlikar, R., Roy, S., Singal, K., Krishnapuram, R.: A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results. In: 13th International WWW Conference, pp. 658–665. ACM Press, New York (2004)Google Scholar
  15. 15.
    Ayad, H.G., Kamel, M.S.: Topic discovery from text using aggregation of different clustering methods. In: Cohen, R., Spencer, B. (eds.) Canadian AI 2002. LNCS (LNAI), vol. 2338, pp. 161–175. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. 16.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)CrossRefGoogle Scholar
  17. 17.
    SVM-light Support Vector Machine, http://svmlight.joachims.org

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Chengzhi Zhang
    • 1
    • 2
  • Huilin Wang
    • 2
  • Yao Liu
    • 2
  • Hongjiao Xu
    • 2
  1. 1.Department of Information ManagementNanjing University of Science & TechnologyNanjingChina
  2. 2.Institute of Scientific & Technical Information of ChinaBeijingChina

Personalised recommendations