Text Pre-processing for Document Clustering

  • Seemab Latif
  • Mary McGee Wood
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5039)


The processing performance of vector-based document clustering methods is improved if automatic summarisation is used in addition to established forms of text pre-processing.


Document Clustering Automatic Summarisation Document Pre-processing 


  1. 1.
    Salton, G.: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs (1971)Google Scholar
  2. 2.
    Chang, H.-C., Hsu, C.-C.: Using topic keyword clusters for automatic document clustering. In: 3rd International Conference on Information Technology and Applications, pp. 419–424. IEEE Computer Society, Los Alamitos (2005)CrossRefGoogle Scholar
  3. 3.
    Sargeant, J., Wood, M.M., Anderson, S.: A human-computer collaborative approach to the marking of free text answers. In: 8th International Conference on Computer Aided Assessment, Loughborough, UK, pp. 361–370 (2004)Google Scholar
  4. 4.
    Wood, M.M., Jones, C., Sargeant, J., Reed, P.: Light-weight clustering techniques for short text answers in HCC CAA. In: 10th International Conference on Computer Aided Assessment, Loughborough, UK, pp. 291–305 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Seemab Latif
    • 1
  • Mary McGee Wood
    • 1
  1. 1.School of Computer ScienceThe University of ManchesterUnited Kingdom

Personalised recommendations