Phrase Clustering Without Document Context

  • Eric SanJuan
  • Fidelia Ibekwe-SanJuan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3936)


We applied different clustering algorithms to the task of clustering multi-word terms in order to reflect a humanly built ontology. Clustering was done without the usual document co-occurrence information. Our clustering algorithm, CPCL (Classification by Preferential Clustered Link) is based on general lexico-syntactic relations which do not require prior domain knowledge or the existence of a training set. Results show that CPCL performs well in terms of cluster homogeneity and shows good adaptability for handling large and sparse matrices.


Cluster Algorithm Editing Distance Hierarchical Algorithm Cluster Methodology Term Length 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    SanJuan, E., Dowdall, J., Ibekwe-SanJuan, F., Rinaldi, F.: A symbolic approach to automatic multiword term structuring. Computer Speech and Language 19(4), 524–542 (2005)CrossRefGoogle Scholar
  2. 2.
    Pantel, P., Lin, D.: Clustering by Committee. In: Annual International conference of ACM on Research and Development in Information retrieval, ACM SIGIR, Tampere, Finland, pp. 199–206 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Eric SanJuan
    • 1
  • Fidelia Ibekwe-SanJuan
    • 2
  1. 1.URI, INIST-CNRSLITA, University of MetzFrance
  2. 2.URSIDOCUniversity of Lyon 3France

Personalised recommendations