Phrase Clustering Without Document Context
We applied different clustering algorithms to the task of clustering multi-word terms in order to reflect a humanly built ontology. Clustering was done without the usual document co-occurrence information. Our clustering algorithm, CPCL (Classification by Preferential Clustered Link) is based on general lexico-syntactic relations which do not require prior domain knowledge or the existence of a training set. Results show that CPCL performs well in terms of cluster homogeneity and shows good adaptability for handling large and sparse matrices.
KeywordsCluster Algorithm Editing Distance Hierarchical Algorithm Cluster Methodology Term Length
Unable to display preview. Download preview PDF.
- 2.Pantel, P., Lin, D.: Clustering by Committee. In: Annual International conference of ACM on Research and Development in Information retrieval, ACM SIGIR, Tampere, Finland, pp. 199–206 (2002)Google Scholar