Abstract
Cluster analysis is an un-supervised learning technique that is widely used in the process of topic discovery from text. The research presented here proposes a novel un-supervised learning approach based on aggregation of clusterings produced by different clustering techniques. By examining and combining two different clusterings of a document collection, the aggregation aims at revealing a better structure of the data rather than imposing one that is imposed or constrained by the clustering method itself. When clusters of documents are formed, a process called topic extraction picks terms from the feature space (i.e. the vocabulary of the whole collection) to describe the topic of each cluster. It is proposed at this stage to re-compute terms weights according to the revealed cluster structure. The work further investigates the adaptive setup of the parameters required for the clustering and aggregation techniques. Finally, a topic accuracy measure is developed and used along with the F-measure to evaluate and compare the extracted topics and the clustering quality (respectively) before and after the aggregation. Experimental evaluation shows that the aggregation can successfully improve the clustering quality and the topic accuracy over individual clustering techniques.
This work was partially funded by an NSERC strategic grant.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
V. L. Brailovsky. A probabilistic approach to clustering. Pattern Recognition Letters, 12:193–198, 1991.
E. Backer. Computer-Assisted Reasoning in Cluster Analysis. Prentice Hall, 1995.
D. Merkl. Text classification with self-organizing maps: Some lessons learned. Neurocomputing, 21:61–77, 1998.
D. Mladenic. Personal webwatcher: Implementation and design. Technical Report IJS-DP-7472, Department of Intelligent Systems, J. Stefan Institute, Slovenia, 1996.
M. W. Berry, Z. Drmac, and E. R. Jessup. Matrices, vector spaces, and information retrieval. Society for Industrial and Applied Mathematics Review, 41(2):335–362, 1999.
M. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.
I. Khan, D. Blight, R. D McLeod, and H. C Card. Categorizing web documents using competitive learning: An ingredient of a personal adaptive agent. In Proceedings of the 1997 IEEE International Conference on Neural Networks, volume 1, pages 96–99, 1997.
B. Larsen and C. Aone. Fast and effective text mining using linear-time document clustering. In S. Chaudhuri and D. Madigan, editors, Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 16–22, San Diego, California, USA, August 1999.
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
D. Cheung, B. Kao, and J. Lee. Discovering user access patterns on the world wide web. Knowledge-Based Systems, 10:463–470, 1998.
T. Yan, H. Jacobsen, H. Garcia-Molina, and U. Dayal. From user access patterns to dynamic hypertext linking. In Proceedings of the 5th International WWW Conference, May 1996.
J. Hartigan. Clustering Algorithms. Wiley, New York, 1975.
B. Mirkin. Concept learning and feature selection based on square-error clustering. Machine Learning, 35:25–39, 1999.
M. Perkowitz and O. Etzioni. Towards adaptive web sites: Conceptual framework and case study. Artificial Intelligence, 118:245–275, 2000.
P. Bradley and U. Fayyad. Refining initial points for k-means clustering. In Proceedings of the 15th International Conference on Machine Learning, pages 91–99, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ayad, H., Kamel, M. (2002). Topic Discovery from Text Using Aggregation of Different Clustering Methods. In: Cohen, R., Spencer, B. (eds) Advances in Artificial Intelligence. Canadian AI 2002. Lecture Notes in Computer Science(), vol 2338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47922-8_14
Download citation
DOI: https://doi.org/10.1007/3-540-47922-8_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43724-6
Online ISBN: 978-3-540-47922-2
eBook Packages: Springer Book Archive