Automatic Taxonomy Generation: Issues and Possibilities

  • Raghu Krishnapuram
  • Krishna Kummamuru
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2715)


Automatic taxonomy generation deals with organizing text documents in terms of an unknown labeled hierarchy. The main issues here are (i) how to identify documents that have similar content, (ii) how to discover the hierarchical structure of the topics and subtopics, and (iii) how to find appropriate labels for each of the topics and subtopics. In this paper, we review several approaches to automatic taxonomy generation to provide an insight into the issues involved. We also describe how fuzzy hierarchies can overcome some of the problems associated with traditional crisp taxonomies.


Marginal Likelihood Document Cluster Concept Hierarchy Vocabulary Term Meta Search Engine 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Franzen, K., Karlgren, J.: Verbosity and interface design. Technical Report T2000:04, Swedish Institute of Computer Science (SICS) (2000)Google Scholar
  2. 2.
    Sanderson, M.: Word sense disambiguation and information retrieval. In: Proceedings of SIGIR. (1994) 142–151Google Scholar
  3. 3.
    Salton, G.: Cluster search strategies and the optimization of retrieval effectiveness. Prentice Hall, Englewood Cliffs, N.J. (1971)Google Scholar
  4. 4.
    Griffiths, A., Luckhurst, H., Willett, P.: Using inter-document similarity information in document retrieval systems. Journal of the American Society for Information Sciences 37 (1986) 3–11CrossRefGoogle Scholar
  5. 5.
    Hearst, M.A., Pedersen, J.O.: Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In: Proceedings of SIGIR, Zürich, CH (1996) 76–84Google Scholar
  6. 6.
    Zamir, O., Etzioni, O.: Web document clustering: A feasibility demonstration. In: Research and Development in Information Retrieval. (1998) 46–54Google Scholar
  7. 7.
    Selberg, E., Etzioni, O.: Multi-service search and comparison using the MetaCrawler. In: Proceedings of the 4th International World-Wide Web Conference, Darmstadt, Germany (1995)Google Scholar
  8. 8.
    Klir, G.J., Yuan, B.: Fuzzy sets and Fuzzy logic. Prentice Hall, Englewood Cliffs, New Jersey (1995)zbMATHGoogle Scholar
  9. 9.
    Vaithyanathan, S., Dom, B.: Model selection in unsupervised learning with applications to document clustering. In: The Sixth International Conference on Machine Learning (ICML-1999). (1999) 423–433Google Scholar
  10. 10.
    Vaithyanathan, S., Dom, B.: Model-based hierarchical clustering. In: Proceedings of Sixth Conference on Uncertainty in Artificial Intelligence. (2000) 599–608Google Scholar
  11. 11.
    Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.M.: Learning to classify text from labeled and unlabeled documents. In: Proceedings of AAAI-98, 15th Conference of the American Association for Artificial Intelligence, Madison, US, AAAI Press, Menlo Park, US (1998) 792–799Google Scholar
  12. 12.
    Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers (1994)Google Scholar
  13. 13.
    Hearst, M.A.: Automated discovery of WordNet relations. In Fellbaum, C., ed.: WordNet: an Electronic Lexical Database. MIT Press (1998)Google Scholar
  14. 14.
    Sanderson, M., W.B. Croft: Deriving concept hierarchies from text. In: Proceedings of SIGIR. (1999) 206–213Google Scholar
  15. 15.
    Lawrie, D., Croft, W.B., Rosenberg, A.: Finding topic words for hierarchical summarization. In: Proceedings of SIGIR, ACM Press (2001) 349–357Google Scholar
  16. 16.
    Krishna, K., Krishnapuram, R.: A clustering algorithm for asymmetrically related data with its applications to text mining. In: Proceedings of CIKM, Atlanta, USA (2001) 571–573Google Scholar
  17. 17.
    Xu, J., Croft, W.B.: Query expansion using local and global document analysis. In: Proceedings of SIGIR. (1996) 4–11Google Scholar
  18. 18.
    Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of SIGIR, Melbourne, AU (1998) 96–103Google Scholar
  19. 19.
    Pereira, F.C.N., Tishby, N., Lee, L.: Distributional clustering of English words. In: Meeting of the Association for Computational Linguistics. (1993) 183–190Google Scholar
  20. 20.
    Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. Technical Report TR2001-05, University of Texas, Austin (2001)Google Scholar
  21. 21.
    Kummamuru, K., Dhawale, A.K., Krishnapuram, R.: Fuzzy co-clustering of documents and keywords. In: Proceedings of FUZZIEEE, St. Louis, MO (2003)Google Scholar
  22. 22.
    Oh, C.H., Honda, K., Ichihashi, H.: Fuzzy clustering for categorical multivariate data. In: Proceedings of IFSA/NAFIPS, Vancouver, Canada (2001) 2154–2159Google Scholar
  23. 23.
    Bezdek, J.C., Hathaway, R.J.: Some notes on alternating optimization. In Pal, N.R., Sugeno, M., eds.: Advances in Soft Computing-AFSS 2002. Springer-Verlag (2002) 288–300Google Scholar
  24. 24.
    Frigui, H., Nasraoui, O.: Simultaneous categorization of text documents and identification of cluster-dependent keywords. In: Proceedings of FUZZIEEE, Honolulu, Hawaii (2002) 158–163Google Scholar
  25. 25.
    Frigui, H., Nasraoui, O.: Simultaneous clustering and attribute discrimination. In: Proceedings of FUZZIEEE, San Antonio (2000) 158–163Google Scholar
  26. 26.
    Mandhani, B., Joshi, S., Kummamuru, K.: A matrix density based algorithm to hierarchically co-cluster documents and words. In: Proceedings of WWW 2003 Conference, Budapest, Hungary (2003)Google Scholar
  27. 27.
    Oyanagi, S., Kubota, K., Nakase, A.: Application of matrix clustering to web log analysis and access prediction. In: Proceedings of WEBKDD, San Francisco (2001)Google Scholar
  28. 28.
    Liu, X., Gong, Y., Xu, W., Zhu, S.: Document clustering with cluster refinement and model selection capabilities. In: Proceedings of SIGIR, ACM Press (2002) 191–198Google Scholar
  29. 29.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience (1991)Google Scholar
  30. 30.
    Van Rijsbergen, C.J.: Information Retrieval, 2nd edition. Dept. of Computer Science, University of Glasgow (1979)Google Scholar
  31. 31.
    Chawathe, S.S.: Comparing hierarchical data in external memory. In: Proceedings of the Twenty-fifth International Conference on Very Large Data Bases, Edinburgh, Scotland, U.K. (1999) 90–101Google Scholar
  32. 32.
    Shasha, D., Zhang, K.: Approximate Tree Pattern Matching. Oxford University Press (1995)Google Scholar
  33. 33.
    Lee, D.H., Kim, M.H.: Database summarization using fuzzy ISA hierarchies. IEEE Trans. On Systems Man And Cybernetics Part B-Cybernetics 27 (1997) 68–78CrossRefGoogle Scholar
  34. 34.
    Krishnapuram, R., Keller, J.M.: A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems 1 (1993) 98–110CrossRefGoogle Scholar
  35. 35.
    Grefenstette, G.: SQLET: Short query linguistic expansion techniques: Palliating one or two-word queries by providing intermediate structure to text. In: Proceedings of RIAO. (1997)Google Scholar
  36. 36.
    Anick, P.G., Tipirneni., S.: The paraphrase search assistant: Terminological feedback for iterative information seeking. In: Proceedings of SIGIR. (1999) 153–159Google Scholar
  37. 37.
    Allan, J., Raghvan, H.: Using part-of-speech patterns to reduce query ambiguity. In: Proceedings of SIGIR, Tampere, Finland (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Raghu Krishnapuram
    • 1
  • Krishna Kummamuru
    • 1
  1. 1.Block I, IITIBM India Research LabNew DelhiIndia

Personalised recommendations