Skip to main content

Building, Testing, and Applying Concept Hierarchies

  • Chapter
Advances in Information Retrieval

Part of the book series: The Information Retrieval Series ((INRE,volume 7))

Abstract

A means of automatically deriving a hierarchical organization of concepts from a set of documents without use of training data or standard clustering techniques is presented. Using a process that extracts salient words and phrases from the documents, these terms are organized hierarchically using a type of co-occurrence known as subsumption. The resulting structure is displayed as a series of hierarchical menus. When generated from a set ofretrieved documents, a user browsing the menus gains an overview of their content in a manner distinct from existing techniques. The methods used to build the structure are simple and appear to be effective. The formation and presentation of the hierarchy is described along with a study of some of its properties, including a preliminary experiment, which indicates that users may find the hierarchy a more efficient means of locating relevant documents than the classic method of scanning a ranked document list.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Anick, P. and Tipirneni, S. (1999). The paraphrase search assistant: Terminological feedback for iterative information seeking. In Hearst, M., Gey, E, and Tong, R., editors, Proceedings on the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 153–159.

    Google Scholar 

  • Ballesteros, L. and Croft, W. (1998). Resolving ambiguity for cross-language retrieval. In Croft, W., Moffat, A., van Rijsbergen, C., Wilkinson, R., and Zobel, J., editors, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 64–71, Melbourne Australia.

    Google Scholar 

  • Bourdoncle, F. (1997). Livetopics: recherche visuelle d’information sur I’internet (livetopics: visual search for information on the internet). In Proceedings of RIAO (Proceedings of RIAO (Recherche d’Informations Assistee par Ordinateur-Computer Assisted Information Retrieval), pages 651–654.

    Google Scholar 

  • Bruza, P. and Dennis, S. (1997). Query reformulation on the internet: Empirical data and the hyperindex search engine. In Proceedings of RIAO (Recherche d’Informations Assistee par Ordinateur-Computer Assisted Information Retrieval), pages 488–499.

    Google Scholar 

  • Caraballo, S. and Charniak, E. (1999). Determining the specificity of nouns from text. In Proceedings of the joint SIGDAT conference on empirical methods in natural language processing (EMNLP) and very large corpora (VLC), pages 63–70.

    Google Scholar 

  • Chen, H., Houston, A., Sewell, R., and Schatz, B. (1998). Internet browsing and searching: user evaluations of category map and concept space techniques. Journal of the American Society for Information Science, 49(7):582–603.

    Google Scholar 

  • Cutting, D., Karger, D., Pedersen, J., and Tukey, J. (1992). Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th Annual International ACM SIGIR conference on Research and development in information retrieval, pages 318–329, Copenhagen Denmark.

    Google Scholar 

  • Dagan, I., Itai, A., and Schwall, U. (1991). Two languages are more informative than one. In Proceedings of ACL’91: the 29th Annual Meeting of the Association for Computational Linguistics, pages 130–137.

    Google Scholar 

  • Doyle, L. (1961). Semantic road maps for literature searchers. Journal of the Association of Computing Machinery (ACM), 8(4):553–578.

    MATH  MathSciNet  Google Scholar 

  • Forsyth, R. and Rada, R. (1986). Adding an edge. In Machine Learning: applications in expert systems and information retrieval, Ellis Horwood series in artificial intelligence, pages 198–212. Chichester: Ellis Horwood: Halsted Press, New York.

    Google Scholar 

  • Fowler, R., Wilson, B., and Fowler, W. (1992). Information navigator: An information system using associative networks for display and retrieval. Technical Report NAG9-551, #92-1, Department of Computer Science, University of Texas, Pan American Edinburg, TX 78539-2999.

    Google Scholar 

  • Frohlich, M. and Werner, M. (1994). The graph visualization system davinci-a user interface for applications. Technical Report 5/94, Department of Computer Science, Universität Bremen, Bremen, Germany.

    Google Scholar 

  • Grefenstette, G. (1994). Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers.

    Google Scholar 

  • Grefenstette, G. (1997). Sqlet: Short query linguistic expansion techniques, palliating one-word queries by providing intermediate structure to text. In Proceedings of RIAO (Proceedings of RIAO (Recherche d’Informations Assisteepar Ordinateur-Computer Assisted Information Retrieval), pages 500–509.

    Google Scholar 

  • Harman, D. (1992). Relevance feedback revisited. In Proceedings of the 15th Annual International ACM SIGIR conference on Research and development in information retrieval, pages 1–10, Copenhagen Denmark.

    Google Scholar 

  • Hearst, M. (1998). Automated discovery of wordnet relations. In Fellbaum, C., editor, Word Net: an electronic lexical database. MIT Press.

    Google Scholar 

  • Hearst, M. and Pedersen, J. (1996). Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of the 19th Annual International ACM SIGIR conference on Research and development in information retrieval, pages 76–84, Zurich, Switzerland.

    Google Scholar 

  • Jansen, B., Spink, A., Bateman, J., and Saracevic, T. (1998). Real life information retrieval: A study of user queries on the web. SIGIR Forum: A Publication of the Special Interest Group on Information Retrieval, 32(1):5–17.

    Google Scholar 

  • Krovetz, R. (1993). Viewing morphology as an inference process. In Proceedings of the 16th annual international ACM SIGIR conference on Research and Development in Information Retrieval, pages 191–202.

    Google Scholar 

  • Lakoff, G. (1987). Women, Fire, and Dangerous Things. University of Chicago Press.

    Google Scholar 

  • Larkey, L. (1999). A patent search and classification system. In Proceedings of the 4th ACM conference on Digital libraries, pages 179–187.

    Google Scholar 

  • Lawrie, D. and Croft, W. (1999). Discovering and comparing hierarchies. Technical Report IR-183, CIIR, Department of Computer Science, University of Massachusetts, Amherst, MA 01002.

    Google Scholar 

  • Magennis, M. and van Rijsbergen, C. (1997). The potential and actual effectiveness of interactive query expansion. In Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, pages 324–332.

    Google Scholar 

  • McCallum, A., Rosenfeld, R., Mitchell, T., and Ng, A. (1998). Improving text classification by shrinkage in a hierarchy of classes. In Brasko, I. and Dzeroski, S., editors, Machine Learning: Proceedings of the 15th International Conferences (ICML’98), pages 359–367. Morgan Kaufmann Publishers.

    Google Scholar 

  • Miller, G. (1995). Wordnet: A lexical database for english. Communications of the ACM, 38(11):39–41.

    Article  Google Scholar 

  • Ng, H. and Lee, H. (1996). Integrating multiple knowledge sources to disambiguate word sense: An examplar-based approach. In Proceedings of ACL’96: the 34th Annual Meeting of the Association for Computational Linguistics, volume 34, pages 40–7.

    Google Scholar 

  • Pirolli, P., Schank, P., Hearst, M., and Diehl, C. (1996). Scattedgather browsing communicates the topic structure of a very large text collection. In Conference proceedings on Human factors in computing systems (ACM CHI’ 96), pages 213–220.

    Google Scholar 

  • Pollitt, A., Ellis, G., Smith, M., Gregory, M., Li, C., and Zangenberg, H. (1993). A common query interface for multilingual document retrieval from databases of the european community institutions. In Proceedings of the 17th International Online Information meeting (Online’ 93) Learned Information, pages 47–61. Learned Information.

    Google Scholar 

  • Qiu, Y. and Frei, H. (1993). Concept based query expansion. In Proceedings of the 16th annual international ACM SIGIR conference on Research and Development in Information Retrieval, pages 160–170. ACM Press.

    Google Scholar 

  • Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), pages 448–453.

    Google Scholar 

  • Rose, D. and Stevens, C. (1996). V-twin: A lightweight engine for interactive use. In NIST Special Publication 500-238: The 5th Text REtrieval Conference (TREC-5), pages 279–290.

    Google Scholar 

  • Sanderson, M. and Croft, W. (1999). Deriving concept hierarchies from text. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and Development in Information Retrieval, pages 206–213.

    Google Scholar 

  • Sparck Jones, K. (1970). Some thoughts on classification for retrieval. Journal of Documentation, 26(2):89–101.

    Google Scholar 

  • Sparck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11–21.

    Google Scholar 

  • Thompson, R. and Croft, W. (1989). Support for browsing in an intelligent text retrieval system. International Journal of Man Machine Studies, 30:639–668.

    Google Scholar 

  • Tombros, A. and Sanderson, M. (1998). Advantages of query-biased summaries in ir. In Proceedings of the 21st annual international ACM SIGIR conference on Research and Development in Information Retrieval, pages 2–10.

    Google Scholar 

  • van Rijsbergen, C. (1979). Information retrieval. Butterworths, London, second edition.

    Google Scholar 

  • Veling, A. and van der Weerd, P. (1999). Conceptual grouping in word cooccurrence networks. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), pages 694–699.

    Google Scholar 

  • Voorhees, E. and Harman, D., editors (1998). The 7th Text REtrieval Conference (TREC-7). Department of Commerce, National Institute of Standards and Technology.

    Google Scholar 

  • Wakao, T., Gaizauskas, R., and Wilks, Y. (1996). Evaluation of an algorithm for the recognition and classification of proper names. In Proceedings of the 16th International Conference on Computational Linguistics (COLING’ 96), pages 418–423.

    Google Scholar 

  • Woods, W. (1997). Conceptual indexing: a better way to organize knowledge. Technical Report TR-97-61, Sun Labs, Editor, Technical Reports, 901 San Antonio Road, Palo Alto, California 94303, USA.

    Google Scholar 

  • Xu, J. and Croft, W. (1996). Query expansion using local and global document analysis. In Proceedings of the 19th annual international ACM SIGlR conference on Research and development in information retrieval, pages 4–11.

    Google Scholar 

  • Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL), pages 189–196.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Kluwer Academic Publishers

About this chapter

Cite this chapter

Sanderson, M., Lawrie, D. (2002). Building, Testing, and Applying Concept Hierarchies. In: Croft, W.B. (eds) Advances in Information Retrieval. The Information Retrieval Series, vol 7. Springer, Boston, MA. https://doi.org/10.1007/0-306-47019-5_9

Download citation

  • DOI: https://doi.org/10.1007/0-306-47019-5_9

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-7923-7812-9

  • Online ISBN: 978-0-306-47019-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics