Abstract
A means of automatically deriving a hierarchical organization of concepts from a set of documents without use of training data or standard clustering techniques is presented. Using a process that extracts salient words and phrases from the documents, these terms are organized hierarchically using a type of co-occurrence known as subsumption. The resulting structure is displayed as a series of hierarchical menus. When generated from a set ofretrieved documents, a user browsing the menus gains an overview of their content in a manner distinct from existing techniques. The methods used to build the structure are simple and appear to be effective. The formation and presentation of the hierarchy is described along with a study of some of its properties, including a preliminary experiment, which indicates that users may find the hierarchy a more efficient means of locating relevant documents than the classic method of scanning a ranked document list.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anick, P. and Tipirneni, S. (1999). The paraphrase search assistant: Terminological feedback for iterative information seeking. In Hearst, M., Gey, E, and Tong, R., editors, Proceedings on the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 153–159.
Ballesteros, L. and Croft, W. (1998). Resolving ambiguity for cross-language retrieval. In Croft, W., Moffat, A., van Rijsbergen, C., Wilkinson, R., and Zobel, J., editors, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 64–71, Melbourne Australia.
Bourdoncle, F. (1997). Livetopics: recherche visuelle d’information sur I’internet (livetopics: visual search for information on the internet). In Proceedings of RIAO (Proceedings of RIAO (Recherche d’Informations Assistee par Ordinateur-Computer Assisted Information Retrieval), pages 651–654.
Bruza, P. and Dennis, S. (1997). Query reformulation on the internet: Empirical data and the hyperindex search engine. In Proceedings of RIAO (Recherche d’Informations Assistee par Ordinateur-Computer Assisted Information Retrieval), pages 488–499.
Caraballo, S. and Charniak, E. (1999). Determining the specificity of nouns from text. In Proceedings of the joint SIGDAT conference on empirical methods in natural language processing (EMNLP) and very large corpora (VLC), pages 63–70.
Chen, H., Houston, A., Sewell, R., and Schatz, B. (1998). Internet browsing and searching: user evaluations of category map and concept space techniques. Journal of the American Society for Information Science, 49(7):582–603.
Cutting, D., Karger, D., Pedersen, J., and Tukey, J. (1992). Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th Annual International ACM SIGIR conference on Research and development in information retrieval, pages 318–329, Copenhagen Denmark.
Dagan, I., Itai, A., and Schwall, U. (1991). Two languages are more informative than one. In Proceedings of ACL’91: the 29th Annual Meeting of the Association for Computational Linguistics, pages 130–137.
Doyle, L. (1961). Semantic road maps for literature searchers. Journal of the Association of Computing Machinery (ACM), 8(4):553–578.
Forsyth, R. and Rada, R. (1986). Adding an edge. In Machine Learning: applications in expert systems and information retrieval, Ellis Horwood series in artificial intelligence, pages 198–212. Chichester: Ellis Horwood: Halsted Press, New York.
Fowler, R., Wilson, B., and Fowler, W. (1992). Information navigator: An information system using associative networks for display and retrieval. Technical Report NAG9-551, #92-1, Department of Computer Science, University of Texas, Pan American Edinburg, TX 78539-2999.
Frohlich, M. and Werner, M. (1994). The graph visualization system davinci-a user interface for applications. Technical Report 5/94, Department of Computer Science, Universität Bremen, Bremen, Germany.
Grefenstette, G. (1994). Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers.
Grefenstette, G. (1997). Sqlet: Short query linguistic expansion techniques, palliating one-word queries by providing intermediate structure to text. In Proceedings of RIAO (Proceedings of RIAO (Recherche d’Informations Assisteepar Ordinateur-Computer Assisted Information Retrieval), pages 500–509.
Harman, D. (1992). Relevance feedback revisited. In Proceedings of the 15th Annual International ACM SIGIR conference on Research and development in information retrieval, pages 1–10, Copenhagen Denmark.
Hearst, M. (1998). Automated discovery of wordnet relations. In Fellbaum, C., editor, Word Net: an electronic lexical database. MIT Press.
Hearst, M. and Pedersen, J. (1996). Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of the 19th Annual International ACM SIGIR conference on Research and development in information retrieval, pages 76–84, Zurich, Switzerland.
Jansen, B., Spink, A., Bateman, J., and Saracevic, T. (1998). Real life information retrieval: A study of user queries on the web. SIGIR Forum: A Publication of the Special Interest Group on Information Retrieval, 32(1):5–17.
Krovetz, R. (1993). Viewing morphology as an inference process. In Proceedings of the 16th annual international ACM SIGIR conference on Research and Development in Information Retrieval, pages 191–202.
Lakoff, G. (1987). Women, Fire, and Dangerous Things. University of Chicago Press.
Larkey, L. (1999). A patent search and classification system. In Proceedings of the 4th ACM conference on Digital libraries, pages 179–187.
Lawrie, D. and Croft, W. (1999). Discovering and comparing hierarchies. Technical Report IR-183, CIIR, Department of Computer Science, University of Massachusetts, Amherst, MA 01002.
Magennis, M. and van Rijsbergen, C. (1997). The potential and actual effectiveness of interactive query expansion. In Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, pages 324–332.
McCallum, A., Rosenfeld, R., Mitchell, T., and Ng, A. (1998). Improving text classification by shrinkage in a hierarchy of classes. In Brasko, I. and Dzeroski, S., editors, Machine Learning: Proceedings of the 15th International Conferences (ICML’98), pages 359–367. Morgan Kaufmann Publishers.
Miller, G. (1995). Wordnet: A lexical database for english. Communications of the ACM, 38(11):39–41.
Ng, H. and Lee, H. (1996). Integrating multiple knowledge sources to disambiguate word sense: An examplar-based approach. In Proceedings of ACL’96: the 34th Annual Meeting of the Association for Computational Linguistics, volume 34, pages 40–7.
Pirolli, P., Schank, P., Hearst, M., and Diehl, C. (1996). Scattedgather browsing communicates the topic structure of a very large text collection. In Conference proceedings on Human factors in computing systems (ACM CHI’ 96), pages 213–220.
Pollitt, A., Ellis, G., Smith, M., Gregory, M., Li, C., and Zangenberg, H. (1993). A common query interface for multilingual document retrieval from databases of the european community institutions. In Proceedings of the 17th International Online Information meeting (Online’ 93) Learned Information, pages 47–61. Learned Information.
Qiu, Y. and Frei, H. (1993). Concept based query expansion. In Proceedings of the 16th annual international ACM SIGIR conference on Research and Development in Information Retrieval, pages 160–170. ACM Press.
Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), pages 448–453.
Rose, D. and Stevens, C. (1996). V-twin: A lightweight engine for interactive use. In NIST Special Publication 500-238: The 5th Text REtrieval Conference (TREC-5), pages 279–290.
Sanderson, M. and Croft, W. (1999). Deriving concept hierarchies from text. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and Development in Information Retrieval, pages 206–213.
Sparck Jones, K. (1970). Some thoughts on classification for retrieval. Journal of Documentation, 26(2):89–101.
Sparck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11–21.
Thompson, R. and Croft, W. (1989). Support for browsing in an intelligent text retrieval system. International Journal of Man Machine Studies, 30:639–668.
Tombros, A. and Sanderson, M. (1998). Advantages of query-biased summaries in ir. In Proceedings of the 21st annual international ACM SIGIR conference on Research and Development in Information Retrieval, pages 2–10.
van Rijsbergen, C. (1979). Information retrieval. Butterworths, London, second edition.
Veling, A. and van der Weerd, P. (1999). Conceptual grouping in word cooccurrence networks. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), pages 694–699.
Voorhees, E. and Harman, D., editors (1998). The 7th Text REtrieval Conference (TREC-7). Department of Commerce, National Institute of Standards and Technology.
Wakao, T., Gaizauskas, R., and Wilks, Y. (1996). Evaluation of an algorithm for the recognition and classification of proper names. In Proceedings of the 16th International Conference on Computational Linguistics (COLING’ 96), pages 418–423.
Woods, W. (1997). Conceptual indexing: a better way to organize knowledge. Technical Report TR-97-61, Sun Labs, Editor, Technical Reports, 901 San Antonio Road, Palo Alto, California 94303, USA.
Xu, J. and Croft, W. (1996). Query expansion using local and global document analysis. In Proceedings of the 19th annual international ACM SIGlR conference on Research and development in information retrieval, pages 4–11.
Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL), pages 189–196.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Kluwer Academic Publishers
About this chapter
Cite this chapter
Sanderson, M., Lawrie, D. (2002). Building, Testing, and Applying Concept Hierarchies. In: Croft, W.B. (eds) Advances in Information Retrieval. The Information Retrieval Series, vol 7. Springer, Boston, MA. https://doi.org/10.1007/0-306-47019-5_9
Download citation
DOI: https://doi.org/10.1007/0-306-47019-5_9
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-7923-7812-9
Online ISBN: 978-0-306-47019-6
eBook Packages: Springer Book Archive