Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval
Expert Network (ExpNet) is our new approach to automatic categorization and retrieval of natural language texts. We use a training set of texts with expert-assigned categories to construct a network which approximately reflects the conditional probabilities of categories given a text. The input nodes of the network are words in the training texts, the nodes on the intermediate level are the training texts, and the output nodes are categories. The links between nodes are computed based on statistics of the word distribution and the category distribution over the training set. ExpNet is used for relevance ranking of candidate categories of an arbitrary text in the case of text categorization, and for relevance ranking of documents via categories in the case of text retrieval. We have evaluated ExpNet in categorization and retrieval on a document collection of the MEDLINE database, and observed a performance in recall and precision comparable to the Linear Least Squares Fit (LLSF) mapping method, and significantly better than other methods tested. Computationally, ExpNet has an O(N 1og N) time complexity which is much more efficient than the cubic complexity of the LLSF method. The simplicity of the model, the high recall-precision rates, and the efficient computation together make ExpNet preferable as a practical solution for real-world applications.
Unable to display preview. Download preview PDF.
- 1.Hersh WR, Haynes RB. Evaluation of SAPHIRE: an automated approach to indexing and retrieving medical literature. Proc 15th Ann Symp Comp Applic Med Care 1991; 15: 808–812Google Scholar
- 3.Yang Y, Chute CG. A Linear Least Squares Fit mapping method for information retrieval from natural language texts. Proc 14th International Conference on Computational Linguistics (COLING 92) 1992; 447–453Google Scholar
- 4.Harman D. Overview of the first TREC Conference. Proc 16th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval1993; 36–47Google Scholar
- 5.Hersh WR, Hickam DH, Leone TJ. Words, concepts, or both: optimal indexing units for automated information retrieval. Proc 16th Ann Symp Comp Applic Med Care 1992; 16: 644–648Google Scholar
- 7.Fuhr N, Hartmann S, Lustig G, et al. AIR/X-a rule-based multistage indexing systems for large subject fields. Proceedings of the RIAO’91 1991; 606–623Google Scholar
- 8.Yang Y, Chute CG. An application of Least Squares Fit Mapping to text information retrieval. Proc 16th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval 1993; 281–290Google Scholar
- 9.Haines D., Croft B. Relevance Feedback and inference networks. Proc 16th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval1993; 2–11Google Scholar
- 10.Tzeras K, Hartmann S. Automatic indexing based on Bayesian inference networks. Proc 16th Ann lot ACM SIGIR Conference on Research and Development in Information Retrieval 1993; 22–34Google Scholar
- 11.Wong SKM, Cal YJ. Computation of term associations by a neural network. Proc 16th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval 1993; 107–115Google Scholar
- 12.Wong SKM, Cal YJ. Computation of term associations by a neural network. Proc 16th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval 1993; 107–115Google Scholar
- 13.Yang Y, Chute CG. Words or Concepts: the Features of Indexing Units and their Optimal Use in Information Retrieval. Proc 17th Ann Symp Comp Applic Med Care 1993; 17: 685–689Google Scholar
- 14.Salton G. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, Pennsylvania, 1989Google Scholar
- 15.Haynes R, McKibbon K, Walker C, Ryan N, Fitzgerald D, Ramsden M. Online access to MEDLINE in clinical settings. Ann. Int. Med. 1990; 112: 78–84Google Scholar