Abstract
Two basic approaches to the development of special topic document corpora are considered. Both techniques result in a partitioning of a general heterogeneous parent corpus into distinct subsets of related documents. The first method is based on the partitioning of a document corpus with respect to the topical content explicitly defined by the documents contained in the corpus. The technique relies on a logicosyntactic analysis of the document text in order to extract topic-denoting phrases, and a weighting function based on the complexity of the logical relational environment of the extracted phrases. The second method is based on a profile-directed partitioning of the document corpus induced by an externally defined thesaurus of phrases. The topic coverage of the profile depends only on the specific requirements of the user community for whom it was defined. Any one of a number of weighting functions can be applied to the phrases and usually depends on the corpus itself. This technique is useful where text analysis is either impractical or not possible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. J. Kasarda and D. J. Hillman, The LEADERMART System and Service, inProc. ACM 72 Nat. Conf. ( Boston, August 1972 ).
D. J. Hillman, An Algorithm for Document Characterization, Report No. 2, Mathematical Theories of Relevance with Respect to the Problems of Indexing, National Science Foundation Grant No. GN-177 (March 12, 1965 ).
M. B. Leibowitz,A Process for Automated Logico-Syntactic Analysis of Natural English Sentences, Ph.D. Diss., National Science Foundation Grant Nos. GN-668 and GN-845 (September 1970).
N. Goodman,The Structure of Appearance, Harvard Press (1951).
D. J. Hillman,The Measurement of Simplicity Philosophy of Science;29(3) (July 1962).
D. J. Hillman, Characterization and Connectivity, Report No. 1,Document Retrieval Theory, Relevance, and the Methodology of Evaluation, National Science Foundation Grant No. GN-451 (May 24, 1966 ).
D. J. Hillman and A. J. Kasarda, The LEADER Retrieval System, inAFIPS Conf Proc.: Spring Joint Computer Conf, ( Boston, May 1969 ).
D. J. Hillman, The Structure of Document Relations, Report No. 8, Study of Theories and Models of Information Storage and Retrieval, National Science Foundation Grant No. GN-283 (August 25, 1964 ).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1974 Plenum Press, New York
About this chapter
Cite this chapter
Kasarda, A.J., Hillman, D.J. (1974). Special Topic Data Base Development. In: Tou, J.T. (eds) Information Systems. Springer, Boston, MA. https://doi.org/10.1007/978-1-4684-2694-6_6
Download citation
DOI: https://doi.org/10.1007/978-1-4684-2694-6_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4684-2696-0
Online ISBN: 978-1-4684-2694-6
eBook Packages: Springer Book Archive