Special Topic Data Base Development

Kasarda, Andrew J.; Hillman, Donald J.

doi:10.1007/978-1-4684-2694-6_6

Andrew J. Kasarda² &
Donald J. Hillman²

113 Accesses

Abstract

Two basic approaches to the development of special topic document corpora are considered. Both techniques result in a partitioning of a general heterogeneous parent corpus into distinct subsets of related documents. The first method is based on the partitioning of a document corpus with respect to the topical content explicitly defined by the documents contained in the corpus. The technique relies on a logicosyntactic analysis of the document text in order to extract topic-denoting phrases, and a weighting function based on the complexity of the logical relational environment of the extracted phrases. The second method is based on a profile-directed partitioning of the document corpus induced by an externally defined thesaurus of phrases. The topic coverage of the profile depends only on the specific requirements of the user community for whom it was defined. Any one of a number of weighting functions can be applied to the phrases and usually depends on the corpus itself. This technique is useful where text analysis is either impractical or not possible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. J. Kasarda and D. J. Hillman, The LEADERMART System and Service, inProc. ACM 72 Nat. Conf. ( Boston, August 1972 ).
Google Scholar
D. J. Hillman, An Algorithm for Document Characterization, Report No. 2, Mathematical Theories of Relevance with Respect to the Problems of Indexing, National Science Foundation Grant No. GN-177 (March 12, 1965 ).
Google Scholar
M. B. Leibowitz,A Process for Automated Logico-Syntactic Analysis of Natural English Sentences, Ph.D. Diss., National Science Foundation Grant Nos. GN-668 and GN-845 (September 1970).
Google Scholar
N. Goodman,The Structure of Appearance, Harvard Press (1951).
Google Scholar
D. J. Hillman,The Measurement of Simplicity Philosophy of Science;29(3) (July 1962).
Google Scholar
D. J. Hillman, Characterization and Connectivity, Report No. 1,Document Retrieval Theory, Relevance, and the Methodology of Evaluation, National Science Foundation Grant No. GN-451 (May 24, 1966 ).
Google Scholar
D. J. Hillman and A. J. Kasarda, The LEADER Retrieval System, inAFIPS Conf Proc.: Spring Joint Computer Conf, ( Boston, May 1969 ).
Google Scholar
D. J. Hillman, The Structure of Document Relations, Report No. 8, Study of Theories and Models of Information Storage and Retrieval, National Science Foundation Grant No. GN-283 (August 25, 1964 ).
Google Scholar

Download references

Author information

Authors and Affiliations

Lehigh University, Bethlehem, Pennsylvania, USA
Andrew J. Kasarda & Donald J. Hillman

Authors

Andrew J. Kasarda
View author publications
You can also search for this author in PubMed Google Scholar
Donald J. Hillman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Informatics Research, University of Florida, Gainesville, Florida, USA
Julius T. Tou

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kasarda, A.J., Hillman, D.J. (1974). Special Topic Data Base Development. In: Tou, J.T. (eds) Information Systems. Springer, Boston, MA. https://doi.org/10.1007/978-1-4684-2694-6_6

Download citation

DOI: https://doi.org/10.1007/978-1-4684-2694-6_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4684-2696-0
Online ISBN: 978-1-4684-2694-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics