Skip to main content

Special Topic Data Base Development

  • Chapter
Information Systems

Abstract

Two basic approaches to the development of special topic document corpora are considered. Both techniques result in a partitioning of a general heterogeneous parent corpus into distinct subsets of related documents. The first method is based on the partitioning of a document corpus with respect to the topical content explicitly defined by the documents contained in the corpus. The technique relies on a logicosyntactic analysis of the document text in order to extract topic-denoting phrases, and a weighting function based on the complexity of the logical relational environment of the extracted phrases. The second method is based on a profile-directed partitioning of the document corpus induced by an externally defined thesaurus of phrases. The topic coverage of the profile depends only on the specific requirements of the user community for whom it was defined. Any one of a number of weighting functions can be applied to the phrases and usually depends on the corpus itself. This technique is useful where text analysis is either impractical or not possible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. J. Kasarda and D. J. Hillman, The LEADERMART System and Service, inProc. ACM 72 Nat. Conf. ( Boston, August 1972 ).

    Google Scholar 

  2. D. J. Hillman, An Algorithm for Document Characterization, Report No. 2, Mathematical Theories of Relevance with Respect to the Problems of Indexing, National Science Foundation Grant No. GN-177 (March 12, 1965 ).

    Google Scholar 

  3. M. B. Leibowitz,A Process for Automated Logico-Syntactic Analysis of Natural English Sentences, Ph.D. Diss., National Science Foundation Grant Nos. GN-668 and GN-845 (September 1970).

    Google Scholar 

  4. N. Goodman,The Structure of Appearance, Harvard Press (1951).

    Google Scholar 

  5. D. J. Hillman,The Measurement of Simplicity Philosophy of Science;29(3) (July 1962).

    Google Scholar 

  6. D. J. Hillman, Characterization and Connectivity, Report No. 1,Document Retrieval Theory, Relevance, and the Methodology of Evaluation, National Science Foundation Grant No. GN-451 (May 24, 1966 ).

    Google Scholar 

  7. D. J. Hillman and A. J. Kasarda, The LEADER Retrieval System, inAFIPS Conf Proc.: Spring Joint Computer Conf, ( Boston, May 1969 ).

    Google Scholar 

  8. D. J. Hillman, The Structure of Document Relations, Report No. 8, Study of Theories and Models of Information Storage and Retrieval, National Science Foundation Grant No. GN-283 (August 25, 1964 ).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1974 Plenum Press, New York

About this chapter

Cite this chapter

Kasarda, A.J., Hillman, D.J. (1974). Special Topic Data Base Development. In: Tou, J.T. (eds) Information Systems. Springer, Boston, MA. https://doi.org/10.1007/978-1-4684-2694-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4684-2694-6_6

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4684-2696-0

  • Online ISBN: 978-1-4684-2694-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics