Skip to main content

Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts

  • Chapter
Information Retrieval and Hypertext

Part of the book series: Information Retrieval and Hypertext ((EPUB))

Abstract

Many kinds of texts are currently available in machine-readable form and are amenable to automatic processing. Because the available databases are large and cover many different subject areas, automatic aids must be provided to users interested in accessing the data. It has been suggested that links be placed between related pieces of text, connecting, for example, particular text paragraphs to other paragraphs covering related subject matter. Such a linked text structure, often called hypertext, makes it possible for the reader to start with particular text passages and use the linked structure to find related text elements [4, 19, 5, 12]. Unfortunately, until now, viable methods for automatically building large hypertext structures and for using such structures in a sophisticated way have not been available. Here we give methods for constructing text relation maps and for using text relations to access and use text databases. In particular, we outline procedures for determining text themes, traversing texts selectively, and extracting summary statements that reflect text content.

1 Vast amounts of text material are now available in machine-readable form for automatic processing. Here, approaches are outlined for manipulating and accessing texts in arbitrary subject areas in accordance with user needs. In particular, methods are given for determining text themes, traversing texts selectively, and extracting summary statements reflecting text content.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Al-Hawamdeh and P. Willett, (1989). Paragraph-Based Near-Neighbor Searching in Full Text Documents. Electronic Publishing 2, 179-.

    Google Scholar 

  2. M.H. Anderson, J. Nielsen and H. Rasmussen, (1989). A Similarity-Based Hypertext Browser for Reading the UNIX Network News, Hypermedia, 1, 255–265.

    Google Scholar 

  3. M. Bernstein, (1990). An Apprentice that Discovers Hypertext Links. In A. Rizk, N. Streitz, and J. Andre, (Eds.) Proc. European Conference on Hypertext (ECHT), 212–223.

    Google Scholar 

  4. M. Bernstein, J.D. Bolter, M. Joyce, and E. Mylonas, (1991). Architectures for Volatile Hypertext. In Proc. Hypertext-91, Association for Computing Machinery, New York, 246–260.

    Google Scholar 

  5. J.D. Bolter, (1991). Writing Space — The Computer, Hypertext, and the History of Writing. L. Erlbaum Associates, Hillsdale, N.J.

    Google Scholar 

  6. R.A. Botafogo, E. Rivlin and B. Shneiderman, (1992). Structural Analysis of Hypertexts: Identifying Hierarchies and Useful Metrics, ACM Transactions on Information Systems, 10, 142–180.

    Article  Google Scholar 

  7. C. Buckley, G. Salton, and J. Allan, (1993). Automatic Retrieval with Locality Information Using SMART, In D.K. Harman (Ed.), The First Text REtrieval Conference (TREC-1). National Institute of Standards and Technology Special Publication 500–207, Gaithersburg, Md. 20899, 59–72.

    Google Scholar 

  8. C. Buckley, J. Allan, and G. Salton, (1994). Automatic Routing and Ad-hoc Retrieval Using SMART: TREC-2, In D.K. Harman (Ed.), The Second Text REtrieval Conference (TREC-2), National Institute of Standards and Technology Special Publication 500–215, Gaithersburg, Md. 20899, 45–55.

    Google Scholar 

  9. M.H. Chignell, B. Nordhausen, J.F. Valdez, and J.A. Waterworth, (1991). The HEFTI Model of Text to Hypertext Conversion. Hypermedia 3, 187-.

    Google Scholar 

  10. W.B. Croft, (1977). Clustering Large Files of Documents Using the Single Link Method. Journal of the American Society for Information Science 28, 341-.

    Article  Google Scholar 

  11. G. de Jong, (1982) in Strategies for Natural Language Processing, W.G. Lehnert and M.H. Ringle, (Eds.), L. Erlbaum Associates, Hillsdale, N.J., 149–176.

    Google Scholar 

  12. P. Delaney and G.P. Landow, Eds., (1991). Hypermedia and Literary Studies. MIT Press, Cambridge, MA (USA).

    Google Scholar 

  13. H.P. Edmundson and R.E. Wyllys, (1961). Automatic Abstracting and Indexing, Survey and Recommendations. Communications of the ACM 4, 226-.

    Article  Google Scholar 

  14. R. Furuta, C. Pleasant and B. Shneiderman, (1989). A Spectrum of Automatic Hypertext Constructions. Hypermedia 1, 179-.

    Google Scholar 

  15. R.S. Gilyarevskii and M.M. Subbotin, (1993). Journal of the American Society for Information Science 44, 185-.

    Article  Google Scholar 

  16. P. Gloor, (1991). Cybermap: Yet Another Way of Navigating Hyperspace. In Proc. Hypertext-91, 107–121.

    Chapter  Google Scholar 

  17. C. Guinan and A.F. Smeaton, (1992). Information Retrieval from Hypertext Using Dynamically Planned Guided Tours. In. Proc. ECHT-92 -European Conference on Hypertext, 122–130.

    Chapter  Google Scholar 

  18. M.A. Hearst and C. Plaunt, (1993). Subtopic Structuring for Full-Length Document Access. In R. Khorfage, E. Rasmussen and P. Willett (Eds.), Proc. 16th ACM-SIGIR Conference, Pittsburgh (USA), 55–68.

    Google Scholar 

  19. G.P. Landow, (1989). Hypertext in Literary Education. Computers and the Humanities, 23, 173-.

    Article  Google Scholar 

  20. H.P. Luhn, (1958). The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development 2, 159-.

    Article  MathSciNet  Google Scholar 

  21. F. Murtagh, (1982). A Survey of Recent Advances in Hierarchical Clustering Algorithms. The Computer Journal 26, 354-.

    Google Scholar 

  22. J. O’Connor, (1975). Retrieval of Answer Sentences and Answer Figures by Text Searching. Information Processing and Management, 11, 155-.

    Article  Google Scholar 

  23. J. O’Connor, (1980). Answer Passage Retrieval by Text Searching. Journal of the American Society for Information Science 32, 227-.

    Article  Google Scholar 

  24. CD. Paice, (1990). Constructing Literature Abstracts by Computer. Information Processing and Management, 26, 171-.

    Article  Google Scholar 

  25. T.C. Rearick, (1991). In Hypertext/Hypermedia Handbook, J. Devlin and E. Berk, (Eds.) McGraw Hill, New York, 113–140.

    Google Scholar 

  26. J.E. Rush, R. Salvador, and A. Zamora, (1964). Automatic Abstracting and Indexing — Production of Indicative Abstracts by Application of Contextual Inference and Syntactic Coherence Criteria. Journal of the American Society for Information Science 22, 260-.

    Article  Google Scholar 

  27. G. Salton, Ed., (1971). The Smart Retrieval System — Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs, N.J.

    Google Scholar 

  28. G. Salton, C.S. Yang, and A. Wong, (1975). A Vector Space Model for Automatic Indexing. Communications of the ACM 18, 613-.

    Article  MATH  Google Scholar 

  29. G. Salton and A. Wong, (1978). Generation and Search of Clustered Files. ACM Transactions on Database Systems 3, 321-.

    Article  Google Scholar 

  30. G. Salton, (1981). Automatic Text Processing — The Transformation, Analysis, and Retrieval of Information by Computer. Addison Wesley, Reading, MA.

    Google Scholar 

  31. G. Salton, (1991). Developments in Automatic Text Retrieval. Science 253, 974-.

    Article  MathSciNet  Google Scholar 

  32. G. Salton and C. Buckley, (1991). Global Text Matching for Information Retrieval. Science 253, 1012-.

    Article  MathSciNet  Google Scholar 

  33. Salton G. and Buckley, C, (1991). Automatic Text Structuring and Retrieval: Experiments in Automatic Encyclopedia Searching. In A. Bookstein, Y. Chiaramella, G. Salton and V.V. Raghavan (Eds.), Proc. 14th ACM-SIGIR Conference, Chicago (USA), 21–30.

    Google Scholar 

  34. G. Salton, C. Buckley and J. Allan, (1992). Automatic Structuring of Text Files. Electronic Publishing 5, 1-.

    Google Scholar 

  35. G. Salton, J. Allan, and C. Buckley, (1993). Approaches to Passage Retrieval in Full Text Information Systems. In R. Khorfage, E. Rasmussen and P. Willett (Eds.), Proc. 16th ACM-SIGIR Conference, Pittsburgh (USA), 49–58.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Kluwer Academic Publishers

About this chapter

Cite this chapter

Salton, G., Allan, J., Buckley, C., Singhal, A. (1996). Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts. In: Agosti, M., Smeaton, A.F. (eds) Information Retrieval and Hypertext. Information Retrieval and Hypertext. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1373-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-1373-1_3

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4612-8593-9

  • Online ISBN: 978-1-4613-1373-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics