Skip to main content

Pyramidal Digest: An Efficient Model for Abstracting Text Databases

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2113))

Included in the following conference series:

  • 529 Accesses

Abstract

We present a novel model of automated composite text digest, the Pyramidal Digest. The model integrates traditional text summarization and text classification in that the digest not only serves as a “summary” but is also able to classify text segments of any given size, and answer queries relative to a context.

“Pyramidal” refers to the fact that the digest is created in at least three dimensions: scope, granularity, and scale. The Pyramidal Digest is defined recursively as a structure of extracted and abstracted features that are obtained gradually – from specific to general, and from large to small text segment size – through a combination of shallow parsing and machine learning algorithms. There are three noticeable threads of learning taking place: learning of characteristic relations, rhetorical relations, and lexical relations.

Our model provides a principle for efficiently digesting large quantities of text: progressive learning can digest text by abstracting its significant features. This approach scales, with complexity bounded by O(n log n), where n is the size of the text. It offers a standard and systematic way of collecting as many semantic features as possible that are reachable by shallow parsing. It enables readers to query beyond keyword matches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Soumen Chakrabarti, Byron Dom, Rakesh Agrawal, and Prabhakar Raghavan. Using taxonomy, discriminants, and signatures for navigating in text databases. In Proceedings of the 23rd VLDB Conference, 1997.

    Google Scholar 

  2. Wesley Chuang, Asok Tiyyagura, Jihoon Yang, and Giovanni Giuffrida. A fast algorithm for hierarchical text classification. In Proceedings of the DaWak Conference, 2000.

    Google Scholar 

  3. Wesley Chuang and Jihoon Yang. Extracting sentence segments for text summarization: A machine learning approach. In Proceedings of the 23rd SIGIR Conference, 2000.

    Google Scholar 

  4. Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification. John Wiley & Sons, 2000.

    Google Scholar 

  5. H.P. Edmundson. New methods in automatic extracting. Journal of the ACM, 16(2):264–285, 1969.

    Article  MATH  Google Scholar 

  6. Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. The MIT Press, 1998.

    Google Scholar 

  7. Graeme Hirst and Davi St-Onge. WordNet: An Electronic Lexical Database, chapter Lexical Chains as Representation of Context for the Detection and Correction Malapropisms, pages 305–332. The MIT Press, 1997.

    Google Scholar 

  8. Eduard Hovy and Chin-Yew Lin. Advances in Automatic Text Summarization, chapter Automated Text Summarization in SUMMARIST. MIT Press, 1999.

    Google Scholar 

  9. Kurt Isselbacher, Eugene Braunwald, Jean Wilson, Joseph Martin, Anthony Fauci, and Dennis Kasper, editors. Harrison’s Principles of Internal Medicine. McGraw-Hill, 13rd edition, 1994.

    Google Scholar 

  10. Julian Kupiec, Jan O. Pedersen, and Francine Chen. Proceedings of the 18th acm sigir conference. In A Trainable Document Summarizer, pages 68–73, 1995.

    Google Scholar 

  11. H.P. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2):159–165, 1958.

    Article  MathSciNet  Google Scholar 

  12. Dragomir R. Radev and Kathleen McKeown. Generating natural language sum-maries from multiple on-line sources. Computational Linguistics, 24(3):469–500, 1998.

    Google Scholar 

  13. Mark Sanderson. Word sense disambiguation and information retrieval. In Proceedings of the SIGIR Conference, pages 142–151, 1994.

    Google Scholar 

  14. Ayse P. Saygin and Tuba Yavuz. Query processing in context-oriented retrieval of information. In Joint Conference on Intelligent Systems, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chuang, W.T., StottParker, D. (2001). Pyramidal Digest: An Efficient Model for Abstracting Text Databases. In: Mayr, H.C., Lazansky, J., Quirchmayr, G., Vogel, P. (eds) Database and Expert Systems Applications. DEXA 2001. Lecture Notes in Computer Science, vol 2113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44759-8_36

Download citation

  • DOI: https://doi.org/10.1007/3-540-44759-8_36

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42527-4

  • Online ISBN: 978-3-540-44759-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics