Pyramidal Digest: An Efficient Model for Abstracting Text Databases

Chuang, Wesley T.; StottParker, D.

doi:10.1007/3-540-44759-8_36

Wesley T. Chuang⁸ &
D. StottParker⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2113))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

529 Accesses

Abstract

We present a novel model of automated composite text digest, the Pyramidal Digest. The model integrates traditional text summarization and text classification in that the digest not only serves as a “summary” but is also able to classify text segments of any given size, and answer queries relative to a context.

“Pyramidal” refers to the fact that the digest is created in at least three dimensions: scope, granularity, and scale. The Pyramidal Digest is defined recursively as a structure of extracted and abstracted features that are obtained gradually – from specific to general, and from large to small text segment size – through a combination of shallow parsing and machine learning algorithms. There are three noticeable threads of learning taking place: learning of characteristic relations, rhetorical relations, and lexical relations.

Our model provides a principle for efficiently digesting large quantities of text: progressive learning can digest text by abstracting its significant features. This approach scales, with complexity bounded by O(n log n), where n is the size of the text. It offers a standard and systematic way of collecting as many semantic features as possible that are reachable by shallow parsing. It enables readers to query beyond keyword matches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Soumen Chakrabarti, Byron Dom, Rakesh Agrawal, and Prabhakar Raghavan. Using taxonomy, discriminants, and signatures for navigating in text databases. In Proceedings of the 23rd VLDB Conference, 1997.
Google Scholar
Wesley Chuang, Asok Tiyyagura, Jihoon Yang, and Giovanni Giuffrida. A fast algorithm for hierarchical text classification. In Proceedings of the DaWak Conference, 2000.
Google Scholar
Wesley Chuang and Jihoon Yang. Extracting sentence segments for text summarization: A machine learning approach. In Proceedings of the 23rd SIGIR Conference, 2000.
Google Scholar
Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification. John Wiley & Sons, 2000.
Google Scholar
H.P. Edmundson. New methods in automatic extracting. Journal of the ACM, 16(2):264–285, 1969.
Article MATH Google Scholar
Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. The MIT Press, 1998.
Google Scholar
Graeme Hirst and Davi St-Onge. WordNet: An Electronic Lexical Database, chapter Lexical Chains as Representation of Context for the Detection and Correction Malapropisms, pages 305–332. The MIT Press, 1997.
Google Scholar
Eduard Hovy and Chin-Yew Lin. Advances in Automatic Text Summarization, chapter Automated Text Summarization in SUMMARIST. MIT Press, 1999.
Google Scholar
Kurt Isselbacher, Eugene Braunwald, Jean Wilson, Joseph Martin, Anthony Fauci, and Dennis Kasper, editors. Harrison’s Principles of Internal Medicine. McGraw-Hill, 13rd edition, 1994.
Google Scholar
Julian Kupiec, Jan O. Pedersen, and Francine Chen. Proceedings of the 18th acm sigir conference. In A Trainable Document Summarizer, pages 68–73, 1995.
Google Scholar
H.P. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2):159–165, 1958.
Article MathSciNet Google Scholar
Dragomir R. Radev and Kathleen McKeown. Generating natural language sum-maries from multiple on-line sources. Computational Linguistics, 24(3):469–500, 1998.
Google Scholar
Mark Sanderson. Word sense disambiguation and information retrieval. In Proceedings of the SIGIR Conference, pages 142–151, 1994.
Google Scholar
Ayse P. Saygin and Tuba Yavuz. Query processing in context-oriented retrieval of information. In Joint Conference on Intelligent Systems, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, UCLA, Los Angeles, CA, 90095, USA
Wesley T. Chuang & D. StottParker

Authors

Wesley T. Chuang
View author publications
You can also search for this author in PubMed Google Scholar
D. StottParker
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Klagenfurt, IFI -IWAS Universitaetsstr. 65, 9020, Klagenfurt, Austria
Heinrich C. Mayr
Faculty of Electrical Engineering, Czech Technical University, Technicka 2, 166 27, Prague 6, Czech Republic
Jiri Lazansky
School of Computer and Information Science, University of South Australia, Mawson Lakes Campus, Mawson Lakes, SA, 5095
Gerald Quirchmayr
Department of Information Systems, Technical University of Munich, Orleanstr. 34, 81667, Munich, Germany
Pavel Vogel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chuang, W.T., StottParker, D. (2001). Pyramidal Digest: An Efficient Model for Abstracting Text Databases. In: Mayr, H.C., Lazansky, J., Quirchmayr, G., Vogel, P. (eds) Database and Expert Systems Applications. DEXA 2001. Lecture Notes in Computer Science, vol 2113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44759-8_36

Download citation

DOI: https://doi.org/10.1007/3-540-44759-8_36
Published: 28 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42527-4
Online ISBN: 978-3-540-44759-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics