Abstract
We present a novel model of automated composite text digest, the Pyramidal Digest. The model integrates traditional text summarization and text classification in that the digest not only serves as a “summary” but is also able to classify text segments of any given size, and answer queries relative to a context.
“Pyramidal” refers to the fact that the digest is created in at least three dimensions: scope, granularity, and scale. The Pyramidal Digest is defined recursively as a structure of extracted and abstracted features that are obtained gradually – from specific to general, and from large to small text segment size – through a combination of shallow parsing and machine learning algorithms. There are three noticeable threads of learning taking place: learning of characteristic relations, rhetorical relations, and lexical relations.
Our model provides a principle for efficiently digesting large quantities of text: progressive learning can digest text by abstracting its significant features. This approach scales, with complexity bounded by O(n log n), where n is the size of the text. It offers a standard and systematic way of collecting as many semantic features as possible that are reachable by shallow parsing. It enables readers to query beyond keyword matches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Soumen Chakrabarti, Byron Dom, Rakesh Agrawal, and Prabhakar Raghavan. Using taxonomy, discriminants, and signatures for navigating in text databases. In Proceedings of the 23rd VLDB Conference, 1997.
Wesley Chuang, Asok Tiyyagura, Jihoon Yang, and Giovanni Giuffrida. A fast algorithm for hierarchical text classification. In Proceedings of the DaWak Conference, 2000.
Wesley Chuang and Jihoon Yang. Extracting sentence segments for text summarization: A machine learning approach. In Proceedings of the 23rd SIGIR Conference, 2000.
Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification. John Wiley & Sons, 2000.
H.P. Edmundson. New methods in automatic extracting. Journal of the ACM, 16(2):264–285, 1969.
Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. The MIT Press, 1998.
Graeme Hirst and Davi St-Onge. WordNet: An Electronic Lexical Database, chapter Lexical Chains as Representation of Context for the Detection and Correction Malapropisms, pages 305–332. The MIT Press, 1997.
Eduard Hovy and Chin-Yew Lin. Advances in Automatic Text Summarization, chapter Automated Text Summarization in SUMMARIST. MIT Press, 1999.
Kurt Isselbacher, Eugene Braunwald, Jean Wilson, Joseph Martin, Anthony Fauci, and Dennis Kasper, editors. Harrison’s Principles of Internal Medicine. McGraw-Hill, 13rd edition, 1994.
Julian Kupiec, Jan O. Pedersen, and Francine Chen. Proceedings of the 18th acm sigir conference. In A Trainable Document Summarizer, pages 68–73, 1995.
H.P. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2):159–165, 1958.
Dragomir R. Radev and Kathleen McKeown. Generating natural language sum-maries from multiple on-line sources. Computational Linguistics, 24(3):469–500, 1998.
Mark Sanderson. Word sense disambiguation and information retrieval. In Proceedings of the SIGIR Conference, pages 142–151, 1994.
Ayse P. Saygin and Tuba Yavuz. Query processing in context-oriented retrieval of information. In Joint Conference on Intelligent Systems, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chuang, W.T., StottParker, D. (2001). Pyramidal Digest: An Efficient Model for Abstracting Text Databases. In: Mayr, H.C., Lazansky, J., Quirchmayr, G., Vogel, P. (eds) Database and Expert Systems Applications. DEXA 2001. Lecture Notes in Computer Science, vol 2113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44759-8_36
Download citation
DOI: https://doi.org/10.1007/3-540-44759-8_36
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42527-4
Online ISBN: 978-3-540-44759-7
eBook Packages: Springer Book Archive