Abstract
We consider assembling documents using, as a source, a digital library containing SGML documents. The assembly process contains two parts: 1) finding interesting fragments, and 2) constructing a coherent document. We present a general document assembly framework. First, we describe a system for tailoring control engineering textbooks. Its assembling facilities are rather restricted but, on the other hand, the quality of documents produced is high. Second, we address the problem of filtering and combining interesting information from a large heterogeneous document collection. The methods presented offer various ways to find the interesting document fragments. Moreover, the elements found in the fragments are mapped to generic elements, like sections, paragraph containers, paragraphs and strings, which have known semantics. Hence, even arbitrary compositions can be formatted and printed.
This work was supported by the Finnish Technology Development Centre (TEKES).
Preview
Unable to display preview. Download preview PDF.
References
Helena Ahonen, Barbara Heikkinen, Oskari Heinonen, Jam Jaakkola, Pekka Kilpeläinen, Greger Lindén, and Heikki Mannila. Intelligent Assembly of Structured Documents. Report C-1996-40, Department of Computer Science, University of Helsinki, 1996.
Helena Ahonen, Barbara Heikkinen, Oskari Heinonen, and Mika Klemettinen. Improving the accessibility of SGML documents: A content-analytical approach. In SGML Europe '97, Barcelona, 1997. GCA.
Custom CourseWare. McMaster University Bookstore, 1997.URL: http://bookstore.services.mcmaster.ca/home/ccw/ccw.html.
Douglas R. Cutting, Jan O. Pedersen, David Karger, and John W. Tukey. Scatter/Gather: A cluster-based approach to browsing large document collections. In Proc. of the 15th ACMISIGIR Conference, Copenhagen, 1992.
Anja Haake, Christoph Hüser, and Klaus Reichenberger. The individualized electronic newspaper: an example of an active publication. Electronic Publishing–Origination, Dissemination and Design, 7(2):89–111, June 1994.
ISO. Information Processing — Text and Office Systems — Standard Generalized Markup Language (SGML), ISO 8879, 1986.
ISO. Information and documentation — Electronic manuscript preparation and markup, ISO 12083, 1994.
W. Eliot Kimber. Re-usable SGML: Why I demand SUBDOC. In SGML '96, Boston, 1996. GCA.
John McFadden. Hybrid distributed database (HDDB) and the future of SGML. In SGML Europe '96, Munich, 1996. GCA.
Nelson Canada Power Pak. Nelson Canada, a Division of Thomson International, 1997. URL:http://www.thomson.com/nelson/custom/custom.html.
Primis. Primis Custom Publishing, a Division of McGraw-Hill, 1997. URL: http://www.mhcollege.com/primis/.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ahonen, H., Heikkinen, B., Heinonen, O., Kilpeläinen, P. (1997). Assembling documents from digital libraries. In: Hameurlain, A., Tjoa, A.M. (eds) Database and Expert Systems Applications. DEXA 1997. Lecture Notes in Computer Science, vol 1308. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0022051
Download citation
DOI: https://doi.org/10.1007/BFb0022051
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63478-2
Online ISBN: 978-3-540-69580-6
eBook Packages: Springer Book Archive