Abstract
Document management is critical for the distribution and preservation of knowledge. The aim is discovering, in a database of documents in paper, electronic and Web pages format, significant knowledge to be used as meta-information for their content-based retrieval and management. This paper proposes processing solutions that are suitable for application in the three cases, all of them exploiting symbolic (first-order) learning techniques for automatically classifying the documents and their layout components according to their semantics. This will allow to properly tag the documents in a Semantic Web development perspective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
T.M. Breuel. Two geometric algorithms for layout analysis. In Workshop on Document Analysis Systems, 2002.
F. Esposito, S. Ferilli, N. Fanizzi, T.M.A. Basile, and N. Di Mauro. Incremental multistrategy learning for document processing. Applied Artificial Intelligence, 17 (8/9): 859–883, 2003.
F. Esposito, D. Malerba, and F.A. Lisi. Machine learning for intelligent processing of printed documents. Journal of Intelligent Information Systems, 14 (2/3): 175–198, 2000.
S. Ferilli, F. Esposito, T.M.A. Basile, and N. Di Mauro. Automatic induction of rules for classification and interpretation of cultural heritage material. In T. Koch and I.T. Solvberg, editors, Research and Advanced Technology for Digital Libraries, number 2769 in Lecture Notes in Computer Science, pages 152–163. Springer, 2003.
S. Ferilli, N. Fanizzi, and G. Semeraro. Learning logic models for automated text categorization. In F. Esposito, editor, AI *IA 2001: Advances in Artificial Intelligence, number 2175 in Lecture Notes in Artificial Intelligence, pages 81–86. Springer, 2001.
S. Ferilli, N. Di Mauro, T.M.A. Basile, and F. Esposito. Incremental induction of rules for document image understanding. In A. Cappelli and F. Turini, editors, AI*IA 2003: Advances in Artificial Intelligence, number 2829 in Lecture Notes in Artificial Intelligence, pages 176–188. Springer, 2003.
D. Freitag. Information extraction from HTML: Application of a general machine learning approach. In AAAI/IAAI, pages 517–523, 1998.
G. Nagy. Twenty years of document image analysis in PAMI. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22 (1): 38–62, 2000.
S. Soderland. Learning information extraction rules for semi-structured and free text. Machine Learning, 34: 233–272, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Esposito, F., Ferilli, S., Basile, T.M.A., Di Mauro, N. (2004). Discovering Logical Structures in Digital Documents. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 25. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39985-8_60
Download citation
DOI: https://doi.org/10.1007/978-3-540-39985-8_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21331-4
Online ISBN: 978-3-540-39985-8
eBook Packages: Springer Book Archive