Document Classification and Interpretation through the Inference of Logic-Based Models

Semeraro, Giovanni; Ferilli, Stefano; Fanizzi, Nicola; Esposito, Floriana

doi:10.1007/3-540-44796-2_6

Giovanni Semeraro⁷,
Stefano Ferilli⁷,
Nicola Fanizzi⁷ &
…
Floriana Esposito⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2163))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

817 Accesses
8 Citations

Abstract

We present a methodology for document processing that exploits logic-based machine learning techniques. Our claim is that information capture and indexing can profit by the identification of the document class and of specific function of its single layout components. Indeed, the application of incremental and multistrategy machine learning techniques, rather than the classic ones, allows for an efficient solution to the problem of information capture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

O. Altamura, F. Esposito, and D. Malerba. Transforming paper documents into XML format with WISDOM++. International Journal on Document Analysis and Recognition, 2001. To appear.
Google Scholar
H. Brocks, U. Thiel, A. Stein, and A. Dirsch-Weigand. Customizable retrieval functions based on user tasks in the cultural heritage domain. In this book.
Google Scholar
F. Esposito, D. Malerba, and F.A. Lisi. Machine learning for intelligent processing of printed documents. Journal of Intelligent Information Systems, 14(2/3):175–198, 2000.
Article Google Scholar
F. Esposito, D. Malerba, G. Semeraro, N. Fanizzi, and S. Ferilli. Adding machine learning and knowledge intensive techniques to a digital library service. International Journal of Digital Libraries, 2(1): 3–19, 1998.
Article Google Scholar
F. Esposito, G. Semeraro, N. Fanizzi, and S. Ferilli. Multistrategy Theory Revision: Induction and abduction in INTHELEX. Machine Learning, 38(1/2):133–156, 2000.
Article MATH Google Scholar
E.A. Fox. How to make intelligent digital libraries. In Z.W. Raś and M. Zemankova, editors, Proceedings of the 8th International Symposium on Methodologies for Intelligent Systems, volume 869 of LNAI, pages 27–38. Springer, 1994.
Google Scholar
X. Li and P. Ng. A document classification and extraction system with learning ability. In Proceedings of the 5th International Conference on Document Analysis and Recognition, pages 197–200, 1999.
Google Scholar
G. Nagy. Twenty years of document image analysis in PAMI. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):38–62, 2000.
Article Google Scholar
F. Sebastiani. Machine learning in automated text categorization. Technical Report Technical Report IEI:B4-31-12-99, CNR-IEI, Pisa, Italy, 1999. Rev. 2001.
Google Scholar
G. Semeraro, F. Esposito, D. Malerba, N. Fanizzi, and S. Ferilli. Machine learning + on-line libraries = IDL. In C. Peters and C. Thanos, editors, Research and Advanced Technology for Digital Libraries. First European Conference-ECDL97, volume 1324 of LNCS, pages 195–214. Springer, 1997.
Chapter Google Scholar
Y. Tang, S. Lee, and C. Suen. Automatic document processing: A survey. Pattern Recognition, 29(2):1931–1952, 1996.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Università di Bari, Via Orabona 4, 70125, Bari, Italy
Giovanni Semeraro, Stefano Ferilli, Nicola Fanizzi & Floriana Esposito

Authors

Giovanni Semeraro
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Ferilli
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Fanizzi
View author publications
You can also search for this author in PubMed Google Scholar
Floriana Esposito
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Crete, Leof. Knossou, P.O. Box 1470, 71409, Heraklion, Greece
Panos Constantopoulos
Foundation for Research and Technology - Hellas, Institute of Computer Science, Vassilika Vouton, P.O. Box 1385, 71110, Heraklion, Greece
Panos Constantopoulos
Department of Computer and Information Science, The Norwegian University of Science and Technology, 7491, Trondheim, Norway
Ingeborg T. Sølvberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Semeraro, G., Ferilli, S., Fanizzi, N., Esposito, F. (2001). Document Classification and Interpretation through the Inference of Logic-Based Models. In: Constantopoulos, P., Sølvberg, I.T. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2001. Lecture Notes in Computer Science, vol 2163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44796-2_6

Download citation

DOI: https://doi.org/10.1007/3-540-44796-2_6
Published: 30 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42537-3
Online ISBN: 978-3-540-44796-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics