Document Classification and Interpretation through the Inference of Logic-Based Models

  • Giovanni Semeraro
  • Stefano Ferilli
  • Nicola Fanizzi
  • Floriana Esposito
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2163)


We present a methodology for document processing that exploits logic-based machine learning techniques. Our claim is that information capture and indexing can profit by the identification of the document class and of specific function of its single layout components. Indeed, the application of incremental and multistrategy machine learning techniques, rather than the classic ones, allows for an efficient solution to the problem of information capture.


Predictive Accuracy Digital Library Document Image Document Processing Layout Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    O. Altamura, F. Esposito, and D. Malerba. Transforming paper documents into XML format with WISDOM++. International Journal on Document Analysis and Recognition, 2001. To appear.Google Scholar
  2. [2]
    H. Brocks, U. Thiel, A. Stein, and A. Dirsch-Weigand. Customizable retrieval functions based on user tasks in the cultural heritage domain. In this book.Google Scholar
  3. [3]
    F. Esposito, D. Malerba, and F.A. Lisi. Machine learning for intelligent processing of printed documents. Journal of Intelligent Information Systems, 14(2/3):175–198, 2000.CrossRefGoogle Scholar
  4. [4]
    F. Esposito, D. Malerba, G. Semeraro, N. Fanizzi, and S. Ferilli. Adding machine learning and knowledge intensive techniques to a digital library service. International Journal of Digital Libraries, 2(1): 3–19, 1998.CrossRefGoogle Scholar
  5. [5]
    F. Esposito, G. Semeraro, N. Fanizzi, and S. Ferilli. Multistrategy Theory Revision: Induction and abduction in INTHELEX. Machine Learning, 38(1/2):133–156, 2000.MATHCrossRefGoogle Scholar
  6. [6]
    E.A. Fox. How to make intelligent digital libraries. In Z.W. Raś and M. Zemankova, editors, Proceedings of the 8th International Symposium on Methodologies for Intelligent Systems, volume 869 of LNAI, pages 27–38. Springer, 1994.Google Scholar
  7. [7]
    X. Li and P. Ng. A document classification and extraction system with learning ability. In Proceedings of the 5th International Conference on Document Analysis and Recognition, pages 197–200, 1999.Google Scholar
  8. [8]
    G. Nagy. Twenty years of document image analysis in PAMI. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):38–62, 2000.CrossRefGoogle Scholar
  9. [9]
    F. Sebastiani. Machine learning in automated text categorization. Technical Report Technical Report IEI:B4-31-12-99, CNR-IEI, Pisa, Italy, 1999. Rev. 2001.Google Scholar
  10. [10]
    G. Semeraro, F. Esposito, D. Malerba, N. Fanizzi, and S. Ferilli. Machine learning + on-line libraries = IDL. In C. Peters and C. Thanos, editors, Research and Advanced Technology for Digital Libraries. First European Conference-ECDL97, volume 1324 of LNCS, pages 195–214. Springer, 1997.CrossRefGoogle Scholar
  11. [11]
    Y. Tang, S. Lee, and C. Suen. Automatic document processing: A survey. Pattern Recognition, 29(2):1931–1952, 1996.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Giovanni Semeraro
    • 1
  • Stefano Ferilli
    • 1
  • Nicola Fanizzi
    • 1
  • Floriana Esposito
    • 1
  1. 1.Dipartimento di InformaticaUniversità di BariBariItaly

Personalised recommendations