A Semantic-Based System for Querying Personal Digital Libraries

  • Luigi Cinque
  • Alessio Malizia
  • Roberto Navigli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3163)


The decreasing cost and the increasing availability of new technologies is enabling people to create their own digital libraries. One of the main topic in personal digital libraries is allowing people to select interesting information among all the different digital formats available today (pdf, html, tiff, etc.). Moreover the increasing availability of these on-line libraries, as well as the advent of the so called Semantic Web [1], is raising the demand for converting paper documents into digital, possibly semantically annotated, documents. These motivations drove us to design a new system which could enable the user to interact and query documents independently from the digital formats in which they are represented. In order to achieve this independence from the format we consider all the digital documents contained in a digital library as images. Our system tries to automatically detect the layout of the digital documents and recognize the geometric regions of interest. All the extracted information is then encoded with respect to a reference ontology, so that the user can query his digital library by typing free text or browsing the ontology.


  1. 1.
    Berners-Lee, T.: Weaving the Web. Harper, San Francisco (1999)Google Scholar
  2. 2.
    Smith, B., Welty, C.: Ontology: towards a new synthesis. In: Proc. of Formal Ontology in Information Systems FOIS-2001, October 2001, ACM Press, New York (2001)Google Scholar
  3. 3.
    Cinque, L., Levialdi, S., Malizia, A.: An Integrated System for the Automatic Segmentation and Classification of Documents. In: Proceedings of the International Conference on Signal Processing, Pattern Recognition, and Applications (SPPRA 2002), Crete, Greece, June 2002, pp. 491–496 (2002)Google Scholar
  4. 4.
    Pavlidis, T.: Algorithms for Graphics and Image Processing. Computer Science Press, Rockeville (1982)Google Scholar
  5. 5.
    Miller, A.: WordNet: An On-line Lexical Resource. Journal of Lexicography 3(4) (1990)Google Scholar
  6. 6.
    Pianta, E., Bentivogli, L., Girardi, C.: MultiWordNet: developing an aligned multilingual database. In: Proceedings of the First International Conference on Global WordNet, Mysore, India, January 21-25 (2002)Google Scholar
  7. 7.
    Nagy, G.: Twenty years of document image analysis. PAMI, IEEE Trans. Pattern Analysis and Machine Intelligence 1/22, 38–62 (2000)CrossRefGoogle Scholar
  8. 8.
    Navigli, R., Velardi, P.: Semantic Interpretation of Terminological Strings. In: Proc. 6th Int’l Conf. on Terminology and Knowledge Engineering (TKE 2002), INIST-CNRS, Vandoeuvre-lès-Nancy, France, pp. 95–100 (2002)Google Scholar
  9. 9.
    Missikoff, M., Navigli, R., Velardi, P.: An Integrated Approach for Web Ontology Learning and Engineering. IEEE Computer, 60–63 (November 2002)Google Scholar
  10. 10.
    International Conference on Document Analysis and Recognition (ICDAR 2003), Edinburgh, Scotland, UK, August 3-6 (2003),
  11. 11.
    Spitz, L., Tombre, K.: Special issue-selected papers from the ICDAR 2001 conference. IJDAR 5(2-3), 87 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Luigi Cinque
    • 1
  • Alessio Malizia
    • 1
  • Roberto Navigli
    • 1
  1. 1.Dept. of Computer ScienceUniversity “La Sapienza” of RomeRomeItaly

Personalised recommendations