Using Page Breaks for Book Structuring

  • Hervé Déjean
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7424)


We report on the XRCE participation to the Structure Extraction task of the INEX/ICDAR Book Structure Extraction 2011. We wanted to assess a simple method for structuring a book: using leading and trailing page whitespace. The detection of such large whitespace occurring at the top of leading pages and at the bottom of trailing pages is based on the detection of the type area zone. Evaluation shows as expected a very good precision. Since this approach aims at detecting high level book structures (parts, chapters), structures not marked a page break are not detected (thus a lower recall).


Anchor Point Type Area Structure Extraction Full Page Page Frame 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tschichold, J.: The form of the book: essays on the morality of good design. Hartley & Marks, Point Roberts (1991)Google Scholar
  2. 2.
    Shafait, F., van Beusekom, J., Keysers, D., Breuel, T.M.: Document cleanup using frame detection. International Journal of Document Analysis and Recognition 11, 81–96 (2008)CrossRefGoogle Scholar
  3. 3.
    Déjean, H., Meunier, J.-L.: A System for Converting PDF Documents into Structured XML Format. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 129–140. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Déjean, H., Meunier, J.-L.: Reflections on the INEX structure extraction competition, Boston. In: Document Analysis Systems, pp. 301–308 (2010)Google Scholar
  5. 5.
    Giguet, E., Baudrillart, A., Lucas, N.: Resurgence for the Book Structure Extraction Competition. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2009 Workshop Pre-Proceedings (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Hervé Déjean
    • 1
  1. 1.Xerox Research Centre EuropeMeylanFrance

Personalised recommendations