Ground Truth for Layout Analysis Performance Evaluation

  • A. Antonacopoulos
  • D. Karatzas
  • D. Bridson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3872)


Over the past two decades a significant number of layout analysis (page segmentation and region classification) approaches have been proposed in the literature. Each approach has been devised for and/or evaluated using (usually small) application-specific datasets. While the need for objective performance evaluation of layout analysis algorithms is evident, there does not exist a suitable dataset with ground truth that reflects the realities of everyday documents (widely varying layouts, complex entities, colour, noise etc.). The most significant impediment is the creation of accurate and flexible (in representation) ground truth, a task that is costly and must be carefully designed. This paper discusses the issues related to the design, representation and creation of ground truth in the context of a realistic dataset developed by the authors. The effectiveness of the ground truth discussed in this paper has been successfully shown in its use for two international page segmentation competitions (ICDAR2003 and ICDAR2005).


Ground Truth Document Image Text Region Document Type Definition Connected Component Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Philips, I.T., Chen, S., Ha, J., Haralick, R.M.: English Document Database Design and Implementation Methodology. In: Proceeding of the 2nd Annual Symposium on Document Analysis and Retrieval, UNLV, USA, pp. 65–104 (1993)Google Scholar
  2. 2.
    Antonacopoulos, A., Brough, B.: Methodology for Flexible and Efficient Analysis of the Performance of Page Segmentation Algorithms. In: Proceedings of the 5th International Conference on Document Analysis and Recognition (ICDAR 1999), Bangalore, India, pp. 451–454. IEEE-CS Press, Los Alamitos (1999)Google Scholar
  3. 3.
    Antonacopoulos, A.: Page Segmentation Using the Description of the Background. Computer Vision and Image Understanding 70(3), 350–369 (1998)CrossRefGoogle Scholar
  4. 4.
    Antonacopoulos, A., Ritchings, R.T.: Representation and Classification of Complex-Shaped Printed Regions Using White Tiles. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition (ICDAR 1995), Montreal, Canada, pp. 1132–1135. IEEE-CS Press, Los Alamitos (1995)CrossRefGoogle Scholar
  5. 5.
    Antonacopoulos, A., Meng, H.: A ground-truthing tool for layout analysis performance evaluation. In: Lopresti, D., Hu, J., Kashi, R. (eds.) DAS 2002. LNCS, vol. 2423, pp. 236–244. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Simske, S.J., Sturgill, M.: A Ground-Truthing Engine for Proofsetting, Publishing, Re-Purposing and Quality Assurance. In: Proceedings of the 2003 ACM Symposium on Document Engineering (DocEng 2003), Grenoble, France, pp. 150–152. ACM Press, New York (2003)CrossRefGoogle Scholar
  7. 7.
    Antonacopoulos, A., Gatos, B., Karatzas, D.: ICDAR 2003 Page Segmentation Competition. In: Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR 2003), Edinburgh, UK, pp. 688–692. IEEE-CS Press, Los Alamitos (2003)CrossRefGoogle Scholar
  8. 8.
    Antonacopoulos, A., Gatos, B., Bridson, D.: ICDAR 2005 Page Segmentation Competition. In: Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR 2005), Seoul, South Korea, pp. 75–79. IEEE-CS Press, Los Alamitos (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • A. Antonacopoulos
    • 1
  • D. Karatzas
    • 2
  • D. Bridson
    • 1
  1. 1.Pattern Recognition and Image Analysis (PRImA) Research Lab, School of Computing, Science and EngineeringUniversity of SalfordManchesterUnited Kingdom
  2. 2.School of Electronics and Computer ScienceUniversity of SouthamptonSouthamptonUnited Kingdom

Personalised recommendations