Advertisement

Using colour information to understand censorship cards of film archives

  • Oronzo Altamura
  • Margherita Berardi
  • Michelangelo Ceci
  • Donato Malerba
  • Antonio Varlaro
Original Paper
  • 52 Downloads

Abstract

Many European film archives are involved in the digitization of 20th century historical paper documents. In the context of the IST project COLLATE three of them were interested in the semi-automatic annotation of censorship cards and their subsequent retrieval on the basis of both annotations and content. Processing censorship cards, which is the main subject of this paper, leads to a number of challenges for many document image analysis (DIA) systems. Problems arise due to the low layout quality and standard of such material, which introduces a considerable amount of noise in its description. The layout quality is often negatively affected by the presence of stamps, signatures, ink specks, manual annotations and so on that overlap those layout components involved in the understanding or annotation processes. In order to effectively reduce the presence and the effect of noise, we propose an improved version of the knowledge-based DIA system WISDOM++ allowing it to take full advantage of the use of colour information in all processing steps: namely, image segmentation, layout analysis, document image classification and understanding. Experiments have been conducted on a corpus of multi-format documents concerning rare historic film censorships provided by the three film archives involved in the COLLATE project.

Keywords

Historical paper documents Color image segmentation Inductive learning from examples 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aiello M., Monz C., Todoran L., Worring M. (2002). Document understanding for a broad class of documents. Int. J. Doc. Anal. Recogn. 5(1):1–16MATHCrossRefGoogle Scholar
  2. 2.
    Altamura O., Esposito F., Malerba D. (2001). Transforming paper documents into XML format with WISDOM++. Int. J. Doc. Anal. Recogn. 4(1):2–17CrossRefGoogle Scholar
  3. 3.
    Antonacopoulos, A., Karatzas, D.: Document image analysis for World War II personal records. In: 1st International Workshop on Document Image Analysis for Libraries (DIAL 2004), pp. 336–341 (2004)Google Scholar
  4. 4.
    Antonacopoulos, A., Karatzas, D., Krawczyk, H., Wiszniewski, B.: The lifecycle of a digital historical document: structure and content. In: Munson, E.V., Vion-Dury J.Y. (eds.) Proceedings of the 2004 ACM Symposium on Document Engineering, pp. 147–154. ACM (2004)Google Scholar
  5. 5.
    Bensaid A., Hall L.O., Bezdek J.C., Clarke L.P. (1996). Partially supervised clustering for image segmentation. Pattern Recogn. 29(5): 859–871CrossRefGoogle Scholar
  6. 6.
    Berardi M., Varlaro A., Malerba D. (2004). On the effect of caching in recursive theory learning. In: Camacho R., King R.D., Srinivasan A. (eds) Inductive Logic Programming, Lecture Notes in Computer Science, vol 3194. Springer, Berlin Heidelberg New York, pp. 44–62Google Scholar
  7. 7.
    Cheng H.D., Jiang X., Sun Y., Wang J. (2001). Color image segmentation: advances and prospects. Pattern Recogn. 34(12):2259–2281MATHCrossRefGoogle Scholar
  8. 8.
    Esposito F., Malerba D., Marengo V. (2001). Inductive learning from numerical and symbolic data: an integrated framework. Intell. Data Anal. 5(6):445–461MATHGoogle Scholar
  9. 9.
    Frommholz, I., Brocks, H., Thiel, U., Neuhold, E.J., Iannone, L., Semeraro, G., Berardi, M., Ceci, M.: Document-centered collaboration for scholars in the humanities – the collate system. In: European Conference on Research and Advanced Technology for Digital Libraries, pp. 434–445 (2003)Google Scholar
  10. 10.
    Gatos B., Ntzios K., Pratikakis I., Petridis S., Konidaris T., Perantonis S.J. (2004). A segmentation-free recognition technique to assist old greek handwritten manuscript ocr. In: Marinai S., Dengel A. (eds) International Workshop on Document Analysis Systems, Lecture Notes in Computer Science, vol 3163. Springer, Berlin Heidelberg New York, pp. 63–74Google Scholar
  11. 11.
    Gatos B., Pratikakis I., Perantonis S.J. (2004). An adaptive binarization technique for low quality historical documents. In: Marinai S., Dengel A. (eds) International Workshop on Document Analysis Systems, Lecture Notes in Computer Science, vol. 3163. Springer, Berlin Heidelberg New York, pp. 102–113Google Scholar
  12. 12.
    Gervauz, M., Purgathofer, W.: A simple method for color quantization: octree quantization. Graphic Gems, pp. 287–293 (1990)Google Scholar
  13. 13.
    Hase H., Yoneda M., Tokai S., Kato J., Suen C.Y. (2003). Color segmentation for text extraction. Int. J. Doc. Anal. Recogn. 6(4):271–284CrossRefGoogle Scholar
  14. 14.
    He J., Downton A.C. (2004). Configurable text stamp identification tool with application of fuzzy logic. In: Marinai S., Dengel A. (eds) International Workshop on Document Analysis Systems, Lecture Notes in Computer Science, vol. 3163. Springer, Berlin Heidelberg New York, pp. 201–212Google Scholar
  15. 15.
    Karatzas, D., Antonacopoulos, A.: Two approaches for text segmentation in web images. In: International Conference on Document Analysis and Recognition, pp. 131–136 (2003)Google Scholar
  16. 16.
    Klink S., Kieninger T. (2001). Rule-based document structure understanding with a fuzzy combination of layout and textual features. Int. J. Doc. Anal. Recogn. 4(1):18–26CrossRefGoogle Scholar
  17. 17.
    Le Bourgeois F., Kaileh H. (2004). Automatic metadata retrieval from ancient manuscripts. In: Marinai S., Dengel A. (eds) International Workshop on Document Analysis Systems, Lecture Notes in Computer Science, vol. 3163. Springer, Berlin Heidelberg New York, pp. 75–89Google Scholar
  18. 18.
    Lee K.H., Choy Y.C., Cho S.B. (2000). Geometric structure analysis of document images: A knowledge-based approach. IEEE Trans. Pattern Anal. Mach. Intell. 22(11):1224–1240CrossRefGoogle Scholar
  19. 19.
    Levi G., Sirovich F. (1976). Generalized and/or graphs. Artif. Intell. 7(3):243–259MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Lucchese, L., Mitra, S.K.: An algorithm for fast segmentation of color images,. In: Proceedings of IEEE 10th Tyrrhenian Workshop on Digital Communication, pp. 110–119 (1998)Google Scholar
  21. 21.
    Lucchese, L., Mitra, S.K.: Advances in color image segmentation. In: Proceedings of Globecom’99, pp. 2038–2044 (1999)Google Scholar
  22. 22.
    Malerba D. (2003). Learning recursive theories in the normal ilp setting. Fundamenta Informaticae 57(1):39–77MATHMathSciNetGoogle Scholar
  23. 23.
    Malerba, D., Esposito, F., Lisi, F.A., Altamura, O.: Automated discovery of dependencies between logical components in document image understanding. In: International Conference on Document Analysis and Recognition, pp. 174–178 (2001)Google Scholar
  24. 24.
    Malerba, D., Esposito, F., Altamura, O., Ceci, M., Berardi, M.: Correcting the document layout: a machine learning approach. In: International Conference on Document Analysis and Recognition, p. 97 (2003)Google Scholar
  25. 25.
    Mello, C.A.B., Lins, R.D.: Image segmentation of historical documents. In: Visual2000: 3rd International Conference on Visual Computing (2000)Google Scholar
  26. 26.
    Mitchell T. (1997). Machine Learning. McGraw Hill, New YorkMATHGoogle Scholar
  27. 27.
    Moghaddamzadeh A., Bourbakis N.G. (1997). A fuzzy region growing approach for segmentation of color images. Pattern Recogn. 30(6):867–881CrossRefGoogle Scholar
  28. 28.
    Nicolas S., Paquet T., Heutte L. (2004). Enriching historical manuscripts: The bovary project. In: Marinai S., Dengel A. (eds) International Workshop on Document Analysis Systems, Lecture Notes in Computer Science, vol. 3163. Springer, Berlin Heidelberg New York, pp. 135–146Google Scholar
  29. 29.
    Niyogi, D., Srihari, S.N.: Knowledge-based derivation of document logical structure. In: International Conference on Document Analysis and Recognition, pp. 472–475 (1995)Google Scholar
  30. 30.
    Palmero, G.I.S., Dimitriadis, Y.A.: Structured document labeling and rule extraction using a new recurrent fuzzy-neural system. In: International Conference on Document Analysis and Recognition, pp. 181–184 (1999)Google Scholar
  31. 31.
    Perroud, T., Sobottka, K., Bunke, H., Hall, L.: Text extraction from color documents – clustering approaches in three and four dimensions. In: International Conference on Document Analysis and Recognition, pp. 937–941 (2001)Google Scholar
  32. 32.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc. (1993)Google Scholar
  33. 33.
    Shih Y., Chen S.S. (1996). Adaptive document block segmentation and classification. IEEE Trans. Syst. Man Cybern Part B 26(5):797–802CrossRefGoogle Scholar
  34. 34.
    Sobottka K., Kronenberg H., Perroud T., Bunke H. (2000). Text extraction from colored book and journal covers. Int. J. Doc. Anal. Recogn. 2(4):163–176Google Scholar
  35. 35.
    Trémeau A., Borel N. (1997). A region growing and merging algorithm to color segmentation. Pattern Recogn. 30(7):1191–1203CrossRefGoogle Scholar
  36. 36.
    Utgoff, P.: An improved algorithm for incremental induction of decision trees. In: Proceedings of the Eleventh Internatinal Conference on Machine Learning. Morgan Kaufmann (1994)Google Scholar
  37. 37.
    Wong K., Casey R., Wahl F. (1982). Document analysis system. IBM J. Res. Dev. 26(6):647–656CrossRefGoogle Scholar
  38. 38.
    Zhong Y., Karu K., Jain A.K. (1995). Locating text in complex color images. Pattern Recogn. 28(10):1523–1535CrossRefGoogle Scholar
  39. 39.
    Zhou, J., Lopresti, D.P.: Extracting text from www images. In: International Conference Document Analysis and Recognition, pp. 248–252. IEEE Computer Society (1997)Google Scholar

Copyright information

© Springer-Verlag 2006

Authors and Affiliations

  • Oronzo Altamura
    • 1
  • Margherita Berardi
    • 1
  • Michelangelo Ceci
    • 1
  • Donato Malerba
    • 1
  • Antonio Varlaro
    • 1
  1. 1.Dipartimento di InformaticaUniversità degli StudiBariItaly

Personalised recommendations