MC-JBIG2: an improved algorithm for Chinese textual image compression

  • Kui Hu
  • Zhi Tang
  • Liangcai Gao
  • Yadong Mu
Full Paper


Standard JBIG2 algorithms for textual image compression focus on the features of alphabetic characters such as English, not considering the features of pictograph characters such as Chinese. In this work, an improved algorithm called MC-JBIG2 is developed, which aims at improving compression ratio for Chinese textual images. In the proposed method, first multiple features are extracted from the characters in the images. After that, a cascade of clusters is introduced to accomplish the pattern-matching task for the characters. Finally, to optimize the parameters used in the cascade of clusters, a Monte Carlo strategy is implemented to traverse the feasible space. Experimental results show MC-JBIG2 outperforms existing representative JBIG2 algorithms and systems on Chinese textual images. MC-JBIG2 can also improve compression ratio on Latin textual images, however, the improvement on Latin textual images is not as stable as the improvement on Chinese ones.


Textual image compression JBIG2 Pattern matching Clustering Monte Carlo method 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kia, O.E., Doermann, D.S.: Integrated segmentation and clustering for enhanced compression of document images. In: ICDAR, p. 406 (1997)Google Scholar
  2. 2.
    Kia, O.E., Doermann, D.S., Rosenfeld, A., Chellappa, R.: Symbolic Compression and Processing of Document Images, University of Maryland, College Park, Tech. Rep. LAMP-TR-004,CFAR-TR-849,CS-TR-3734, January (1997)Google Scholar
  3. 3.
    Lee, D.S., Hull, J.: Duplicate detection for symbolically compressed documents. In: ICDAR, pp. 305–308 (1999)Google Scholar
  4. 4.
    Luong H.Q., Philips W.: Robust reconstruction of low-resolution document images by exploiting repetitive character behaviour. Int. J. Doc. Anal. Recognit. 11(1), 39–51 (2008)CrossRefGoogle Scholar
  5. 5.
    Witten I.H., Moffat A., Bell T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. 2nd edn. Morgan Kaufmann, San Francisco (1999)Google Scholar
  6. 6.
    I. ITU, Information technology—coded representation of picture and audio information—lossy/lossless coding of bi-level images (jbig2). ITU-T Recommendation T.88 | ISO/IEC 14492 (1999)Google Scholar
  7. 7.
    ITU, Mixed raster content (mrc), ITU-T Recommendation T.44 (1997)Google Scholar
  8. 8.
    Haffner, P., Bottou, L., Howard, P.G., LeCun, Y.: Djvu: analyzing and compressing scanned documents for internet distribution. In: ICDAR, Washington, DC, USA, pp. 625–628 (Sep 1999)Google Scholar
  9. 9.
    Haffner, P., Bottou, L., Howard, P.G., Simard, P., Bengio, Y., Cun, Y.L.: Browsing through high quality document images with djvu. In: ADL ’98: Proceedings of the Advances in Digital Libraries Conference, p. 309. IEEE Computer Society, Washington, DC, USA (1998)Google Scholar
  10. 10.
    Howard P.G., Kossentini F., Martins B., Forchhammer S., Rucklidge W.J., Ono F.: The emerging jbig2 standard. IEEE Trans. Circuits Syst. Video Technol. 8, 838–848 (1998)CrossRefGoogle Scholar
  11. 11.
    Glassner A.: Graphic Gems. Academic Press, Boston (1990)Google Scholar
  12. 12.
    Foley J.D., Van Dam A.: Fundamentals of Interactive Computer Graphics. Addison-Wesley Longman Publishing Co., Inc., Boston (1982)Google Scholar
  13. 13.
    Garain, U., Debnath, S., Mandal, A., Chaudhuri, B.B.: Compression of scan-digitized indian language printed text: a soft pattern matching technique. In: Proceedings of the 2003 ACM Symposium on Document Engineering, pp. 185–192. ACM, New York, NY, USA (2003)Google Scholar
  14. 14.
    Grailu H., Lotfizad M., Yazdi H.S.: Farsi and arabic document images lossy compression based on the mixed raster content model. Int. J. Doc. Anal. Recognit. 12(4), 227–248 (2009)CrossRefGoogle Scholar
  15. 15.
    Saykol E., Sinop A.K., Güdükbay U., Ulusoy Ö., Çetin A.E.: Content-based retrieval of historical ottoman documents stored as textual images. IEEE Trans. Image Process. 13(3), 314–325 (2004)CrossRefGoogle Scholar
  16. 16.
    Dai R., Liu C., Xiao B.: Chinese character recognition: history, status and prospects. Front. Comput. Sci. China 1(2), 126–136 (2007)CrossRefGoogle Scholar
  17. 17.
    Ye, Y.: Text image compression based on pattern matching. Ph.D. dissertation, University of California, USA (1998)Google Scholar
  18. 18.
    Ye Y., Cosman P.C.: Dictionary design for text image compression with jbig2. IEEE Trans. Image Process. 10(6), 818–828 (2001)zbMATHCrossRefGoogle Scholar
  19. 19.
    Ye Y., Cosman P.: Fast and memory efficient text image compression with jbig2. IEEE Trans. Image Process. 12(8), 944–956 (2003)CrossRefGoogle Scholar
  20. 20.
    Chen, S., Yan, H., Xu, Z.: Compression of chinese document images based on morphologic analysis and pattern matching. Opt. Eng. 45(10) (2006)Google Scholar
  21. 21.
    Shang, J., Liu, C., Ding, X.: Jbig2 text image compression based on ocr. In: Proceedings of the Society of Photo-optical Instrumentation Engineering (SPIE), vol. 6067 (2006)Google Scholar
  22. 22.
    Viola, P.A., Jones, M.J.: Rapid object detection using a boosted cascade of simple features. In: CVPR (1), pp. 511–518 (2001)Google Scholar
  23. 23.
    Han J., Kamber M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2005)Google Scholar
  24. 24.
    Liu J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2008)zbMATHGoogle Scholar
  25. 25.
    Hu, K., Tang, Z., Liang, X.: The valuation of china venture capital guiding fund policy based on options model. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 2788–2793 (2007)Google Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  1. 1.Institute of Computer Science and TechnologyPeking UniversityBeijingChina
  2. 2.Department of Electrical and Computer EngineeringNational University of SingaporeSingaporeSingapore

Personalised recommendations