Using Bags of Symbols for Automatic Indexing of Graphical Document Image Databases

  • Eugen Barbu
  • Pierre Héroux
  • Sébastien Adam
  • Éric Trupin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3926)


A database is only usefull if it is associated a set of procedures allowing to retrieve relevant elements for the users’ needs. A lot of IR techniques have been developed for automatic indexing and retrieval in document databases. Most of these use indexes depending on the textual content of documents, and very few are able to handle graphical or image content without human annotation.

This paper describes an approach similar to the bag of words technique for automatic indexing of graphical document image databases and different ways to consequently query these databases. In an unsupervised manner, this approach proposes a set of automatically discovered symbols that can be combined with logical operators to build queries.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Antonacopoulos, A.: Introduction to Document Image Analysis (1996)Google Scholar
  2. 2.
    Nagy, G.: Twenty years of document analysis in pami. IEEE Trans. on Pattern Analysis and Machine Intelligence 22(1), 38–62 (2000)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Pavlidis, T.: Algorithms for Graphics and Image Processing. Computer Science Press, Rockville (1982)CrossRefMATHGoogle Scholar
  4. 4.
    Bagdanov, A.D., Worring, M.: Fine-grained document genre classification using first order random graphs. In: Proc. of the sixth International Conference on Document Analysis and Recognition, pp. 79–83 (2001)Google Scholar
  5. 5.
    Washio, T., Motoda, H.: State of the art of graph-based data mining. SIGKDD Explor. Newsletter 5(1), 59–68 (2003)CrossRefGoogle Scholar
  6. 6.
    Fung, B.C.M., Wang, K., Ester, M.: Hierarchical document clustering using frequent items. In: Proc. of the SIAM Conference on Data Maining (2003)Google Scholar
  7. 7.
    Termier, A., Rousset, M., Sebag, M.: Mining xml data with frequent trees. In: Proc. of DBFusion Workshop, pp. 87–96 (2002)Google Scholar
  8. 8.
    Doermann, D.: The indexing and retrieval of document images: A survey. Technical report, LAMP (1998)Google Scholar
  9. 9.
    Lorenz, O., Monagan, G.: Automatic indexing for storage and retrieval of line drawings. In: SPIE (ed.) Storage and Retrieval for Image and Video Databases (SPIE), vol. 2420, pp. 216–227 (1995)Google Scholar
  10. 10.
    Blostein, D., Zanibbi, R., Nagy, G., Harrap, R.: Document representations. In: Proc. of the IAPR Workshop on Graphic Recognition (2003)Google Scholar
  11. 11.
    Khotazad, A., Hong, Y.H.: Invariant image recognition by zernike moments. IEEE Trans. on Pattern Recogntion and Machine Analysis 12(5) (1990)Google Scholar
  12. 12.
    Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 58(2), 159–179 (1985)CrossRefGoogle Scholar
  13. 13.
    Gordon, A.D.: Classification, 2nd edn. Chapman & Hall, Boca Raton (1999)MATHGoogle Scholar
  14. 14.
    Kaufmann, L., Rousseeuw, P.J.: Clustering by means of medoids. In: Dodge, Y. (ed.) Statistical Data Analysis based on the L1 Norm and Related Methods, pp. 405–416. Elsevier Science, Amsterdam (1987)Google Scholar
  15. 15.
    Tabbone, S., Wendling, L., Tombre, K.: Matching of graphical symbols in line-drawing images using angular signature information. International Journal on Document Analysis and Recognition 6(2), 115–125 (2003)CrossRefGoogle Scholar
  16. 16.
    Yan, X., Han, J.: Closegraph: mining closed frequent graph patterns. In: Press, A. (ed.) Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 286–295 (2003)Google Scholar
  17. 17.
    Kuramochi, M., Karypis, G.: An efficient algorithm for discovering frequent subgraphs. IEEE Transactions on Knowledge Data Engeneering 16(9), 1038–1051 (2004)CrossRefGoogle Scholar
  18. 18.
    Dumais, S.T.: Improving the retrieval information from external ressources, behaviour research methods. Instrument and Computers 23(2), 229–236 (1991)CrossRefGoogle Scholar
  19. 19.
    Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Eugen Barbu
    • 1
  • Pierre Héroux
    • 1
  • Sébastien Adam
    • 1
  • Éric Trupin
    • 1
  1. 1.LITISUniversité de RouenSaint-Etienne du RouvrayFrance

Personalised recommendations