Skip to main content

Scan-to-XML: Using Software Component Algebra for Intelligent Document Generation

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2390))

Abstract

The main objective of this paper is to experiment a new approach to develop a high level document analysis platform by composing existing components from a comprehensive library of state-of-the art algorithms. Starting from the observation that document analysis is conducted as a layered pipeline taking syntax as an input, and producing semantics as an output on each layer, we introduce the concept of a Component Algebra as an approach to integrate different existing document analysis algorithms in a coherent and self-containing manner. Based on xml for data representation and exchange on the one side, and on combined scripting and compiled libraries on the other side, our claim is that this approach can eventually lead to a universal representation for real world document analysis algorithms.

The test-case of this methodology consists in the realization of a fully automated method for generating a browsable, hyper-linked document from a simple scanned image. Our example is based on cutaway diagrams. Cutaway diagrams present the advantage of containing simple “browsing semantics”, in the sense that they consist of a clearly identifiable legend containing index references, plus a drawing containing one or more occurrences of the same indices.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. H. Anderson. Syntax directed recognition of hand-printed two-dimensional mathematics. In M. Klerer and J. Reinfelds, editors, Interactive Systems for Experimental Applied Mathematics. Academic Press, New York, 1968.

    Google Scholar 

  2. D. M. Beazley. Swig and automated C/C++ scripting extensions. Dr. Dobbs Journal, (282):30–36, February 1998.

    Google Scholar 

  3. D. Dori. A Syntactic/Geometric Approach to Recognition of Dimensions in Engineering Drawings. Computer Vision, Graphics and Image Processing, 47:271–291, 1989.

    Article  Google Scholar 

  4. Ph. Dosch, C. Ah-Soon, G. Masini, G. Sánchez, and K. Tombre. Design of an Integrated Environment for the Automated Analysis of Architectural Drawings. In S.-W. Lee and Y. Nakano, editors, Document Analysis Systems: Theory and Practice. Selected papers from Third IAPR Workshop, DAS’98, Nagano, Japan, November 4–6, 1998, in revised version, Lecture Notes in Computer Science 1655, pages 295–309. Springer-Verlag, Berlin, 1999.

    Google Scholar 

  5. Ph. Dosch, K. Tombre, C. Ah-Soon, and G. Masini. A complete system for analysis of architectural drawings. International Journal on Document Analysis and Recognition, 3(2):102–116, December 2000.

    Google Scholar 

  6. L. A. Fletcher and R. Kasturi. A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images. IEEE Transactions on PAMI, 10(6):910–918, 1988.

    Google Scholar 

  7. S. H. Joseph and T. P. Pridmore. Knowledge-Directed Interpretation of Mechanical Engineering Drawings. IEEE Transactions on PAMI, 14(9):928–940, September 1992.

    Google Scholar 

  8. B. Lamiroy, L. Najman, R. Ehrhard, C. Louis, F. Quélain, N. Rouyer, and N. Zeghache. Scan-to-XML for vector graphics: an experimental setup for intelligent browsable document generation. In Proceedings of Fourth IAPR International Workshop on Graphics Recognition, Kingston, Ontario, Canada, September 2001.

    Google Scholar 

  9. John K. Ousterhout. Scripting: Higher-level programming for the 21st century. Computer, 31(3):23–30, March 1998.

    Google Scholar 

  10. J.-G. Schneider and O. Nierstrasz. Components, scripts and glue. In J. Hall L. Barroca and P. Hall, editors, Software Architectures-Advances and Applications, pages 13–25. Springer, 1999.

    Google Scholar 

  11. M. Viswanathan. Analysis of Scanned Documents — a Syntactic Approach. In H. S. Baird, H. Bunke, and K. Yamamoto, editors, Structured Document Image Analysis, pages 115–136. Springer-Verlag, Heidelberg, 1992.

    Google Scholar 

  12. K. Y. Wong, R. G. Casey, and F. M. Wahl. Document analysis system. IBM J. Res. Develop., 26(2):647–656, 1982.

    Article  Google Scholar 

  13. Extensible markup language (xml) 1.0 (second edition). Technical report, w3c, 2000. http://www.w3.org/TR/2000/REC-xml-20001006.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lamiroy, B., Najman, L. (2002). Scan-to-XML: Using Software Component Algebra for Intelligent Document Generation. In: Blostein, D., Kwon, YB. (eds) Graphics Recognition Algorithms and Applications. GREC 2001. Lecture Notes in Computer Science, vol 2390. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45868-9_18

Download citation

  • DOI: https://doi.org/10.1007/3-540-45868-9_18

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44066-6

  • Online ISBN: 978-3-540-45868-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics