Myths and facts about the efficient implementation of finite automata and lexical analysis

  • Klaus Brouwer
  • Wolfgang Gellerich
  • Erhard Ploedereder
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1383)


Finite automata and their application in lexical analysis play an important role in many parts of computer science and particularly in compiler constructions. We measured 12 scanners using different implementation strategies and found that the execution time differed by a factor of 74. Our analysis of the algorithms as well as run-time statistics on cache misses and instruction frequency reveals substantive differences in code locality and certain kinds of overhead typical for specific implementation strategies. Some of the traditional statements on writing “fast” scanners could not be confirmed. Finally, we suggest an improved scanner generator.


Scanner Lexical Analysis Finite Automata Run-time Efficiency 


  1. 1.
    Ada 95 Reference Manual. Intermetrics, Inc., 1995. ANSI/ISO/IEC-8652:1995.Google Scholar
  2. 2.
    A.V. Aho and M.J. Corasick. Efficient String Matching: An Aid to Bibliographic Search. Communications of the ACM, 18(6):333–340, June 1975.Google Scholar
  3. 3.
    A.V. Aho, R. Sethi, and J.D. Ullman. Compilers. Addison-Wesley, 1986.Google Scholar
  4. 4.
    J. Barnes. Programming in Ada 95. Addison Wesley, 1995.Google Scholar
  5. 5.
    P. Bumbulis and D.D. Cowan. RE2C: A More Versatile Scanner Generator. ACM Letters on Programming Languages and Systems, 2(1-4):70–84, 1993.Google Scholar
  6. 6.
    R.J. Cichelli. Minimal Perfect Hash Functions Made Simple. Communications of the ACM, 23:17–19, 1980.Google Scholar
  7. 7.
    C.R. Cook and R.R: Oldehoeft. A Letter Oriented Minimal Perfect Hashing Function. ACM SIGPLAN Notices, 17(9):18–27, 1982.Google Scholar
  8. 8.
    Z.J. Czech and G. Havas. An optimal algorithm for generating minimal perfect hash functions. Information Processing Letters, 43(5):257–264, October 1992.Google Scholar
  9. 9.
    R. Dewar. (private communication).Google Scholar
  10. 10.
    C.W. Praser and D.R. Hanson. A Retargetable C Compiler. ACM SIGPLAN Notices, 26(10):29–43, October 1991.Google Scholar
  11. 11.
    C.W. Fraser and D.R. Hanson. A Retargetable C Compiler. Addison-Wesley, 1995.Google Scholar
  12. 12.
    Free Software Foundation, 59 Temple Place-Suite 330, Boston, MA 02111-1307 USA. Using and Porting GNU CC, 1995. (for GCC Version 2.7.2).Google Scholar
  13. 13.
    W. Gellerich, M. Kosiol, and E. Ploedereder. Where does goto go to? In Reliable Software Technologies-Ada-Europe 1996, volume 1088 of LNCS, pages 385–395. Springer, 1996. ( /ifi/ps/Gellerich/ ).Google Scholar
  14. 14.
    Gnu ada translator (gnat) documentation, 1995. (ftp Scholar
  15. 15.
    R.W. Gray, V. Heurig, S.P. Levi, A.M. Sloane, and W.M. Waite. Eli: A complete, flexible compiler construction system. CALM, 35(2):121–131, February 1992.Google Scholar
  16. 16.
    J. Grosch. Generators for High-Speed Front-Ends. In Compiler-Compilers and High-Speed Compilation, volume 371 of LNCS, pages 81–92. Springer, 1988.Google Scholar
  17. 17.
    J. Grosch. Selected Examples of Scanner Specifications. Technical Report 7, Gesellschaft fuer Mathematik und Datenverarbeitung mbH, 1988.Google Scholar
  18. 18.
    J. Grosch. Efficient Generation of Lexical Analysers. Software Practice and Experience, 19(11):1089–1103, November 1989.Google Scholar
  19. 19.
    J. Grosch. Rex-A Scanner Generator. Technical Report 5, Gesellschaft fuer Mathematik und Datenverarbeitung mbH, 1992.Google Scholar
  20. 20.
    G. Havas and B.S. Majewski. Graph Theoretic Obstacles to Perfect Hashing. Congressus Numerantium, 98:81–93, 1993.Google Scholar
  21. 21.
    Intel Corporation. Pentium Processor Family Developer's Manual, 1997.Google Scholar
  22. 22.
    SPARC International. SPARC Architecture Manual, Vers. 8. Prentice Hall, 1992.Google Scholar
  23. 23.
    W.L. Johnson, J.H. Porter, S.I. Ackley, and D.T. Ross. Automatic Generation of Efficient Lexical Processors Using Finite State Techniques. ACM SIGPLAN Notices, 11(8):805–813, December 1968.Google Scholar
  24. 24.
    D.W. Jones. How (Not) to Code a Finite State Machine. ACM SIGPLAN Notices, 23(8):19–22, 1988.Google Scholar
  25. 25.
    J.R. Levine, T. Mason, and D. Brown. lex & yacc. O'Reilly & Associates, Inc., Sebastopol, 2. edition, 1990.Google Scholar
  26. 26.
    SUN Microsystems. Solaris 2.3 Software Developer Answerbook, November 1993.Google Scholar
  27. 27.
    H. Moessenboeck. Alex-a simple and efficient scanner generator. ACM SIGPLAN Notices, 21(5):69–78, May 1986.Google Scholar
  28. 28.
    Vern Paxon. Flex, Version 2.5. University of California, Berkeley, March 1995.Google Scholar
  29. 29.
    U. Post. Gleitzeit-Performance Monitoring deckt Gleitkommanutzung auf. c't, pages 256–259, Sep 1997.Google Scholar
  30. 30.
    Ada version of REX. ( Scholar
  31. 31.
    J. Self. Aflex-An Ada Lexical Analyzer Generator. Technical Report UCI-90-18, University of California, Irvine, May 1990.Google Scholar
  32. 32.
    D. Szafron and R. Ng. LexAGen: An Interactive Incremental Scanner Generator. Software-Practice and Experience, 20(5):459–483, 1990.Google Scholar
  33. 33.
    W.F. Tichy, P. Lukowicz, Lutz Prechelt, and E.A. Heinz. Experimental Evaluation in Computer Science: A Quantitative Study. 01Journal of Systems and Software, 28(1):9–18, Januar 1995.Google Scholar
  34. 34.
    J.P. Tremblay and P.G. Sorenson. The Theory and Praxis of Compiler Writing. McGraw-Hill, 1985.Google Scholar
  35. 35.
    W.M. Waite. The Cost of Lexical Analysis. Software-Practice and Experience, 16(5):473–488, 1986.Google Scholar
  36. 36.
    D.L. Weaver and T. Germond. SPARC Architecture Manual, Version 9. Prentice Hall, 1994.Google Scholar
  37. 37.
    R. Wilhelm and D. Maurer. Compiler Design. Addison-WesleySpringer, 1995.Google Scholar
  38. 38.
    M. Withopf and A. Stiller. Durchgriff-Direkte Zugriffe unter Windows NT 4.0 und ein entfesselter Cyrix 6x86. c't, pages 312–315, Jan 1997.Google Scholar
  39. 39.
    D.A. Wolverton. A Perfect Hash Function for Ada Reserved Words. ACM Ada Letters, VI(1):40–44, July/August 1984.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Klaus Brouwer
    • 1
  • Wolfgang Gellerich
    • 1
  • Erhard Ploedereder
    • 1
  1. 1.Department of Computer ScienceUniversity of StuttgartStuttgartGermany

Personalised recommendations