Donate | Contact | AMDG

PARSER GENERATOR
Home Information Customers Support About
Home LRSTAR DFASTAR
 DFASTAR Lexer Generator 1.0


Description
DFASTAR is a DFA lexer generator, which reads a lexical grammar (with regular expressions) and creates a DFA state machine in C/C++ which provides the fastest type of lexical analyzer.  When compiled, the lexers can recognize the tokens defined by your lexical grammar at a very-high speed.  The input to DFASTAR is LBNF grammar notation, which has the power of an LALR(1) grammar plus the addition of regular expressions in the right side of rules. 

Very-Fast Lexers
With the 'ts' option, DFASTAR creates small table-driven lexers, similar to FLEX but 2 times the speed of FLEX lexers.  With the 'c' option, DFASTAR creates direct-code lexers, similar to RE2C but 1% faster.  For more information about the difference between table-driven and direct-code lexers, see below.  For this test, all 4 lexers were created from the same C-language lexical specification and the input file was a very large C program. 

Small Lexers
With the 'ts' option, DFASTAR created a small table-driven lexer for the C language that is almost as small as the lexer created by FLEX.  With the 'c' option, DFASTAR created a direct-code lexer that is about 25% smaller than the lexer created by RE2C.  For this test, I used a C-language lexical specification which included all 36 keywords of the C language, the same as the speed test.  See the chart on the right.

Generation & Build Time
For the lexer-build-time test, I used a different lexical specification.  I used the one for the DB2 language, which has 550 keywords.  This is one of the largest computer languages I have encountered and it is a good stress test for lexer generators.  With the 'ts' option, the build time was 5 seconds, a little faster than FLEX lexers.  With the 'c' option the build time was 29 seconds, which was about 25% faster than RE2C lexers. 

Table-Driven Lexers
DFASTAR with the 'ts' option, creates small table-driven lexers, similar to those created by FLEX.  However, DFASTAR lexers are about 2 times the speed of lexers created by FLEX.  DFASTAR lexers are faster because they use a compressed-matrix data structure but they have the same small size of FLEX lexers.  Table-driven lexers are preferrable to direct-code lexers because they are more scalable and compile faster.  DFASTAR can create a lexer for a 250,000 word dictionary that compiles and runs very fast, whereas direct-code lexers cannot be compiled if the number of words exceeds 4,000.  For the DB2 programming language (with 550 keywords), the build time (compile and link) is only 5 seconds, compared to 29 seconds for a direct-code lexer.

Direct-Code Lexers
DFASTAR with the 'c' option, creates direct-code lexers, similar to those created by RE2C.  However, the DFASTAR lexer, in our C-language test case, was 1% faster, 25% smaller and compiled 25% faster.  Testing of both types of DFASTAR lexers has shown that direct-code lexers are 19% faster than table-driven lexers, if you need a line counter, and 11% faster if you don't need a line counter.  In a compiler front end, where the lexical analysis time is typically 20% or less, the direct-code lexer would be only 4% faster.  The big disadvantage of direct-code lexers is a much longer build time (compile and link), 29 seconds for the DB2 programming language (with 550 keywords).

Keywords and Identifiers
DFASTAR lexers can recognize keywords and identifiers, simultaneously.  This is faster than classifying all words as identifiers and doing a symbol-table lookup to discover that an identifier is a actually a keyword.  Note, you do not have to be concerned about the order of the rules in your lexical grammar, because DFASTAR is smart enough to figure out the difference between a keyword and an identifier, if the keywords are listed in the grammar.

Advanced Regular Expressions
DFASTAR reads an advanced lexical notation which is a combination of BNF grammar rules and regular expressions.  This permits more readable lexical grammars.  DFASTAR originated in the world of compiler construction tools and does a thorough job of checking for errors.  One nice feature is that you will be warned about ambiguities that other tools may not detect.

Error Messages in Visual Studio
One of the nice features of DFASTAR is that error messages displayed in Visual Studio provide the file name and line number of the error.  A double-click takes you to the error.

Test Information
All tests were done on a Dell Dimension 3000 desktop computer with a 3 GHz Pentium CPU and 2 GB of RAM.  Visual Studio C/C++ 2008 was used for compiling and linking with optimizations for speed.  The speed was measured for the lexer processing time in memory only and does not include the time required to load the large input file.  The C-language lexical grammar contains the 36 keywords of the C language.  This means that the lexers were recognizing both identifiers and keywords (no symbol table was used). 

 
© Copyright 2010 Paul B Mann