antlr - ANother Tool for Language Recognition
antlr [options] grammar_files
Antlr converts an extended form of context-free grammar
into a set of C functions which directly implement an efficient form of
deterministic recursive-descent LL(k) parser. Context-free grammars may be
augmented with predicates to allow semantics to influence parsing; this
allows a form of context-sensitive parsing. Selective backtracking is also
available to handle non-LL(k) and even non-LALR(k) constructs. Antlr
also produces a definition of a lexer which can be automatically converted
into C code for a DFA-based lexer by dlg. Hence, antlr serves
a function much like that of yacc, however, it is notably more
flexible and is more integrated with a lexer generator (antlr
directly generates dlg code, whereas yacc and lex are
given independent descriptions). Unlike yacc which accepts LALR(1)
grammars, antlr accepts LL(k) grammars in an extended BNF notation
— which eliminates the need for precedence rules.
Like yacc grammars, antlr grammars can use
automatically-maintained symbol attribute values referenced as dollar
variables. Further, because antlr generates top-down parsers,
arbitrary values may be inherited from parent rules (passed like function
parameters). Antlr also has a mechanism for creating and manipulating
abstract-syntax-trees.
There are various other niceties in antlr, including the
ability to spread one grammar over multiple files or even multiple grammars
in a single file, the ability to generate a version of the grammar with
actions stripped out (for documentation purposes), and lots more.
- -ck n
- Use up to n symbols of lookahead when using compressed (linear
approximation) lookahead. This type of lookahead is very cheap to compute
and is attempted before full LL(k) lookahead, which is of exponential
complexity in the worst case. In general, the compressed lookahead can be
much deeper (e.g, -ck 10) than the full lookahead
(which usually must be less than 4).
- -CC
- Generate C++ output from both ANTLR and DLG.
- -cr
- Generate a cross-reference for all rules. For each rule, print a list of
all other rules that reference it.
- -e1
- Ambiguities/errors shown in low detail (default).
- -e2
- Ambiguities/errors shown in more detail.
- -e3
- Ambiguities/errors shown in excruciating detail.
- -fe file
- Rename err.c to file.
- -fh file
- Rename stdpccts.h header (turns on -gh) to file.
- -fl file
- Rename lexical output, parser.dlg, to file.
- -fm file
- Rename file with lexical mode definitions, mode.h, to file.
- -fr file
- Rename file which remaps globally visible symbols, remap.h, to
file.
- -ft file
- Rename tokens.h to file.
- -ga
- Generate ANSI-compatible code (default case). This has not been rigorously
tested to be ANSI XJ11 C compliant, but it is close. The normal output of
antlr is currently compilable under both K&R, ANSI C, and
C++—this option does nothing because antlr generates a bunch
of #ifdef's to do the right thing depending on the language.
- -gc
- Indicates that antlr should generate no C code, i.e., only perform
analysis on the grammar.
- -gd
- C code is inserted in each of the antlr generated parsing functions
to provide for user-defined handling of a detailed parse trace. The
inserted code consists of calls to the user-supplied macros or functions
called zzTRACEIN and zzTRACEOUT. The only argument is a
char * pointing to a C-style string which is the grammar rule
recognized by the current parsing function. If no definition is given for
the trace functions, upon rule entry and exit, a message will be printed
indicating that a particular rule as been entered or exited.
- -ge
- Generate an error class for each non-terminal.
- -gh
- Generate stdpccts.h for non-ANTLR-generated files to include. This
file contains all defines needed to describe the type of parser generated
by antlr (e.g. how much lookahead is used and whether or not trees
are constructed) and contains the header action specified by the
user.
- -gk
- Generate parsers that delay lookahead fetches until needed. Without this
option, antlr generates parsers which always have k tokens
of lookahead available.
- -gl
- Generate line info about grammar actions in C parser of the form
# line "file"
which makes error messages from the C/C++ compiler make more sense as they
will point into the grammar file not the resulting C file. Debugging is
easier as well, because you will step through the grammar not C file.
- -gs
- Do not generate sets for token expression lists; instead generate a
||-separated sequence of LA(1)==token_number. The
default is to generate sets.
- -gt
- Generate code for Abstract-Syntax Trees.
- -gx
- Do not create the lexical analyzer files (dlg-related). This option should
be given when the user wishes to provide a customized lexical analyzer. It
may also be used in make scripts to cause only the parser to be
rebuilt when a change not affecting the lexical structure is made to the
input grammars.
- -k n
- Set k of LL(k) to n; i.e. set tokens of look-ahead
(default==1).
- -o dir
- Directory where output files should go (default="."). This is
very nice for keeping the source directory clear of ANTLR and DLG
spawn.
- -p
- The complete grammar, collected from all input grammar files and stripped
of all comments and embedded actions, is listed to stdout. This is
intended to aid in viewing the entire grammar as a whole and to eliminate
the need to keep actions concisely stated so that the grammar is easier to
read. Hence, it is preferable to embed even complex actions directly in
the grammar, rather than to call them as subroutines, since the subroutine
call overhead will be saved.
- -pa
- This option is the same as -p except that the output is annotated
with the first sets determined from grammar analysis.
- -prc on
- Turn on the computation and hoisting of predicate context.
- -prc off
- Turn off the computation and hoisting of predicate context. This option
makes 1.10 behave like the 1.06 release with option -pr on. Context
computation is off by default.
- -rl n
- Limit the maximum number of tree nodes used by grammar analysis to
n. Occasionally, antlr is unable to analyze a grammar
submitted by the user. This rare situation can only occur when the grammar
is large and the amount of lookahead is greater than one. A nonlinear
analysis algorithm is used by PCCTS to handle the general case of LL(k)
parsing. The average complexity of analysis, however, is near linear due
to some fancy footwork in the implementation which reduces the number of
calls to the full LL(k) algorithm. An error message will be displayed, if
this limit is reached, which indicates the grammar construct being
analyzed when antlr hit a non-linearity. Use this option if
antlr seems to go out to lunch and your disk start thrashing; try
n=10000 to start. Once the offending construct has been identified,
try to remove the ambiguity that antlr was trying to overcome with
large lookahead analysis. The introduction of (...)? backtracking blocks
eliminates some of these problems — antlr does not
analyze alternatives that begin with (...)? (it simply backtracks, if
necessary, at run time).
- -w1
- Set low warning level. Do not warn if semantic predicates and/or (...)?
blocks are assumed to cover ambiguous alternatives.
- -w2
- Ambiguous parsing decisions yield warnings even if semantic predicates or
(...)? blocks are used. Warn if predicate context computed and semantic
predicates incompletely disambiguate alternative productions.
- -
- Read grammar from standard input and generate stdin.c as the parser
file.
Antlr works... we think. There is no implicit guarantee of
anything. We reserve no legal rights to the software known as the
Purdue Compiler Construction Tool Set (PCCTS) — PCCTS is in the
public domain. An individual or company may do whatever they wish with
source code distributed with PCCTS or the code generated by PCCTS, including
the incorporation of PCCTS, or its output, into commercial software. We
encourage users to develop software with PCCTS. However, we do ask that
credit is given to us for developing PCCTS. By "credit", we mean
that if you incorporate our source code into one of your programs
(commercial product, research project, or otherwise) that you acknowledge
this fact somewhere in the documentation, research report, etc... If you
like PCCTS and have developed a nice tool with the output, please mention
that you developed it using PCCTS. As long as these guidelines are followed,
we expect to continue enhancing this system and expect to make other tools
available as they are completed.
- *.c
- output C parser.
- *.cpp
- output C++ parser when C++ mode is used.
- parser.dlg
- output dlg lexical analyzer.
- err.c
- token string array, error sets and error support routines. Not used in C++
mode.
- remap.h
- file that redefines all globally visible parser symbols. The use of the
#parser directive creates this file. Not used in C++ mode.
- stdpccts.h
- list of definitions needed by C files, not generated by PCCTS, that
reference PCCTS objects. This is not generated by default. Not used in C++
mode.
- tokens.h
- output #defines for tokens used and function prototypes for
functions generated for rules.