bisonc++input - Organization of bisonc++’s grammar
file(s)
Bisonc++ derives from bison++(1), originally derived
from bison(1). Like these programs bisonc++ generates a parser
for an LALR(1) grammar. Bisonc++ generates C++ code: an
expandable C++ class.
Refer to bisonc++(1) for a general overview. This manual
page covers the structure and organization of bisonc++’s
grammar file(s).
Bisonc++’s grammar file has the following generic
outline:
directives (see the next section)
%%
grammar rules
Grammar rules have the following generic form:
nonterminal:
production-rules
;
Production rules consist of zero or more sequences of terminal
tokens, nonterminal tokens and/or action blocks. When multiple production
rules are used they must be separated from each other by vertical bars.
Action blocks are C++ compound statements.
This manual page contains the following sections:
- o
- DESCRIPTION: this section;
- o
- DIRECTIVES: bisonc++’s grammar-specification
directives;
- o
- POLYMORPHIC SEMANTIC VALUES: how to use polymorphic semantic values
in parsers generated by bisonc++;
- o
- DOLLAR NOTATIONS: available $-shorthand notations with single,
union, and polymorphic semantic value types.
- o
- RESTRICTIONS ON TOKEN NAMES: name restrictions for user-defined
symbols;
- o
- OBSOLETE SYMBOLS: symbols available to bison(1), but not to
bisonc++;
- o
- USING SYMBOLIC TOKENS IN CLASSES OTHER THAN THE PARSER CLASS; how
to refer to tokens defined in the grammar;
- o
- EXAMPLE: an example of using bisonc++;
- o
- SEE ALSO: references to other programs and documentation;
- o
- AUTHOR: at the end of this man-page.
Starting with version 6.02.00 bisonc++ reserved identifiers
no longer end in two underscore characters, but in one. This modification
was necessary because according to the C++ standard identifiers
having two or more consecutive underscore characters are reserved by the
language. In practice this could require some minor modifications of
existing source files using bisonc++’s facilities, most likely
limited to changing Tokens__ into Tokens_ and changing
Meta__ into Meta_.
The complete list of affected names is:
- Enums:
DebugMode_, ErrorRecovery_, Return_, Tag_, Tokens_
- Enums values:
PARSE_ABORT_, PARSE_ACCEPT_, UNEXPECTED_TOKEN_,
sizeofTag_
- Type / namespace
designators:
Meta_, PI_, STYPE_
- Member functions:
clearin_, errorRecovery_, errorVerbose_, executeAction_,
lex_, lookup_, nextCycle_, nextToken_, popToken_, pop_, print_, pushToken_,
push_, recovery_, redoToken_, reduce_, savedToken_, shift_, stackSize_,
startRecovery_, state_, token_, top_, vs_,
- Protected data
members:
d_acceptedTokens_, d_actionCases_, d_debug_, d_nErrors_,
d_requiredTokens_, d_val_, idOfTag_, s_nErrors_
Quite a few directives can be specified in the initial section of
the grammar specification file. If command-line options for directives are
available, then their specifications take precedence over the corresponding
directives in the grammar file. Once class header or implementation header
files exist directives affecting those files are ignored.
Directives accepting a `filename’ do not accept path names,
i.e., they cannot contain directory separators (/); directives
accepting a ’pathname’ may contain directory separators. A
’pathname’ using blank characters should be surrounded by
double quotes.
Some directives may generate errors. This happens when their
specifications conflict with the contents of files bisonc++ cannot
modify (e.g., a parser class header file exists, but doesn’t define a
namespace, but in a later run the a %namespace directive was
provided).
To resolve such errors the offending directive could be omitted,
the existing file could be removed, or the existing file could be
hand-edited according to the directive’s specification.
- o
- %baseclass-header filename
- Filename defines the name of the file to contain the
parser’s base class. This class defines, e.g., the parser’s
symbolic tokens. Defaults to the name of the parser class plus the suffix
base.h. This directive is overruled by the
--baseclass-header (-b) command-line option.
- It is an error if this directive is used and an already existing parser
class header file does not contain #include
"filename".
- o
- %baseclass-preinclude pathname
- Pathname defines the path to the file preincluded by the
parser’s base-class header. See the description of the
--baseclass-preinclude option for details about this directive. By
default, bisonc++ surrounds header by double quotes.
However, when header itself is surrounded by pointed brackets
#include <header> is included.
- o
- %class-header filename
- Filename defines the name of the file to contain the parser class.
Defaults to the name of the parser class plus the suffix .h This
directive is overruled by the --class-header (-c)
command-line option.
- It is an error if this directive is used and an already existing
implementation header file does not contain #include
"filename".
- o
- %class-name parser-class-name
- Declares the name of the parser class. It defines the name of the
C++ class that is generated. If no %class-name is specified
the default class name Parser is used.
- It is an error if this directive is used and an already existing
parser-class header file does not define class
`className’ and/or if an already existing implementation
header file does not define members of the class
`className’.
- o
- %debug
- Add debugging code to the generated parse and its support
functions, which can show (on the standard output stream) the steps
performed by the parsing function while it parses input streams. When this
directive is specified then the parsing steps are shown by default. The
setDebug members can be used to suppress outputting these parsing
steps. #ifdef DEBUG macros are not used. Existing debugging code
can be removed by rerunning bisonc++ without specifying the
debug option or directive.
- o
- %default-actions(d)(off|quiet|warn|std)
- By default, bisonc++ adds a $$ = $1 action block to rules
not having final action blocks, but not to empty production rules. This
default behavior can also explicitly be configured using the
default-actions std option or directive.
- Bisonc++ also supports alternate ways of handling rules not having
final action blocks. When off is specified, bisonc++ does
not add $$ = $1 action blocks; when polymorphic semantic values are
used, then specifying
- - warn adds specialized action blocks, using the semantic types of
the first elements of the production rules, while issuing a warning;
- - quiet adds these action blocks without issuing warnings.
- When either warn or quiet are specified the types of $$ and
$1 must match. When bisonc++ detects a type mismatches it issues
errors.
- o
- %error-verbose
- This directive can be specified to dump the parser’s state stack to
the standard output stream when the parser encounters a syntactic error.
The stack dump shows on separate lines a stack index followed by the state
stored at the indicated stack element. The first stack element is the
stack’s top element.
- o
- %expect number
- This directive specifies the exact number of shift/reduce and
reduce/reduce conflicts for which no warnings are to be generated. Details
of the conflicts are reported in the verbose output file (e.g.,
grammar.output). If the number of actually encountered conflicts
deviates from `number’, then this directive is ignored.
- o
- %filenames filename
- Filename is a generic filename that is used for all header files
generated by bisonc++. Options defining specific filenames are also
available (which then, in turn, overrule the name specified by this
directive). This directive is overruled by the --filenames
(-f) command-line option.
- o
- %flex
- When provided, the scanner member returning the matched text is called as
d_scanner.YYText(), and the scanner member returning the next
lexical token is called as d_scanner.yylex(). This directive is
only interpreted if the %scanner directive is also provided.
- o
- %implementation-header filename
- Filename defines the name of the file to contain the implementation
header. It defaults to the name of the generated parser class plus the
suffix .ih.
- The implementation header should contain all directives and declarations
that are only used by the parser’s member functions. It is
the only header file that is included by the source file containing
parse’s implementation. User defined implementation of other
class members may use the same convention, thus concentrating all
directives and declarations that are required for the compilation of other
source files belonging to the parser class in one header file.
- o
- %include pathname
- This directive is used to switch to pathname while processing a
grammar specification. Unless pathname defines an absolute
file-path, pathname is searched relative to the location of
bisonc++’s main grammar specification file (i.e., the
grammar file that was specified as bisonc++’s command-line
option). This directive can be used to split long grammar specification
files in shorter, meaningful units. After processing pathname
processing continues beyond the %include pathname directive.
- o
- %left terminal ...
- Defines the names of symbolic terminal tokens that must be treated as
left-associative. I.e., in case of a shift/reduce conflict, a reduction is
preferred over a shift. Sequences of %left, %nonassoc,
%right and %token directives may be used to define the
precedence of operators. In expressions, the first used directive defines
the tokens having the lowest precedence, the last used defines the tokens
having the highest priority. See also %token below.
- o
- %locationstruct struct-definition
- Defines the organization of the location-struct data type LTYPE_.
This struct should be specified analogously to the way the parser’s
stacktype is defined using %union (see below). The location struct
is named LTYPE_. By default (if neither locationstruct nor
LTYPE_ is specified) the standard location struct (see the next
directive) is used:
- o
- %lsp-needed
- This directive results in bisonc++ generating a parser using the
standard location stack. This stack’s default type is:
struct LTYPE_
{
int timestamp;
int first_line;
int first_column;
int last_line;
int last_column;
char *text;
};
Bisonc++ does not provide the elements of the LTYPE_
struct with values. Action blocks of production rules may refer to the
location stack element associated with a production element using @
variables, like @1.timestamp, @3.text, @5. The rule’s
location struct itself may be referred to as either d_loc_ or
@@.
- o
- %ltype typename
- Specifies a user-defined token location type. If %ltype is used,
typename should be the name of an alternate (predefined) type
(e.g., size_t). It should not be used if a %locationstruct
specification is defined (see below). Within the parser class, this type
is available as the type `LTYPE_’. All text on the line
following %ltype is used for the typename specification. It
should therefore not contain comment or any other characters that are not
part of the actual type definition.
- o
- %namespace namespace
- Define all of the code generated by bisonc++ in the namespace
namespace. By default no namespace is defined. If this directive is
used the implementation header is provided with a commented out using
namespace declaration for the specified namespace. In addition, the
parser and parser base class header files also use the specified namespace
to define their include guard directives.
- It is an error if this directive is used and an already existing
parser-class header file and/or implementation header file does not define
namespace identifier.
- o
- %negative-dollar-indices
- Do not generate warnings when zero- or negative dollar-indices are used in
the grammar’s action blocks. Zero or negative dollar-indices are
commonly used to implement inherited attributes, and should normally be
avoided. When used, they can be specified like $-1, or like
$<type>-1, where type is empty; an STYPE_ tag;
or a field-name. However, note that in combination with the
%polymorphic directive (see below) only the $-i format can
be used.
- o
- %no-lines
- By default #line preprocessor directives are inserted just before
action statements in the file containing the parser’s parse
function. These directives are suppressed by the %no-lines
directive.
- o
- %nonassoc terminal ...
- Defines the names of symbolic terminal tokens that should be treated as
non-associative. I.e., in case of a shift/reduce conflict, a reduction is
preferred over a shift. Sequences of %left, %nonassoc, %right and
%token directives may be used to define the precedence of
operators. In expressions, the first used directive defines the tokens
having the lowest precedence, the last used defines the tokens having the
highest priority. See also %token below.
- o
- %parsefun-source filename
- Filename defines the name of the file to contain the parser member
function parse. Defaults to parse.cc. This directive is
overruled by the --parse-source (-p) command-line
option.
- o
- %polymorphic polymorphic-specification(s)
- Bison’s traditional way of handling multiple semantic values is to
use a %union specification (see below). Although %union is
supported by bisonc++, a polymorphic semantic value class is
preferred due to its improved type safety.
- The %polymorphic directive defines a polymorphic semantic value
class and can be used instead of a %union specification. Refer to
section POLYMORPHIC SEMANTIC VALUES below or to
bisonc++’s user manual for a detailed description of the
specification, characteristics, and use of polymorphic semantic
values.
- o
- %prec token
- Defines the precedence of a (non-empty) production rule. By default,
production rules have priorities that are equal to the priorities of their
first terminal tokens, or they receive the maximum possible priority if
they don’t contain terminal tokens. To change a production
rule’s default priority the %prec directive is used, which
assigns the directive’s token’s priority to the production
rule’s priority. A well known application of %prec is:
expression:
’-’ expression %prec UMINUS
{
...
}
Here, the default priority and precedence of the `-’ token as
the subtraction operator is overruled by the precedence and priority of
the UMINUS token, which is commonly defined as
%right UMINUS
(see below) following, e.g., the ’*’ and
’/’ operators.
- Refer to bisonc++’s user manual for a more elaborate
coverage of the %prec directive.
- o
- %print-tokens
- The print directive provides an implementation of the Parser
class’s print_ function displaying the current token value
and the text matched by the lexical scanner as received by the generated
parse function.
- o
- %prompt
- When adding debugging code (using the debug option or directive)
the debug information is displayed continuously while the parser processes
its input. When using the prompt directive the generated parser
displays a prompt (a question mark) at each step of the parsing process.
Caveat: when using this option the parser’s input cannot be
provided at the parser’s standard input stream.
- o
- %required-tokens number
- Following a syntactic error, require at least number successfully
processed tokens before another syntactic error can be reported. By
default number is zero.
- o
- %right terminal ...
- Defines the names of symbolic terminal tokens that should be treated as
right-associative. I.e., in case of a shift/reduce conflict, a shift is
preferred over a reduction. Sequences of %left, %nonassoc, %right
and %token directives may be used to define the precedence of
operators. In expressions, the first used directive defines the tokens
having the lowest precedence, the last used defines the tokens having the
highest priority. See also %token below.
- o
- %scanner pathname
- Use pathname as the path name to the file pre-included in the
parser’s class header. See the description of the --scanner
option for details about this directive. Similar to the convention adopted
for this argument, pathname by default is surrounded by double
quotes. However, when the argument is surrounded by pointed brackets
#include <pathname> is included. This directive results in
the definition of a composed Scanner d_scanner data member
into the generated parser, and in the definition of a int lex()
member, returning d_scanner.lex().
- By specifying the %flex directive the function
d_scanner.yylex() is called. Any other function to call can be
specified using the --scanner-token-function option (or
%scanner-token-function directive).
- It is an error if this directive is used and an already existing parser
class header file does not include `pathname’.
- o
- %scanner-class-name scannerClassName
- Defines the name of the scanner class, declared by the pathname
header file that is specified at the scanner option or directive.
By default the class name Scanner is used.
- It is an error if this directive is used and either the scanner
directive was not provided, or the parser class interface in an already
existing parser class header file does not declare a scanner class
d_scanner object.
- o
- %scanner-matched-text-function function-call
- The scanner function returning the text that was matched by the lexical
scanner after its token function (see below) has returned. A complete
function call expression should be provided (including a scanner object,
if used). Example:
%scanner-matched-text-function myScanner.matchedText()
By specifying the %flex directive the function
d_scanner.YYText() is called.
- If the function call contains white space scanner-token-function
should be surrounded by double quotes.
- o
- %scanner-token-function function-call
- The scanner function returning the next token, called from the generated
parser’s lex function. A complete function call expression
should be provided (including a scanner object, if used). Example:
%scanner-token-function d_scanner.lex()
If the function call contains white space scanner-token-function
should be surrounded by double quotes.
- It is an error if this directive is used and the scanner token function is
not called from the code in an already existing implementation
header.
- o
- %stack-expansion size Defines the number of elements to be
added to the generated parser’s semantic value stack when it must
be enlarged. By default 10 elements are added to the stack. This
option/directive is interpreted only once, and only if size at
least equals the default stack expansion size of 10.
- o
- %start nonterminal
- The nonterminal nonterminal should be used as the grammar’s
start-symbol. If omitted, the first grammatical rule is used as the
grammar’s starting rule. All syntactically correct sentences must
be derivable from this starting rule.
- o
- %stype typename
- The type of the semantic value of nonterminal tokens. By default it is
int. %stype, %union, and %polymorphic are mutually
exclusive directives.
- Within the parser class, the semantic value type is available as the type
`STYPE_’. All text on the line following %stype is
used for the typename specification. It should therefore not
contain comment or any other characters that are not part of the actual
type definition.
- o
- %tag-mismatches on|off
- This directive is only interpreted when polymorphic semantic values are
used. When on is specified (which is used by default) the
parse member of the generated parser dynamically checks that the
tag that is used when calling a semantic value’s get member
matches the actual tag of the semantic value.
- If a mismatch is observed, then the parsing function aborts after
displaying a fatal error message. If this happens, and if the
option/directive debug was specified when bisonc++ created
the parser’s parsing function, then the program can be rerun,
specifying parser.setDebug(Parser::ACTIONCASES) before calling the
parsing function. As a result the case-entry numbers of the switch,
defined in the parser’s executeAction member, are inserted
into the standard output stream. The action case number reported just
before the program displays the fatal error message tells you in which of
the grammar’s action block the error was encountered.
- o
- %target-directory pathname
- Pathname defines the directory where generated files should be
written. By default this is the directory where bisonc++ is called.
This directive is overruled by the --target-directory command-line
option.
- o
- %thread-safe
- Only used with polymorphic semantic values, and then only required when
the parser is used in multiple threads: it ensures that each
thread’s polymorphic code only accesses its own parser’s
error counting variable.
- o
- %token terminal ...
- Defines the names of symbolic terminal tokens. Sequences of %left,
%nonassoc, %right and %token directives may be used to define
the precedence of operators. In expressions, the first used directive
defines the tokens having the lowest precedence, the last used defines the
tokens having the highest priority. See also %token below.
- NOTE: Symbolic tokens are defined as enum-values in the
parser’s base class. The names of symbolic tokens may not be equal
to the names of the members and types defined by bisonc++ itself
(see the next sections). This requirement is not enforced by
bisonc++, but compilation errors may result if this requirement is
violated.
- o
- %token-class classname
- Classname defines the name of the Tokens class that is
defined when the %token-path directive or option (see below) is
specified. If token-path isn’t specified then this directive
is ignored. By default the class name Tokens is used.
- o
- %token-namespace namespace
- If token-path is specified (see below) then namespace
defines the namespace of the Tokens class. By default no namespace
is used.
- o
- %token-path pathname
- Pathname defines the path name of the file to contain the struct
Tokens defining the enumeration Tokens_ containing the symbolic
tokens of the generated grammar. If this option is specified the
ParserBase class is derived from it, thus making the tokens
available to the generated parser class. The name of the struct
Tokens can be altered using the token-class directive or
option. By default (if token_path is not specified) the tokens are
defined as the enum Tokens_ in the ParserBase class. If
pathname doesn’t exist it is created by bisonc++. If
the file pathname already exists it is rewritten at each new run of
bisonc++.
- o
- %type <type> nonterminal ...
- In combination with %polymorphic or %union: associate the
semantic value of a nonterminal symbol with a polymorphic semantic value
tag or union field defined by these directives.
- o
- %union union-definition
- Acts identically to the identically named bison and bison++
declaration. Bisonc++ generates a union, named STYPE_, as
its semantic type.
- o
- %weak-tags
- This directive is ignored unless the %polymorphic directive was
specified. It results in the declaration of enum Tag_ rather
than enum class Tag_. When in doubt, don’t use this
directive.
Like bison(1), bisonc++ by default uses int
semantic values, and also supports the %stype and %union
directives for using single-type or traditional C-type unions as
semantic values. These types of semantic values are covered in
bisonc++’s manual.
In addition, the %polymorphic directive can be specified to
generate a parser using `polymorphic’ semantic values. In this case
semantic values are specified as pairs, consisting of tags (which are
C++ identifiers), and C++ (pointer or value) type names. Tags
and type names are separated by colons. Multiple tag and type name
combinations are separated by semicolons, and an optional semicolon ends the
final tag/type pair.
Here is an example, defining three semantic values: an int,
a std::string and a std::vector<double>:
%polymorphic INT: int; STRING: std::string;
VECT: std::vector<double>
The identifier to the left of the colon is called the tag-identifier (or
simply tag), and the type name to the right of the colon is called the
type-name. Starting with bisonc++ version 4.12.00 the types no
longer have to provide default constructors.
When polymorphic type-names refer to types that have not yet been
declared by the parser’s base class header, then these types must be
(directly or indirectly) declared in a header file whose location is
specified using the %baseclass-preinclude directive.
%type directives are used to associate (non-)terminals with
semantic value types. E.g., after:
%polymorphic INT: int; TEXT: std::string
%type <INT> expr
the expr nonterminal returns int semantic values. In a rule like:
expr:
expr ’+’ expr
{
// Action block: C++ statements here.
}
symbols $$, $1, and $3 represent int values, and can be
used that way in the C++ action block.
Definitions and declarations
The %polymorphic directive adds the following definitions
and declarations to the generated base class header and parser source file
(if the %namespace directive was used then all declared/defined
elements are placed inside the namespace that is specified by the
%namespace directive):
- o
- All semantic value type identifiers are collected in a strongly typed
`Tag_’ enumeration. E.g.,
enum class Tag_
{
INT,
STRING,
VECT
};
- o
- An anonymous enum defining the symbolic constant sizeofTag_
equal to the number of tags in the Tag_ enumeration.
- o
- The namespace Meta_ contains almost all of the code implementing
polymorphic values.
The namespace Meta_ contains, among other classes the class
SType. The parser’s semantic value type STYPE_ is equal
to Meta_::SType.
STYPE_ equals Meta_::SType
Meta_::SType provides the standard user interface for using
polymorphic semantic data types. It declares the following public
interface:
- o
- Constructors: Default, copy and move constructors. No data can be
retrieved from SType objects that were constructed by
SType’s default constructors, but they can accept values of
defined polymorphic types, which may then be retrieved from those
objects.
- o
- Operators: The standard overloaded assignment operators (copy and move
assignment operators) are available.
- In addition the members
SType &operator=(Type const &value)
and
SType &operator=(Type &&tmp)
are defined for each of the polymorphic semantic value types. Up to version
6.03.00 these members were defined as member templates, but sometimes
awkward compilation errors were encountered as with member templates
Type must exactly match one of the defined polymorphic semantic
types since Type is used to determine the appropriate
Meta_::Tag_ value. As a consequence, if, e.g., a polymorphic type
%polymorphic INT: int is defined then an assignment like $$
= true fails, since the inferred type is bool and no
matching polymorphic type is available. Now that the assignment operators
are defined as plain member functions this problem isn’t
encountered anymore because standard type conversions may then be applied
by the compiler. Note that ambiguities may still be encountered. If, e.g.,
polymorphic types are defined for int and char and an
expression like $$ = 30U is used the compiler cannot tell whether
$$ refers to the int or to the char semantic value. A
standard (static) cast, or explicitly calling the assign member
(see the next item) solves these kind of ambiguities.
- When operator=(Type const &value) is used, the left-hand side
SType object receives a copy of value; when
operator=(Type &&tmp) is used, tmp is
move-assigned to the left-hand side SType object;
- o
- void assign<tag>(Args &&...args) The tag
template argument must be a Tag_ value. This member function
constructs a semantic value of the type matching tag from the
arguments that are passed to this member (zero arguments are OK if the
type associated with tag supports default construction). The
constructed value (not a copy of this value) is then stored in the
STYPE_ object for which assign has been called.
- As a Meta_::Tag_ value must be specified when using assign
the compiler can use the explicit tag to convert assign’s
arguments to an SType object of the type matching the specified
tag.
- The member assign can be used to store a specific polymorphic
semantic value in an STYPE_ object. It differs from the set of
operator=(Type) members in that assign accepts multiple
arguments to construct the requested SType value from, whereas the
operator= members only accept single arguments of defined
polymorphic types.
- To initialize an STYPE_ object with a default STYPE_ value,
direct assignment can be used (e.g., d_lval_ = STYPE_{}). To assign
a semantic value to a production rule using assign the _$$
notation must be used, as $$ is interpreted as the polymorphic
value type that is associated with the production rule:
_$$.assign<Tag_::CHAR>(30U);
- o
- DataType &get<tag>(), and DataType const
&get<tag>() const These members return references to the
object’s semantic values. The tag must be a Tag_
value: its specification tells the compiler which semantic value type it
must use.
- When the option/directive tag-mismatches on was specified then
get, when called from the generated parse function, performs
a run-time check to confirm that the specified tag corresponds to
object’s actual Tag_ value. If a mismatch is observed, then
the parsing function aborts with a fatal error message. When shorthand
notations (like $$ and $1) are used in production
rules’ action blocks, then bisonc++ can determine the
correct tag, preventing the run-time check from failing.
- But once a fatal error is encountered, it can be difficult to
determine which action block generated the error. If this happens, then
consider regenerating the parser specifying the --debug option,
calling
parser.setDebug(Parser::ACTIONCASES)
before calling the parser’s parse function.
- Following this the case-entry numbers of the switch which is
defined in the parser’s executeAction member are inserted
into the standard output stream just before the matching statements are
executed. The action case number that’s reported just before the
program reports the fatal error tells you in which of the grammar’s
action block the error was encountered.
- o
- Tag_ tag() const The tag matching the semantic value’s
polymorphic type is returned. The returned value is a valid Tag_
value when the SType object’s valid member returns
true;
- By default, or after assigning a plain (default) STYPE_ object to
an STYPE_ object (e.g., using a statement like $$ =
STYPE_{}), valid returns false, and the tag
member returns Meta_::sizeofTag_.
- o
- bool valid() const
- The value true is returned if the object contains a semantic value.
Otherwise false is returned. Note that default STYPE_ values
can be assigned to STYPE_ objects, but they do not represent valid
semantic values. See also the previous description of the tag
member.
Inside action blocks dollar-notations can be used to retrieve and
assign values from/to the elements of production rules. Type directives are
used to associates dollar-notations with semantic types.
When %stype is specified (and with the default int
semantic value type) the following dollar-notations are available:
- o
- $$ =
- A value is assigned to the rule’s nonterminal’s semantic
value. The right-hand side (rhs) of the assignment expression must be an
expression of a type that can be assigned to the STYPE_ type.
- o
- $$(expr)
- Same as the previous dollar-notation: expr’s value is
assigned to the rule’s nonterminal’s semantic value.
- o
- _$$
- This refers to the semantic value of the rule’s nonterminal.
- o
- $$
- Same as the previous item: this refers to the semantic value of the
rule’s nonterminal.
- o
- $$.
- If STYPE_ is a class-type then this dollar-notation is shorthand
for the member selector operator, applied to the rule’s
nonterminal’s semantic value.
- o
- $$->
- If STYPE_ is a class-type then this dollar-notation is shorthand
for the pointer to member operator, applied to the rule’s
nonterminal’s semantic value.
- o
- _$1
- This refers to the current production rule’s first
component’s semantic value.
- o
- $1
- Same as the previous dollar-notation: this refers to the current
production rule’s first component’s semantic value.
- o
- $1.
- If STYPE_ is a class-type then this dollar-notation is shorthand
for the member selector operator, applied to the current production
rule’s first component’s semantic value.
- o
- $1->
- If STYPE_ is a class-type then this dollar-notation is shorthand
for the pointer to member operator, applied to the current production
rule’s first component’s semantic value.
- o
- _$-1
- This refers to the semantic value of a component in a production rule,
listed immediately before the current rule’s nonterminal ($-2
refers to a component used two elements before the current nonterminal,
etc.).
- o
- $-1
- Same as the previous item: this refers to the semantic value of a
component in a production rule, listed immediately before the current
rule’s nonterminal.
- o
- $-1.
- If STYPE_ is a class-type then this dollar-notation is shorthand
for the member selector operator, applied to the semantic value of some
production rule element, 1 element before the current rule’s
nonterminal.
- o
- $-1->
- If STYPE_ is a class-type then this dollar-notation is shorthand
for the pointer to member operator, applied to the semantic value of some
production rule element, 1 element before the current rule’s
nonterminal.
When %union is specified these dollar-notations are
available:
- o
- $$ =
- A value is assigned to the rule’s nonterminal’s semantic
value. If the rule’s nonterminal was associated with one of the
union’s field types, then the matching union field receives the
value of the assignment expression’s right-hand side. If no
association was defined then the variable representing the
nonterminal’s semantic value is a plain union (i.e., STYPE_)
variable.
- o
- $$(expr)
- Expr’s value is assigned to the rule’s
nonterminal’s plain union (i.e., STYPE_) type. Any
association that may have been defined between the nonterminal and a union
field is ignored.
- o
- _$$
- This refers to the rule’s nonterminal’s plain union (i.e.,
STYPE_) type. Any association that may have been defined between
the nonterminal and a union field is ignored.
- o
- $$
- This refers to the rule’s nonterminal’s semantic value. If
it was associated with one of the union’s types, then $$
refers to the associated union field. If no association was defined then
$$ represents a plain union (i.e., STYPE_) type of
variable.
- o
- $$.
- If the rule’s nonterminal’s semantic value was associated
with one of the union’s types, then $$. is shorthand for the
member selector operator, applied to the associated union field type. If
no association was defined then $$. is shorthand for the field
selector operator, applied to the nonterminal’s semantic
value’s plain union (i.e., STYPE_) type.
- o
- $$->
- If the rule’s nonterminal’s semantic value was associated
with one of the union’s types, then $$-> is shorthand for
the pointer to member operator, applied to the associated union field
type. If no association was defined then an error message is issued, as
the pointer to member operator is not defined for plain union types.
- o
- _$1
- This refers to the current production rule’s first
component’s plain union (STYPE_) value.
- o
- $1
- This shorthand refers to the semantic value of the production
rule’s first element. If it was associated with one of the
union’s types, then $1 refers to the associated union field.
If no association was defined then $1 represents a plain union
(i.e., STYPE_) type of variable.
- o
- $1.
- If the production rule’s first component’s semantic value
was associated with one of the union’s types, then $1. is
shorthand for the member selector operator, applied to the associated
union field type. If no association was defined then $1. is
shorthand for the field selector operator, applied to the first
component’s semantic value’s plain union (i.e.,
STYPE_) type.
- o
- $1->
- If the production rule’s first component’s semantic value
was associated with one of the union’s types, then $1->
is shorthand for the pointer to member operator, applied to the associated
union field type. If no association was defined then an error message is
issued, as the pointer to member operator is not defined for plain union
types.
- o
- _$-1
- This refers to the plain union (STYPE_) value of a component in a
production rule, listed immediately before the current rule’s
nonterminal ($-2 refers to a component used two elements before the
current nonterminal, etc.).
- o
- $-1
- Same: this refers to the plain union (STYPE_) value of a component
in a production rule, listed immediately before the current rule’s
nonterminal ($-2 refers to a component used two elements before the
current nonterminal, etc.).
- o
- $-1.
- This is shorthand for the field selector operator applied to to the plain
union (STYPE_) value of some production rule element, 1 element
before the current rule’s nonterminal.
- o
- $-1->
- This shorthand refers to tho pointer to member operator applied to the
plain union (STYPE_) value of some production rule element, 1
element before the current rule’s nonterminal. Its use results in
an error message, as the pointer to member operator is not defined for
plain union types.
- o
- $<field>-1
- This refers to the field union field of a component in a production
rule, listed immediately before the current rule’s nonterminal.
Note that the validity of the specified field for that particular
component cannot be verified by bisonc++.
- o
- $<field>-1.
- This refers to the member selector operator of the field union
field of a component in a production rule, listed immediately before the
current rule’s nonterminal. Note that the validity of the specified
field for that particular component cannot be verified by
bisonc++.
- o
- $<field>-1-> This refers to the pointer to member operator
of the field union field of a component in a production rule,
listed immediately before the current rule’s nonterminal. Note that
the validity of the specified field for that particular component cannot
be verified by bisonc++.
When %polymorphic is specified these dollar-notations can
be used:
- o
- $$ =
- A semantic value is assigned to the rule’s nonterminal’s
semantic value. The right-hand side (rhs) of the assignment expression
must be an expression of the type that is associated with $$. This
assignment operation assumes that the type of the rhs-expression equals
$$’s semantic value type. If the types don’t match the
compiler issues a compilation error when compiling parse.cc.
Casting the rhs to the correct value type is possible, but in that case
the function call operator (see the next item) is preferred, as it does
not require casting. If no semantic value type was associated with $$ then
the assignment $$ = STYPE_{} can be used.
- o
- $$(expr)
- A value is assigned to the rule’s nonterminal’s semantic
value. Expr must be of a type that can be statically cast to
$$’s semantic value type. The required static_cast is
generated by bisonc++ and doesn’t have to be specified for
expr.
- o
- _$$
- This refers to the rule’s nonterminal’s semantic value,
disregarding any polymorphic type that might have been associated with the
rule’s nonterminal.
- o
- $$
- If no polymorphic type was associated with the rule’s nonterminal
then this is shorthand for a reference to the rule’s plain
STYPE_ value. If a polymorphic value type was associated with the
rule’s nonterminal then this shorthand represents a reference to a
value of that particular type.
- o
- $$.
- If no polymorphic type was associated with the rule’s nonterminal
then this is shorthand for the member selector operator, applied to a
reference to the rule’s nonterminal’s STYPE_ value.
If a polymorphic value type was associated with the rule’s
nonterminal then this shorthand represents the member selector operator,
applied to a reference of that particular type.
- o
- $$->
- If no polymorphic type was associated with the rule’s nonterminal
then this is shorthand for the pointer to member operator, applied to a
reference to the rule’s nonterminal’s STYPE_ value.
If a polymorphic value type was associated with the rule’s
nonterminal then this shorthand represents the pointer to member operator,
applied to a reference of that particular type.
- o
- _$1
- This refers to the current production rule’s first
component’s generic STYPE_ value.
- o
- $1
- This shorthand refers to the semantic value of the production
rule’s first element. If it was associated with a polymorphic type,
then $1 refers to a value of that particular type. If no
association was defined then $1 represents a generic STYPE_
value.
- o
- $1.
- If the production rule’s first component’s semantic value
was associated with a polymorphic type, then $1. is shorthand for
the member selector operator, applied to the value of the associated
polymorphic type. If no association was defined then $1. is
shorthand for the member selector operator, applied to the first
component’s generic STYPE_ value.
- o
- $1->
- If the production rule’s first component’s semantic value
was associated with a polymorphic type, then $1-> is shorthand
for the pointer to member operator, applied to the value of the associated
polymorphic type. If no association was defined then $1. is
shorthand for the pointer to member operator, applied to the first
component’s generic STYPE_ value.
- o
- _$-1
- This refers to the generic (STYPE_) value of a component in a
production rule, listed immediately before the current rule’s
nonterminal ($-2 refers to a component used two elements before the
current nonterminal, etc.).
- o
- $-1
- Same: this refers to the generic (STYPE_) value of a component in a
production rule, listed immediately before the current rule’s
nonterminal ($-2 refers to a component used two elements before the
current nonterminal, etc.).
- o
- $-1.
- This is shorthand for the member selector operator applied to to the
generic STYPE_ value of some production rule element, 1 element
before the current rule’s nonterminal.
- o
- $-1->
- This is shorthand for the pointer to member operator applied to to the
generic STYPE_ value of some production rule element, 1 element
before the current rule’s nonterminal.
- o
- $<tag>-1
- This shorthand represents a reference to the semantic value of the
polymorphic type associated with tag of some production rule
element, 1 element before the current rule’s nonterminal.
- If, when using the generated parser’s class parse function,
the polymorphic type of that element turns out not to match the type that
is associated with tag then a run-time fatal error results.
- If that happens, and the debug option/directive had been specified
when bisonc++ was run, then rerun the program after specifying
parser.setDebug(Parser::ACTIONCASES) to locate the parse
function’s action block where the fatal error was encountered.
- o
- $<tag>-1.
- This shorthand represents the member selector operator, applied to the
semantic value of the polymorphic type associated with tag of some
production rule element, 1 element before the current rule’s
nonterminal.
- If, when using the generated parser’s class parse function,
the polymorphic type of that element turns out not to match the type that
is associated with tag then a run-time fatal error results. The
procedure suggested at the previous ($<tag>-1) item for
solving such errors can be applied here as well.
- o
- $<tag>-1->
- This shorthand represents the pointer to member selector operator, applied
to the semantic value of the polymorphic type associated with tag
of some production rule element, 1 element before the current
rule’s nonterminal.
- If, when using the generated parser’s class parse function,
the polymorphic type of that element turns out not to match the type that
is associated with tag then a run-time fatal error results. The
procedure suggested at the previous ($<tag>-1) item for
solving such errors can be applied here as well.
To avoid collisions with names defined by the parser’s
(base) class, the following identifiers should not be used as token
names:
- o
- Identifiers ending in an underscore;
- o
- Any of the following identifiers: ABORT, ACCEPT, ERROR, clearin,
debug, or setDebug.
All DECLARATIONS and DEFINE symbols not listed above
but defined in bison++ are obsolete with bisonc++. In
particular, there is no %header{ ... %} section anymore. Also,
all DEFINE symbols related to member functions are now obsolete.
There is no need for these symbols anymore as they can simply be declared in
the class header file and defined elsewhere.
The tokens defined in the grammar files processed by
bisonc++ must usually also be available to the lexical scanner,
returning those tokens when certain regular expressions are matched. E.g., a
NUMBER token may be used in the grammar and the lexical scanner may
be expected to return that token when the input matches the [0-9]+
regular expression. To avoid circular dependencies among classes the tokens
can be written to a separate file using the token-path directive or
option. The location and name of this file is specified by the
token-path specification, and is generated from scratch at every run
of bisonc++. By default the grammar’s symbolic tokens are made
available in the class Tokens, and classes may refer to its tokens
using the Tokens class scope (e.g., Tokens::NUMBER).
Before bisonc++ version 6.04.00 tokens were made available
by including the file parserbase.h, using a simple #define
suggesting that the tokens were in fact defined by the parser class itself.
Using this scheme lexical scanner specifications returned, e.g.,
Parser::NUMBER when [0-9]+ was matched. Unless the
token-path directive or option is used this approach is still
available, but its use is deprecated.
Using a fairly traditional example, we construct a simple
calculator below. The basic operators as well as parentheses can be used to
specify expressions, and each expression should be terminated by a newline.
The program terminates when a q is entered. Empty lines result in a
mere prompt.
First an associated grammar is constructed. When a syntactic error
is encountered all tokens are skipped until then next newline and a simple
message is printed using the default error function. It is assumed
that no semantic errors occur (in particular, no divisions by zero). The
grammar is decorated with actions performed when the corresponding
grammatical production rule is recognized. The grammar itself is rather
standard and straightforward, but note the first part of the specification
file, containing various other directives, among which the %scanner
directive, resulting in a composed d_scanner object as well as an
implementation of the member function int lex, and the
%token-path directive, defining the class Tokens in he file
../scanner/tokens.h. In this example, the Scanner class is
generated by flexc++(1). The details of constructing a class using
flexc++ is beyond the scope of this man-page, but
flexc++’s specification file is shown below.
Here is bisonc++’s input file:
%filenames parser
%scanner ../scanner/scanner.h
%token-path ../tokens/tokens.h
// lowest precedence
%token NUMBER // integral numbers
EOLN // newline
%left ’+’ ’-’
%left ’*’ ’/’
%right UNARY
// highest precedence
%%
expressions:
expressions evaluate
|
prompt
;
evaluate:
alternative prompt
;
prompt:
{
prompt();
}
;
alternative:
expression EOLN
{
cout << $1 << endl;
}
|
’q’ done
|
EOLN
|
error EOLN
;
done:
{
cout << "Done.\n";
ACCEPT();
}
;
expression:
expression ’+’ expression
{
$$ = $1 + $3;
}
|
expression ’-’ expression
{
$$ = $1 - $3;
}
|
expression ’*’ expression
{
$$ = $1 * $3;
}
|
expression ’/’ expression
{
$$ = $1 / $3;
}
|
’-’ expression %prec UNARY
{
$$ = -$2;
}
|
’+’ expression %prec UNARY
{
$$ = $2;
}
|
’(’ expression ’)’
{
$$ = $2;
}
|
NUMBER
{
$$ = stoul(d_scanner.matched());
}
;
Bisonc++ processes this file, generating the following
files:
- o
- The parser’s base class, which should not be modified by the
programmer:
-
// hdr/includes
#ifndef ParserBase_h_included
#define ParserBase_h_included
#include <exception>
#include <vector>
#include <iostream>
// $insert preincludes
#include "../tokens/tokens.h"
// hdr/baseclass
namespace // anonymous
{
struct PI_;
}
// $insert parserbase
class ParserBase: public Tokens
{
public:
enum DebugMode_
{
OFF = 0,
ON = 1 << 0,
ACTIONCASES = 1 << 1
};
// $insert tokens
// $insert STYPE
using STYPE_ = int;
private:
// state semval
using StatePair = std::pair<size_t, STYPE_>;
// token semval
using TokenPair = std::pair<int, STYPE_>;
int d_stackIdx = -1;
std::vector<StatePair> d_stateStack;
StatePair *d_vsp = 0; // points to the topmost value stack
size_t d_state = 0;
TokenPair d_next;
int d_token;
bool d_terminalToken = false;
bool d_recovery = false;
protected:
enum Return_
{
PARSE_ACCEPT_ = 0, // values used as parse()’s return values
PARSE_ABORT_ = 1
};
enum ErrorRecovery_
{
UNEXPECTED_TOKEN_,
};
bool d_actionCases_ = false; // set by options/directives
bool d_debug_ = true;
size_t d_requiredTokens_;
size_t d_nErrors_; // initialized by clearin()
size_t d_acceptedTokens_;
STYPE_ d_val_;
ParserBase();
void ABORT() const;
void ACCEPT() const;
void ERROR() const;
STYPE_ &vs_(int idx); // value stack element idx
int lookup_() const;
int savedToken_() const;
int token_() const;
size_t stackSize_() const;
size_t state_() const;
size_t top_() const;
void clearin_();
void errorVerbose_();
void lex_(int token);
void popToken_();
void pop_(size_t count = 1);
void pushToken_(int token);
void push_(size_t nextState);
void redoToken_();
bool recovery_() const;
void reduce_(int rule);
void shift_(int state);
void startRecovery_();
public:
void setDebug(bool mode);
void setDebug(DebugMode_ mode);
};
// hdr/abort
inline void ParserBase::ABORT() const
{
throw PARSE_ABORT_;
}
// hdr/accept
inline void ParserBase::ACCEPT() const
{
throw PARSE_ACCEPT_;
}
// hdr/error
inline void ParserBase::ERROR() const
{
throw UNEXPECTED_TOKEN_;
}
// hdr/savedtoken
inline int ParserBase::savedToken_() const
{
return d_next.first;
}
// hdr/opbitand
inline ParserBase::DebugMode_ operator&(ParserBase::DebugMode_ lhs,
ParserBase::DebugMode_ rhs)
{
return static_cast<ParserBase::DebugMode_>(
static_cast<int>(lhs) & rhs);
}
// hdr/opbitor
inline ParserBase::DebugMode_ operator|(ParserBase::DebugMode_ lhs,
ParserBase::DebugMode_ rhs)
{
return static_cast<ParserBase::DebugMode_>(static_cast<int>(lhs) | rhs);
};
// hdr/recovery
inline bool ParserBase::recovery_() const
{
return d_recovery;
}
// hdr/stacksize
inline size_t ParserBase::stackSize_() const
{
return d_stackIdx + 1;
}
// hdr/state
inline size_t ParserBase::state_() const
{
return d_state;
}
// hdr/token
inline int ParserBase::token_() const
{
return d_token;
}
// hdr/vs
inline ParserBase::STYPE_ &ParserBase::vs_(int idx)
{
return (d_vsp + idx)->second;
}
#endif
- o
- The parser class parser.h itself. In the grammar specification
various member functions are used (e.g., done) and prompt.
These functions are so small that they can very well be implemented
inline. Note that done calls ACCEPT to terminate further
parsing. ACCEPT and related members (e.g., ABORT) can be
called from any member called by parse. As a consequence, action
blocks could contain mere function calls, rather than several statements,
thus minimizing the need to rerun bisonc++ when an action is
modified.
- Once bisonc++ has created parser.h additionally required
members can be added to it (bisonc++ itself won’t modify
parser.h anymore once it is created), resulting in the following
final version:
-
// Generated by Bisonc++ V5.00.00 on Sun, 03 Apr 2016 17:49:17 +0200
#ifndef Parser_h_included
#define Parser_h_included
// $insert baseclass
#include "parserbase.h"
// $insert scanner.h
#include "../scanner/scanner.h"
#undef Parser
class Parser: public ParserBase
{
// $insert scannerobject
Scanner d_scanner;
public:
int parse();
private:
void error(); // called on (syntax) errors
int lex(); // returns the next token from the
// lexical scanner.
void print(); // use, e.g., d_token, d_loc
void prompt();
void done();
// support functions for parse():
void executeAction_(int ruleNr);
void errorRecovery_();
void nextCycle_();
void nextToken_();
void print_();
void exceptionHandler(std::exception const &exc);
};
inline void Parser::prompt()
{
std::cout << "? " << std::flush;
}
inline void Parser::done()
{
std::cout << "Done\n";
ACCEPT();
}
#endif
- o
- The file ../tokens/tokens.h is generated because of the
%token-path directive. To avoid circular dependencies the tokens
are made available in a separate file, allowing classes used by the parser
to use the grammar’s tokens as well. Here is the file specifying
the grammar’s tokens:
-
#ifndef INCLUDED_TOKENS_
#define INCLUDED_TOKENS_
struct Tokens
{
// Symbolic tokens:
enum Tokens_
{
NUMBER = 257,
EOLN,
UNARY,
};
};
#endif
For the program no additional members had to be defined in the
class Parser. The member function parse is defined by
bisonc++ in the source file parse.cc, and it includes
parser.ih.
As cerr is used in the grammar’s actions, a using
namespace std or comparable directive is required. It is specified in
parser.ih. Here is the implementation header declaring the standard
namespace:
// Generated by Bisonc++ V5.00.00 on Sun, 03 Apr 2016 17:51:26 +0200
// Include this file in the sources of the class Parser.
// $insert class.h
#include "parser.h"
inline void Parser::error()
{
std::cerr << "Syntax error\n";
}
// $insert lex
inline int Parser::lex()
{
return d_scanner.lex();
}
inline void Parser::print()
{
print_(); // displays tokens if --print was specified
}
inline void Parser::exceptionHandler(std::exception const &exc)
{
throw; // re-implement to handle exceptions thrown by actions
}
// Add here includes that are only required for the compilation
// of Parser’s sources.
// UN-comment the next using-declaration if you want to use
// int Parser’s sources symbols from the namespace std without
// specifying std::
using namespace std;
In the current context the member function parse’s
implementation is not very relevant (it should not be modified by the
programmer anyway). It is not shown here, but is available as
calculator/parser/parse.cc in the distribution’s demos/
directory after building the calculator using the there provided
build script.
The lexical scanner is generated by flexc++(1) from the
following specification file, using the command flexc++ lexer:
// see also regression/calculator/scanner
%interactive
%filenames scanner
%%
[ \t]+ // skip white space
\n return Tokens::EOLN;
[0-9]+ return Tokens::NUMBER;
. return matched()[0];
%%
Finally, here is the program’s main function:
#include "parser/parser.h"
int main()
{
Parser calculator;
return calculator.parse();
}
bison(1), bison++(1), bisonc++(1),
bisonc++api(3), bison.info (using texinfo), flexc++(1),
https://fbb-git.gitlab.io/bisoncpp/
Lakos, J. (2001) Large Scale C++ Software Design, Addison
Wesley.
Aho, A.V., Sethi, R., Ullman, J.D. (1986) Compilers, Addison
Wesley.
Frank B. Brokken (f.b.brokken@rug.nl).