SMLYACC(1) General Commands Manual SMLYACC(1)

smlyacc - the parser generator for SML#

smlyacc [-s] [-p output_prefix] filename

SMLYacc is a parser generator in the style of ML-Yacc. It can accept grammer files of ML-Yacc, but generated programs and their usage are not compatible to those of ML-Yacc. Generated programs can be compiled by the SML# compiler.

By default, for an input file X.grm, smlyacc generates X.grm.sml for the generated parser, X.grm.sig for the signature of tokens, and optionally X.grm.desc for the description of LALR parser construction. To compile the generated program with SML#, you need to write an inteface file X.grm.smi by yourself according to the generated signature X.grm.sig.

Insert the token signature at the beginning of the generated .sml file, instead of a separate .sig file.
Set the prefix of the output file names. When output_prefix is set to X, smlyacc generates X.sml, X.sig, and X.desc. The default is the same as the input file name.

The following is a minimal example of an input file ex.grm:

%%
%term LPAREN | RPAREN | EOF
%nonterm start of word | exp of word
%pos int
%eop EOF
%name Example
%%
start : exp (exp)
exp : (0w0)
    | LPAREN exp RPAREN exp (exp1 + exp2)

By applying this file to smlyacc,

smlyacc ex.grm

you obtain two files ex.grm.sml and ex.grm.sig. Only ex.grm.sml needs to be compiled. To compile it, you need to create the following ex.grm.smi file by yourself:

_require "basis.smi"
_require local "ml-yacc-lib.smi"
_require local "./ex.grm.sig"
structure ExampleLrVals =
struct
  structure Parser =
  struct
    type token (= boxed)
    type stream (= ref)
    type result = word
    type pos = int
    type arg = unit
    val makeStream : {lexer : unit -> token} -> stream
    val consStream : token * stream -> stream
    val getStream : stream -> token * stream
    val sameToken : token * token -> bool
    val parse : {lookahead : int,
                 stream : stream,
                 error : string * pos * pos -> unit,
                 arg : arg}
                -> result * stream
  end
  structure Tokens =
  struct
    type pos = Parser.pos
    type token = Parser.token
    val EOF: pos * pos -> token
    val RPAREN: pos * pos -> token
    val LPAREN: pos * pos -> token
  end
end

The types of token constructors (EOF, RPAREN, and LPAREN) are copied from the generated signature ex.grm.sig file by hand.

The parse function in the generated program is the parser. To invoke it, an imperative lexer function of type unit -> token is needed. In the case of combining with SMLLex, the lexer is generated by SMLLex. Suppose that SMLLex generates a lexer of the following interface:

structure ExampleLex =
struct
  exception LexError
  val makeLexer : (int -> string)
                  -> unit -> ExampleLrVals.Parser.token
end

A typical code joining SMLLex and SMLYacc looks like the following:

fun inputN n = TextIO.inputN (instream, n)
val lexer = ExampleLex.makeLexer inputN
val stream = ExampleLrVals.Parser.makeStream {lexer = lexer}
val (result, stream) =
    ExampleLrVals.parse
      {lookahead = 0, stream = stream,
       error = errorFn, arg = parserArg}

SMLYacc is a derivative of ML-Yacc, which is originally developed by David R. Tarditi and Andrew W. Appel. When ML-Yacc was ported to SML#, the source code was restructured to replace functor applications with SML#'s separate compilation and linking. See the SML# document for major changes from the original ML-Yacc.

smllex(1)
ML-Yacc User's Manual, available at https://www.smlnj.org/doc/ML-Yacc/
SML# Document, available at https://www.pllab.riec.tohoku.ac.jp/smlsharp/docs/