groff_out(5) | File Formats Manual | groff_out(5) |
groff_out - GNU roff intermediate output format
The fundamental operation of the troff(1) formatter is the translation of the groff(7) input language into a series of instructions concerned primarily with placing glyphs or geometric objects at specific positions on a rectangular page. In the following discussion, the term command refers to this intermediate output language, never to the groff(7) language intended for use by document authors. Intermediate output commands comprise several categories: glyph output; font, color, and text size selection; motion of the printing position; page advancement; drawing of geometric primitives; and device control commands, a catch-all for other operations. The last includes directives to start and stop output, identify the intended output device, and embed URL hyperlinks in supported output formats.
Because the front-end command groff(1) is a wrapper that normally runs the troff formatter to generate intermediate output and an output driver (“postprocessor”) to consume it, users normally do not encounter this language. The groff program's -Z option inhibits postprocessing such that this intermediate output is sent to the standard output stream as when troff is run manually.
groff's intermediate output facilitates the development of output drivers and other postprocessors by offering a common programming interface. It is an extension of the page description language developed by Brian Kernighan for AT&T device-independent troff circa 1980. Where a distinction is necessary, we will say “troff output” to describe the output of GNU troff, and “intermediate output” to denote the language accepted by the parser implemented in groff's internal C++ library used by most of its output drivers.
During the run of troff, the roff input is cracked down to the information on what has to be printed at what position on the intended device. So the language of the intermediate output format can be quite small. Its only elements are commands with or without arguments. In this document, the term “command” always refers to the intermediate output language, never to the roff language used for document formatting. There are commands for positioning and text writing, for drawing, and for device controlling.
Classical troff output had strange requirements on whitespace. The groff output parser, however, is smart about whitespace by making it maximally optional. The whitespace characters, i.e., the tab, space, and newline characters, always have a syntactical meaning. They are never printable because spacing within the output is always done by positioning commands.
Any sequence of space or tab characters is treated as a single syntactical space. It separates commands and arguments, but is only required when there would occur a clashing between the command code and the arguments without the space. Most often, this happens when variable length command names, arguments, argument lists, or command clusters meet. Commands and arguments with a known, fixed length need not be separated by syntactical space.
A line break is a syntactical element, too. Every command argument can be followed by whitespace, a comment, or a newline character. Thus a syntactical line break is defined to consist of optional syntactical space that is optionally followed by a comment, and a newline character.
The normal commands, those for positioning and text, consist of a single letter taking a fixed number of arguments. For historical reasons, the parser allows stacking of such commands on the same line, but fortunately, in groff intermediate output, every command with at least one argument is followed by a line break, thus providing excellent readability.
The other commands — those for drawing and device controlling — have a more complicated structure; some recognize long command names, and some take a variable number of arguments. So all D and x commands were designed to request a syntactical line break after their last argument. Only one command, ‘x X’ has an argument that can stretch over several lines, all other commands must have all of their arguments on the same line as the command, i.e., the arguments may not be split by a line break.
Lines containing only spaces and/or a comment are treated as empty and ignored.
Some commands accept integer arguments that represent measurements, but the scaling units of the formatter's language are never used. Most commands assume a scaling unit of “u” (basic units), and others use “z” (scaled points); These are defined by the parameters specified in the device's DESC file; see groff_font(5) and, for more on scaling units, groff(7) and Groff: The GNU Implementation of troff, the groff Texinfo manual. Color-related commands use dimensionless integers.
Note that single characters can have the eighth bit set, as can the names of fonts and special characters (this is, glyphs). The names of glyphs and fonts can be of arbitrary length. A glyph that is to be printed will always be in the current font.
A string argument is always terminated by the next whitespace character (space, tab, or newline); an embedded # character is regarded as part of the argument, not as the beginning of a comment command. An integer argument is already terminated by the next non-digit character, which then is regarded as the first character of the next argument or command.
A correct intermediate output document consists of two parts, the prologue and the body.
The task of the prologue is to set the general device parameters using three exactly specified commands. The groff prologue is guaranteed to consist of the following three lines (in that order):
x T device
x res n h v
x init
with the arguments set as outlined in subsection “Device Control Commands” below. However, the parser for the intermediate output format is able to swallow additional whitespace and comments as well.
The body is the main section for processing the document data. Syntactically, it is a sequence of any commands different from the ones used in the prologue. Processing is terminated as soon as the first x stop command is encountered; the last line of any groff intermediate output always contains such a command.
Semantically, the body is page oriented. A new page is started by a p command. Positioning, writing, and drawing commands are always done within the current page, so they cannot occur before the first p command. Absolute positioning (by the H and V commands) is done relative to the current page, all other positioning is done relative to the current location within this page.
This section describes all intermediate output commands, the classical commands as well as the groff extensions.
The commands in this subsection have a command code consisting of a single character, taking a fixed number of arguments. Most of them are commands for positioning and text writing. These commands are smart about whitespace. Optionally, syntactical space can be inserted before, after, and between the command letter and its arguments. All of these commands are stackable, i.e., they can be preceded by other simple commands or followed by arbitrary other commands on the same line. A separating syntactical space is necessary only when two integer arguments would clash or if the preceding argument ends with a string argument.
Each graphics or drawing command in the intermediate output starts with the letter D followed by one or two characters that specify a subcommand; this is followed by a fixed or variable number of integer arguments that are separated by a single space character. A D command may not be followed by another command on the same line (apart from a comment), so each D command is terminated by a syntactical line break.
troff output follows the classical spacing rules (no space between command and subcommand, all arguments are preceded by a single space character), but the parser allows optional space between the command letters and makes the space before the first argument optional. As usual, each space can be any sequence of tab and space characters.
Some graphics commands can take a variable number of arguments. In this case, they are integers representing a size measured in basic units u. The h arguments stand for horizontal distances where positive means right, negative left. The v arguments stand for vertical distances where positive means down, negative up. All these distances are offsets relative to the current location.
Unless indicated otherwise, each graphics command directly corresponds to a similar groff \D escape sequence; see groff(7).
Unknown D commands are assumed to be device-specific. Its arguments are parsed as strings; the whole information is then sent to the postprocessor.
In the following command reference, the syntax element ⟨line-break⟩ means a syntactical line break as defined in subsection “Separation” above.
mg 0 0 65536 Df -1
This command is a groff extension.
Each device control command starts with the letter x followed by a space character (optional or arbitrary space/tab in groff) and a subcommand letter or word; each argument (if any) must be preceded by a syntactical space. All x commands are terminated by a syntactical line break; no device control command can be followed by another command on the same line (except a comment).
The subcommand is basically a single letter, but to increase readability, it can be written as a word, i.e., an arbitrary sequence of characters terminated by the next tab, space, or newline character. All characters of the subcommand word but the first are simply ignored. For example, troff outputs the initialization command x i as x init and the resolution command x r as x res. But writings like x i_like_groff and x roff_is_groff are accepted as well to mean the same commands.
In the following, the syntax element ⟨line-break⟩ means a syntactical line break as defined in subsection “Separation” above.
In classical troff output, emitting a single glyph was mostly done by a very strange command that combined a horizontal move and the printing of a glyph. It didn't have a command code, but is represented by a 3-character argument consisting of exactly 2 digits and a character.
In groff, arbitrary syntactical space around and within this command is allowed to be added. Only when a preceding command on the same line ends with an argument of variable length a separating space is obligatory. In classical troff, large clusters of these and other commands were used, mostly without spaces; this made such output almost unreadable.
For modern high-resolution devices, this command does not make sense because the width of the glyphs can become much larger than two decimal digits. In groff, it is used only for output to the X75, X75-12, X100, and X100-12 devices. For others, the commands t and u provide greater functionality and superior troubleshooting capacity.
The roff postprocessors are programs that have the task to translate the intermediate output into actions that are sent to a device. A device can be some piece of hardware such as a printer, or a software file format suitable for graphical or text processing. The groff system provides powerful means that make the programming of such postprocessors an easy task.
There is a library function that parses the intermediate output and sends the information obtained to the device via methods of a class with a common interface for each device. So a groff postprocessor must only redefine the methods of this class. For details, see the reference in section “Files” below.
This section presents the intermediate output generated from the same input for three different devices. The input is the sentence hell world fed into groff on the command line.
shell> echo "hell world" | groff -Z -T ps
x T ps x res 72000 1 1 x init p1 x font 5 TR f5 s10000 V12000 H72000 thell wh2500 tw H96620 torld n12000 0 x trailer V792000 x stop
This output can be fed into the postprocessor grops(1) to get its representation as a PostScript file, or gropdf(1) to output directly to PDF.
This is similar to the high-resolution device except that the positioning is done at a minor scale. Some comments (lines starting with #) were added for clarification; they were not generated by the formatter.
shell> "hell world" | groff -Z -T latin1
# prologue x T latin1 x res 240 24 40 x init # begin a new page p1 # font setup x font 1 R f1 s10 # initial positioning on the page V40 H0 # write text 'hell' thell # inform about a space, and do it by a horizontal jump wh24 # write text 'world' tworld # announce line break, but do nothing because ... n40 0 # ... the end of the document has been reached x trailer V2640 x stop
This output can be fed into the postprocessor grotty(1) to get a formatted text document.
As a computer monitor has a very low resolution compared to modern printers the intermediate output for the X devices can use the jump-and-write command with its 2-digit displacements.
shell> "hell world" | groff -Z -T X100
x T X100 x res 100 1 1 x init p1 x font 5 TR f5 s10 V16 H100 # write text with old-style jump-and-write command ch07e07l03lw06w11o07r05l03dh7 n16 0 x trailer V1100 x stop
This output can be fed into the postprocessor xditview(1x) or gxditview(1) for displaying in X.
Due to the obsolete jump-and-write command, the text clusters in the classical output are almost unreadable.
The intermediate output language of the classical troff was first documented in [CSTR #97]. The groff intermediate output format is compatible with this specification except for the following features.
The differences between groff and classical troff are documented in groff_diff(7).
James Clark wrote an early version of this document, which described only the differences between AT&T device-independent troff's output format and that of GNU roff. The present version was completely rewritten in 2001 by Bernd Warken.
Groff: The GNU Implementation of troff, by Trent A. Fisher and Werner Lemberg, is the primary groff manual. You can browse it interactively with “info groff”.
“Troff User's Manual” by Joseph F. Ossanna, 1976 (revised by Brian W. Kernighan, 1992), AT&T Bell Laboratories Computing Science Technical Report No. 54, widely called simply “CSTR #54”, documents the language, device and font description file formats, and device-independent output format referred to collectively in groff documentation as “AT&T troff”.
“A Typesetter-independent TROFF” by Brian W. Kernighan, 1982, AT&T Bell Laboratories Computing Science Technical Report No. 97, provides additional insights into the device and font description file formats and device-independent output format.
grodvi(1), grohtml(1), grolbp(1), grolj4(1), gropdf(1), grops(1), and grotty(1) are groff postprocessors.
31 March 2024 | groff 1.23.0 |