Pod::Man(3perl) | Perl Programmers Reference Guide | Pod::Man(3perl) |
Pod::Man - Convert POD data to formatted *roff input
use Pod::Man; my $parser = Pod::Man->new (release => $VERSION, section => 8); # Read POD from STDIN and write to STDOUT. $parser->parse_file (\*STDIN); # Read POD from file.pod and write to file.1. $parser->parse_from_file ('file.pod', 'file.1');
Pod::Man is a module to convert documentation in the POD format (the preferred language for documenting Perl) into *roff input using the man macro set. The resulting *roff code is suitable for display on a terminal using nroff(1), normally via man(1), or printing using troff(1). It is conventionally invoked using the driver script pod2man, but it can also be used directly.
By default (on non-EBCDIC systems), Pod::Man outputs UTF-8. Its output should work with the man program on systems that use groff (most Linux distributions) or mandoc (most BSD variants), but may result in mangled output on older UNIX systems. To choose a different, possibly more backward-compatible output mangling on such systems, set the "encoding" option to "roff" (the default in earlier Pod::Man versions). See the "encoding" option and "ENCODING" for more details.
See "COMPATIBILTY" for the versions of Pod::Man with significant backward-incompatible changes (other than constructor options, whose versions are documented below), and the versions of Perl that included them.
If the output contains characters that cannot be represented in this encoding, that is an error that will be reported as configured by the "errors" option. If error handling is other than "die", the unrepresentable character will be replaced with the Encode substitution character (normally "?").
If the "encoding" option is set to the special value "groff" (the default on EBCDIC systems), or if the Encode module is not available and the encoding is set to anything other than "roff", Pod::Man will translate all non-ASCII characters to "\[uNNNN]" Unicode escapes. These are not traditionally part of the *roff language, but are supported by groff and mandoc and thus by the majority of manual page processors in use today.
If the "encoding" option is set to the special value "roff", Pod::Man will do its historic transformation of (some) ISO 8859-1 characters into *roff escapes that may be adequate in troff and may be readable (if ugly) in nroff. This was the default behavior of versions of Pod::Man before 5.00. With this encoding, all other non-ASCII characters will be replaced with "X". It may be required for very old troff and nroff implementations that do not support UTF-8, but its representation of any non-ASCII character is very poor and often specific to European languages.
If the output file handle has a PerlIO encoding layer set, setting "encoding" to anything other than "groff" or "roff" will be ignored and no encoding will be done by Pod::Man. It will instead rely on the encoding layer to make whatever output encoding transformations are desired.
WARNING: The input encoding of the POD source is independent from the output encoding, and setting this option does not affect the interpretation of the POD input. Unless your POD source is US-ASCII, its encoding should be declared with the "=encoding" command in the source. If this is not done, Pod::Simple will will attempt to guess the encoding and may be successful if it's Latin-1 or UTF-8, but it will produce warnings. See perlpod(1) for more information.
The default is "pod".
The special value "all" enables all guesswork. This is also the default for backward compatibility reasons. The special value "none" disables all guesswork. Otherwise, the value of this option should be a comma-separated list of one or more of the following keywords:
Any unknown guesswork name is silently ignored (for potential future compatibility), so be careful about spelling.
Specifically, this adds:
.mso <language>.tmac .hla <language>
to the start of the file, which configure correct line breaking for the specified language. Without these commands, groff may not know how to add proper line breaks for Chinese and Japanese text if the manual page is installed into the normal manual page directory, such as /usr/share/man.
On many systems, this will be done automatically if the manual page is installed into a language-specific manual page directory, such as /usr/share/man/zh_CN. In that case, this option is not required.
Unfortunately, the commands added with this option are specific to groff and will not work with other troff and nroff implementations.
Also see the "quotes" option, which can be used to set both quotes at once. If both "quotes" and one of the other options is set, "lquote" or "rquote" overrides "quotes".
If generating a manual page from standard input, the name will be set to "STDIN" if this option is not provided. In this case, providing this option is strongly recommended to set a meaningful manual page name.
L<foo|http://example.com/>
is formatted as:
foo <http://example.com/>
This option, if set to a true value, suppresses the URL when anchor text is given, so this example would be formatted as just "foo". This can produce less cluttered output in cases where the URLs are not particularly important.
This may also be set to the special value "none", in which case no quote marks are added around C<> text (but the font is still changed for troff output).
Also see the "lquote" and "rquote" options, which can be used to set the left and right quotes independently. If both "quotes" and one of the other options is set, "lquote" or "rquote" overrides "quotes".
Note that some system "an" macro sets assume that the centered footer will be a modification date and will prepend something like "Last modified: ". If this is the case for your target system, you may want to set "release" to the last modified date and "date" to the version number.
By default, section 1 will be used unless the file ends in ".pm" in which case section 3 will be selected.
This option is for backward compatibility with Pod::Man versions that did not support "errors". Normally, the "errors" option should be used instead.
As a derived class from Pod::Simple, Pod::Man supports the same methods and interfaces. See Pod::Simple for all the details. This section summarizes the most-frequently-used methods and the ones added by Pod::Man.
my $man = Pod::Man->new(); my $output; $man->output_string(\$output); $man->parse_file('/some/input/file');
Be aware that the output in that variable will already be encoded in UTF-8.
parse_from_filehandle() is provided for backward compatibility with older versions of Pod::Man. parse_from_file() should be used instead.
This method expects raw bytes, not decoded characters.
This method expects raw bytes, not decoded characters.
As of Pod::Man 5.00, the default output encoding for Pod::Man is UTF-8. This should work correctly on any modern system that uses either groff (most Linux distributions) or mandoc (Alpine Linux and most BSD variants, including macOS).
The user will probably have to use a UTF-8 locale to see correct output. This may be done by default; if not, set the LANG or LC_CTYPE environment variables to an appropriate local. The locale "C.UTF-8" is available on most systems if one wants correct output without changing the other things locales affect, such as collation.
The backward-compatible output format used in Pod::Man versions before 5.00 is available by setting the "encoding" option to "roff". This may produce marginally nicer results on older UNIX versions that do not use groff or mandoc, but none of the available options will correctly render Unicode characters on those systems.
Below are some additional details about how this choice was made and some discussion of alternatives.
The default output encoding for Pod::Man has been a long-standing problem. troff and nroff predate Unicode by a significant margin, and their implementations for many UNIX systems reflect that legacy. It's common for Unicode to not be supported in any form.
Because of this, versions of Pod::Man prior to 5.00 maintained the highly conservative output of the original pod2man, which output pure ASCII with complex macros to simulate common western European accented characters when processed with troff. The nroff output was awkward and sometimes incorrect, and characters not used in western European scripts were replaced with "X". This choice maximized backwards compatibility with man and nroff/troff implementations at the cost of incorrect rendering of many POD documents, particularly those containing people's names.
The modern implementations, groff (used in most Linux distributions) and mandoc (used by most BSD variants), do now support Unicode. Other UNIX systems often do not, but they're now a tiny minority of the systems people use on a daily basis. It's increasingly common (for very good reasons) to use Unicode characters for POD documents rather than using ASCII conversions of people's names or avoiding non-English text, making the limitations in the old output format more apparent.
Four options have been proposed to fix this:
Pod::Man 5.00 and later makes the last choice. This arguably produces worse output when manual pages are formatted with troff into PostScript or PDF, but doing this is rare and normally manual, so the encoding can be changed in those cases. The older output encoding is available by setting "encoding" to "roff".
Here is the results of testing "encoding" values of "utf-8" and "groff" on various operating systems. The testing methodology was to create man/man1 in the current directory, copy encoding.utf8 or encoding.groff from the podlators 5.00 distribution to man/man1/encoding.1, and then run:
LANG=C.UTF-8 MANPATH=$(pwd)/man man 1 encoding
If the locale is not explicitly set to one that includes UTF-8, the Unicode characters were usually converted to ASCII (by, for example, dropping an accent) or deleted or replaced with "<?>" if there was no conversion.
Tested on 2022-09-25. Many thanks to the GCC Compile Farm project for access to testing hosts.
OS UTF-8 groff ------------------ ------- ------- AIX 7.1 no [1] no [2] Alpine 3.15.0 yes yes CentOS 7.9 yes yes Debian 7 yes yes FreeBSD 13.0 yes yes NetBSD 9.2 yes yes OpenBSD 7.1 yes yes openSUSE Leap 15.4 yes yes Solaris 10 yes no [2] Solaris 11 no [3] no [3]
I did not have access to a macOS system for testing, but since it uses mandoc, it's behavior is probably the same as the BSD hosts.
Notes:
PostScript and PDF output using groff on a Debian 12 system do not support combining accent marks or SMP characters due to a lack of support in the default output font.
Testing on additional platforms is welcome. Please let the author know if you have additional results.
(Arguably, according to the specification, this variable should be used only if the timestamp of the input file is not available and Pod::Man uses the current time. However, for reproducible builds in Debian, results were more reliable if this variable overrode the timestamp of the input file.)
Pod::Man 1.02 (based on Pod::Parser) was the first version included with Perl, in Perl 5.6.0.
The current API based on Pod::Simple was added in Pod::Man 2.00. Pod::Man 2.04 was included in Perl 5.9.3, the first version of Perl to incorporate those changes. This is the first version that correctly supports all modern POD syntax. The parse_from_filehandle() method was re-added for backward compatibility in Pod::Man 2.09, included in Perl 5.9.4.
Support for anchor text in L<> links of type URL was added in Pod::Man 2.23, included in Perl 5.11.5.
parse_lines(), parse_string_document(), and parse_file() set a default output file handle of "STDOUT" if one was not already set as of Pod::Man 2.28, included in Perl 5.19.5.
Support for SOURCE_DATE_EPOCH and POD_MAN_DATE was added in Pod::Man 4.00, included in Perl 5.23.7, and generated dates were changed to use UTC instead of the local time zone. This is also the first release that aligned the module version and the version of the podlators distribution. All modules included in podlators, and the podlators distribution itself, share the same version number from this point forward.
Pod::Man 4.10, included in Perl 5.27.8, changed the formatting for manual page references and function names to bold instead of italic, following the current Linux manual page standard.
Pod::Man 5.00 changed the default output encoding to UTF-8, overridable with the new "encoding" option. It also fixed problems with bold or italic extending too far when used with C<> escapes, and began converting Unicode zero-width spaces (U+200B) to the "\:" *roff escape. It also dropped attempts to add subtle formatting corrections in the output that would only be visible when typeset with troff, which had previously been a significant source of bugs.
There are numerous bugs and language-specific assumptions in the nroff fallbacks for accented characters in the "roff" encoding. Since the point of this encoding is backward compatibility with the output from earlier versions of Pod::Man, and it is deprecated except when necessary to support old systems, those bugs are unlikely to ever be fixed.
Pod::Man doesn't handle font names longer than two characters. Neither do most troff implementations, but groff does as an extension. It would be nice to support as an option for those who want to use it.
Pod::Man copies the input spacing verbatim to the output *roff document. This means your output will be affected by how nroff generally handles sentence spacing.
nroff dates from an era in which it was standard to use two spaces after sentences, and will always add two spaces after a line-ending period (or similar punctuation) when reflowing text. For example, the following input:
=pod One sentence. Another sentence.
will result in two spaces after the period when the text is reflowed. If you use two spaces after sentences anyway, this will be consistent, although you will have to be careful to not end a line with an abbreviation such as "e.g." or "Ms.". Output will also be consistent if you use the *roff style guide (and XKCD 1285 <https://xkcd.com/1285/>) recommendation of putting a line break after each sentence, although that will consistently produce two spaces after each sentence, which may not be what you want.
If you prefer one space after sentences (which is the more modern style), you will unfortunately need to ensure that no line in the middle of a paragraph ends in a period or similar sentence-ending paragraph. Otherwise, nroff will add a two spaces after that sentence when reflowing, and your output document will have inconsistent spacing.
The handling of hyphens versus dashes is somewhat fragile, and one may get a the wrong one under some circumstances. This will normally only matter for line breaking and possibly for troff output.
Written by Russ Allbery <rra@cpan.org>, based on the original pod2man by Tom Christiansen <tchrist@mox.perl.com>.
The modifications to work with Pod::Simple instead of Pod::Parser were contributed by Sean Burke <sburke@cpan.org>, but I've since hacked them beyond recognition and all bugs are mine.
Copyright 1999-2010, 2012-2020, 2022 Russ Allbery <rra@cpan.org>
Substantial contributions by Sean Burke <sburke@cpan.org>.
This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
Encode::Supported, Pod::Simple, perlpod(1), pod2man(1), nroff(1), troff(1), man(1), man(7)
Ossanna, Joseph F., and Brian W. Kernighan. "Troff User's Manual," Computing Science Technical Report No. 54, AT&T Bell Laboratories. This is the best documentation of standard nroff and troff. At the time of this writing, it's available at <http://www.troff.org/54.pdf>.
The manual page documenting the man macro set may be man(5) instead of man(7) on your system.
See perlpodstyle(1) for documentation on writing manual pages in POD if you've not done it before and aren't familiar with the conventions.
The current version of this module is always available from its web site at <https://www.eyrie.org/~eagle/software/podlators/>. It is also part of the Perl core distribution as of 5.6.0.
2024-10-02 | perl v5.38.2 |