DelimMatch(3pm) | User Contributed Perl Documentation | DelimMatch(3pm) |
Text::DelimMatch - Perl extension to find regexp delimited strings with proper nesting
use Text::DelimMatch; $mc = new Text::DelimMatch, $startdelim, $enddelim; $mc->quote('"'); $mc->escape("\\"); $mc->double_escape('"'); $mc->case_sensitive(1); ($prefix, $match, $remainder) = $mc->match($string); ($prefix, $nextmatch, $remainder) = $mc->match(); $middle = $mc->strip_delim($match); # returns $match w/o start and end delim
These routines allow you to match delimited substrings in a buffer. The delimiters can be specified with any regular expression and the start and end delimiters need not be the same. If the delimited text is properly nested, entire nested groups are returned.
In addition, you may specify quoting and escaping characters that contribute to the recognition of start and end delimiters.
For example, if you specify the start and end delimiters as '\(' and '\)', respectively, and the double quote character as a quoting character, and the backslash as an escaping character, then the delimited substring in this buffer is "(ma(t)c\)h)":
'prefix text "(quoted text)" \(escaped \" text) (ma(t)c\)h) postfix text'
In order to support this rather complex interface, the matching context is encapsulated in an object. The object, Text::DelimMatch, has the following public methods:
If $string is not provided on subsequent calls, the $post from the previous match is used, unless keep is false. If keep is false, the match always fails.
Returns the delimters in use before this call.
If only $start is passed, $end is assumed to be the same.
In matching, quotes occur in pairs. In other words, if (",") and (',') are both specified as quote pairs and a string beginning with " is found, it is ended only by another ", not by '.
Returns the quote hash in use before this call.
Returns the escape characters in use before this call.
'Don''t you see?'
defines a string containing a single apostrophe.
$esc can only be a string of characters. $esc can be a regexp set or a simple string. If it is a simple string, it will be translated into the regexp set "[ quotemeta($esc) ]".
Returns the double-escaping characters in use before this call.
Returns the case sensitivity in use before this call.
Keep, which is true by default, specifies whether or not the matching context object keeps a local copy of the buffer used in matching. Keeping a local copy allows repeated matching on the same buffer, but might be a bad idea if the buffer is a terabyte long. ;-)
Returns the keep setting in use before this call.
Returndelim, which is true by default, specifies whether or not the start and end delimiters are returned with the matching string.
Returns the returndelim setting in use before this call.
The most common error is a bad regular expression, for example specifing the start delimiter as "(" instead of "\\(". Remember, these are regexps!
If debug is true, informative and progress messages are printed to STDOUT by some methods.
Returns the debugging setting in use before this call.
For simplicity, and backward compatibility with the previous (limited release) incarnation of this module, the following functions are also available directly:
$mc = new Text::DelimMatch '"'; $mc->('pre "match" post') == '"match"'; $mc->delim("\\(", "\\)"); $mc->('pre (match) post') == ('pre ', '(match)', ' post'); $mc->('pre (ma(t)ch) post') == ('pre ', '(ma(t)ch)', ' post'); $mc->quote('"'); $mc->escape("\\"); $mc->('pre (ma")"tch) post') == ('pre ', '(ma")"tch)', ' post'); $mc->('pre (ma(t)c\)h\") post') == ('pre ', '(ma(t)c\)h\")', ' post');
See also test.pl in the distribution.
Norman Walsh, ndw@nwalsh.com
Copyright (C) 1997-2002 Norman Walsh. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
perl(1).
2022-10-13 | perl v5.36.0 |