Like most PGE subrules, PGE::Text::bracketed can be called
as a subrule in a rule expression or directly via a subroutine call.
Thus, to extract quote-delimited text from a string:
.local pmc bracketed
bracketed = find_global "PGE::Text", "bracketed"
$P0 = bracketed("'this quoted string' and other stuff", "'")
returns a match object into $P0 that contains C< 'this quoted string' >.
Delimiters can be escaped, thus
$P0 = bracketed("'Don\'t do that,' he said", "'")
returns a match object corresponding to C< 'Don\'t do that,' >.
PGE::Text::bracketed also understands balanced delimiters of (), [],
<>, and {}, as long as they are properly nested:
$P0 = bracketed("{ match (this) please } but not this", "(){}")
print $P0 # outputs { match (this) please }
$P0 = bracketed("{ match (this please }", "(){}")
# match fails, "(" has no closing paren
$P0 = bracketed("{ match \(this please }", "(){}")
# succeeds, paren is escaped
Quotes can be included in the delimiter to surround unbalanced
delimiters:
$P0 = bracketed("{ The '(' paren is quoted }", "{}()") # match fails
$P0 = bracketed("{ The '(' paren is quoted }", "{}()'") # match succeeds
Lastly, PGE::Text::bracketed can be used as a subrule in rule expressions:
.local pmc p6rule, rulesub
p6rule = find_global "PGE", "p6rule"
rulesub = p6rule(":w <ident> \:= <PGE::Text::bracketed> ")
$P0 = rulesub(" a := ( a + b(3) - 4 ) ; "
# $P0['ident'] == 'a'
# $P0['PGE::Text::bracketed'] == '( a + b(3) - 4 )'
At the moment there's not a mechanism to specify an alternate set
of delimiters from within a subrule, but this should be available in
the near future as a subrule parameter:
<PGE::Text::balanced: {}()[]> # delimited by {}, (), []
<PGE::Text::balanced("<'\"")> # delimited by <>, ", '
Comments, questions, suggestions, and tests (in t/p6rules/text_brk.t)
welcomed!
Pm
> I've just added a <PGE::Text::bracketed> subrule to PGE,
> which is roughly analogous to the "bracketed" function in
> Perl 5's Text::Balanced.
>
> Like most PGE subrules, PGE::Text::bracketed can be called
> as a subrule in a rule expression or directly via a subroutine call.
> Thus, to extract quote-delimited text from a string:
>
> .local pmc bracketed
> bracketed = find_global "PGE::Text", "bracketed"
>
> $P0 = bracketed("'this quoted string' and other stuff", "'")
>
> returns a match object into $P0 that contains C< 'this quoted
> string' >.
> Delimiters can be escaped, thus
Shouldn't it contain C<this quoted string>? That is, shouldn't it
remove the bracketing characters? Or at least hold the string without
the brackets somewhere within the PGE::Text object? I'm getting:
<PunieGrammar::term> => PMC 'PunieGrammar' => "(ok 1)" @ 6 {
<PGE::Text::bracketed> => PMC 'PGE::Text' => "(ok 1)" @ 6
}
But want something more like:
<PunieGrammar::term> => PMC 'PunieGrammar' => "(ok 1)" @ 6 {
<PGE::Text::bracketed> => PMC 'PGE::Text' => "(ok 1)" @ 6 {
[0] => PMC 'PunieGrammar' => "ok 1" @ 7
}
}
So I can extract the parsed string. (If you squint and read "()" as
double quotes, it makes more sense.)
> At the moment there's not a mechanism to specify an alternate set
> of delimiters from within a subrule, but this should be available in
> the near future as a subrule parameter:
>
> <PGE::Text::balanced: {}()[]> # delimited by {}, (), []
> <PGE::Text::balanced("<'\"")> # delimited by <>, ", '
I have a use for this as soon as its done (or is it already done?)
And thanks very much for adding this feature. It's a big help!
Allison
For this I was following the design of "extract_bracketed" in
Perl 5's Text::Balanced, which returns the delimiters as part
of the string. I agree it would be nice for PGE::Text::bracketed
to also return the string without the outer delimiters somewhere.
But this begs a larger design issue -- should it be able to return
all nested and balanced substrings without their delims (and if so,
what should the resulting Match object structure look like)?
Anyway, it's no problem for me to update PGE::Text::bracketed to
return a sub-match of the string without its outer delimiters, if
we want to go that way.
> >At the moment there's not a mechanism to specify an alternate set
> >of delimiters from within a subrule, but this should be available in
> >the near future as a subrule parameter:
> >
> > <PGE::Text::balanced: {}()[]> # delimited by {}, (), []
> > <PGE::Text::balanced("<'\"")> # delimited by <>, ", '
>
> I have a use for this as soon as its done (or is it already done?)
It's already "done", but only using the colon+string argument version
above, and I have this nagging suspicious that syntax is going to be
removed from the spec (if it hasn't been removed already).
But at the moment, a string can be matched based on balanced and
nested parentheses using
<PGE::Text::balanced: ()>
> And thanks very much for adding this feature. It's a big help!
You're welcome. I like adding helpful features. :-)
Pm
Most uses past "give me a chunk of text between these delimiters" are
complex enough to require custom parsing. That is, you rarely care
about just the bracketing delimiters; you generally also want to
parse other information between the delimiters, and so would end up
writing your own rules for it anyway. It also seems like it might be
somewhat expensive to generate the whole tree. Those are two reasons
to say "we'll give you the small step of the contained string without
the bracketing delimiters, but for anything more complex, write your
own rule".
> Anyway, it's no problem for me to update PGE::Text::bracketed to
> return a sub-match of the string without its outer delimiters, if
> we want to go that way.
I'd certainly use it right away.
> It's already "done", but only using the colon+string argument version
> above, and I have this nagging suspicious that syntax is going to be
> removed from the spec (if it hasn't been removed already).
>
> But at the moment, a string can be matched based on balanced and
> nested parentheses using
>
> <PGE::Text::balanced: ()>
Ah-ha! This works:
p6rule('\d+ | <PGE::Text::bracketed: ">', 'PunieGrammar', 'term')
(I experimented with several variations of syntax today, but hadn't
hit on one that worked yet.)
Allison
Just keep in mind that PGE::Text::bracketed is using Text::Balanced
as its model. :-)
I'll make the delimiter-less results available shortly.
Pm
Now added to PGE::Text, r10402.
Pm