Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

PGE::Text::bracketed added to PGE

1 view
Skip to first unread message

Patrick R. Michaud

unread,
Oct 17, 2005, 5:15:09 PM10/17/05
to perl6-i...@perl.org, perl6-c...@perl.org
I've just added a <PGE::Text::bracketed> subrule to PGE,
which is roughly analogous to the "bracketed" function in
Perl 5's Text::Balanced.

Like most PGE subrules, PGE::Text::bracketed can be called
as a subrule in a rule expression or directly via a subroutine call.
Thus, to extract quote-delimited text from a string:

.local pmc bracketed
bracketed = find_global "PGE::Text", "bracketed"

$P0 = bracketed("'this quoted string' and other stuff", "'")

returns a match object into $P0 that contains C< 'this quoted string' >.
Delimiters can be escaped, thus

$P0 = bracketed("'Don\'t do that,' he said", "'")

returns a match object corresponding to C< 'Don\'t do that,' >.

PGE::Text::bracketed also understands balanced delimiters of (), [],
<>, and {}, as long as they are properly nested:

$P0 = bracketed("{ match (this) please } but not this", "(){}")
print $P0 # outputs { match (this) please }

$P0 = bracketed("{ match (this please }", "(){}")
# match fails, "(" has no closing paren

$P0 = bracketed("{ match \(this please }", "(){}")
# succeeds, paren is escaped

Quotes can be included in the delimiter to surround unbalanced
delimiters:

$P0 = bracketed("{ The '(' paren is quoted }", "{}()") # match fails
$P0 = bracketed("{ The '(' paren is quoted }", "{}()'") # match succeeds

Lastly, PGE::Text::bracketed can be used as a subrule in rule expressions:

.local pmc p6rule, rulesub
p6rule = find_global "PGE", "p6rule"
rulesub = p6rule(":w <ident> \:= <PGE::Text::bracketed> ")

$P0 = rulesub(" a := ( a + b(3) - 4 ) ; "
# $P0['ident'] == 'a'
# $P0['PGE::Text::bracketed'] == '( a + b(3) - 4 )'

At the moment there's not a mechanism to specify an alternate set
of delimiters from within a subrule, but this should be available in
the near future as a subrule parameter:

<PGE::Text::balanced: {}()[]> # delimited by {}, (), []
<PGE::Text::balanced("<'\"")> # delimited by <>, ", '

Comments, questions, suggestions, and tests (in t/p6rules/text_brk.t)
welcomed!

Pm

Allison Randal

unread,
Dec 7, 2005, 6:17:10 PM12/7/05
to Patrick R.Michaud, perl6-c...@perl.org
On Oct 17, 2005, at 14:15, Patrick R. Michaud wrote:

> I've just added a <PGE::Text::bracketed> subrule to PGE,
> which is roughly analogous to the "bracketed" function in
> Perl 5's Text::Balanced.
>
> Like most PGE subrules, PGE::Text::bracketed can be called
> as a subrule in a rule expression or directly via a subroutine call.
> Thus, to extract quote-delimited text from a string:
>
> .local pmc bracketed
> bracketed = find_global "PGE::Text", "bracketed"
>
> $P0 = bracketed("'this quoted string' and other stuff", "'")
>
> returns a match object into $P0 that contains C< 'this quoted
> string' >.
> Delimiters can be escaped, thus

Shouldn't it contain C<this quoted string>? That is, shouldn't it
remove the bracketing characters? Or at least hold the string without
the brackets somewhere within the PGE::Text object? I'm getting:

<PunieGrammar::term> => PMC 'PunieGrammar' => "(ok 1)" @ 6 {
<PGE::Text::bracketed> => PMC 'PGE::Text' => "(ok 1)" @ 6
}

But want something more like:

<PunieGrammar::term> => PMC 'PunieGrammar' => "(ok 1)" @ 6 {
<PGE::Text::bracketed> => PMC 'PGE::Text' => "(ok 1)" @ 6 {
[0] => PMC 'PunieGrammar' => "ok 1" @ 7
}
}

So I can extract the parsed string. (If you squint and read "()" as
double quotes, it makes more sense.)

> At the moment there's not a mechanism to specify an alternate set
> of delimiters from within a subrule, but this should be available in
> the near future as a subrule parameter:
>
> <PGE::Text::balanced: {}()[]> # delimited by {}, (), []
> <PGE::Text::balanced("<'\"")> # delimited by <>, ", '

I have a use for this as soon as its done (or is it already done?)


And thanks very much for adding this feature. It's a big help!

Allison

Patrick R. Michaud

unread,
Dec 7, 2005, 6:34:50 PM12/7/05
to Allison Randal, perl6-c...@perl.org
On Wed, Dec 07, 2005 at 03:17:10PM -0800, Allison Randal wrote:
> On Oct 17, 2005, at 14:15, Patrick R. Michaud wrote:
>
> >I've just added a <PGE::Text::bracketed> subrule to PGE,
> >which is roughly analogous to the "bracketed" function in
> >Perl 5's Text::Balanced.
> >
>
> Shouldn't it contain C<this quoted string>? That is, shouldn't it
> remove the bracketing characters? Or at least hold the string without
> the brackets somewhere within the PGE::Text object?

For this I was following the design of "extract_bracketed" in
Perl 5's Text::Balanced, which returns the delimiters as part
of the string. I agree it would be nice for PGE::Text::bracketed
to also return the string without the outer delimiters somewhere.
But this begs a larger design issue -- should it be able to return
all nested and balanced substrings without their delims (and if so,
what should the resulting Match object structure look like)?

Anyway, it's no problem for me to update PGE::Text::bracketed to
return a sub-match of the string without its outer delimiters, if
we want to go that way.

> >At the moment there's not a mechanism to specify an alternate set
> >of delimiters from within a subrule, but this should be available in
> >the near future as a subrule parameter:
> >
> > <PGE::Text::balanced: {}()[]> # delimited by {}, (), []
> > <PGE::Text::balanced("<'\"")> # delimited by <>, ", '
>
> I have a use for this as soon as its done (or is it already done?)

It's already "done", but only using the colon+string argument version
above, and I have this nagging suspicious that syntax is going to be
removed from the spec (if it hasn't been removed already).

But at the moment, a string can be matched based on balanced and
nested parentheses using

<PGE::Text::balanced: ()>

> And thanks very much for adding this feature. It's a big help!

You're welcome. I like adding helpful features. :-)

Pm

Allison Randal

unread,
Dec 7, 2005, 8:17:23 PM12/7/05
to Patrick R.Michaud, perl6-c...@perl.org
On Dec 7, 2005, at 15:34, Patrick R. Michaud wrote:
>
> For this I was following the design of "extract_bracketed" in
> Perl 5's Text::Balanced, which returns the delimiters as part
> of the string. I agree it would be nice for PGE::Text::bracketed
> to also return the string without the outer delimiters somewhere.
> But this begs a larger design issue -- should it be able to return
> all nested and balanced substrings without their delims (and if so,
> what should the resulting Match object structure look like)?

Most uses past "give me a chunk of text between these delimiters" are
complex enough to require custom parsing. That is, you rarely care
about just the bracketing delimiters; you generally also want to
parse other information between the delimiters, and so would end up
writing your own rules for it anyway. It also seems like it might be
somewhat expensive to generate the whole tree. Those are two reasons
to say "we'll give you the small step of the contained string without
the bracketing delimiters, but for anything more complex, write your
own rule".

> Anyway, it's no problem for me to update PGE::Text::bracketed to
> return a sub-match of the string without its outer delimiters, if
> we want to go that way.

I'd certainly use it right away.

> It's already "done", but only using the colon+string argument version
> above, and I have this nagging suspicious that syntax is going to be
> removed from the spec (if it hasn't been removed already).
>
> But at the moment, a string can be matched based on balanced and
> nested parentheses using
>
> <PGE::Text::balanced: ()>

Ah-ha! This works:

p6rule('\d+ | <PGE::Text::bracketed: ">', 'PunieGrammar', 'term')

(I experimented with several variations of syntax today, but hadn't
hit on one that worked yet.)

Allison

Patrick R. Michaud

unread,
Dec 7, 2005, 8:37:36 PM12/7/05
to Allison Randal, perl6-c...@perl.org
On Wed, Dec 07, 2005 at 05:17:23PM -0800, Allison Randal wrote:
> >But at the moment, a string can be matched based on balanced and
> >nested parentheses using
> >
> > <PGE::Text::balanced: ()>
>
> Ah-ha! This works:
>
> p6rule('\d+ | <PGE::Text::bracketed: ">', 'PunieGrammar', 'term')
>
> (I experimented with several variations of syntax today, but hadn't
> hit on one that worked yet.)

Just keep in mind that PGE::Text::bracketed is using Text::Balanced
as its model. :-)

I'll make the delimiter-less results available shortly.

Pm

Patrick R. Michaud

unread,
Dec 8, 2005, 10:39:03 AM12/8/05
to Allison Randal, perl6-c...@perl.org
On Wed, Dec 07, 2005 at 05:34:50PM -0600, Patrick R. Michaud wrote:
> On Wed, Dec 07, 2005 at 03:17:10PM -0800, Allison Randal wrote:
> > Shouldn't it contain C<this quoted string>? That is, shouldn't it
> > remove the bracketing characters? Or at least hold the string without
> > the brackets somewhere within the PGE::Text object? I'm getting:
> >
> > <PunieGrammar::term> => PMC 'PunieGrammar' => "(ok 1)" @ 6 {
> > <PGE::Text::bracketed> => PMC 'PGE::Text' => "(ok 1)" @ 6
> > }
> >
> >But want something more like:
> >
> > <PunieGrammar::term> => PMC 'PunieGrammar' => "(ok 1)" @ 6 {
> > <PGE::Text::bracketed> => PMC 'PGE::Text' => "(ok 1)" @ 6 {
> > [0] => PMC 'PunieGrammar' => "ok 1" @ 7
> > }
> > }
> > Shouldn't it contain C<this quoted string>? That is, shouldn't it
> > remove the bracketing characters? Or at least hold the string without
> > the brackets somewhere within the PGE::Text object?
>
> Anyway, it's no problem for me to update PGE::Text::bracketed to
> return a sub-match of the string without its outer delimiters, if
> we want to go that way.

Now added to PGE::Text, r10402.

Pm

0 new messages