[perl #41623] [TODO] modify p6regex op naming convention to match perl 6

Jerry Gay

unread,

Feb 26, 2007, 10:09:58 AM2/26/07

to bugs-bi...@rt.perl.org

# New Ticket Created by Jerry Gay
# Please include the string: [perl #41623]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=41623 >

pge's syntax for specifying ops to the op precedence parser should
follow the perl 6 spec in it's op rule naming convention. that is,
'infix:+'
'circumfix:( )'

should be
infix:<+>
circumfix:<( )>

~jerry

Patrick R. Michaud

unread,

Feb 26, 2007, 10:24:29 AM2/26/07

to perl6-i...@perl.org

We should also note that with Larry's recent work
on an "official" Perl 6 grammar [1], the syntax for
defining tokens may in fact be radically changing
from what pgc is currently using.. For example,
infix:<+> appears to be written as

token infix ( --> Additive) #+ +
{ <sym: +> {*} } #= +

and circumfix:<( )> is

token circumfix ( --> Term) #+ ( )
{ $ <EXPR> $ { @<sym>:=<( )> } {*} } #= ( )

So there could be more substantial pgc formatting changes
going on here than meets the eye. (I'm still figuring out
what all of the syntax means in the Perl-6.0.0-STD.pm
file.)

1. http://svn.pugscode.org/pugs/src/perl6/Perl-6.0.0-STD.pm

Pm

Larry Wall

unread,

Feb 26, 2007, 2:45:27 PM2/26/07

to perl6-i...@perl.org

On Mon, Feb 26, 2007 at 09:24:29AM -0600, Patrick R. Michaud wrote:

What's basically going on here is the attempt to detangle the user's
namespace from the grammar's namespace. The user will name operators
using &infix:<+>, but if we use the same name for the parsing rule
as for the eventual operator, a name collision results. &infix:<+>
is just a funny name for a function that adds two things, not the name
for the parse rule that parses plus. If we were to refer to &infix:<+>
within the grammar, it would be the grammar's *own* definition of
the plus operator, not the one we're trying to define for the user.

In order to distinguish the names I originally went with a form
more like this:

token PARSE_circumfix:<( )>
{ $ <EXPR> $ }

But I noticed several problems with that. First is simply the ugliness
of the implied name mangling. Next is the redundancy of having to
repeat the symbol. Nearly all of the rules end up parsing exactly
the same prefix as the symbol name, though in the case of circumfix
you see that the actual symbol comes in two pieces in the regex.

Another problem is that the symbol is hardwired into the name, so you
can't write a rule that parses to more than one symbol. Both of these
problems go away if we simply construct the symbol from the name of
the rule plus the list of $<sym> bindings. That also lets us leave
out the PARSE kludge. And $<sym> bindings already automatically
handle multiple bindings by generating a list, which is exactly what
circumfix:<( )> wantss for its pseudo-subscript.

Another problem with that form is that the precedence is unspecified.
I first tried to mix in the precedence of these various operators
via property or role, but then realized I needed to generate the
precedence on the fly anyway for certain meta-operators, and the
natural place to handle that is in the return processing from the rule.
One would like to simultaneously have a declarative solution that
eventually turns into a call to something we can tweak. That's what
the ( --> Additive) notation gives us--that's just a return type
coercion in Perl 6. And why introduce new traits when the signature
is already supposed to handle return type coercion?

The other stuff in the file can mostly be ignored unless you're writing
a bootstrapping parser, in which case you might want to examine
the #+ and #= comments and the {*} action points to preprocess the
file into something your bootstrap compiler can handle, since most
such processors will not be expected to handle full P6 syntax. Indeed,
not even pugs can parse the file currently without the help of a
preprocessor.

The other big thing that's going on in Perl-6.0.0-STD.pm is that
I've assumed multi dispatch semantics for rules that have the same
name but can be differentiated by their longest-token prefix (and
by their normal function arguments, if there are any). So when you
call a rule like <circumfix> you are calling into all the rules name
"circumfix", whether defined by your grammar or by some derived
grammar. Deriving grammars is how the user can add things like
their own circumfix macro without influencing anything outside their
lexical scope. Allowing people to do multi dispatch based on what
the pattern actually matches rather than an artifically generated
name prevents a large class of errors, I expect. Otherwise you
end up generating arrays or hashes of operators to dispatch to, and
such data structures cannot be overridden piecemeal within a derived
grammar without reinventing the dispatcher (badly).

Anyway, that's where it stands at the moment. What you see in the
latest Perl 6 grammar is the result of trying to following your basic,
everyday design principles like:

Don't Repeat Yourself
Don't Force Artificial Name Generation
Don't Decide Things Prematurely
Limit Damage to the Smallest Scope
Avoid Magical Action at a Distance
Reduce, Reuse, Recycle... :)

Larry