Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[regex] \&

0 views
Skip to first unread message

Ruud H.G. van Tol

unread,
Sep 26, 2005, 3:27:12 PM9/26/05
to perl6-l...@perl.org
Think about adding \& to the replacement part of a s///.

As in sed, the & means the whole match.


Then one can do

s/$search/*\&*/go

in stead of

s/($search)/*\1*/go

and there needs to be no $1 variable set up.

(I assume that using () always makes a $1 available, even if it is not
being used.)

--
Grtz, Ruud

Ruud H.G. van Tol

unread,
Sep 26, 2005, 4:41:21 PM9/26/05
to perl6-l...@perl.org
Juerd:
> Ruud H.G. van Tol:

>> Think about adding \& to the replacement part of a s///.
>> As in sed, the & means the whole match.
>

> Do you know Perl 5's $& variable? What you want isn't exactly new for
> Perl.

Yes I certainly know it. That it slows things down too.

A point was to not set up a variable. The \& is only a message for the
regex. An extra meaning for \1 could be to block the set up of a $1,
which might seem drastic but can help optimization.


>> s/($search)/*\1*/go
>
> \1 in Perl 5 is bad style and emits a warning, if you were clever
> enough to enable warnings. \1 in Perl 6 strings will no longer have
> anything to do with regex matches.

It could be reinstated as "living in regex-engine-space only".


>> (I assume that using () always makes a $1 available, even if it is
>> not being used.)
>

> Perl 5's $& is inefficient because of this. If the variable is used
> anywhere, Perl will for every regex used capture everything. An
> implicit match string is far less efficient than an explicit one, in
> terms of Perl 5. Perl 6, however, will handle things smarter and not
> copy the substring until it needs to be. That's why the equivalent of
> $& will be usable without any frowning.

OK.

--
Grtz, Ruud

Juerd

unread,
Sep 26, 2005, 4:19:29 PM9/26/05
to perl6-l...@perl.org
Ruud H.G. van Tol skribis 2005-09-26 21:27 (+0200):

> Think about adding \& to the replacement part of a s///.
> As in sed, the & means the whole match.

Do you know Perl 5's $& variable? What you want isn't exactly new for
Perl.

In Perl 6, the match object $/ will instead be used. It's a bit harder
to use with s///, because it will look ugly, but remember that you can
always choose to use s^^^ or s[][] or any other of the many
possibilities instead.

> s/($search)/*\1*/go

\1 in Perl 5 is bad style and emits a warning, if you were clever enough
to enable warnings. \1 in Perl 6 strings will no longer have anything to
do with regex matches.

> and there needs to be no $1 variable set up.

Perl 6 will count from 0, so it'll be $0.

> (I assume that using () always makes a $1 available, even if it is not
> being used.)

Perl 5's $& is inefficient because of this. If the variable is used


anywhere, Perl will for every regex used capture everything. An implicit
match string is far less efficient than an explicit one, in terms of
Perl 5. Perl 6, however, will handle things smarter and not copy the
substring until it needs to be. That's why the equivalent of $& will be
usable without any frowning.

> Grtz, Ruud

K vnd grtz n btj mljk t lzn, n d z mt ntrljk n s zjn.


Juerd
--
http://convolution.nl/maak_juerd_blij.html
http://convolution.nl/make_juerd_happy.html
http://convolution.nl/gajigu_juerd_n.html

David Storrs

unread,
Sep 26, 2005, 5:00:00 PM9/26/05
to Perl6 Language List, David Storrs

On Sep 26, 2005, at 4:19 PM, Juerd wrote:
> Perl 5's $& is inefficient because of this. If the variable is used
> anywhere, Perl will for every regex used capture everything.

My understanding is that this died with 5.10. Is that right?

--Dks

Nicholas Clark

unread,
Sep 26, 2005, 5:33:31 PM9/26/05
to David Storrs, Perl6 Language List

$& is dynamically scoped (rather than lexically scoped).
I don't believe that it's possible to avoid capturing it anywhere, without
affecting correctness somehow.

There is already an optimisation to avoid capturing it if $& is never seen,
but even that is actually buggy:

$ perl -lwe '$_ = "trouble"; /o/; print eval q.$&.'
Use of uninitialized value in print at -e line 1.

compare with

$ perl -lwe '$&; $_ = "trouble"; /o/; print eval q.$&.'
Useless use of a variable in void context at -e line 1.
o

where by mentioning $& I set the "seen $& somewhere" flag, so $& is
captured, and it's there when the '' eval gets compiled at run time.

IIRC the perl 6 equivalents of $` $& and $' are all lexical rather than
dynamic, so the pain will be far less. (at least in its scope)

Nicholas Clark

Ruud H.G. van Tol

unread,
Sep 30, 2005, 5:54:08 AM9/30/05
to perl6-l...@perl.org
Juerd:
> Ruud H.G. van Tol:

>> s/($search)/*\1*/go


>
> \1 in Perl 5 is bad style and emits a warning

The point was to give \1 and \&, in the replace part, a very limited
scope.

Maybe even better to limit \1 to the first '(?: ... )' in the search
part.

s/(?:$search)(?:.\1)+/\1/go

xy.xy.xy.xy --> xy


But if Perl6 can do the same with

s/($search)(.\1)+/$1/go

by detecting that the possible $1 and $2 and $& (or new equivalents) are
(almost certainly) not going to be used, that's of course best.


A '+' can often be optimized to a {2,}. In this case:

s/($search)+/$1/

only if the resulting count is never used.

--
Grtz, Ruud

Larry Wall

unread,
Sep 30, 2005, 11:58:22 AM9/30/05
to perl6-l...@perl.org
On Mon, Sep 26, 2005 at 10:19:29PM +0200, Juerd wrote:
: In Perl 6, the match object $/ will instead be used. It's a bit harder

: to use with s///, because it will look ugly, but remember that you can
: always choose to use s^^^ or s[][] or any other of the many
: possibilities instead.

It's always bothered me a little to use $/ "the object" when you
want to refer explicitly to the string matched, especially if the
object knows it matched more than the string is officially matching.
I think we could go as far as to say that $<> is the name of the text
that would be returned by ~$/ and the number that would be returned
by +$/. If we did that, I think we could get away with making

/frontstuff < \w* > backstuff/

a shorthand for

/<pre frontstuff> $<>:=( \w* ) <post backstuff>/

The space after the < would be required, of course. It works because
in the <foo \d*> form, the default is to take the argument as rule,
and here we merely have a null "foo".

That gives us cool things like

s/back \s+ < \d+ > \s+ times/{ $<> + 1 }/

to increment the number of times the quick brown fox jumped over the
lazy dog's back.

Larry

0 new messages