Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

zip: stop when and where?

1 view
Skip to first unread message

Juerd

unread,
Oct 4, 2005, 3:00:15 PM10/4/05
to perl6-l...@perl.org
What should zip do given 1..3 and 1..6?

(a) 1 1 2 2 3 3 4 5 6
(b) 1 1 2 2 3 3 undef 4 undef 5 undef 6
(c) 1 1 2 2 3 3
(d) fail

I'd want c, mostly because of code like

for @foo Y 0... -> $foo, $i { ... }

Pugs currently does b.


Juerd
--
http://convolution.nl/maak_juerd_blij.html
http://convolution.nl/make_juerd_happy.html
http://convolution.nl/gajigu_juerd_n.html

Joshua Gatcomb

unread,
Oct 4, 2005, 3:15:23 PM10/4/05
to Juerd, perl6-l...@perl.org
On 10/4/05, Juerd <ju...@convolution.nl> wrote:
>
> What should zip do given 1..3 and 1..6?
>
> (a) 1 1 2 2 3 3 4 5 6
> (b) 1 1 2 2 3 3 undef 4 undef 5 undef 6
> (c) 1 1 2 2 3 3
> (d) fail
>
> I'd want c, mostly because of code like
>
> for @foo Y 0... -> $foo, $i { ... }
>
> Pugs currently does b.


Yeah. This is one of those things where it is hard to have a single function
always DWYM. Algorithm::Loops solves this by just providing multiple
functions. I can't see how to solve this using MMD alone. You would need to
add an optional parameter that would specify behavior

-min (zip to the smallest list)
-undef (insert undefs as needed)
-error (blow up if the lists are not equal in size)

etc

Juerd
>

Just my 2 cents from the peanut gallery.

Cheers,
Joshua Gatcomb
a.k.a. L~R

Greg Woodhouse

unread,
Oct 4, 2005, 3:15:53 PM10/4/05
to Juerd, perl6-l...@perl.org
That (b) certainly seems like the sensible option to me. My second
choice would be d.

A nice thing about c is that it leaves open the possibility of lazy
evaluation (zip as much of the lists as you can, leaving open the
possibility of picking up the process later). But I still prefer b.
Maybe there could be separate "lazy zip" (lzip?).

--- Juerd <ju...@convolution.nl> wrote:

===
Gregory Woodhouse <gregory....@sbcglobal.net>

"Without the requirement of mathematical aesthetics a great many discoveries would not have been made."

-- Albert Einstein

Eric

unread,
Oct 4, 2005, 3:29:42 PM10/4/05
to Juerd, perl6-l...@perl.org
Hey,
I'd just like to say that I find B a bit misleading because you couldn't
tell that the first list ended, it could just have undef's at the end. I
like a because it doesn't add any data that wasn't there, of course that
could be a reason to dislike it too. On the other hand c makes a good option
when you want to work with infinite lists. Is this something that could be
modified on per use basis and we just choose one now as the default "they
didn't request a specific one so use this one).

After all that i think I agree on C specificaly because you can provide a
good code use of it and it doesn't add any data that wasn't there before. I
don't think it should ever lean towards (b) but them I bet someone else will
have an equaly good use of that. ;) So in the end I think some way of
chooseing would be good, with one option picked as standard.

--
Eric Hodges

Jonathan Scott Duff

unread,
Oct 4, 2005, 3:30:06 PM10/4/05
to Juerd, perl6-l...@perl.org
On Tue, Oct 04, 2005 at 09:00:15PM +0200, Juerd wrote:
> What should zip do given 1..3 and 1..6?
>
> (a) 1 1 2 2 3 3 4 5 6
> (b) 1 1 2 2 3 3 undef 4 undef 5 undef 6
> (c) 1 1 2 2 3 3
> (d) fail
>
> I'd want c, mostly because of code like
>
> for @foo Y 0... -> $foo, $i { ... }
>
> Pugs currently does b.

(a) and (d) are certainly wrong IMHO.

Surely zip could get a modifier to vary the behavior as desired?

for @foo Ą 0... :greedy -> $foo, $i { ... } # (b)
for @foo Ą 0... :conservative -> $foo, $i { ... } # (c)

Didn't we go over this a while back?

Anyway, I agree that (c) is probably the sanest default behavior.

-Scott
--
Jonathan Scott Duff
du...@pobox.com

Greg Woodhouse

unread,
Oct 4, 2005, 3:39:08 PM10/4/05
to Eric, Juerd, perl6-l...@perl.org
I see your point. Option b does suggest that you can read ahead in a
"blocked" list and get undef's. If I had to choose just one, I think
I'd opt for d, but having two zip's one acting like c and one like d
might be useful. Then, of course, my first thought was wrong. This one
may well be, too.

--- Eric <eri...@gmail.com> wrote:

===

Damian Conway

unread,
Oct 4, 2005, 8:05:12 PM10/4/05
to perl6-l...@perl.org
Juerd wrote:

> What should zip do given 1..3 and 1..6?
>
> (a) 1 1 2 2 3 3 4 5 6
> (b) 1 1 2 2 3 3 undef 4 undef 5 undef 6
> (c) 1 1 2 2 3 3
> (d) fail
>
> I'd want c, mostly because of code like
>
> for @foo Y 0... -> $foo, $i { ... }
>
> Pugs currently does b.

I agree that C<zip> should have named options (perhaps :min and :max) that
allow precise behaviour to be specified.

I suspect that the dwimmiest default would be for C<zip> to stop zipping at
the length of the shortest finite argument. And to fail unless all finite
arguments are of the same length. Hence:

@i3 = 1..3 ;
@a3 = 'a'..'c' ;
@i6 = 1..6 ;

zip(@a3, @i3) # 'a', 1, 'b', 2, 'c', 3
zip(@i3, @i6) # fail
zip(100..., @a3, @i3) # 100, 'a', 1, 101, 'b', 2, 102, 'c', 3
zip(100..., @a3, @i6) # fail

Damian

Luke Palmer

unread,
Oct 4, 2005, 8:10:53 PM10/4/05
to Juerd, perl6-l...@perl.org
On 10/4/05, Juerd <ju...@convolution.nl> wrote:
> What should zip do given 1..3 and 1..6?
>
> (a) 1 1 2 2 3 3 4 5 6
> (b) 1 1 2 2 3 3 undef 4 undef 5 undef 6
> (c) 1 1 2 2 3 3
> (d) fail
>
> I'd want c, mostly because of code like
>
> for @foo Y 0... -> $foo, $i { ... }
>
> Pugs currently does b.

I think (c) is correct, precisely for this reason. The idiom:

for 0... Y @array -> $index, $elem {...}

Is one we're trying to create. If it involves a pain like:

for 0... Y @array -> $index, $elem {
$elem // last;
}

Then it's not going to be a popular idiom.

If you want behavior (b), SWIM:

for 0... Y @array, undef xx Inf -> $index, $elem {
...
}

If that ends up being common, we could create a syntax for it, like
postfix:<...>:

@array... # same as (@array, undef xx Inf)

Luke

Luke Palmer

unread,
Oct 4, 2005, 8:27:56 PM10/4/05
to Juerd, perl6-l...@perl.org
On 10/4/05, Luke Palmer <lrpa...@gmail.com> wrote:
> If that ends up being common, we could create a syntax for it, like
> postfix:<...>:
>
> @array... # same as (@array, undef xx Inf)

No, no, that's a bad idea, because:

@array... # same as @array.elems..Inf

So I think I'm pretty much with Damian on this one. I don't like the
idea of it discriminating between finite and infinite lists, though.
What about things like =<>, for which it is never possible to know if
it is infinite?

I don't think people make assumptions about the zip operator. "Does
it quit on the shortest one or the longest one?" seems like a pretty
common question for a learning Perler to ask. That means they'll
either write a little test or look it up in the docs, and we don't
need to be so strict about its failure. I'd like to go with the
minimum.

I was thinking a good name for the adverbs would be :long and :short.

Luke

Michele Dondi

unread,
Oct 5, 2005, 5:47:59 AM10/5/05
to Eric, Juerd, perl6-l...@perl.org
On Tue, 4 Oct 2005, Eric wrote:

> I'd just like to say that I find B a bit misleading because you couldn't
> tell that the first list ended, it could just have undef's at the end. I

Well, OTOH undef is now a more complex object than it used to be, so there
may be cheap workarounds. Of course one would still like reasonable
defaults and dwimmeries on commonly used idioms...


Michele
--
> Darl MacBride, is that you? They said over at Groklaw that the folks
> at SCO were trying to discredit Open Source.
SCO, a company traditionally run by Mormons, but they had to downsize
the second m? Well, they still have M for capital.
- David Kastrup in comp.text.tex, "Re: Is Kastrup..."

Juerd

unread,
Oct 5, 2005, 12:19:34 PM10/5/05
to Damian Conway, perl6-l...@perl.org
Damian Conway skribis 2005-10-05 10:05 (+1000):

> I suspect that the dwimmiest default would be for C<zip> to stop zipping at
> the length of the shortest finite argument. And to fail unless all finite
> arguments are of the same length.

This is a nice compromise.

But what if you cannot know whether a list is finite?

my @foo = slurp ...; # lazy, but can be either finite or infinite
my @bar = 1..10;

say @foo Y @bar; # ?

Bryan Burgers

unread,
Oct 5, 2005, 1:06:14 PM10/5/05
to perl6-l...@perl.org
I guess nobody mentioned this, so I don't know how people on perl-language
feel about 'do it the same was as <language>', but I took a small jump into
Haskell a while back (barely enough to consider myself a beginner), but even
after just a little bit of time with it, I think I'd almost expect the
default zip behavior to stop zipping after the least amount of elements.

Damian Conway

unread,
Oct 5, 2005, 7:49:24 PM10/5/05
to perl6-l...@perl.org
I've been thinking about this issue some more and it occurs to me that we
might be thinking about this the wrong way.

Providing a :fillin() adverb on C<zip> is a suboptimal solution, because it
implies that you would always want to fill in *any* gap with the same value.
While that's likely in a two-way zip, it seems much less likely in a multiway zip.

So I now propose that C<zip> works like this:

C<zip> interleaves elements from each of its arguments until
any argument is (a) exhausted of elements I<and> (b) doesn't have
a C<fill> property.

Once C<zip> stops zipping, if any other element has a known finite
number of unexhausted elements remaining, the <zip> fails.

In other words, you get:

@i3 = 1..3 ;
@i4 = 1..4 ;


@a3 = 'a'..'c' ;

zip(@a3, @i3) # 'a', 1, 'b', 2, 'c', 3
zip(@i3, @i4) # fail

zip(100..., @a3, @i3) # 100, 'a', 1, 101, 'b', 2, 102, 'c', 3

zip(100..., @a3, @i4) # fail

zip(@a3 but fill(undef), @i4) # 'a', 1, 'b', 2, 'c', 3, undef, 4

zip(1..6, @i3 but fill(3), @i4 but fill('?'))
# 1,1,1,2,2,2,3,3,3,4,3,4,5,3,'?',6,3,'?'


Damian

David Storrs

unread,
Oct 5, 2005, 8:08:00 PM10/5/05
to Perl6 Language List

On Oct 5, 2005, at 7:49 PM, Damian Conway wrote:

> Providing a :fillin() adverb on C<zip> is a suboptimal solution,
> because it implies that you would always want to fill in *any* gap
> with the same value. While that's likely in a two-way zip, it seems
> much less likely in a multiway zip.

I actually have no problem with the solution you suggest (although I
rather like my idea about being able to 'fill in' with a control
exception), but I do have a question. If you want a multiway zip
with differing fillins, can't you do this?

@foo = 1..10 ¥:fill(0) 'a'..c' ¥:fill('x') ¥ 1..50;

Assuming, of course, that it is possible to stick an adverb on the op
as I was requesting.

--Dks

Damian Conway

unread,
Oct 5, 2005, 8:22:55 PM10/5/05
to Perl6 Language List
David Storrs asked:

> If you want a multiway zip with
> differing fillins, can't you do this?
>
> @foo = 1..10 •:fill(0) 'a'..c' •:fill('x') • 1..50;

I don't think that works. For example, why does the :fill(0) of the first •
apply to the 1..10 argument instead of to the 'a'..'c' argument? Especially
when it's the 'a'..'c' argument that's the shorter of the two!

Besides which, adverbs, being optional, come at the end of an operator's
argument list. Moreover, it's unclear to me where how they are applied at all
to an n-ary operator like •.

On top of which, even if it did work, that formulation doesn't help at all if
you don't have Unicode available and are therefore forced to use C<zip>.


> Assuming, of course, that it is possible to stick an adverb on the op
> as I was requesting.

My recollection is that $Larry has previously said that this is not the
case...that adverbs are suffixed.

Damian

Luke Palmer

unread,
Oct 5, 2005, 8:41:50 PM10/5/05
to Damian Conway, perl6-l...@perl.org
On 10/5/05, Damian Conway <dam...@conway.org> wrote:
> So I now propose that C<zip> works like this:
>
> C<zip> interleaves elements from each of its arguments until
> any argument is (a) exhausted of elements I<and> (b) doesn't have
> a C<fill> property.
>
> Once C<zip> stops zipping, if any other element has a known finite
> number of unexhausted elements remaining, the <zip> fails.

Wow, that's certainly not giving the user any credit.

I'm just wondering why you feel that we need to be so careful.

Luke

Damian Conway

unread,
Oct 5, 2005, 10:45:05 PM10/5/05
to perl6-l...@perl.org
Luke wrote:

>> Once C<zip> stops zipping, if any other element has a known finite
>> number of unexhausted elements remaining, the <zip> fails.
>
> Wow, that's certainly not giving the user any credit.

Actually, I want to be careful because I give the users too much credit. For
imagination.


> I'm just wondering why you feel that we need to be so careful.

Because I can think of at least three reasonable and useful default behaviours
for zipping lists of differing lengths:

# Minimal (stop at first exhausted list)...
for @names ¥ @addresses -> $name, $addr {
...
}


# Maximal (insert undefs for exhausted lists)...
for @finishers ¥ (10..1 :by(-1)) -> $name, $score {
$score err next;
...
}


# Congealed (ignore exhausted lists)...
for @queue1 ¥ @queue2 -> $server {
...
}

Which means that there will be people who expect each of those to *be* the
default behaviour for unbalanced lists. Which means there shouldn't be any
default for unbalanced lists, since whatever that default is won't DWIM for
2/3 of the potential users. Which means that unbalanced lists ought to produce
an error, unless the user specifies how to deal with the imbalance.

Damian

Luke Palmer

unread,
Oct 6, 2005, 12:31:50 PM10/6/05
to Damian Conway, perl6-l...@perl.org
On 10/5/05, Damian Conway <dam...@conway.org> wrote:
> Luke wrote:
> > I'm just wondering why you feel that we need to be so careful.
>
> Because I can think of at least three reasonable and useful default behaviours
> for zipping lists of differing lengths:
>
> # Minimal (stop at first exhausted list)...
> for @names ¥ @addresses -> $name, $addr {
> ...
> }
>
>
> # Maximal (insert undefs for exhausted lists)...
> for @finishers ¥ (10..1 :by(-1)) -> $name, $score {
> $score err next;
> ...
> }
>
>
> # Congealed (ignore exhausted lists)...
> for @queue1 ¥ @queue2 -> $server {
> ...
> }
>
> Which means that there will be people who expect each of those to *be* the
> default behaviour for unbalanced lists.

Perhaps that makes sense. That certainly makes sense for other kinds
of constructs. Something makes me think that this is a little
different. Whenever somebody asks what "Y" is on #perl6, and I tell
them that it interleaves two lists, a follow-up question is *always*
"what does it do when the lists are unbalanced." Now, that may just
be a behavior of #perl6ers, but I'm extrapolating. It means that
there isn't an assumption, and if they weren't #perl6ers, they'd RTFM
about it.

When I learned Haskell and saw zip, I asked the very same question[1].
I was about as comfortable writing Haskell at that point as beginning
programmers are with writing Perl, but it still took me about ten
seconds to write a test program to find out. The rest of Perl doesn't
trade a reasonable default behavior for an error, even if it *might*
be surprising the first time you use it. It doesn't take people long
to discover that kind of error and never make that mistake again.

If we make zip return a list of tuples rather than an interleaved
list, we could eliminate the final 1/3 of those errors above using the
typechecker. That would make the for look like this:

for @a Y @b -> ($a, $b) {...}

An important property of that is the well-typedness of the construct.
With the current zip semantics:

my A @a;
my B @b;
for @a Y @b -> $a, $b {
# $a has type A (+) B
# $b has type A (+) B
}

With tuple:

my A @a;
my B @b;
for @a Y @b -> ($a, $b) {
# $a has type A
# $b has type B
}

Which is more correct. No... it's just correct, no superlative
needed. It also keeps things like this from happening:

for @a Y @b -> $a, $b {
say "$a ; $b"
}
# a1 b1
# a2 b2
# a3 b3
# ...

"Oh, I need a count," says the user:

for @a Y @b Y 0... -> $a, $b { # oops, forgot to add $index
say "$a ; $b"
}
# a1 b1
# 0 a2
# b2 1
# ...

Luke

[1] But I didn't need to. The signature told me everything:

zip :: [a] -> [b] -> [(a,b)]

It *has* to stop at the shortest one, because it has no idea how to
create a "b" unless I tell it one. If it took the longest, the
signature would have looked like:

zip :: [a] -> [b] -> [(Maybe a, Maybe b)]

Anyway, that's just more of the usual Haskell praise.

Dave Whipp

unread,
Oct 6, 2005, 12:57:42 PM10/6/05
to perl6-l...@perl.org
Luke Palmer wrote:

> zip :: [a] -> [b] -> [(a,b)]
>
> It *has* to stop at the shortest one, because it has no idea how to
> create a "b" unless I tell it one. If it took the longest, the
> signature would have looked like:
>
> zip :: [a] -> [b] -> [(Maybe a, Maybe b)]
>
> Anyway, that's just more of the usual Haskell praise.

Given that my idea about using optional binding for look-ahead didn't
fly, maybe it would work here, instead:

@a Y @b -> $a, $b { ... } # stop at end of shortest
@a Y @b -> $a, ?$b { ... } # keep going until @a is exhaused
@a Y @b -> ?$a, ?$b { ... } # keep going until both are exhaused

I think we still need a way to determine if an optional arg is bound.
Can the C<exists> function be used for that ("if exists $b {...}")?


Dave.

Jonathan Scott Duff

unread,
Oct 6, 2005, 1:17:13 PM10/6/05
to Luke Palmer, Damian Conway, perl6-l...@perl.org
On Thu, Oct 06, 2005 at 10:31:50AM -0600, Luke Palmer wrote:
> If we make zip return a list of tuples rather than an interleaved
> list, we could eliminate the final 1/3 of those errors above using the
> typechecker. That would make the for look like this:
>
> for @a Y @b -> ($a, $b) {...}

I like it (I think). I'm not sure about the syntax though. Is this one
of those places where round brackets are equivalent to square brackets?
I.e., would this be the same:

for @a • @b -> [$a,$b] { ... }

?

Also, it seems like this syntax would almost always require the brackets
to be correct. Most of the time people will see and expect for loops
that look like this:

for MUMBLE -> $a, $b { ... }

Except now they've probably got a semantic error when MUMBLE contains •
or is prefixed by zip. This type of error mayn't be so easy to detect
depending on what they're mumbling about.

Juerd

unread,
Oct 6, 2005, 2:40:34 PM10/6/05
to perl6-l...@perl.org
Dave Whipp skribis 2005-10-06 9:57 (-0700):

> Given that my idea about using optional binding for look-ahead didn't
> fly, maybe it would work here, instead:
> @a Y @b -> $a, $b { ... } # stop at end of shortest
> @a Y @b -> $a, ?$b { ... } # keep going until @a is exhaused
> @a Y @b -> ?$a, ?$b { ... } # keep going until both are exhaused
> I think we still need a way to determine if an optional arg is bound.
> Can the C<exists> function be used for that ("if exists $b {...}")?

Y isn't something that is specific to for loops, or to sub invocation,
so this cannot be a solution.

Also remember that Y creates a single flattened list by definition, and
that the given sub's arity determines how many items of that list are
used.

It's perfectly legal and possibly even useful to say

for @foo, @bar, @baz -> $quux, $xyzzy { ... }

And even though

for @foo Y @bar Y @baz -> $quux, $xyzzy { ... }

is something you will probably not see very often, it's still legal
Perl, even though it looks asymmetric. This too makes finding the
solution in arguments a non-solution.

Luke Palmer

unread,
Oct 6, 2005, 2:49:13 PM10/6/05
to Juerd, perl6-l...@perl.org
On 10/6/05, Juerd <ju...@convolution.nl> wrote:
> for @foo Y @bar Y @baz -> $quux, $xyzzy { ... }
>
> is something you will probably not see very often, it's still legal
> Perl, even though it looks asymmetric. This too makes finding the
> solution in arguments a non-solution.

Don't be silly. There's no reason we can't break that; it's not an
idiom anybody is counting on. If you still want the behavior:

for flatten(@foo Y @bar Y @baz) -> $quux, $xyzzy {...}

But your point about Y returning a list and therefore not being
for-specific is quite valid.

Luke

0 new messages