Smart Matching clarification

Jonathan Lang

unread,

Nov 16, 2006, 7:25:30 PM11/16/06

to p6l

Looking through the table provided, I ran across the following:

$_ $x Type of Match Implied Matching Code
====== ===== ===================== =============
Hash Hash hash keys identical match if $_.keys.sort
»eq« $x.keys.sort

My understanding is that at the time this was written, the working
theory was that Hash keys would always be strings. I'm wondering:
should this entry replace 'eq' with '===' or 'eqv', so that non-string
keys can also be compared for equivalent values? If so, which
operator should replace 'eq'? (I'm leaning toward '===', since S03
defines '$a eq $b' as '~$a === ~$b'.)

--
Jonathan "Dataweaver" Lang

Larry Wall

unread,

Nov 17, 2006, 7:16:48 PM11/17/06

to p6l

On Thu, Nov 16, 2006 at 04:25:30PM -0800, Jonathan Lang wrote:
: Looking through the table provided, I ran across the following:

Yes, it should be ===. But in revising the smartmatching tables for this
and other ===nesses, and thinking about how hashes may or may not be
implemented as ordered underneath, it seems to me that .keys.sort is
suboptimal if sort has to second-guess the ordering provided by the
underlying hash. So maybe we have some or all of:

.keys .sortkeys
.values .sortvalues
.kv .sortkv
.pairs .sortpairs

Possible variations: .skeys, .ordkeys, etc. Also could flip the
default and make .keys sort by default and then you use .rawkeys to get
unordered--shades of PHP. (Note that arrays are already considered
to be ordered when you use these methods.) Or we stick with the
methods we have now and give options for sorting and selecting as
parameters to .keys etc. In any case, making these methods able
to sort allows sorted .kv lists, which would not be possible with
.keys.sort. It also allows the hash or array itself to specify the
ordering behavior declaratively, and perhaps optimize ordering in
various ways that .keys.sort can't do without cheating.

Slicing modifiers like :kv and :pairs could also take an optional
sort parameter, presumably.

Larry

Jonathan Lang

unread,

Nov 17, 2006, 8:01:25 PM11/17/06

to p6l

Larry Wall wrote:
> Jonathan Lang wrote:
> : Looking through the table provided, I ran across the following:
> :
> : $_ $x Type of Match Implied Matching Code
> : ====== ===== ===================== =============
> : Hash Hash hash keys identical match if $_.keys.sort
> : »eq« $x.keys.sort
> :
> : My understanding is that at the time this was written, the working
> : theory was that Hash keys would always be strings. I'm wondering:
> : should this entry replace 'eq' with '===' or 'eqv', so that non-string
> : keys can also be compared for equivalent values? If so, which
> : operator should replace 'eq'? (I'm leaning toward '===', since S03
> : defines '$a eq $b' as '~$a === ~$b'.)
>
> Yes, it should be ===. But in revising the smartmatching tables for this
> and other ===nesses, and thinking about how hashes may or may not be
> implemented as ordered underneath, it seems to me that .keys.sort is
> suboptimal if sort has to second-guess the ordering provided by the
> underlying hash.

Not only is it suboptimal, it might not be possible. Sorting depends
on cmp returning Order::Increase, Order::Same, or Order::Decrease in
every case; but what if you're comparing Color::Blue to Bool::True?
And what about classes that involve partial ordering, such as sets?
Heck, how do you '[cmp] Color::Blue, Color::Gray', and what does
"sqrt(-1) cmp 0" (or even "sqrt(-1) <=> 0") return?

--
Jonathan "Dataweaver" Lang

Paul Seamons

unread,

Nov 17, 2006, 10:34:03 PM11/17/06

to p6l

> So maybe we have some or all of:
>
> .keys .sortkeys
> .values .sortvalues
> .kv .sortkv
> .pairs .sortpairs
>
> Possible variations: .skeys, .ordkeys, etc. Also could flip the
> default and make .keys sort by default and then you use .rawkeys to get
> unordered--shades of PHP.

Taking a page from Template Toolkit.

.keys # same as perl5
.sort # the sorted keys

I know that it isn't quite parallel with Array.sort and it doesn't provide
for .sortkv or .sort pairs, but it might be an option.

Paul

Darren Duncan

unread,

Nov 18, 2006, 4:16:05 PM11/18/06

to p6l

At 3:24 AM -0800 11/18/06, Jonathan Lang wrote:
>Jonathan Lang wrote:

>>Larry Wall wrote:
>> > it seems to me that .keys.sort is
>> > suboptimal if sort has to second-guess the ordering provided by the
>>> underlying hash.
>>
>>Not only is it suboptimal, it might not be possible. Sorting depends
>>on cmp returning Order::Increase, Order::Same, or Order::Decrease in
>>every case; but what if you're comparing Color::Blue to Bool::True?
>>And what about classes that involve partial ordering, such as sets?
>>Heck, how do you '[cmp] Color::Blue, Color::Gray', and what does
>>"sqrt(-1) cmp 0" (or even "sqrt(-1) <=> 0") return?
>

>OK: IIRC, this original definition also preceded Sets. So instead of
>"$_.keys.sort »===« $x.keys.sort", perhaps this should simply be
>"Set($_.keys) === Set($x.keys)", corrected for proper syntax. Heck,
>perhaps "$_.keys" and "$x.keys" should _be_ Sets.

I seem to remember having this discussion months
ago when trying to implement Set::Relation.

Absolutely .keys should return a Set.

We *know* already that all keys in a keyed
collection are unique, so why not explicitly
stamp them as such from the start by returning
them in a Set container; then users of those keys
don't have to recheck them for uniqueness, such
as in a Set's constructor, if they want to use
them as a set.

Then we can reliably and tersely say "$_.keys ===
$x.keys" and it will do the right thing.

Similarly, we can do set operations with the keys
more tersely, such as "$_.keys subset $x.keys" or
"$_.keys superset $x.keys" or "$_.keys union
$x.keys" or "$_.keys intersection $x.keys" or
"$_.keys minus $x.keys" etc.

And you can compare the keysets of 2 collections
reliably even when keys are objects of a data
type that is *not* ordinal, which makes it more
general.

Finally, common operations like .keys.sort (which
returns a Seq or List) can be written just as
tersely or the same as before, so that syntax can
be kept.

On a tangential matter, if you follow my
suggestion of another email and actually add a
(immutable) Bag type to the language, which your
documentation already references as a common
thing people may use, then .values can return
that, since it is an unordered collection that
may have duplicates. Once again, you can then
compare the value lists of 2 Hashes set without
sorting them. Still, regardless of what you do
here, making .keys return a Set should be done.

-- Darren Duncan

Jonathan Lang

unread,

Nov 18, 2006, 9:56:50 PM11/18/06

to p6l

Darren Duncan wrote:
> Absolutely .keys should return a Set.

Let me voice my one concern with this notion: to what extent does a
Set behave like a Seq? In particular, can you iterate over a Set?
E.g., if $S is a Set, would 'for $S { .doit() }' be valid code? If
not, having .keys return a Set complicates one of the most common uses
for it, as you'd then have to ask the resulting Set to produce a Seq
for you to iterate over: you'd need something like 'for %H.keys.items
{ .doit() }' instead of 'for %H.keys { .doit() }'.

If this is the case, I'd rather have .keys return a Seq which can then
be coerced to behave like a Set when appropriate. I wouldn't _like_
it; but the Huffman coding would be preferable to having .keys return
a Set. Ideally, though, I hope that Sets have enough Seq-like
behavior to render this concern moot.

> On a tangential matter, if you follow my
> suggestion of another email and actually add a

> (immutable) Bag type to the language, then
> .values can return that.

Same issue: how Seq-like is a Bag?

--
Jonathan "Dataweaver" Lang

Darren Duncan

unread,

Nov 20, 2006, 8:13:34 PM11/20/06

to p6l

At 6:56 PM -0800 11/18/06, Jonathan Lang wrote:
>Darren Duncan wrote:
>>Absolutely .keys should return a Set.
>
>Let me voice my one concern with this notion: to what extent does a
>Set behave like a Seq? In particular, can you iterate over a Set?
>E.g., if $S is a Set, would 'for $S { .doit() }' be valid code?

><snip>

>Same issue: how Seq-like is a Bag?

Sets and Bags are explicitly unordered, so referencing their elements
by ordinal notation doesn't make sense ... but you should be able to
iterate over them same as a Seq whose element order isn't significant.

I would expect that "for" would iterate over both a Set and a Bag in
the same way it iterates over the pseudo-Bag construct "for
all(<list>) -> ...", and so could be used in place of said.

Moreover, like with all(), I would expect that iterating over a Set
or Bag should be implicitly parallelizable and auto-thread, since we
already know that the list items are unordered.

Or even if a Set or Bag won't autothread without using all(), you
should still be able to iterate it as you do a Seq.

So we should just be able to say "for %h.keys -> ..." and it would
work. Only if the user wants the keys sorted would they use a longer
notation, same as when .keys returns a Seq.

-- Darren Duncan

Darren Duncan

unread,

Nov 21, 2006, 12:25:34 AM11/21/06

to Jonathan Lang, p6l

At 6:29 PM -0800 11/20/06, Jonathan Lang wrote:
>Darren Duncan wrote:
>>I would expect that "for" would iterate over both a Set and a Bag in
>>the same way it iterates over the pseudo-Bag construct "for
>>all(<list>) -> ...", and so could be used in place of said.
>

>If this turns out to be the case, then fine. Still, you should look
>at the other uses that Seq can be put to in terms of whether or not
>Set could also be used in that way; for instance, 'Set »op« Set' won't
>always make sense, although 'Set »op» Scalar' will, and so will 'op«
>Set', 'Set»op', and 'Set XopX Set'.

If you are talking about the merits of %hash.keys
returning a Set vs a Seq, your 'Set »op« Set'
example is not applicable to the argument. Hash
keys are unordered, so '%hash.keys >>op<<
<some-collection-type>' doesn't make sense even
if .keys returned a Seq.

Practically speaking, I think that perhaps what
.keys, and .values, exactly returns should depend
on what it is called on. If called on an ordinal
type like an Array, it would return a Seq for
each. If called on an unordered type like a Hash
or Mapping, it should return a Set for keys and
Bag for values.

And if Seq and Set etc are interchangeable for
all situations where it doesn't matter whether
the elements are ordered or not, then a lot of
times users won't have to care which they have.

>This would
>put a Set in the category of being something of a hybrid of a Seq and
>a Hash - in fact, I'd be hard-pressed to point out significant
>differences between 'Hash' and 'Set of Pair'. (Or am I now talking
>about something that relates to Set as Array relates to List?)

Assuming that a Mapping and a Hash are identical
except for whether they are mutable or not, I
believe that a Set is indeed just like a Mapping
or Hash that have just keys, and not values. And
that a Mapping or Hash is a Set of Pair, but with
syntactic sugar and optimized for certain kinds
of uses.

>Other cases: What would 'Set.push(items)' and 'Set.pop()' do? What
>_is_ the appropriate way to go about adding items to (or removing
>items from) a Set, or of searching the Set for an element? In some
>ways, it might be more natural to treat a Set as a Hash that has no
>values, e.g. 'Set.exists(item)' and 'Set.delete(item)'.

Following the above, and if the Set type was
mutable, then 'Set.exists(item)' and
'Set.delete(item)' are indeed the way to go.
Similarly, we could add Set.insert(item) to round
things out, as Hash-assignment syntax doesn't
really make sense if we aren't associating a
value with the item. (Hashes could gain similar
syntax, Hash.insert(item,value) as an alias for
Hash.{item} = value.)

Set.pop() doesn't make sense, unless it is
defined to remove a random item from the set, and
this isn't repeatable so probably a bad idea.
Set.push(items) would work, inserting items, but
it semantics would be different from an array due
to the non-orderedness of the results, and the
size of the Set wouldn't necessarily increase by
the number of elements if there is duplication.
Better to not use push() and pop() with unordered
types at all, keep them to ordered types. Use
different names for different behaviour.

Of course, you would have to add a mutable
Set-like type to Perl 6 to even consider using
.insert or .delete with them (.exists would still
work).

Since Sets are immutable, you could do this to
emulate .insert and .delete (my syntax may be
off):

$set .= union Set(item);
$set .= minus Set(item);

>If Set is _not_ enough like Seq for this to work (e.g., Set is deemed
>to be more like Junction than like Seq),

While a Junction is implemented using a Set, I
think that syntactically or conceptually
speaking, an actual Junction is more like a
scalar value that just happens to match multiple
things, and is used in place of a scalar, while a
Set is used as a collection value. Or I could be
mistaken.

-- Darren Duncan

TSa

unread,

Nov 23, 2006, 8:22:07 AM11/23/06

to p6l

HaloO,

Darren Duncan wrote:
> And if Seq and Set etc are interchangeable for all situations where it
> doesn't matter whether the elements are ordered or not, then a lot of
> times users won't have to care which they have.

One can argue that we have the subtyping chain Seq <: Bag <: Set for
these immutable types. The idea is that a Bag is a multiset, that is a
set that maintains a multiplicity per element. Apart from that it
behaves like Set. The Seq would add an order to the elements and some
functionality that builds on that order. The good thing would be that
Seq literals are applicable where Sets are expected.

Regards, TSa.
--

Adriano Rodrigues

unread,

Nov 23, 2006, 8:41:21 AM11/23/06

to TSa, p6l

On 11/23/06, TSa <Thomas....@barco.com> wrote:
> HaloO,
>
> Darren Duncan wrote:
> > And if Seq and Set etc are interchangeable for all situations where it
> > doesn't matter whether the elements are ordered or not, then a lot of
> > times users won't have to care which they have.
>
> One can argue that we have the subtyping chain Seq <: Bag <: Set for
> these immutable types. The idea is that a Bag is a multiset, that is a
> set that maintains a multiplicity per element.

And we may argue as well that being Bag a multiset, the set is a
special case where all the elements have the same multiplicity.

> Apart from that it
> behaves like Set. The Seq would add an order to the elements and some
> functionality that builds on that order. The good thing would be that
> Seq literals are applicable where Sets are expected.

Going from a Set to a Seq imposes some dilemma which may be though
similar to how going from a list to a scalar (it is the length, a
reference, what else?). The mapping of a Set to a Seq could be
parameterized via an ordering sub. But there are more possibilities.

TSa

unread,

Nov 23, 2006, 10:16:18 AM11/23/06

to p6l

HaloO,

Adriano Rodrigues wrote:
> And we may argue as well that being Bag a multiset, the set is a
> special case where all the elements have the same multiplicity.

Yes, that would be a subset type. The thing I had in mind was
'role Seq does Bag' and 'role Bag does Set'. And classes with
the same names for creating instances.

> Going from a Set to a Seq imposes some dilemma which may be though
> similar to how going from a list to a scalar (it is the length, a
> reference, what else?). The mapping of a Set to a Seq could be
> parameterized via an ordering sub. But there are more possibilities.

Indeed it is easier to drop information than it is to create it
in the first place. Note that the order of a Seq can be random,
that is not compressable to a sub that produces it. So there's no
other way than storing the order. And this is exactly what a Seq
does anyway.

The operational advantage of Set being a supertype of Seq is that
all set operations are available for Seq out of the box. Mixed
operations of Seq and Set would dispatch to the Set variant. The
Seq operations like hypering are naturally precluded for Sets.

Many places where the return value of Hash::keys might be put would
be typed as expecting a Set. And actually returning a Set would make
%a.keys === %b.keys a natural idiom. Jonathan's concerns of iterating
being unavailable for a Set should be unfounded because this
functionality should be in Set and hence in Seq from where we know
it best. There could e.g. be a role Iterateable that Set, Range,
Array and Hash do.

Regards, TSa.
--

TSa

unread,

Nov 24, 2006, 5:35:55 AM11/24/06

to Jonathan Lang, p6l

HaloO,

Jonathan Lang wrote:
> Other cases: What would 'Set.push(items)' and 'Set.pop()' do? What
> _is_ the appropriate way to go about adding items to (or removing
> items from) a Set, or of searching the Set for an element?

Since Sets are immutable values there should be no push and pop
methods. These are part of the rw Array interface. Instead we
have:

my Set $s = set(1,2,3);

$s (|)= (4,5,6); # $s === set(1,2,3,4,5,6)
$s (-)= (5,6); # $s === set(1,2,3,4)

Searching a set is just the element containment test:

if 3 (in) $s { say "set contains 3" }

Regards, TSa.
--

Darren Duncan

unread,

Nov 24, 2006, 4:06:21 PM11/24/06

to p6l

P.S. Sending this again, for timeliness, (first attempt was 20 hours
ago) due to p6l mail server being down before. Sorry if you end up
getting a duplicate later.
----------------

To start off, I should clarify that I see little value for the
existence of a Bag type except for certain matters of syntactic or
semantic brevity, but that those alone can still warrant its
existence.

A Bag is for marking when your duplicate-allowing collection is
conceptually not ordered, and that is all that it is for. This
marker is useful for optimizing certain places a Seq would otherwise
use, such as implicitly permitting hyperthreading (a Set can also
hyperthread). And it is also useful as a language-enforced stricture
where you are prevented from doing order-dependent operations on that
collection because they don't make sense.

Aside from these optimizations and strictures afforded by a Bag type,
I see no reason to provide too many operators for them ... in fact, I
would argue that what one can do with a Bag be defined as an
intersection of what one can do with a Seq and a Set.

That said ...

At 4:16 PM +0100 11/23/06, TSa wrote:
>Adriano Rodrigues wrote:
>>And we may argue as well that being Bag a multiset, the set is a
>>special case where all the elements have the same multiplicity.

Or specifically, a multiplicity of 1.

>Yes, that would be a subset type. The thing I had in mind was
>'role Seq does Bag' and 'role Bag does Set'. And classes with
>the same names for creating instances.

I think you have something backwards here. While the 3 collection
types Seq,Bag,Set could be sequenced like that for some purposes of
explanation, where adjacent types have commonalities that the other
doesn't, I don't see that it falls to also chain .does() in the same
direction all the way across.

Seq and Set are *both* more specific or restricted than Bag. So it
would make more sense to say 'role Set does Bag' (and 'role Seq does
Bag'), not 'role Bag does Set'. For illustrative purposes, replace
"Set" with "Int" and "Bag" with "Num". Everything that is a valid
Set|Seq is a valid Bag, but the reverse isn't true.

(That's not to say that we can't cast a Bag as a Set, but that would
change the value, like doing round|floor|ceil|etc on a Num to get an
Int, and this is external to a .does relationship.)

This also allows us to reserve operators for Set that Bag can't or
won't have (because they depend on all collection elements being
distinct), as we can reserve operators for Seq that Bag can't have
(because they depend on the order of elements being significant).

Now, there is a small handful of operations that could easily be
ascribed to all 3 of those types, such as testing if an element
exists, or how many occurrances there are, or iterating through all
elements in an order-agnostic fashion. These can all have easily
predictable and consistent behaviour.

Moreover, some operations are clearly useable with only the Seq type,
such as iterating through elements in order or reading an element at
a specific index.

The operators [union, intersection, difference, disjoint-union, etc]
have clearly defined and predictable behaviour with a Set, since all
inputs and outputs have no duplicates.

>The operational advantage of Set being a supertype of Seq is that
>all set operations are available for Seq out of the box. Mixed
>operations of Seq and Set would dispatch to the Set variant. The
>Seq operations like hypering are naturally precluded for Sets.

But I would ask whether it is desirable for those Set operators to be
present in Bag|Seq, and if so, then what the desired semantics are.
For example, what would these return:

Bag(1,2,2,2,3,3) union Bag(1,2,2,4,4);
# Bag(1,1,2,2,2,2,2,3,3,4,4) or Bag(1,2,2,2,3,3,4,4) ?

Bag(1,2,2,2,3,3) intersection Bag(1,2,2,4,4);
# Bag(1,1,2,2,2,2,2) or Bag(1,2,2) ?

Bag(1,2,2,2,3,3) difference Bag(1,2,2,4,4);
# Bag(2,3,3) or Bag(3,3) ?

Bag(1,2,2,2,3,3) d_union Bag(1,2,2,4,4);
# Bag(2,3,3,4,4) or Bag(3,3,4,4) ?

Repeat again with Bag->Seq.

In my mind, it would be far simpler to reserve such operators to the
Set only, and cast a Bag|Seq as a Set to use them on it, if that is
desired whereupon the results are all distinct.

But still, it is something that should be decided on, one way or the other.

-- Darren Duncan

Darren Duncan

unread,

Nov 23, 2006, 8:54:37 PM11/23/06

to p6l

To start off, I should clarify that I see little value for the
existence of a Bag type except for certain matters of syntactic or
semantic brevity, but that those alone can still warrant its
existence.

A Bag is for marking when your duplicate-allowing collection is
conceptually not ordered, and that is all that it is for. This
marker is useful for optimizing certain places a Seq would otherwise
use, such as implicitly permitting hyperthreading (a Set can also
hyperthread). And it is also useful as a language-enforced stricture
where you are prevented from doing order-dependent operations on that
collection because they don't make sense.

Aside from these optimizations and strictures afforded by a Bag type,
I see no reason to provide too many operators for them ... in fact, I
would argue that what one can do with a Bag be defined as an
intersection of what one can do with a Seq and a Set.

That said ...

At 4:16 PM +0100 11/23/06, TSa wrote:

>Adriano Rodrigues wrote:
>>And we may argue as well that being Bag a multiset, the set is a
>>special case where all the elements have the same multiplicity.

Or specifically, a multiplicity of 1.

>Yes, that would be a subset type. The thing I had in mind was

>'role Seq does Bag' and 'role Bag does Set'. And classes with
>the same names for creating instances.

I think you have something backwards here. While the 3 collection

>The operational advantage of Set being a supertype of Seq is that

>all set operations are available for Seq out of the box. Mixed
>operations of Seq and Set would dispatch to the Set variant. The
>Seq operations like hypering are naturally precluded for Sets.

But I would ask whether it is desirable for those Set operators to be

TSa

unread,

Nov 27, 2006, 4:55:19 AM11/27/06

to p6l

HaloO,

Darren Duncan wrote:
> To start off, I should clarify that I see little value for the existence
> of a Bag type except for certain matters of syntactic or semantic
> brevity, but that those alone can still warrant its existence.

I partly agree. The Bag or MultiSet type naturally falls out as an
intermediate between Set and Seq in my typing. Since I see Bag as
an interface extension subtype of Set, one could e.g. derive a FuzzySet
in the same way.

> I think you have something backwards here. While the 3 collection types
> Seq,Bag,Set could be sequenced like that for some purposes of
> explanation, where adjacent types have commonalities that the other
> doesn't, I don't see that it falls to also chain .does() in the same
> direction all the way across.
>
> Seq and Set are *both* more specific or restricted than Bag. So it
> would make more sense to say 'role Set does Bag' (and 'role Seq does
> Bag'), not 'role Bag does Set'. For illustrative purposes, replace
> "Set" with "Int" and "Bag" with "Num". Everything that is a valid
> Set|Seq is a valid Bag, but the reverse isn't true.

All this depends on what kind of subtype you are creating. My proposal
is a strict extension of the interface or internal representation. Your
proposal is going the other way of restricting multiplicity to 1. It is
clear that all operations of a Bag can be carried out with a Set if you
take the multiplicity as 1. In the end it's a matter of choice. The
coolest thing would actually be to *supertype* Bag atop of Set (see the
thread 'set operations for roles').

> The operators [union, intersection, difference, disjoint-union, etc]
> have clearly defined and predictable behaviour with a Set, since all
> inputs and outputs have no duplicates.

The operations of MultiSets are equally well defined. See e.g.
http://en.wikipedia.org/wiki/Multiset

> But I would ask whether it is desirable for those Set operators to be
> present in Bag|Seq, and if so, then what the desired semantics are. For
> example, what would these return:

The multiplicity is min for intersection and max for union.

> Bag(1,2,2,2,3,3) union Bag(1,2,2,4,4);
> # Bag(1,1,2,2,2,2,2,3,3,4,4) or Bag(1,2,2,2,3,3,4,4) ?

So this would be Bag(1,2,2,2,3,3,4,4).

> Bag(1,2,2,2,3,3) intersection Bag(1,2,2,4,4);
> # Bag(1,1,2,2,2,2,2) or Bag(1,2,2) ?

Bag(1,2,2) of course.

> Bag(1,2,2,2,3,3) difference Bag(1,2,2,4,4);
> # Bag(2,3,3) or Bag(3,3) ?

Difference is taking away the intersection, so we
get Bag(2,3,3).

The most interesting new operation in the Bag type is the join.
That yields a Bag even for Sets:

Set(1,2,3) (+) Set(2,3,4) === Bag(1,2,2,3,3,4)

> Bag(1,2,2,2,3,3) d_union Bag(1,2,2,4,4);
> # Bag(2,3,3,4,4) or Bag(3,3,4,4) ?

Disjoint union has a Bag of Pair as output.
See http://en.wikipedia.org/wiki/Disjoint_union
So we get Bag(1=>1, 1=>2, 1=>2, 1=>2, 1=>3, 1=>3, 2=>1, 2=>2,
2=>2, 2=>4, 2=>4). Well, or as Bag of Seq (1,1; 1,2; ... ; 2,4).

Regards, TSa.
--

Darren Duncan

unread,

Nov 27, 2006, 8:01:36 AM11/27/06

to p6l

At 10:55 AM +0100 11/27/06, TSa wrote:
>>Seq and Set are *both* more specific or restricted than Bag. So it
>>would make more sense to say 'role Set does Bag' (and 'role Seq does
>>Bag'), not 'role Bag does Set'. For illustrative purposes, replace
>>"Set" with "Int" and "Bag" with "Num". Everything that is a valid
>>Set|Seq is a valid Bag, but the reverse isn't true.
>
>All this depends on what kind of subtype you are creating. My proposal
>is a strict extension of the interface or internal representation. Your
>proposal is going the other way of restricting multiplicity to 1. It is
>clear that all operations of a Bag can be carried out with a Set if you
>take the multiplicity as 1. In the end it's a matter of choice. The
>coolest thing would actually be to *supertype* Bag atop of Set (see the
>thread 'set operations for roles').

I was not meaning to get into implementation issues so much as just
to say that all bag values are a superset of all set values.

The Set and Bag types could very well, and probably should, have
disjoint implementations, just as Array and Hash should probably have
disjoint implementations even though you could represent all Array
values as Hash values if you wanted to, for matters of efficiency.

Note that a "value" in the aforementioned refers to the whole
collection object, not an element therein.

>> Bag(1,2,2,2,3,3) d_union Bag(1,2,2,4,4);
>> # Bag(2,3,3,4,4) or Bag(3,3,4,4) ?
>Disjoint union has a Bag of Pair as output.
>See http://en.wikipedia.org/wiki/Disjoint_union
>So we get Bag(1=>1, 1=>2, 1=>2, 1=>2, 1=>3, 1=>3, 2=>1, 2=>2,
>2=>2, 2=>4, 2=>4). Well, or as Bag of Seq (1,1; 1,2; ... ; 2,4).

I always considered disjoint_union to mean exactly the same thing as
symmetric_difference, meaning an analogy to XOR. Still, Wikipedia
says they are different, and its
http://en.wikipedia.org/wiki/Symmetric_difference article describes
the meaning I had been attributing to both. So what I was really
asking was:

Bag(1,2,2,2,3,3) symmetric_difference Bag(1,2,2,4,4);

# Bag(2,3,3,4,4) or Bag(3,3,4,4) ?

-- Darren Duncan

TSa

unread,

Nov 27, 2006, 12:01:44 PM11/27/06

to p6l

HaloO,

Darren Duncan wrote:
> I was not meaning to get into implementation issues so much as just
> to say that all bag values are a superset of all set values.

I agree with that. And I end up wanting a nice syntax to create
a supertype. This is particularly needed if the Bag type shall
be loadable as module whereas the Set is a base type. The problem
is the same as one might want to supertype Num with Complex. OK,
Complex is perhaps a base type, but the latest when it comes to
supertyping quaternions onto Complex, a module is asked for. A
FuzzySet would be another supertype of Set that will hardly be
a core type. So, how do we do supertyping?

Funny result is that with 'Set does Bag' and 'Seq does Bag' and
Bag coming from a module, without loading it Set and Seq have not
even an indirect relation. But then again 'Seq does Bag' is doubtful
anyway because there are several Seq values for a single Bag. This
collapsing of several Sequences into a single Bag does not establish
a subset relation.

Non the less picking a particular order for a Bag constitutes an
interface subtype. The latter is the only reason why the issue came up
in this thread. Note that the iteration interface would ideally be
available from the Bag type. So we might want to make it a core type.
What's @Larry's opinion on that?

> Bag(1,2,2,2,3,3) symmetric_difference Bag(1,2,2,4,4);
> # Bag(2,3,3,4,4) or Bag(3,3,4,4) ?

This is just the union (1,2,2,2,3,3,4,4) minus the intersection
(1,2,2): (2,3,3,4,4). And I still think that it is a good idea
to name the set operations after their equivalent boolean connectives:

(|) union
(&) intersection
(^) symmetric difference

Well, and to make them Bag operations to start with.

Regards, TSa.
--

Darren Duncan

unread,

Nov 27, 2006, 8:05:45 PM11/27/06

to p6l

To start off with, I agree with your comment about making Set the
main type and making Bag an extension built upon that, as complex is
built upon num, etc.

At 6:01 PM +0100 11/27/06, TSa wrote:
>And I still think that it is a good idea
>to name the set operations after their equivalent boolean connectives:
>
> (|) union
> (&) intersection
> (^) symmetric difference
>
>Well, and to make them Bag operations to start with.

This may be a non-issue from a user's viewpoint, but as a user, I
want set operations that have sets as input to return sets as output
by default. Eg, unioning 2 Set that have common values should return
a Set.

Moreover, since set operations would be a lot more common in practice
than bag operations, the set operations should be the most terse, if
they both aren't equally terse.

I see the matter as being similar to Int vs Num. Any operation whose
operands are Ints should return Ints wherever it is conceivable to do
so. In particular, this means that dividing an Int by an Int should
return an Int. (Also, the modulus should be undefined for 2 Num.)
Only promote an Int to a Num if one or more other operands are
already Num, even if the value of that Num is a whole number.

So taking the semantics of Int vs Num that users see as examples for
how Set vs Bag semantics should work, as far as argument and result
types go, makes a lot of sense, and is easily to implement with Perl
6 multis.

-- Darren Duncan

Ruud H.G. van Tol

unread,

Nov 28, 2006, 4:54:29 AM11/28/06

to perl6-l...@perl.org

Darren Duncan schreef:
> TSa:

>> And I still think that it is a good idea
>> to name the set operations after their equivalent boolean
>> connectives:
>>
>> (|) union
>> (&) intersection
>> (^) symmetric difference
>>
>> Well, and to make them Bag operations to start with.

> To start off with, I agree with your comment about making Set the

> main type and making Bag an extension built upon that, as complex is
> built upon num, etc.

I don't think that will work out. Modification of a Set is more complex
than modification of a Bag, so in that sense the Bag is the main type.

> This may be a non-issue from a user's viewpoint, but as a user, I
> want set operations that have sets as input to return sets as output
> by default. Eg, unioning 2 Set that have common values should return
> a Set.

Or base it on what the receiving end wants.

> I see the matter as being similar to Int vs Num. Any operation whose
> operands are Ints should return Ints wherever it is conceivable to do
> so. In particular, this means that dividing an Int by an Int should
> return an Int.

Int three = 3 ;
Int four = 4 ;
Num n1 = three / four ;
Num n2 = 3 / 4 ;

(Is "Int three" the same as "my Int three"? I hope so.)

--
Groet, Ruud

Smylers

unread,

Nov 28, 2006, 5:18:54 AM11/28/06

to perl6-l...@perl.org

Ruud H.G. van Tol writes:

> Darren Duncan schreef:
>
> > TSa:
>
> > > set operations ... make them Bag operations to start with.
>
> > I agree with ... making Set the main type and making Bag an

> > extension built upon that, as complex is built upon num, etc.
>
> I don't think that will work out. Modification of a Set is more
> complex than modification of a Bag, so in that sense the Bag is the
> main type.

Is this still the Perl 6 _Language_ group? The one where we consider
what Perl 6 will do, and leave the implementation details to others?

Smylers

Dr.Ruud

unread,

Nov 28, 2006, 5:52:22 AM11/28/06

to perl6-l...@perl.org

Smylers schreef:
> Ruud H.G. van Tol:
>> Darren Duncan:
>>> TSa:

>>>> set operations ... make them Bag operations to start with.
>>>
>>> I agree with ... making Set the main type and making Bag an
>>> extension built upon that, as complex is built upon num, etc.
>>
>> I don't think that will work out. Modification of a Set is more
>> complex than modification of a Bag, so in that sense the Bag is the
>> main type.
>
> Is this still the Perl 6 _Language_ group? The one where we consider
> what Perl 6 will do, and leave the implementation details to others?

I don't consider much in this thread implementation oriented. In the
part that you quote, I mentioned that putting a hierarchical order on
Set vs. Bag has problems, because that order only applies to one side of
them. The order was used for fall-back, to which I mentioned a context
driven approach (like wantarray).

--
Affijn, Ruud

"Gewoon is een tijger."

Darren Duncan

unread,

Nov 28, 2006, 6:39:35 AM11/28/06

to perl6-l...@perl.org

At 10:54 AM +0100 11/28/06, Ruud H.G. van Tol wrote:
> > To start off with, I agree with your comment about making Set the
>> main type and making Bag an extension built upon that, as complex is
>> built upon num, etc.
>
>I don't think that will work out. Modification of a Set is more complex
>than modification of a Bag, so in that sense the Bag is the main type.

On the contrary, I would say that a Bag is more complicated than a
Set, both conceptually, and implementation-wise for many types of
implementations.

For conceptual, with a Set, you just know that a particular element
either does or does not exist in it. With a Bag, you additionally
have a count of how many times that element exists (with a more
practical Hash-like implementation), or you have an actual
multiplicity of instances (with a less practical Array-like
implementation), which is more information and more complicated.

Look at these implemented in terms of a Hash-like collection, for
example. A Set only has to maintain the list of keys, while the Bag
also has to maintain the associated count values for each. When Hash
values can be safely never defined, or ignored or dropped at any
given time, as with the Hash-like Set, there is less work to do than
if any of them are accounted for.

What I have said above is the same message regardless of whether Set
and/or Bag are immutable or mutable, though I assume that as Perl 6
already has Set being immutable, then if Bag were added, it would be
too.

-- Darren Duncan

TSa

unread,

Nov 28, 2006, 10:33:14 AM11/28/06

to p6l

HaloO,

Darren Duncan wrote:
> This may be a non-issue from a user's viewpoint, but as a user, I want
> set operations that have sets as input to return sets as output by
> default. Eg, unioning 2 Set that have common values should return a Set.

First of all the operations could be overloaded. Secondly the bag
operations are defined in terms of min and max and therefore result
in Sets remaining Sets at least in the sense of a Bag with all element
multiplicities being 1.

> Moreover, since set operations would be a lot more common in practice
> than bag operations, the set operations should be the most terse, if
> they both aren't equally terse.

I think the operators are the same. And with Seq a subtype of Bag they
are applicable to Seqs as well. Some operations like (+) however make
no sense for Sets and so they return a Bag. That is similar to using
Bool values as arguments to numeric operations: True + True == 2.

> I see the matter as being similar to Int vs Num. Any operation whose
> operands are Ints should return Ints wherever it is conceivable to do
> so. In particular, this means that dividing an Int by an Int should
> return an Int.

As a kind of compromise an Int division could return a "doubled" value
with but: 5 / 4 == 1 but 1.25.

> (Also, the modulus should be undefined for 2 Num.)

This could be achieved by introducing the modulus on the Int level.
But I think that modulus can be easily generalized to Num. E.g.
3.2 % 2.4 == 0.8. Handling negative divisors is more of an issue.
With the Euclidean definition of modulus a negative sign should not
propagate into the result. E.g. 8 % -3 == 8 % 3 == 2.

> Only
> promote an Int to a Num if one or more other operands are already Num,
> even if the value of that Num is a whole number.

I'm not sure that this is the intent of the current spec. I personally
value adherence to the subtyping of Int and Num higher than keeping a
pure Int subsystem.

Regards, TSa.
--