Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Unified prelude, FFI, multiple runtimes

2 views
Skip to first unread message

Yuval Kogman

unread,
Sep 12, 2005, 6:15:33 AM9/12/05
to perl6-l...@perl.org, perl6-c...@perl.org
Hi,

please point your browser to

"http://nothingmuch.woobling.org/compilation_of_circular_prelude."
~ any(<graffle vdx pdf png jpg>).pick;

My proposition:

1. *the* perl 6 compiler should ship a reference implementation of
the prelude, that is circular. For example

multi &*infix:<*> (Int $x, Int $y) { [+] $x xx $y }

is the reference implementation of multiplication on integers.

2. each block of code has a cryptographic digest, which is the hash
of it's body with the digests of all the functions it calls.
This digest is stable regardless of optimization semantics, and
is applied to the PIL structure.

3. a foreign function interface must exist, and have function
somewhat like the Inline:: modules function today (presumably
with a lower level but zero-copy interface an option).

4. FFI definitions are defined using a uniform interface, whose
most basic interface is a constructor for a code object that
takes a runtime and a string body, e.g.

Code::Foreign.new(:language<C>, :body("int foo () { return 10 }"));

(Some FFIs might only allow construction of foreign functions at
compile time.)

5. Functions have a notion of equivelence. This is managed based on
the digest. For example

my &c_int_mul = BEGIN { Code::Foreign.new(:language<C>, :body("
int foo (int x, int y) { return x * y }
") };

multi &infix:<*> (int $x, int $y --> int) {
[+] $x xx $y;
}

my $str = &infix:<*><int, int --> int>.digest; # must specify the variant

&c_int_mul.set_equiv($str); # could be in another file

# or, if they are maintained together

&c_int_mul.set_equiv(&infix:<*><int, int --> int>);

This equivelence is with respect to the semantics of the input
and output. The test suite supposedly can assure that these are
really equivelent by running the same tests against either
version.

6. the runtime is just an FFI provider, which happens to be the
default one too

7. When applying code in the runtime, the runtime is free to use
any equivelent function

8. In order to run perl code at all, some native equivelent
functions must be provided by the runtime, for the basic
operations, and things that can't have reference implementations
like IO calls (which are implemented in the prelude as stubs).

9. static analysis may be leveraged to compile direct calls to
native functions when compile time resolution is possible. In
the example graph, for example, no eval or symbol table
assignments are made, so there is no way the code will ever
change. Hence the entire program can be pre-resolved. This
should be controlled via the 'optimize' pragma.

the reasons for this will now be discussed.

A unified prelude with reference implementations for even the
simplest operations gives us several things:

* a circular structure that doesn't make sense when we try to
run it, since the prelude depends on itself (bad)
* must break circularity by going to native operations
* a reference spec that alternative runtimes can compare to
* a way to kickstart implementations
* just provide a root set of native operation, enough to
break circularity in one single point
* demagicalization of the language

Since FFIs are going to be a core feature of perl 6, they can be
used to bootstrap the whole compilation process. In effect, the
"primitive" operations are now just FFI calls to the runtime we
happen to be executing on.

To promote code reuse and to simplify the model the notion of
equivelence is introduced, letting the runtime pick which version of
a function (FFI or native) it uses.

To make things safe, when the prelude is bug fixed and the runtime
is not yet updated, the cryptographic hash of the function changed,
so it is no longer equal to the native one based on the way they are
paired.

To make things modular, the paring of FFI and pure perl functions is
orthogonal to their location of definition based on the hashing
scheme.

This has some nice properties:

Modules like Template::Stash::XS are depracated. Instead, an FFI
based Template::Stash can be automatically loaded from the
runtime library prefix, and it will be set as equivalenet to the
pure perl Template::Stash.

Modules like DBD::your_db which rely on a certain library can be
stubbed in Perl 6 to tell the user that they must use a certain
runtime. The stubs are set as equal to the FFI calls into the
library.

WRT MMD, you can set the entire MM equivalent to
a certain foreign function, and you can also set any variant
individually. You can even set a single variant to be equivalent to
a multimethod to make the FFI implementation simpler. The compiler
simply presents the runtime with all the possible MMD choices, and
lets the runtime choose between conflicting ones.

For example, in this version the runtime is told that it has two
choices:

multi factorial (int $n) { $n * factorial($n - 1) }
multi factorial (0) { 1 }

my &c_fact = BEGIN {
Code::Foreign.new(:language<C>, :body("
int factorial (int n) {
switch (n) {
case 0:
return 1;
default
return n * factorial(n - 1);
}
}
")
}

# the C version is compatible with both behaviors
&factorial.set_equiv(&c_fact); # all variants
&c_fact.set_equiv(&factorial<int $n> | &factorial<0>); # or some variants

The runtime needs to choose &factorial<int $n> or &c_fact for one
variant, and &factorial<0> or &c_fact for the other variant. If it
chooses the same implementation for both, then MMD is taken out of
the picture entirely.

Note that this choice might be delayed to runtime if the MMD
choices cannot be statically determined, and might be re-made if
unsafe statical analysis was made (all variants could be determined,
but new ones may be introduced in runtime). This is highly dependant
on the capabilities of the runtime.

--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &
/\ kung foo master: /me dodges cabbages like macalypse log N: neeyah!

Yuval Kogman

unread,
Sep 12, 2005, 6:30:39 AM9/12/05
to perl6-l...@perl.org, perl6-c...@perl.org
On Mon, Sep 12, 2005 at 13:15:33 +0300, Yuval Kogman wrote:
> To make things safe, when the prelude is bug fixed and the runtime
> is not yet updated, the cryptographic hash of the function changed,
> so it is no longer equal to the native one based on the way they are
> paired.

It should be noted that this equivalence should be transitive.. For
example, if the prelude's definition of &foo is changed, but this is
for readability or optimization, and did not change behavior at all,
it can be simply marked as equal to the hash of the previous
version.

This way the runtimes' equality tables don't have to be updated
every time the prelude changes - only when it's behavior changes.

--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &

/\ kung foo master: /me sneaks up from another MIME part: neeyah!!!!!

Yuval Kogman

unread,
Sep 12, 2005, 7:06:28 AM9/12/05
to perl6-l...@perl.org, perl6-c...@perl.org
On Mon, Sep 12, 2005 at 13:15:33 +0300, Yuval Kogman wrote:

The circularity issue was not made clear in the email or the
diagram. Here is what I meant:

The prelude operators are mutually recursive at some point, and
completely pure. An pathetic example:

multi &infix:<-> (Int $x, Int $y) { $x + 1 - $y - 1 }
multi &infix:<-> (Int $x, 0) { $x }

multi &infix:<+> (Int $x, Int $y) { ($x + 1) + ($y - 1) }
multi &infix:<+> (Int $x, 0) { $x }

multi &infix:<+> (Int $x, 1) { ... } # really a stub, requires native increment
multi &infix:<-> (Int $x, 1) { ... } # really a stub, requires native decrement

(I should note that once the language grows it becomes rather
circular especially when control flow and combinators get
involved...)

This allows runtime authors to provide a minimal root set of
operations to break the circularity of the prelude (and even this
can be done incrementally, i guess), and get a slow but functional
system in very little time.

This root set is dynamic, and can be chosen by the runtime
implementor based on how well this fits into the the target arch.

This opens a door to a variety of levels of target
implementations... The metamodel, for example, could be implemented
in the prelude. When it is not implemented natively it will be slow,
but will work. Serious runtime implementations can override the
parts of the metamodel that profiling has shown to be slow.

--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &

/\ kung foo master: /me does not drink tibetian laxative tea: neeyah!

Luke Palmer

unread,
Sep 12, 2005, 3:27:21 PM9/12/05
to Yuval Kogman, perl6-l...@perl.org, perl6-c...@perl.org
On 9/12/05, Yuval Kogman <nothi...@woobling.org> wrote:
> Hi,

Hi. These are superficial thoughts, before I've had time to really
think about the Big Picture.

> 2. each block of code has a cryptographic digest, which is the hash
> of it's body with the digests of all the functions it calls.
> This digest is stable regardless of optimization semantics, and
> is applied to the PIL structure.

Okay, a little clarification needed here. Is the digest of the code
itself a collection of digests, one for the lexical body, and one for
each funciton it calls (1)? Or is it a hash that combines all of
those somehow?

How do you deal with recursive/mutually recursive functions? (I'm
pretty sure there's a way, I just can't think of it)

What about "temp &foo = sub {...}" ?

> 5. Functions have a notion of equivelence. This is managed based on
> the digest. For example
>
> my &c_int_mul = BEGIN { Code::Foreign.new(:language<C>, :body("
> int foo (int x, int y) { return x * y }
> ") };
>
> multi &infix:<*> (int $x, int $y --> int) {
> [+] $x xx $y;
> }
>
> my $str = &infix:<*><int, int --> int>.digest; # must specify the variant

You mean:

my $str = &infix:<*>:(int, int --> int).digest;

Also, you said that the digest contains the digests of all called
functions. How do you deal with multis there, which may depend on the
runtime types?

> &c_int_mul.set_equiv($str); # could be in another file
>
> # or, if they are maintained together
>
> &c_int_mul.set_equiv(&infix:<*><int, int --> int>);
>
> This equivelence is with respect to the semantics of the input
> and output. The test suite supposedly can assure that these are
> really equivelent by running the same tests against either
> version.

Okay, good. So you have a way of marking that two functions do the
same thing, without having the program try to figure that out (which
is impossible). That's something that Haskell didn't figure out :-)

I suppose it would be nice to have subtype semantics, in that
"function A does the same thing as function B for all arguments that
are valid for function B (but function A may have additional
functionality for arguments that are not valid for B)". Then you
specify equivalence by specifying subtype equivalence in both
directions (with a shortcut, of course).

> 9. static analysis may be leveraged to compile direct calls to
> native functions when compile time resolution is possible. In
> the example graph, for example, no eval or symbol table
> assignments are made, so there is no way the code will ever
> change. Hence the entire program can be pre-resolved. This
> should be controlled via the 'optimize' pragma.

Rather than having the compiler try to infer for itself, which would
come up negative 99% of the time and just waste compile time.

> Since FFIs are going to be a core feature of perl 6, they can be
> used to bootstrap the whole compilation process. In effect, the
> "primitive" operations are now just FFI calls to the runtime we
> happen to be executing on.

And if you have a circular reference implementation, the implementors
can decide to implement whatever generating subset is easiest and get
a working Perl. I like this.

Hmm, could we pull the idea of a generating subset out to
roles/theories? That is, have a role specify that "a" and "b" are
implemented in terms of each other, so if you provide one, you fulfill
the role contract. In Haskell, if you don't fulfill a class contract
that's mutually recursive, you get infinite loops. It be nice to
catch that.

Then I guess we would have "theory PerlCore". Neat.

> To make things modular, the paring of FFI and pure perl functions is
> orthogonal to their location of definition based on the hashing
> scheme.

So as we're hashing PIL, we'd leave out line number information and whatnot.

> WRT MMD, you can set the entire MM equivalent to
> a certain foreign function, and you can also set any variant
> individually. You can even set a single variant to be equivalent to
> a multimethod to make the FFI implementation simpler. The compiler
> simply presents the runtime with all the possible MMD choices, and
> lets the runtime choose between conflicting ones.

Like a nice "wrapping" MMD scheme ought to.

All in all, I like the idea. I hope there are no major technical
hurdles. The hashing scheme scares me a little, because it's easy to
create equivalent code that does not look the same from PIL, but it
seems like you covered that base.

Luke

Yuval Kogman

unread,
Sep 12, 2005, 7:08:47 PM9/12/05
to perl6-l...@perl.org, perl6-c...@perl.org
A proof of concept is available here:

http://svn.openfoundry.org/pugs/docs/notes/circular_prelude_stuff.pl

And logs where I explain the guts to Luke are availble here:

http://colabti.de/irclogger/irclogger_log/perl6?date=2005-09-12,Mon&sel=785#l1413

--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &

/\ kung foo master: /me supports the ASCII Ribbon Campaign: neeyah!!!

Yuval Kogman

unread,
Sep 12, 2005, 6:47:28 PM9/12/05
to Luke Palmer, perl6-l...@perl.org, perl6-c...@perl.org
On Mon, Sep 12, 2005 at 13:27:21 -0600, Luke Palmer wrote:
> On 9/12/05, Yuval Kogman <nothi...@woobling.org> wrote:
> > Hi,
>
> Hi. These are superficial thoughts, before I've had time to really
> think about the Big Picture.
>
> > 2. each block of code has a cryptographic digest, which is the hash
> > of it's body with the digests of all the functions it calls.
> > This digest is stable regardless of optimization semantics, and
> > is applied to the PIL structure.
>
> Okay, a little clarification needed here. Is the digest of the code
> itself a collection of digests, one for the lexical body, and one for
> each funciton it calls (1)? Or is it a hash that combines all of
> those somehow?

If there is a given function x, that calls another function y, and
there is an FFI function z that is "the same" as x (it refers to
it's hash). If 'y' is changed, then the semantics of x may change,
hence the hash must change (a call to z is no longer equal to a call
to x). (unless y says it is equal to the old y, by mentioning it's
hash).

> How do you deal with recursive/mutually recursive functions? (I'm
> pretty sure there's a way, I just can't think of it)

Right now I don't, i'm cheating.

In the future I'd like to hash the body for all functions, and then
hash the hash of the body and the hashes of all the called bodies to
make a function hash.

> What about "temp &foo = sub {...}" ?

This has an entirely different hash, so it doesn't go to the same
thing.

> > my $str = &infix:<*><int, int --> int>.digest; # must specify the variant
>
> You mean:
>
> my $str = &infix:<*>:(int, int --> int).digest;

uh, yeah... I keep forgetting the syntax.


>
> Also, you said that the digest contains the digests of all called
> functions. How do you deal with multis there, which may depend on the
> runtime types?

replacements are per variant i guess, and the catch all (completely
unqualified multi) has a hash built out of all it's variants.

> I suppose it would be nice to have subtype semantics, in that
> "function A does the same thing as function B for all arguments that
> are valid for function B (but function A may have additional
> functionality for arguments that are not valid for B)". Then you
> specify equivalence by specifying subtype equivalence in both
> directions (with a shortcut, of course).

I'll leave this part to you ;-)

> > assignments are made, so there is no way the code will ever
> > change. Hence the entire program can be pre-resolved. This
> > should be controlled via the 'optimize' pragma.
>
> Rather than having the compiler try to infer for itself, which would
> come up negative 99% of the time and just waste compile time.

I disagree... Most one liners and small scripts could really benefit
from this, and it's an easy test - just set a flag the moment you
see either the eval opcode or the ::= opcode.

> And if you have a circular reference implementation, the implementors
> can decide to implement whatever generating subset is easiest and get
> a working Perl. I like this.

Yes, this is why I brought it up =)


> Hmm, could we pull the idea of a generating subset out to
> roles/theories? That is, have a role specify that "a" and "b" are
> implemented in terms of each other, so if you provide one, you fulfill
> the role contract. In Haskell, if you don't fulfill a class contract
> that's mutually recursive, you get infinite loops. It be nice to
> catch that.

it's too late for me to understand that... sorry...

> So as we're hashing PIL, we'd leave out line number information and whatnot.

Yeah, this is also possibly the canonical representation. For
example

sub foo ($x) {
$x ? 1 : 2;
}

sub foo ($x) {
if ($x) {
return 1;
} else {
return 2;
}
}

> All in all, I like the idea. I hope there are no major technical
> hurdles. The hashing scheme scares me a little, because it's easy to
> create equivalent code that does not look the same from PIL, but it
> seems like you covered that base.

I won't expect a reference implementation to be refactored too much,
but even if it does, you just say that this function is the same as
the function with the hash value x, where x is the hash it had
before the refactoring.

--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &

/\ kung foo master: /me tips over a cow: neeyah!!!!!!!!!!!!!!!!!!!!!!

Yuval Kogman

unread,
Sep 14, 2005, 9:18:57 PM9/14/05
to perl6-l...@perl.org, perl6-c...@perl.org
On Tue, Sep 13, 2005 at 02:08:47 +0300, Yuval Kogman wrote:
> A proof of concept is available here:

and has been updated here:

http://svn.openfoundry.org/pugs/perl5/Blondie/

There's a bit of documentation, and the code is split up into files
and ever so slightly refactored.

--
() Yuval Kogman <nothi...@woobling.org> 0xEBD27418 perl hacker &

/\ kung foo master: uhm, no, I think I'll sit this one out..: neeyah!

0 new messages