Bytecode PDD

Jonathan Worthington

unread,

Sep 28, 2006, 7:39:12 PM9/28/06

to parrot-...@perl.org

Hi,

I've checked in the proposed bytecode PDD and also most of the changes
that I discussed with Allison earlier today. Feedback on it would be
greatly appreciated.

One of the areas that it would good to have some input on, even if it's
just a "yes, that's sane", is versioning. The current implementation
versions the packfile based up the current version of Parrot and the
opcode fingerprint. However, between versions of Parrot it is feasible
that there will be no packfile format changes, and we'd like an easy way
for a particular Parrot version to assess whether it can read and run a
certain packfile (as well as being able to write packfiles that are
readable by previous Parrots). These things matter once Parrot is
deployed in production use - new versions must be able to read older
bytecode files.

Therefore, a bytecode file version number has been introduced, with a
major and a minor part. This is independent of the Parrot version
number, and replaces the opcode fingerprint. See the proposed PDD for
details.

A couple of open questions on this are:

1) Is keeping the Parrot version number around sensible and if so, is
having it as the version of Parrot that wrote the packfile useful? I
guess it's helpful if we need workarounds for bugs in previous versions
of Parrot in later versions to know this. Other thoughts?

2) How should we handle changes to the core Parrot library (mostly PMCs,
but also consider anything we promise is available)? Should this bump
the packfile version number too? Or do we want some other mechanism to
handle this?

Again, comments and/or suggestions on anything else in the proposal are
very welcome! :-)

Thanks,

Jonathan

Leopold Toetsch

unread,

Sep 29, 2006, 5:52:23 AM9/29/06

to perl6-i...@perl.org

Am Freitag, 29. September 2006 01:39 schrieb Jonathan Worthington:
> Hi,
>
> I've checked in the proposed bytecode PDD and also most of the changes
> that I discussed with Allison earlier today. Feedback on it would be
> greatly appreciated.

Great work, thanks.

> A couple of open questions on this are:
>
> 1) Is keeping the Parrot version number around sensible and if so, is
> having it as the version of Parrot that wrote the packfile useful? I
> guess it's helpful if we need workarounds for bugs in previous versions
> of Parrot in later versions to know this. Other thoughts?

I think it's useful.

> 2) How should we handle changes to the core Parrot library (mostly PMCs,
> but also consider anything we promise is available)? Should this bump
> the packfile version number too? Or do we want some other mechanism to
> handle this?

This is still a can of worms. Not so much changes to PMC type numberings per
se (which should invalidate PBCs) but the dynamic nature of these resources.

I'll try to dump my thoughts.

A PBC refers - via its contents - to several possibly dynamically extendable
resources. A probably incomplete list is:

1) PMCs [*1]
2) charsets
3) encodings
4) HLLs
5) opcodes

(see also src/pmc/parrotinterpreter.pmc:547 ff) [*2]

Whenever such items are refered to by a numeric index and that index is part
of the PBC, we have a possible problem.

Let's look at opcodes. These are present in the PBC as index (the opcode
number). We got a packfile with some dynamic opcode inside:

opcodes
[ 10, 20, 30, 1300, 1301, 0 ]

Let's say, opcode #1300 and #1301 are from some dynamic opcode lib. Now this
PF gets loaded into an interpreter, which already has dynamic opcode
librar{y,ies} loaded. In the best case, it was the same opcode library and
the opcode numbers just happen to match. But that's pure luck.

The same argument holds for all other above resources.

BTW encodings seem to be missing in the pdd - and we can't do:
"Character set, copied from the string structure."
because this is a pointer. We need an index into the available
charsets/encodings.

So what I think, we have to do, is:

- store a metatable of such resources, this is basically for:
2-4) a list of names / library PMCs, which describes how to load
the resource
(or NULL, if this resource is a core resource)
1,5) same + range of indices

- when now a PBC is loaded, we'd have to merge this information with already
in-memory structures of the interpreter. We can at least detect, if there's a
collision. Still better would of course be to relocate the index and use this
mapping during unpacking. Unfortunately we can't do the relocation of opcodes
for mmap-ed bytecde in memory.

[*1] theoretically PMCs shouldn't be a problem, as these are usually looked up
dynamically, but it depends of course on the usage of dynamic oplibs :-(

.loadlib "mypmc"
...
new P0, .MyPMC # new_p_ic .MyPMC is refered to by index
new P0, 'MyPMC' # referenced by name

For the index case, we'd again have the described problem.
(The .Type syntax is always fine for core PMCs, which don't change for the
validity range of the packfile).

[*2] This resides currently in the interpreter PMC, but should be moved into
the future PackFile PMC.

> Again, comments and/or suggestions on anything else in the proposal are
> very welcome! :-)

I've some thoughts re PF PMCs too - later.

> Thanks,
>
> Jonathan

leo

Bernhard Schmalhofer

unread,

Oct 5, 2006, 5:04:52 PM10/5/06

to Jonathan Worthington, parrot-...@perl.org

Jonathan Worthington schrieb:

> Hi,
>
> I've checked in the proposed bytecode PDD and also most of the changes
> that I discussed with Allison earlier today. Feedback on it would be
> greatly appreciated.

One thing that I noticed is the naming of the new field UUID.

Shouldn't this field be renamed to something like 'checksum' ? The term
'UUID' already has a specific meaning, http://en.wikipedia.org/wiki/UUID.

CU, Bernhard

Leopold Toetsch

unread,

Oct 5, 2006, 5:30:35 PM10/5/06

to perl6-i...@perl.org

Am Donnerstag, 5. Oktober 2006 23:04 schrieb Bernhard Schmalhofer:
> Shouldn't this field be renamed to something like 'checksum' ? The term
> 'UUID' already has a specific meaning, http://en.wikipedia.org/wiki/UUID.

Indeed. But we probably want to have an UUID to identify
loaded .pasm/.pir/.pbc to avoid loading duplicates.

As a side note: distinct PBC segments for checksum and/or UUID is probably
simpler to handle.

> CU, Bernhard

leo

Jonathan Worthington

unread,

Oct 6, 2006, 4:45:11 AM10/6/06

to Bernhard Schmalhofer, Jonathan Worthington, parrot-...@perl.org

Bernhard Schmalhofer wrote:
> One thing that I noticed is the naming of the new field UUID.
>
> | | | The UUID is |
> | | | computed by applying the hash function specified
> in |
> | | | the UUID type field over the entire packfile
> not |
> | | | including this header and the trailing zero padding
>
>
> Shouldn't this field be renamed to something like 'checksum' ? The
> term 'UUID' already has a specific meaning,
> http://en.wikipedia.org/wiki/UUID.

Yup, and I meant for that field to be a UUID. In the article you linked
to it does mention the use of hash functions to create a UUID, but yes,
it ain't the only way. So a clarification that using a hash function
isn't the only way to create a UUID would be good; I'll get it in
there. (When I manage to nab some time to do stuff...working quite a
few hours at $day_job, on-site and abroad at the moment...)

Thanks,

Jonathan

Jonathan Worthington

unread,

Oct 6, 2006, 4:52:04 AM10/6/06

to Leopold Toetsch, perl6-i...@perl.org

Leopold Toetsch wrote:
> Indeed. But we probably want to have an UUID to identify
> loaded .pasm/.pir/.pbc to avoid loading duplicates.
>

The UUID as proposed was intended for that; I just hashed up the
definition in the PDD. Er, no pun intended.

> As a side note: distinct PBC segments for checksum and/or UUID is probably simpler to handle.
>

How so? At least if it's in the header you can just read it in and know
if you've already loaded the file without having to unpack a directory
segment and so on.

Thanks,

Jonathan

Jonathan Worthington

unread,

Oct 23, 2006, 12:31:30 PM10/23/06

to Leopold Toetsch, perl6-i...@perl.org

Hi,

Sorry for delay in getting to this - been working on-site with $JOB for
a while. Comments and questions below, but please see r15001.

Leopold Toetsch wrote:
>> 2) How should we handle changes to the core Parrot library (mostly PMCs,
>> but also consider anything we promise is available)? Should this bump
>> the packfile version number too? Or do we want some other mechanism to
>> handle this?
>>
>
> This is still a can of worms. Not so much changes to PMC type numberings per se (which should invalidate PBCs)

Yup, after further mulling I think changes to these and
non-backward-compatible interface changes to the built-in PMCs should
cause an entry in PBC_COMPAT and invalidate said resources. Now in the spec.

> but the dynamic nature of these resources.
>
> I'll try to dump my thoughts.
>
> A PBC refers - via its contents - to several possibly dynamically extendable resources. A probably incomplete list is:
>
> 1) PMCs [*1]
> 2) charsets
> 3) encodings
> 4) HLLs
> 5) opcodes
>
> (see also src/pmc/parrotinterpreter.pmc:547 ff) [*2]
>
> Whenever such items are refered to by a numeric index and that index is part of the PBC, we have a possible problem.
>
> Let's look at opcodes. These are present in the PBC as index (the opcode number). We got a packfile with some dynamic opcode inside:
>
> opcodes
> [ 10, 20, 30, 1300, 1301, 0 ]
>
> Let's say, opcode #1300 and #1301 are from some dynamic opcode lib. Now this PF gets loaded into an interpreter, which already has dynamic opcode librar{y,ies} loaded. In the best case, it was the same opcode library and the opcode numbers just happen to match. But that's pure luck.
>
> The same argument holds for all other above resources.
>

I have added a dependencies segment that can be used to list all of the
dynamically loaded resources that a bytecode file uses. These can then
be located and loaded and any collisions detected (and once implemented,
resolved) at load-time.

> BTW encodings seem to be missing in the pdd - and we can't do:
> "Character set, copied from the string structure."
> because this is a pointer. We need an index into the available
> charsets/encodings.
>

Fixed this bit, thanks.

> So what I think, we have to do, is:
>
> - store a metatable of such resources, this is basically for:
> 2-4) a list of names / library PMCs, which describes how to load
> the resource
> (or NULL, if this resource is a core resource)
> 1,5) same + range of indices
>

Will a dynamic character set or encoding library that we load not
possibly contain more than one character set or encoding and therefore
need a range of indices too? I have gone with this for now.

Please can you also expand a little on what a HLL resource is? I thought
this was just a dynamic PMC library but where some of those PMCs get
used in place of some built-ins, such as Integer using Perl6Integer
instead or something like this?

> - when now a PBC is loaded, we'd have to merge this information with already in-memory structures of the interpreter. We can at least detect, if there's a collision.

We're not doing this at the moment?!

> Still better would of course be to relocate the index and use this
> mapping during unpacking. Unfortunately we can't do the relocation of opcodes for mmap-ed bytecde in memory.
>

Sure; we'll probably be able to teach pbc_merge to resolve such
collisions though, so people can merge stuff together and have them
resolved once rather than having to make an unmapped copy each runtime.
Maybe we can find some scheme to make collisions less likely too (we've
got 32 bits to play with, after all).

> [*1] theoretically PMCs shouldn't be a problem, as these are usually looked up dynamically, but it depends of course on the usage of dynamic oplibs :-(
>
> .loadlib "mypmc"
> ...
> new P0, .MyPMC # new_p_ic .MyPMC is refered to by index
> new P0, 'MyPMC' # referenced by name
>
> For the index case, we'd again have the described problem.
> (The .Type syntax is always fine for core PMCs, which don't change for the validity range of the packfile).
>

Yup - unless we only allow .Type for built-ins of course.

Thanks,

Jonathan

Leopold Toetsch

unread,

Oct 23, 2006, 3:39:13 PM10/23/06

to perl6-i...@perl.org

Am Montag, 23. Oktober 2006 18:31 schrieb Jonathan Worthington:
> > 1,5) same + range of indices
> >
>
> Will a dynamic character set or encoding library that we load not
> possibly contain more than one character set or encoding and therefore
> need a range of indices too? I have gone with this for now.

Indeed. We should just use the generalization i.e. a range of indices for all
resources.

> Please can you also expand a little on what a HLL resource is? I thought
> this was just a dynamic PMC library but where some of those PMCs get
> used in place of some built-ins, such as Integer using Perl6Integer
> instead or something like this?

It's a HLL name, the shared lib, and an array of type mappings. See also
src/hll.c:

interpreter->HLL_info

@HLL_info = [
[ hll_name, hll_lib, { core_type => HLL_type, ... }, namespace ],
...
]

The namespace is added at runtime.

leo