Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: [perl #17490] Magic is useless unless verifiable.

11 views
Skip to first unread message

Jonathan Worthington

unread,
Sep 21, 2005, 6:44:17 AM9/21/05
to bugs-par...@netlabs.develooper.com, perl6-i...@perl.org
"Joshua Hoblitt via RT" <bugs-par...@netlabs.develooper.com> wrote:
>> [jhoblitt - Mon Sep 19 22:28:00 2005]:
>>
>> > [si...@netthink.co.uk - Sun Sep 22 07:13:56 2002]:
>> >
>> > The point of having a validifiable magic number at the start
>> > of a bytecode file is to avoid this sort of thing:
>> >
>> > % ../../parrot -j mops.pasm
>> > PackFile_unpack: Unimplemented wordsize transform.
>> > File has wordsize: 35 (native is 4)
>> > Parrot VM: Can't unpack packfile mops.pasm.
>> >
>> > If you're going to check the magic after the wordsize and bytecode, you
>> > might as well get rid of it altogether.
>> >
The only way we can *really* fix this is by not storing the magic number in
native endian form. At the moment we have to read the byteorder before the
magic number so we can transform it into native form.

Of course, there's nothing to prevent us putting in a "hack" that says "is
this magic number OK in any of the byte orderings we support".

This is a design decision - Chip (or leo), which road should we go down?
Change the packfile format, or code around the current way we do it?

>> The issue seems to be related to the jit core being in use. I can't
>> recreate it on amd64 (no jit)
I can't see any way it could be something to do with the JIT core, or any
runcore. We haven't even entered one at the point the above error is given.

>> but I can cause a segfault from random input on x86.
>>
>> --
>> $ ./parrot -j docs/running.pod
>> Segmentation fault
>> --
>>
This is a Bad Thing and needs fixing. I'll see what I can find - I don't
even see a segfault or any other error mesage under Win32, which is at least
as bad.

> Jonathan has volunteered to look into this. Thanks.
>
I'll do what I can.

Jonathan

Roger Browne

unread,
Sep 21, 2005, 7:24:10 AM9/21/05
to Jonathan Worthington, bugs-par...@netlabs.develooper.com, perl6-i...@perl.org
simon:

> >> > If you're going to check the magic after the wordsize and bytecode, you
> >> > might as well get rid of it altogether.
...
Jonathan:
> ...Change the packfile format, or code around the current way

If you do tweak the signature for the packfile format, I suggest you
take a leaf out of the PNG specification and ensure that the signature
will robustly detect common errors such as byte order transpositions,
CRLF-to-newline mappings (e.g. when binary files are FTPd using ASCII
mode), etc.

See section 12.11 of the PNG specification:
http://www.faqs.org/rfcs/rfc2083.html

Regards,
Roger Browne

Chromatic

unread,
Sep 21, 2005, 2:22:27 PM9/21/05
to Jonathan Worthington, bugs-par...@netlabs.develooper.com, perl6-i...@perl.org
On Wed, 2005-09-21 at 11:44 +0100, Jonathan Worthington wrote:

> >> but I can cause a segfault from random input on x86.
> >>
> >> --
> >> $ ./parrot -j docs/running.pod
> >> Segmentation fault

> This is a Bad Thing and needs fixing. I'll see what I can find - I don't

> even see a segfault or any other error mesage under Win32, which is at least
> as bad.

It segfaults on me in Linux. The problem is that the JIT core always
expects there to be valid op_start and op_end members in
interpreter->code, so when there's no code there, it blindly
dereferences them. I don't have time now to trace what the other
runcores do in that situation, but I put a couple of guards in
src/interpreter.c in init_jit() and caused different errors.

-- c

Joshua Hoblitt

unread,
Sep 21, 2005, 5:16:03 PM9/21/05
to Jonathan Worthington, bugs-par...@netlabs.develooper.com, perl6-i...@perl.org
On Wed, Sep 21, 2005 at 11:44:17AM +0100, Jonathan Worthington wrote:
> "Joshua Hoblitt via RT" <bugs-par...@netlabs.develooper.com> wrote:
> >>[jhoblitt - Mon Sep 19 22:28:00 2005]:
> >>
> >>> [si...@netthink.co.uk - Sun Sep 22 07:13:56 2002]:
> >>>
> >>> If you're going to check the magic after the wordsize and bytecode, you
> >>> might as well get rid of it altogether.
> >>>
> The only way we can *really* fix this is by not storing the magic number in
> native endian form. At the moment we have to read the byteorder before the
> magic number so we can transform it into native form.
>
> Of course, there's nothing to prevent us putting in a "hack" that says "is
> this magic number OK in any of the byte orderings we support".

I was looking at adding pbc support to 'file' this morning and the only
way to handle that would be to test for both byte orderings of the magic
number.

> This is a design decision - Chip (or leo), which road should we go down?
> Change the packfile format, or code around the current way we do it?

I agree. Some possible options are:

a) live with it
b) change the magic number to be two identical bytes so the byte
ordering doesn't matter
c) shrink the magic number to be a single byte

> >>The issue seems to be related to the jit core being in use. I can't
> >>recreate it on amd64 (no jit)
> I can't see any way it could be something to do with the JIT core, or any
> runcore. We haven't even entered one at the point the above error is given.

Fair enough. I should have said it's related to the '-j' flag.

> >Jonathan has volunteered to look into this. Thanks.
> >
> I'll do what I can.

Your willingness to help is much appreciated.

-J

--

Mark A. Biggar

unread,
Sep 22, 2005, 10:56:33 AM9/22/05
to Joshua Hoblitt, Jonathan Worthington, bugs-par...@netlabs.develooper.com, perl6-i...@perl.org
Joshua Hoblitt wrote:

> a) live with it
> b) change the magic number to be two identical bytes so the byte
> ordering doesn't matter
> c) shrink the magic number to be a single byte

d) use a magic number that can also be used as the byte order indicator.

--
ma...@biggar.org
mark.a...@comcast.net

Jonathan Worthington

unread,
Sep 22, 2005, 12:00:11 PM9/22/05
to bugs-par...@netlabs.develooper.com, perl6-i...@perl.org
"Roger Browne" <ro...@eiffel.demon.co.uk> wrote:
> If you do tweak the signature for the packfile format, I suggest you
> take a leaf out of the PNG specification and ensure that the signature
> will robustly detect common errors such as byte order transpositions,
> CRLF-to-newline mappings (e.g. when binary files are FTPd using ASCII
> mode), etc.
>
> See section 12.11 of the PNG specification:
> http://www.faqs.org/rfcs/rfc2083.html
>
Interesting, thanks - they make some good suggestions there. Our current
magic number is "13155a1" - I'm unsure of the rationale behind it, but there
may be a reason. If we're going to change the packfile format, we may as
well make sure we're squeezing whatever use we can out of our magic number.

"Mark A. Biggar" <ma...@biggar.org> wrote:
> Joshua Hoblitt wrote:
>
>> a) live with it
>> b) change the magic number to be two identical bytes so the byte
>> ordering doesn't matter
>> c) shrink the magic number to be a single byte
>

When I talked about doing something endian-independent, I meant something
along the lines of store a sequence of, say, 4 bytes that will have certain
values. Forget reading the 4 bytes as an int at all, read it as a char[4]
and check each element is what it should be. Makes adding support to "file"
easy enough, and is my preferred solution.

> d) use a magic number that can also be used as the byte order indicator.
>

Clever, though not sure it helps with writing something to independently
identify a Parrot packfile, if it can be one of a number of things (though I
guess in this case, one of only two things - unless there's some insane
ordering scheme I've not heard of).

Before rushing into fixing this, it's worth pondering why the designer of
the packfile format might have chosen to have the magic number in native
endian format. All I came up with was that it was a good way of making sure
we really had transformed the input to the correct byte ordering. If we
didn't find out at the magic, we probably wouldn't until we got to byte 24 -
the directory format.

So, now we have two design decisions:-
1) How to store the magic "number"
2) What the magic "number" should be

Jonathan

Matt Fowles

unread,
Sep 22, 2005, 12:07:48 PM9/22/05
to Jonathan Worthington, perl6-i...@perl.org
Jonathan~

I have seen architectures that swap byte ordering for 8 byte things
(like doubles) but not 4 byte things. So that gives 3 options and
requires an 8 byte magic number if you want to do it that way.

Matt
--
"Computer Science is merely the post-Turing Decline of Formal Systems Theory."
-Stan Kelly-Bootle, The Devil's DP Dictionary

Joshua Hoblitt

unread,
Sep 23, 2005, 6:26:13 AM9/23/05
to Jonathan Worthington, bugs-par...@netlabs.develooper.com, perl6-i...@perl.org
On Thu, Sep 22, 2005 at 05:00:11PM +0100, Jonathan Worthington wrote:
> Interesting, thanks - they make some good suggestions there. Our current
> magic number is "13155a1" - I'm unsure of the rationale behind it, but
> there may be a reason. If we're going to change the packfile format, we
> may as well make sure we're squeezing whatever use we can out of our magic
> number.

You raise a good question; how was the magic number chosen?

> "Mark A. Biggar" <ma...@biggar.org> wrote:
> >Joshua Hoblitt wrote:
> >
> >>a) live with it
> >>b) change the magic number to be two identical bytes so the byte
> >> ordering doesn't matter
> >>c) shrink the magic number to be a single byte

I left out another good option ... 4 identical bytes. ;)

> When I talked about doing something endian-independent, I meant something
> along the lines of store a sequence of, say, 4 bytes that will have certain
> values. Forget reading the 4 bytes as an int at all, read it as a char[4]
> and check each element is what it should be. Makes adding support to
> "file" easy enough, and is my preferred solution.

That would work if the magic 'number' was written as a 'string', which
is not. Currently on x86 the magic number as written by parrot is
0x55a1 0x0131.

I've figured out how to make C<file> to understand the current scheme
but it's rather ugly.

--
16 lelong 0x013155a1 Parrot Bytecode (PBC),
>0 byte x wordsize %d bytes,
>1 byte =0 little endian,
>1 byte =1 big endian,
>2 byte x major %d,
>3 byte x major %d,
>4 byte x sizeof(INTVAL) == %d,
>5 byte =0 FloatType is IEEE 754
>5 byte =1 FloatType is i387 `long double'

16 belong 0x013155a1 Parrot Bytecode (PBC),
>0 byte x wordsize %d bytes,
>1 byte =0 little endian,
>1 byte =1 big endian,
>2 byte x major %d,
>3 byte x major %d,
>4 byte x sizeof(INTVAL) == %d,
>5 byte =0 FloatType is IEEE 754
>5 byte =1 FloatType is i387 `long double'
--

> So, now we have two design decisions:-
> 1) How to store the magic "number"
> 2) What the magic "number" should be

Good questions.

Cheers,

-J

--

Chip Salzenberg

unread,
Sep 25, 2005, 3:23:53 PM9/25/05
to Matt Fowles, Jonathan Worthington, perl6-i...@perl.org
On Thu, Sep 22, 2005 at 12:07:48PM -0400, Matt Fowles wrote:

>
> > Mark Biggar writes:
> > > d) use a magic number that can also be used as the byte order indicator.
>
> I have seen architectures that swap byte ordering for 8 byte things
> (like doubles) but not 4 byte things. So that gives 3 options and
> requires an 8 byte magic number if you want to do it that way.

"Ordering" is at least three potentially independent variables: byte
order in words, word order in dwords, and dword order in quads.
Writing a quad magic number in native order thus produces eight
possible eight-byte strings in 'file' databases. Seems like we're
not playing to the strengths of the system that way.

Worse, a quad integer can't express other variations in machine
ordering that may arise, e.g. if dword order in quad integers differs
from dword order in doubles.

I think the right answer is to use a magic string rather than a
magic number.
--
Chip Salzenberg <ch...@pobox.com>

Joshua Hoblitt

unread,
Sep 25, 2005, 4:04:16 PM9/25/05
to Chip Salzenberg via RT, jhob...@cpan.org
On Sun, Sep 25, 2005 at 12:24:52PM -0700, Chip Salzenberg via RT wrote:
> I think the right answer is to use a magic string rather than a
> magic number.

Leo and I been discussing this on #parrot and we've come to the same
conclusion. Attached is a possible patch for parrotbyte.pod that
implements a number of changes to the header region. It:

* Expands the header to be 32 bytes in size.
* The magic number is no longer an opcode outside the header. It is
now an 8 byte magic string at the the beginning of the header.
* Bytes 20 through 31 are now padding so the core.op fingerprint can
be expanded in the future.

Remaining issues are:

* Do we need to keep the Opcode Type? It's not clear to me what it's used
for.

+----------+----------+----------+----------+
| Opcode Type (Perl = 0x5045524c) |
+----------+----------+----------+----------+

* Does it make sense to use a fix size header? The offset of the first
segment could be calculated by multiplying an "offset byte" and the
wordsize. That would allow more then enough room for growth (at least
1KB) and ensure that the first segment is always 32-bit aligned. Leo
and I disagree on this but I think it makes sense. Additional metadata
could be added to the header without breaking backwards compatibility.

-J

--

parrotbyte-header_magic.patch

Chip Salzenberg

unread,
Sep 26, 2005, 12:43:15 AM9/26/05
to Joshua Hoblitt, Chip Salzenberg via RT, jhob...@cpan.org
On Sun, Sep 25, 2005 at 10:04:16AM -1000, Joshua Hoblitt wrote:
> * Expands the header to be 32 bytes in size.

OK

> * The magic number is no longer an opcode outside the header. It is
> now an 8 byte magic string at the the beginning of the header.

I should think four would do, but no matter.

> * Bytes 20 through 31 are now padding so the core.op fingerprint can
> be expanded in the future.

Marvy. Important note: All those bytes *must* be zeros in the current
implementation. See below.

> * Do we need to keep the Opcode Type? It's not clear to me what it's used
> for.
>
> +----------+----------+----------+----------+
> | Opcode Type (Perl = 0x5045524c) |
> +----------+----------+----------+----------+

I don't think it's useful. A pbc file is Parrot byte code; if Parrot
learns to translate .NET, Python, or JVM files, it'll read them in
their native formats.

> * Does it make sense to use a fix size header? The offset of the first
> segment could be calculated by multiplying an "offset byte" and the
> wordsize.

We don't have to decide that. A fixed size header now does not
foreclose the possiblity that byte #31 will be that "how many more
words should be considered part of the header" feature you suggest.
--
Chip Salzenberg <ch...@pobox.com>

Joshua Hoblitt

unread,
Sep 26, 2005, 9:29:52 PM9/26/05
to Chip Salzenberg, Chip Salzenberg via RT, jhob...@cpan.org
On Sun, Sep 25, 2005 at 09:43:15PM -0700, Chip Salzenberg wrote:
> On Sun, Sep 25, 2005 at 10:04:16AM -1000, Joshua Hoblitt wrote:
> > * The magic number is no longer an opcode outside the header. It is
> > now an 8 byte magic string at the the beginning of the header.
>
> I should think four would do, but no matter.

It's so 'large' because of an idea 'borrowed' from the PNG spec. One or
more of the bytes 0 & 4-7 are likely to be damaged by common transport
encoding errors. I've changed my proposal to explicitly note this.

> > * Bytes 20 through 31 are now padding so the core.op fingerprint can
> > be expanded in the future.
>
> Marvy. Important note: All those bytes *must* be zeros in the current
> implementation. See below.

That was already in my proposal but I've changed the wording to include
I<MUST>.

> > * Do we need to keep the Opcode Type? It's not clear to me what it's used
> > for.
> >
> > +----------+----------+----------+----------+
> > | Opcode Type (Perl = 0x5045524c) |
> > +----------+----------+----------+----------+
>
> I don't think it's useful. A pbc file is Parrot byte code; if Parrot
> learns to translate .NET, Python, or JVM files, it'll read them in
> their native formats.

Sounds reasonable. It's been dumped.

>
> > * Does it make sense to use a fix size header? The offset of the first
> > segment could be calculated by multiplying an "offset byte" and the
> > wordsize.
>
> We don't have to decide that. A fixed size header now does not
> foreclose the possiblity that byte #31 will be that "how many more
> words should be considered part of the header" feature you suggest.

Fair enough.

An updated patch is attached.

-J

--

parrotbyte-header_magic.patch

Jonathan Worthington

unread,
Sep 27, 2005, 7:13:06 AM9/27/05
to Joshua Hoblitt, Chip Salzenberg via RT
"Joshua Hoblitt" <jhob...@ifa.hawaii.edu> wrote:
> An updated patch is attached.
>
Looks good. Provided there's no further issues brought up with it, I'll put
it on my "to implement" list and do it when I'm doing the changes relating
to the PASM/PIR debug segment (bytecode format changes are a pain, so it's
best to munge them together). Then I'll apply the doc patch at the same
time as the implementation changes so they're kept in sync.

Jonathan

Chip Salzenberg

unread,
Sep 27, 2005, 4:49:52 PM9/27/05
to Joshua Hoblitt, Chip Salzenberg via RT, jhob...@cpan.org
On Mon, Sep 26, 2005 at 03:29:52PM -1000, Joshua Hoblitt wrote:
> An updated patch is attached.

All OK now with me, thanks.
--
Chip Salzenberg <ch...@pobox.com>

Joshua Hoblitt

unread,
Sep 27, 2005, 4:16:29 PM9/27/05
to Jonathan Worthington, Chip Salzenberg via RT
Jonathan,

Chip gave an official OK via irc.

<^conner> chip, Jonathan said that he'd try to do it as part of his changes and commit the doc patch when he's done
<chip> ^conner: Oh, that's a good plan

-J

--

Joshua Hoblitt

unread,
Sep 28, 2005, 6:19:42 AM9/28/05
to Chip Salzenberg, Chip Salzenberg via RT, jhob...@cpan.org
On Tue, Sep 27, 2005 at 01:49:52PM -0700, Chip Salzenberg wrote:
> On Mon, Sep 26, 2005 at 03:29:52PM -1000, Joshua Hoblitt wrote:
> > An updated patch is attached.
>
> All OK now with me, thanks.

The ASCII art of the 'padding' was wrong. A corrected patch is
attached.

-J

--

parrotbyte-header_magic.patch
0 new messages