Of course, there's nothing to prevent us putting in a "hack" that says "is
this magic number OK in any of the byte orderings we support".
This is a design decision - Chip (or leo), which road should we go down?
Change the packfile format, or code around the current way we do it?
>> The issue seems to be related to the jit core being in use. I can't
>> recreate it on amd64 (no jit)
I can't see any way it could be something to do with the JIT core, or any
runcore. We haven't even entered one at the point the above error is given.
>> but I can cause a segfault from random input on x86.
>>
>> --
>> $ ./parrot -j docs/running.pod
>> Segmentation fault
>> --
>>
This is a Bad Thing and needs fixing. I'll see what I can find - I don't
even see a segfault or any other error mesage under Win32, which is at least
as bad.
> Jonathan has volunteered to look into this. Thanks.
>
I'll do what I can.
Jonathan
If you do tweak the signature for the packfile format, I suggest you
take a leaf out of the PNG specification and ensure that the signature
will robustly detect common errors such as byte order transpositions,
CRLF-to-newline mappings (e.g. when binary files are FTPd using ASCII
mode), etc.
See section 12.11 of the PNG specification:
http://www.faqs.org/rfcs/rfc2083.html
Regards,
Roger Browne
> >> but I can cause a segfault from random input on x86.
> >>
> >> --
> >> $ ./parrot -j docs/running.pod
> >> Segmentation fault
> This is a Bad Thing and needs fixing. I'll see what I can find - I don't
> even see a segfault or any other error mesage under Win32, which is at least
> as bad.
It segfaults on me in Linux. The problem is that the JIT core always
expects there to be valid op_start and op_end members in
interpreter->code, so when there's no code there, it blindly
dereferences them. I don't have time now to trace what the other
runcores do in that situation, but I put a couple of guards in
src/interpreter.c in init_jit() and caused different errors.
-- c
I was looking at adding pbc support to 'file' this morning and the only
way to handle that would be to test for both byte orderings of the magic
number.
> This is a design decision - Chip (or leo), which road should we go down?
> Change the packfile format, or code around the current way we do it?
I agree. Some possible options are:
a) live with it
b) change the magic number to be two identical bytes so the byte
ordering doesn't matter
c) shrink the magic number to be a single byte
> >>The issue seems to be related to the jit core being in use. I can't
> >>recreate it on amd64 (no jit)
> I can't see any way it could be something to do with the JIT core, or any
> runcore. We haven't even entered one at the point the above error is given.
Fair enough. I should have said it's related to the '-j' flag.
> >Jonathan has volunteered to look into this. Thanks.
> >
> I'll do what I can.
Your willingness to help is much appreciated.
-J
--
> a) live with it
> b) change the magic number to be two identical bytes so the byte
> ordering doesn't matter
> c) shrink the magic number to be a single byte
d) use a magic number that can also be used as the byte order indicator.
"Mark A. Biggar" <ma...@biggar.org> wrote:
> Joshua Hoblitt wrote:
>
>> a) live with it
>> b) change the magic number to be two identical bytes so the byte
>> ordering doesn't matter
>> c) shrink the magic number to be a single byte
>
When I talked about doing something endian-independent, I meant something
along the lines of store a sequence of, say, 4 bytes that will have certain
values. Forget reading the 4 bytes as an int at all, read it as a char[4]
and check each element is what it should be. Makes adding support to "file"
easy enough, and is my preferred solution.
> d) use a magic number that can also be used as the byte order indicator.
>
Clever, though not sure it helps with writing something to independently
identify a Parrot packfile, if it can be one of a number of things (though I
guess in this case, one of only two things - unless there's some insane
ordering scheme I've not heard of).
Before rushing into fixing this, it's worth pondering why the designer of
the packfile format might have chosen to have the magic number in native
endian format. All I came up with was that it was a good way of making sure
we really had transformed the input to the correct byte ordering. If we
didn't find out at the magic, we probably wouldn't until we got to byte 24 -
the directory format.
So, now we have two design decisions:-
1) How to store the magic "number"
2) What the magic "number" should be
Jonathan
I have seen architectures that swap byte ordering for 8 byte things
(like doubles) but not 4 byte things. So that gives 3 options and
requires an 8 byte magic number if you want to do it that way.
Matt
--
"Computer Science is merely the post-Turing Decline of Formal Systems Theory."
-Stan Kelly-Bootle, The Devil's DP Dictionary
You raise a good question; how was the magic number chosen?
> "Mark A. Biggar" <ma...@biggar.org> wrote:
> >Joshua Hoblitt wrote:
> >
> >>a) live with it
> >>b) change the magic number to be two identical bytes so the byte
> >> ordering doesn't matter
> >>c) shrink the magic number to be a single byte
I left out another good option ... 4 identical bytes. ;)
> When I talked about doing something endian-independent, I meant something
> along the lines of store a sequence of, say, 4 bytes that will have certain
> values. Forget reading the 4 bytes as an int at all, read it as a char[4]
> and check each element is what it should be. Makes adding support to
> "file" easy enough, and is my preferred solution.
That would work if the magic 'number' was written as a 'string', which
is not. Currently on x86 the magic number as written by parrot is
0x55a1 0x0131.
I've figured out how to make C<file> to understand the current scheme
but it's rather ugly.
--
16 lelong 0x013155a1 Parrot Bytecode (PBC),
>0 byte x wordsize %d bytes,
>1 byte =0 little endian,
>1 byte =1 big endian,
>2 byte x major %d,
>3 byte x major %d,
>4 byte x sizeof(INTVAL) == %d,
>5 byte =0 FloatType is IEEE 754
>5 byte =1 FloatType is i387 `long double'
16 belong 0x013155a1 Parrot Bytecode (PBC),
>0 byte x wordsize %d bytes,
>1 byte =0 little endian,
>1 byte =1 big endian,
>2 byte x major %d,
>3 byte x major %d,
>4 byte x sizeof(INTVAL) == %d,
>5 byte =0 FloatType is IEEE 754
>5 byte =1 FloatType is i387 `long double'
--
> So, now we have two design decisions:-
> 1) How to store the magic "number"
> 2) What the magic "number" should be
Good questions.
Cheers,
-J
--
"Ordering" is at least three potentially independent variables: byte
order in words, word order in dwords, and dword order in quads.
Writing a quad magic number in native order thus produces eight
possible eight-byte strings in 'file' databases. Seems like we're
not playing to the strengths of the system that way.
Worse, a quad integer can't express other variations in machine
ordering that may arise, e.g. if dword order in quad integers differs
from dword order in doubles.
I think the right answer is to use a magic string rather than a
magic number.
--
Chip Salzenberg <ch...@pobox.com>
Leo and I been discussing this on #parrot and we've come to the same
conclusion. Attached is a possible patch for parrotbyte.pod that
implements a number of changes to the header region. It:
* Expands the header to be 32 bytes in size.
* The magic number is no longer an opcode outside the header. It is
now an 8 byte magic string at the the beginning of the header.
* Bytes 20 through 31 are now padding so the core.op fingerprint can
be expanded in the future.
Remaining issues are:
* Do we need to keep the Opcode Type? It's not clear to me what it's used
for.
+----------+----------+----------+----------+
| Opcode Type (Perl = 0x5045524c) |
+----------+----------+----------+----------+
* Does it make sense to use a fix size header? The offset of the first
segment could be calculated by multiplying an "offset byte" and the
wordsize. That would allow more then enough room for growth (at least
1KB) and ensure that the first segment is always 32-bit aligned. Leo
and I disagree on this but I think it makes sense. Additional metadata
could be added to the header without breaking backwards compatibility.
-J
--
OK
> * The magic number is no longer an opcode outside the header. It is
> now an 8 byte magic string at the the beginning of the header.
I should think four would do, but no matter.
> * Bytes 20 through 31 are now padding so the core.op fingerprint can
> be expanded in the future.
Marvy. Important note: All those bytes *must* be zeros in the current
implementation. See below.
> * Do we need to keep the Opcode Type? It's not clear to me what it's used
> for.
>
> +----------+----------+----------+----------+
> | Opcode Type (Perl = 0x5045524c) |
> +----------+----------+----------+----------+
I don't think it's useful. A pbc file is Parrot byte code; if Parrot
learns to translate .NET, Python, or JVM files, it'll read them in
their native formats.
> * Does it make sense to use a fix size header? The offset of the first
> segment could be calculated by multiplying an "offset byte" and the
> wordsize.
We don't have to decide that. A fixed size header now does not
foreclose the possiblity that byte #31 will be that "how many more
words should be considered part of the header" feature you suggest.
--
Chip Salzenberg <ch...@pobox.com>
It's so 'large' because of an idea 'borrowed' from the PNG spec. One or
more of the bytes 0 & 4-7 are likely to be damaged by common transport
encoding errors. I've changed my proposal to explicitly note this.
> > * Bytes 20 through 31 are now padding so the core.op fingerprint can
> > be expanded in the future.
>
> Marvy. Important note: All those bytes *must* be zeros in the current
> implementation. See below.
That was already in my proposal but I've changed the wording to include
I<MUST>.
> > * Do we need to keep the Opcode Type? It's not clear to me what it's used
> > for.
> >
> > +----------+----------+----------+----------+
> > | Opcode Type (Perl = 0x5045524c) |
> > +----------+----------+----------+----------+
>
> I don't think it's useful. A pbc file is Parrot byte code; if Parrot
> learns to translate .NET, Python, or JVM files, it'll read them in
> their native formats.
Sounds reasonable. It's been dumped.
>
> > * Does it make sense to use a fix size header? The offset of the first
> > segment could be calculated by multiplying an "offset byte" and the
> > wordsize.
>
> We don't have to decide that. A fixed size header now does not
> foreclose the possiblity that byte #31 will be that "how many more
> words should be considered part of the header" feature you suggest.
Fair enough.
An updated patch is attached.
-J
--
Jonathan
All OK now with me, thanks.
--
Chip Salzenberg <ch...@pobox.com>
Chip gave an official OK via irc.
<^conner> chip, Jonathan said that he'd try to do it as part of his changes and commit the doc patch when he's done
<chip> ^conner: Oh, that's a good plan
-J
--
The ASCII art of the 'padding' was wrong. A corrected patch is
attached.
-J
--