Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[RFC] Debug Segment, HLL Debug Segment And Source Segment

1 view
Skip to first unread message

Jonathan Worthington

unread,
Sep 20, 2005, 6:52:38 PM9/20/05
to perl6-i...@perl.org
Hi,

The current format of the debug segment in Parrot packfiles (.pbc files), as
documented in doc/parrotbyte.pod, only allows for a single source file to be
named. This became insufficient some time ago since we had .include
directives; it also means that there's nothing sensible that pbc_merge can
do with the debug segments it finds in input files.

WHAT WE HAVE NOW
Currently, we store two things:-
1) The filename of a single source file, as an additional field in the
header
2) The line number in the source file for each bytecode instruction, as the
segment's opcode stream

WHAT SOURCE?
The debug segment as we currently have it relates to PIR and PASM source
files, not to high level language source files. Currently PIR parses a
directive that looks like this:
#line 'filename'
This is for compilers to supply the line numbers and file names of HLL
source files. Currently, nothing is done with these directives after they
are parsed, but the data they provide should go into a seperate HLL debug
segment.

As the needs of the PASM/PIR debug segments and the HLL debug segments would
seem to be the same, this proposal will detail a single format that should
work for both of them. If it is determined that the HLL debug segment needs
something more sophisticated, this proposal still stands for the PASM/PIR
debug segment.

SOURCE SEGMENTS
This is currently mentioned in parrotbyte.pod; the idea would seem to be
that this segment can contain source code. I suspect the intention of it
was to store the source code of high level languages rather than PASM or
PIR. I think the doc is correct in stating that this segment is currently
unused. However, in the future it likely will be, so it makes sense to
consider its future existence now while re-designing the debug segment(s).

FORMAT PROPOSAL
The aims of the new format, intended for both the PASM/PIR debug segment and
the HLL debug segment are:
1) Supporting multiple input files
2) Allowing for a reference into the source segment in place of a filename.
3) Still being space-efficient on disk

The opcode stream will contain one line number per bytecode instruction. No
information as to what file that line is in will be stored in this stream.
(This is pretty much the same as what we have now).

The header (after the standard stuff that every header has) will start with
a count of the number of source file to bytecode position mappings that are
in the header.

0 (relative)
+----------+----------+----------+----------+
| number of source => bytecode mappings |
+----------+----------+----------+----------+

A source to bytecode position mapping simply states that the bytecode that
starts from the specified offset up until the offset in the next mapping, or
if there is none up until the end of the bytecode, has it's source in
location X.

A mapping always starts with the offset in the bytecode, followed by the
type of the mapping.

0 (relative)
+----------+----------+----------+----------+
| bytecode offset |
+----------+----------+----------+----------+

4
+----------+----------+----------+----------+
| mapping type |
+----------+----------+----------+----------+

There are 3 mapping types.

Type 0 means there is no source available for the bytecode starting at the
given offset. No further data is stored with this type of mapping; the next
mapping continues immediately after it.

Type 1 means the source is available in a file. A NULL terminated string
containing the filename follows.

Type 2 means the source is available in a source segment. Another integer
follows, which will specify which source file in the source segment to use.

Note that the ordering of the offsets into the bytecode must be sequential;
a mapping for offset 100 cannot follow a mapping for offset 200, for
example.

COMPATIBILITY
This change is incompatible with the current debug segment format. But
that's OK, we're still in development.

Comments on this would be very welcome, even if it's as simple as "looks OK
to me" or "looks terrible to me". :-)

Thanks,

Jonathan

Roger Browne

unread,
Sep 21, 2005, 7:09:12 AM9/21/05
to perl6-i...@perl.org
Jonathan Worthington wrote:

> FORMAT PROPOSAL...

Great! Anything that brings parrot closer to being able to report the
HLL filename and line numbers is a good thing!

> SOURCE SEGMENTS
> ... the idea would seem to be

> that this segment can contain source code. I suspect the intention of it
> was to store the source code of high level languages rather than PASM or
> PIR.

I don't think Parrot should care about what languages are in the source
segments. If someone is writing directly in PASM or PIR, that can go in
a source segment. If someone is writing in a high-level langauge, that
can go in a source segment. If someone is writing data from which HLL
code is generated by some utility (e.g. yacc, a UML tool, or a GUI
designer), that data can go in a source segment too.

Any kind of source code for which there exists some kind of debugging
tool is a candidate to go into a source segment. This implies that there
could be more than one source segment per .pbc file, and more than one
source location for each opcode. It also implies that (eventually)
parrot will have a way of knowing how to call all the candidate
debuggers for a particular bytecode location (according to which source
language the programmer wants to debug in).

[Incidentally, source segments may also meet the needs of those who wish
to distribute source with every application, without burdening those who
just want to run the compiled code.]

...


> 2) Allowing for a reference into the source segment in place of a filename.

Some development tools are still going to want the filename, even if
there is a corresponding source segment in the .pbc file. I think it
should be possible to include both.

> COMPATIBILITY
> This change is incompatible with the current debug segment format. But
> that's OK, we're still in development.

Sure, but if we're going to change it, let's change it to something
general that won't need to be changed again after version 1.0 is
released.

This is something that Dan Sugalski mooted in his "WCB: Full bytecode
metadata" blog entry:
http://www.sidhe.org/~dan/blog/archives/000419.html

I like the idea that each HLL can store whatever kind of metadata it
wants. In particular, I'd like to have my Amber compiler put column
numbers as well as line numbers into the .pbc file, and perhaps even
information about which optimizations it has applied.

> 3) Still being space-efficient on disk

Source segments should probably be compressed. There's a lot of
repetition and whitespace in most source languages, so they tend to
compress really well. Any reference into the source would be an offset
into the uncompressed source (which would only need to be uncompressed
during debugging runs).

> The opcode stream will contain one line number per
> bytecode instruction.

You are proposing to use a chain of mappings to record the filename; why
not use the same system for recording all kinds of metadata including
line numbers? Sure, there's a small performance penalty - only during
debugging runs - but there's a worthwhile space saving on disk (because
typical HLLs produce a lot of bytecodes per line of source).

Regards,
Roger Browne


Jonathan Worthington

unread,
Sep 27, 2005, 7:00:49 AM9/27/05
to Roger Browne, perl6-i...@perl.org
Rumour has it this thread got warnocked... ;-) My original task from leo
was to sort out the PASM and PIR debug segment to handle multiple files. I
thought I might try and sort out the HLL debug seg while I was on the job.
From Roger's input and further discussion on IRC, it seems that we need
something more clever for the HLL debug seg than the PASM/PIR one. So, I'll
back off trying to deal with HLL debug for now (provided my supply of time
goes on, I'll try and come back to that in the not too distant future) and
implement something much like I spec'd for PASM and PIR, which only needs a
simple debug segment with file and line number.

"Roger Browne" <ro...@eiffel.demon.co.uk> wrote:
>> FORMAT PROPOSAL...
>
> Great! Anything that brings parrot closer to being able to report the
> HLL filename and line numbers is a good thing!
>

Seems there will be a slighlty longer wait on this one now, but this is very
much needed, I agree.

>> SOURCE SEGMENTS
>> ... the idea would seem to be
>> that this segment can contain source code. I suspect the intention of it
>> was to store the source code of high level languages rather than PASM or
>> PIR.
>
> I don't think Parrot should care about what languages are in the source
> segments. If someone is writing directly in PASM or PIR, that can go in
> a source segment. If someone is writing in a high-level langauge, that
> can go in a source segment. If someone is writing data from which HLL
> code is generated by some utility (e.g. yacc, a UML tool, or a GUI
> designer), that data can go in a source segment too.
>
> Any kind of source code for which there exists some kind of debugging
> tool is a candidate to go into a source segment. This implies that there
> could be more than one source segment per .pbc file, and more than one
> source location for each opcode. It also implies that (eventually)
> parrot will have a way of knowing how to call all the candidate
> debuggers for a particular bytecode location (according to which source
> language the programmer wants to debug in).
>
> [Incidentally, source segments may also meet the needs of those who wish
> to distribute source with every application, without burdening those who
> just want to run the compiled code.]
>

Pretty much agree with this.

> ...
>> 2) Allowing for a reference into the source segment in place of a
>> filename.
>
> Some development tools are still going to want the filename, even if
> there is a corresponding source segment in the .pbc file. I think it
> should be possible to include both.
>

I was thinking of putting the filename in the source segment, so you could
iterate over the source segments and get the filenames of the source files.
So the filenames would be there.

>> COMPATIBILITY
>> This change is incompatible with the current debug segment format. But
>> that's OK, we're still in development.
>
> Sure, but if we're going to change it, let's change it to something
> general that won't need to be changed again after version 1.0 is
> released.
>

This is the argument that makes me think we hold off the HLL debug seg for a
little while, until somebody (maybe myself) can come up with a design that
meets the needs of HLLs better.

> This is something that Dan Sugalski mooted in his "WCB: Full bytecode
> metadata" blog entry:
> http://www.sidhe.org/~dan/blog/archives/000419.html
>
> I like the idea that each HLL can store whatever kind of metadata it
> wants. In particular, I'd like to have my Amber compiler put column
> numbers as well as line numbers into the .pbc file, and perhaps even
> information about which optimizations it has applied.
>

Yeah, though we also have to consider how Parrot will know what metadata to
show when an error occurs. I guess we need something per language that gets
called along with a reference to the appropriate chunk of meta-data for the
current location and knows how to render an error message for that language.
Then just have a default way to dump the data when this is not supplied.
Also need some thought with regard to how we can efficiently store such
metadata in a packfile.

>> 3) Still being space-efficient on disk
>
> Source segments should probably be compressed. There's a lot of
> repetition and whitespace in most source languages, so they tend to
> compress really well. Any reference into the source would be an offset
> into the uncompressed source (which would only need to be uncompressed
> during debugging runs).
>

Hadn't thought of this...may be a good idea provided we can find a cheap to
implement and free of legal issues compression algorithm. I'll admit now to
not knowing a great deal about this kinda stuff.

>> The opcode stream will contain one line number per
>> bytecode instruction.
>
> You are proposing to use a chain of mappings to record the filename; why
> not use the same system for recording all kinds of metadata including
> line numbers? Sure, there's a small performance penalty - only during
> debugging runs - but there's a worthwhile space saving on disk (because
> typical HLLs produce a lot of bytecodes per line of source).
>

HLLs do, but for PASM/PIR that isn't the case. Thus another reason to do
something different for each.

Thanks,

Jonathan

Roger Browne

unread,
Sep 27, 2005, 8:23:37 AM9/27/05
to perl6-i...@perl.org
Jonathan Worthington wrote:
> ... My original task from leo
> was to sort out the PASM and PIR debug segment to handle multiple files.
> ... it seems that we need
> something more clever for the HLL debug seg than the PASM/PIR one. So, I'll
> back off trying to deal with HLL debug for now

Fair enough! A small task undertaken is always better than a larger task
warnocked.

By the way (just to complicate things further) PIR files can include
other PIR files using the ".include" directive, but the included
filename and line numbers are not currently being processed.

Regards,
Roger Browne

Jonathan Worthington

unread,
Sep 27, 2005, 8:27:11 AM9/27/05
to Roger Browne, perl6-i...@perl.org
"Roger Browne" <ro...@eiffel.demon.co.uk> wrote:
> Jonathan Worthington wrote:
>> ... My original task from leo
>> was to sort out the PASM and PIR debug segment to handle multiple files.
>> ... it seems that we need
>> something more clever for the HLL debug seg than the PASM/PIR one. So,
>> I'll
>> back off trying to deal with HLL debug for now
>
> Fair enough! A small task undertaken is always better than a larger task
> warnocked.
>
Yup, and as I've still much to learn about the Parrot codebase, a lighter
learning curve for me.

> By the way (just to complicate things further) PIR files can include
> other PIR files using the ".include" directive, but the included
> filename and line numbers are not currently being processed.
>

This along with the recent addition of pbc_merge are the main motivating
factors for improving the PASM/PIR debug segment.

Jonathan

dabouqi

unread,
Oct 8, 2005, 2:30:40 PM10/8/05
to
0 new messages