Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Dynamic binding (e.g. Perl5 "local"), continuations, and threads

6 views
Skip to first unread message

Bob Rogers

unread,
Dec 23, 2005, 7:42:50 AM12/23/05
to perl6-i...@perl.org
The obvious way to implement Perl5 "local" bindings in Parrot would
be to:

1. Capture the old value in a temporary;

2. Use store_global to set the new value;

3. Execute the rest of the block; and

4. Do another store_global to reinstate the old value.

This has a serious flaw: It leaves the new binding in effect if
executing the rest of the block causes a nonlocal exit. Throwing can be
dealt with by establishing an error handler that reinstates the old
value and rethrows, but this doesn't begin to address continuations, not
to mention coroutines, which can be used to jump to an arbitrary call
frame without unwinding the stack.

Then there are threads to consider. The naive approach makes each
thread's dynamic bindings visible to all other threads, which may or may
not be desirable (not, IMHO, but this is a language design issue).
Worse, the final state of the variable depends on which thread exits
first, which is surely a bug.

Proposal:

The only reasonable approach (it seems to me) is to keep dynamic
binding information in the call frame. One possible implementation is
as follows:

1. Add a dynamic_bindings pointer to the call frame. This points to
a linked list of the frame's current bindings, each entry of which holds
a name/value pair. Each thread's initial frame gets its
dynamic_bindings list initialized to NULL. Each new frame's
dynamic_bindings list is initialized from the calling frame's
dynamic_bindings. Nothing additional need be done for continuation
calling. The dynamic_bindings list needs to be visited during GC.

2. Modify store_global and find_global to search this list for the
desired global. If found, store_global modifies the entry value, and
find_global fetches the entry value. If not found, the existing hash is
consulted in the current fashion (modulo namespace implementation).

3. Add a bind_global op with the same prototype as store_global that
pushes a new entry on the dynamic_bindings list.

4. Add an unbind_global op that takes an integer or integer constant
and pops that many entries off of the dynamic_bindings list. Bindings
are effectively popped when the sub exists, but this op is still needed
for cases where the end of a dynamic binding lifetime comes before the
end of the sub.

Advantages:

+ The scheme is robust with respect to continuations, threads,
coroutines, and nonlocal exits. Each sub creates dynamic bindings that
are visible only within its dynamic scope (i.e. subs that it calls,
directly or indirectly), and not to other threads or coroutines.

+ The overhead is low: a pointer copy on call, none on return, and
zero context switching overhead. For typical programs with little or no
dynamic binding, these are the only costs.

+ The code that a human or compiler needs to emit is even simpler
than that of the naive scheme described above.

Disadvantages:

+ The time to fetch or store a dynamic binding is proportional to the
depth of the dynamic_bindings stack, which could be considerable for
languages that do a lot of dynamic binding. Conceivably, this could be
addressed by using a language-dependant PMC class for the binding entry,
and any such dynamic-binding-intensive language could define a per-frame
class that acted as a linked list of hashes. But this could be
postponed until it was needed, possibly indefinitely.

If this is acceptable (and there isn't already a better plan), I will
have time to address this over the holiday week. TIA,

-- Bob Rogers
http://rgrjr.dyndns.org/

Leopold Toetsch

unread,
Dec 30, 2005, 7:17:39 PM12/30/05
to Bob Rogers, perl6-i...@perl.org

On Dec 30, 2005, at 17:50, Bob Rogers wrote:

> The attached patch is functionally complete, but I still have a few
> loose ends to nail down, so I thought it would be a good time to post
> it
> for review. The issues are as follows:
>
> 1. It needs more higher-level documentation. Is compiler_faq.pod
> the best place for this?
>
> 2. Binding 'foo' to a null PMC should cause find_global to err out
> with "Global "foo" not found", but this is not happening.
>
> 3. There are no tests for threads or stack-unwinding continuations.
> (Yes, I am procrastinating.)

I think I've missed to answer the last one WRT that - sorry. Anyway,
here are some comments:

- the patch looks really great and seems to cover all aspects to get
this running (just a minor nitpick: src/register.c:init_context() is a
common place to copy inherited context items)

- the patch seems to miss the 'other part' of 'local' i.e. restoring
the old contents of a variable, like e.g. described in 'perldoc
perlsub' - (whatever the reason is that it's doced there ;)

- that is, Perl5 'local' is much more related to Perl6 'let' or 'temp',
it's not just hiding a variable, it looks to me much more like undoing
temporary damage, which looks to me like STM (w/o the needed
synchronization bits)

- if the intent is just to temporarly hide a name, then it should be ok
(but not refer to 'local' at all)

I think it needs more thoughts and more (common) infrastructure .

leo

Bob Rogers

unread,
Dec 30, 2005, 11:41:25 PM12/30/05
to Leopold Toetsch, perl6-i...@perl.org
From: Leopold Toetsch <l...@toetsch.at>
Date: Sat, 31 Dec 2005 01:17:39 +0100

On Dec 30, 2005, at 17:50, Bob Rogers wrote:

> The attached patch is functionally complete, but I still have a few
> loose ends to nail down, so I thought it would be a good time to post
> it
> for review. The issues are as follows:
>
> 1. It needs more higher-level documentation. Is compiler_faq.pod
> the best place for this?
>
> 2. Binding 'foo' to a null PMC should cause find_global to err out
> with "Global "foo" not found", but this is not happening.
>
> 3. There are no tests for threads or stack-unwinding continuations.
> (Yes, I am procrastinating.)

I think I've missed to answer the last one WRT that - sorry. Anyway,
here are some comments:

No prob; thanks for your review.

- the patch looks really great and seems to cover all aspects to get
this running (just a minor nitpick: src/register.c:init_context() is a
common place to copy inherited context items)

Yes, thank you; that does seem to be the logical place. I think I had
passed this over because of the init_context call at src/register.c:325
(as patched) which doesn't pass the caller's context. But this
particular call looks like bit rot; it's only enabled if CHUNKED_CTX_MEM
is true. Should I flush it?

- the patch seems to miss the 'other part' of 'local' i.e. restoring

the old contents of a variable . . .

It never actually has to restore anything, because the old contents are
still in the appropriate stash. The new dynamic bindings shadow the old
one(s), and are undone simply by virtue of returning from the sub. This
is why the coroutine test case works the way it does. (FWIW, the
implementation uses what we old Lispers call "deep binding," mixed with
"shallow binding" (checking the stash) if a dynamic binding is not
found.)

. . . like e.g. described in 'perldoc perlsub' - (whatever the reason


is that it's doced there ;)

At a guess, I'd say that's the bucket into which all discussion of
scoping has been tossed.

- that is, Perl5 'local' is much more related to Perl6 'let' or 'temp',
it's not just hiding a variable, it looks to me much more like undoing
temporary damage, which looks to me like STM (w/o the needed
synchronization bits)

Actually, if I understand STM correctly, the dynamic_bindings stack
functions very much like a "memory log", except that it only applies to
find_global and store_global, which makes it cheaper. But you're right;
this is only part of what "local" can do.

- if the intent is just to temporarly hide a name, then it should be ok
(but not refer to 'local' at all)

I think it needs more thoughts and more (common) infrastructure .

leo

Hmmm [he mumbles as he studies "perldoc perlsub", for the first time in
ages . . . ]

It's true, this patch doesn't address dynamic binding of Perl5
typeglobs and arbitrary references (i.e. "local $SIG{INT} = ..."). But
it does implement all of the rest of the functionality of Perl5 "local",
it seems to me. So, if you prefer, one could just call it "dynamic
variable binding" instead of the more general "dynamic binding."

In any case, I agree that it needs more thought. Here's a sampling
off the top of my head:

1. Dynamic binding of typeglobs. I haven't noticed a Parrot concept
of "symbol table entry"; in particular, Matt's namespace proposal
<http://xrl.us/i5n6> doesn't mention one. Consequently, I am assuming
that each Per[56] typed variable is named with its sigil and treated
separately for purposes of *_global operations. Therefore, it should be
both necessary and sufficient for "local *foo = ..." to expand into code
that binds each sigil separately. True? How else could you do it?

2. Dynamic binding of hash and array elements. The "perlsub"
example of this (the paragraph starts with "It's also worth taking a
moment to explain what happens ...") pretty much requires that this be
done via "temporary damage" (aka shallow binding) -- I can't think of
any other way to mutate $ary[5] and then restore @ary as described.
This has the drawbacks cited in the "Dynamic binding" proposal I posted
last Friday, but that's just too bad, as there isn't a choice.
Consequently, I would argue that these require a separate mechanism.

Now, if you buy both of these arguments, then that suggests I should
finish the patch as designed, and design the rest of "local" support
separately. If not, though, I may not have time to finish a more
general design during my window of opportunity, in which case finishing
the patch as designed may still be a good strategy.

But I will go back to the drawing board, and post more later . . .

-- Bob

Bob Rogers

unread,
Jan 2, 2006, 6:55:24 PM1/2/06
to Leopold Toetsch, perl6-i...@perl.org
Table of contents

1. Deep binding is not appropriate.
2. Outline of a shallow-binding solution.
3. Unbinding must work with a general stack-unwinding mechanism.
4. Conclusion (not).

1. Deep binding is not appropriate.

It has always been clear that a "save/modify/restore" mechanism for
at least some of the functions of Perl5 "local"/Perl6 "temp/let" will be
needed. Such a mechanism is likely to be complicated (see below), which
is why I had tried to solve part of the problem independently with a
relatively lightweight solution. (Indeed, I doubt I grok "temp/let"
yet, so it's probably more complicated than I think.) Because of this
complexity, and because shallow binding can subsume the special case of
binding symbol table values, there is little advantage to having a
parallel mechanism solely for binding globals. Indeed, there are enough
things that interact with stack-unwinding as it is. So, please consider
my previous proposal withdrawn. (Sigh.)

That leaves us with shallow binding.

2. Outline of a shallow-binding solution.

"Shallow binding" essentially just means the straightforward
"save/modify/restore" approach. During execution of a particular
stretch of code in a given context, that context's dynamic bindings are
in place in the heap, and any alien being who can reach into Parrot's
memory (gdb, for one) would find the values established by local/temp.
This means that we must ensure that those dynamic bindings are undone
when exiting the context, even temporarily, and redone when re-entering.
There are a number of cases:

A. Sub/method call. This is the simplest; we are creating a new
context that needs to inherit the old one, but otherwise the environment
stays the same.

B. Sub/method return. The current context's dynamic bindings must
be undone, effectively popping them off.

C. Continuation call. This involves leaving one context (without
necessarily exiting it permanently) and entering another. The simplest
case is just B, but in general the two contexts, call them X and Y, do
not necessarily have any particular relationship. We need to (1) find
the common ancestor of X and Y, call it Q; (2) undo the dynamic bindings
between X and Q in reverse order, but without throwing them away; and
(3) redo dynamic bindings between Q and Y in forward order. Call this
operation "rezipping" the dynamic binding stack.

D. Coroutine yield. This turns out to be equivalent to C, except
that the "from" context is not abandoned.

E. Throwing an exception. This is simpler than the general
continuation case, since the throw is always to an ancestor, but with an
interesting wrinkle. A sub may make bindings before pushing a handler,
or afterwards, or both, so the Exception_Handler object must somehow
record the dynamic binding state when it is created. Care must be taken
to avoid allowing the control stack to get out of phase with respect to
the binding stack. One way to do this would be to use the control stack
for dynamic bindings, forcing bindings and handlers to take note of each
other, but this requires further thought.

[I had thought that context switching between threads would be
another such case, but the ithreads model saves us the trouble; I didn't
understand that at the time I made my previous posts. Dynamic binding
of shared variables needs to be handled correctly, especially with
respect to locking, but I expect this can be addressed independently.]

So most of the complexity boils down to continuations and exceptions.
Note that if the common ancestor Q has dynamic bindings in effect when
the calls leading to X and Y are made, then we don't want to undo/redo
all of Q's bindings; really, we want to find the common ancestor of the
dynamic binding stacks, and not that of the contexts. Rezipping can be
done in time proportional to the number of bindings that must be
undone/redone if we keep a depth counter, similar to the recursion_depth
field in struct Parrot_Context, in each dynamic binding record. Let us
call this structure a Parrot_DBR (for dynamic binding record). The
Parrot_DBR therefore needs the following:

typedef struct _parrot_dynamic_binding_record {
Parrot_DBR *prev; /* backpointer to previous binding. */
INTVAL depth; /* stack depth, used for rezipping. */
PMC *location; /* a generalized location PMC, not NULL. */
PMC *value; /* the saved value; NULL means unbound. */
} Parrot_DBR;

Note that we cannot make this a doubly-linked list; coroutines would
make it branch, so there might be multiple forward pointers. However,
it would be handy to have a temp_next slot for rezipping, to make it
easier to go forward to the destination context from the common
ancestor.

The saved value slot contains the old value when the binding is in
effect, and the newly bound value otherwise. Rezipping past a
Parrot_DBR in either direction therefore consists of swapping the heap
value with the saved value.

Most of the magic is hidden in the location PMC, which records a
generalized notion of a place to be modified [0]. There are two basic
operations: Fetch the location contents (which is always returned as a
PMC [1]), and store a new value into the location (also always a PMC).
For a global, the location records the namespace name (or namespace
object?) and the global name; for an arbitrary structure reference, the
location records the structure and the index/key. That's two new PMC
classes so far; Perl5 typeglobs will probably require at least one more
(whenever that's defined). More still will be required for temporizing
object attributes [2], but the additional complexity in these cases gets
swept neatly under the rug provided by the location PMC. OO to the
rescue, again.

Note that not all locations (e.g. arrays) will support a notion of
"unbound" -- it is the location object's job to pitch an exception if
somebody asks to store a NULL there, which will happen immediately on
binding. Othewise, the location must convert back and forth between the
NULL stored in the Parrot_DBR and the appropriate implementation.

3. Unbinding must work with a general stack-unwinding mechanism.

A key requirement for shallow-binding (and the reason I had shied
away from it at first) is that the interpreter must realize when it
needs do additional work when leaving a context in order to restore
state. This requirement is shared by "throw" and by the cleanup implied
for Perl6 'post', 'undo', and the like [3]:

my $f will post { close } will undo { unlink $file }
= open ">$file" or die;

The curlies contain closures that must be invoked under the right
circumstances on block exit. Part of the definition of "right
circumstances" is that any dynamic bindings and error handlers be
restored to the state they were at the time the closures were taken
(i.e. right after $f entered scope), so all such dynamic cleanup daemons
must interact cleanly with each other and with all stack-unwinding
operations [4]. Fortunately for us, the stack unwinders reduce to
'throw' and continuation calling.

Note that, because of rezipping, dynamic binding still requires some
extra mechanism. Since a given dynamic binding might be disestablished
and reestablished an arbitrary number of times before being finally
undone, a pure POST block doesn't cut it.

4. Conclusion (not).

If this seems like the right thing, I will pursue it, and publish a
detailed design in due course. But please don't hold your breath, as I
won't have much time for it; my "Parrot hacking window" is closing, and
work is likely to keep me very busy for the next few months. (Sigh.)

Notes

[0] Interestingly, this is equivalent to the generalized reference PMC
that was discussed in the "morph()ing" thread last April,
<http://xrl.us/fwkx>

[1] Even though the Parrot_DBR interface requires a PMC, the location
object could be set up to fetch/replace I/S/N values internally.

[2] About two-thirds of the way through A06 (search for "temporize
object attributes"), Larry says that this will be done via
closures. In order to support rezipping, such a closure would need
to accept a new value to store and return the old value. Or maybe
the .TEMP method could just return a location PMC?

[3] From A04, about the middle (search for "time passes").

[4] It appears that pushaction, pushmark, and popmark were intended to
provide some of this cleanup mechanism, but they don't seem up to
the task. More on this later.

Larry Wall

unread,
Jan 2, 2006, 11:11:23 PM1/2/06
to perl6-i...@perl.org
On Mon, Jan 02, 2006 at 06:55:24PM -0500, Bob Rogers wrote:
: [2] About two-thirds of the way through A06 (search for "temporize

: object attributes"), Larry says that this will be done via
: closures. In order to support rezipping, such a closure would need
: to accept a new value to store and return the old value. Or maybe
: the .TEMP method could just return a location PMC?

I would prefer the latter for the general case. As with any "rw"
location, you're returning a proxy/location object with an appropriate
lvalue interface. Any user-specified closures end up being methods
of that object. Treating such a closure as the proxy object not
only confuses getting with setting but also confuses both of those
with the original identification of the location. I was not terribly
consistent in carrying this idea though in the original Apocalypses.

By the way, it's getting to be a bit dangerous to quote the Apocalypses
rather than the Synopses, since they've diverged somewhat, and the
Synopses are being kept rather more up-to-date. The "post" block
has turned into a "leave" block, for instance. (Pre/post blocks are
reserved for Design-By-Contract now.)

Anyway, I think we'll need to recognize over the long haul that some
forms of control flow are simply incompatible with certain kinds of
state change, in a Haskellian monadic sense. I think if the first
whack at rezipping can simply detect such situations and refuse to
proceed, we'll at least be up to the Perl 5 level, even if we can't
continue into a temporized area. After all, the entire Perl 5 model
assumes that you only have to undo once. It'd be nice to generalize
from that, but Perl 6 is moving more toward using a thing we're calling
"environmental" lexical variables to represent dynamic context anyway,
which I think is more like what you'd call a deep binding. So from
a Perl perspective, temporization is being de-emphasized somewhat.

On the other hand, as you point out, you do have to be careful about
unwinding exceptions. We'll need to do that as lazily as possible,
since some exceptions will include an optional continuation to
resume if the exception thrower wants to allow the exception to be
"defatalized". In fact, I suspect we'll end up handling all warnings
as exceptions that are defatalized by default by an outermost warning
exception handler, and it'd be a shame if, when we resume after
a warning, the temporizations have all been undone. Even if they
get rezipped at resumption, that's gotta be a big performance hit
just to emit a warning. So maybe there's some way of running CATCH
blocks in the dynamic context of the thrower without rezipping twice.
Have to think about the semantic ramifications of that though...

Larry

Bob Rogers

unread,
Jan 3, 2006, 11:43:50 PM1/3/06
to Larry Wall, perl6-i...@perl.org
From: Larry Wall <la...@wall.org>
Date: Mon, 2 Jan 2006 20:11:23 -0800

On Mon, Jan 02, 2006 at 06:55:24PM -0500, Bob Rogers wrote:
: [2] About two-thirds of the way through A06 (search for "temporize
: object attributes"), Larry says that this will be done via
: closures. In order to support rezipping, such a closure would need
: to accept a new value to store and return the old value. Or maybe
: the .TEMP method could just return a location PMC?

I would prefer the latter for the general case. As with any "rw"
location, you're returning a proxy/location object with an appropriate
lvalue interface. Any user-specified closures end up being methods
of that object. Treating such a closure as the proxy object not
only confuses getting with setting but also confuses both of those
with the original identification of the location. I was not terribly
consistent in carrying this idea though in the original Apocalypses.

Thanks for making that clear. The advantage of supporting closures, of
course, is that they are very general, and allow pretty arbitrary things
to be rezipped (if you want to allow such extension). But I do agree
that it's better to use something that describes exactly what you mean,
and is (at least potentially) introspectable.

By the way, it's getting to be a bit dangerous to quote the Apocalypses
rather than the Synopses, since they've diverged somewhat, and the
Synopses are being kept rather more up-to-date. The "post" block
has turned into a "leave" block, for instance. (Pre/post blocks are
reserved for Design-By-Contract now.)

Sorry; I was hoping to cut down on the amount of reading I need to do.
I still haven't even covered the full set of Apocalypses. I've also
avoided subscribing to p6l for the same reason. But I suppose I need to
reconsider both.

Anyway, I think we'll need to recognize over the long haul that some
forms of control flow are simply incompatible with certain kinds of
state change, in a Haskellian monadic sense. I think if the first
whack at rezipping can simply detect such situations and refuse to
proceed, we'll at least be up to the Perl 5 level, even if we can't
continue into a temporized area. After all, the entire Perl 5 model
assumes that you only have to undo once.

Yes; "life was much simpler back then . . ." Closures can be a pain,
but on the other hand, they are also a lot of fun; personally, I
consider them the natural vehicle for implementing nonlocal transfer of
control. Which is why I care about getting things to rezip correctly.

The good news, though, is that I haven't seen mention of anything
that I wouldn't know how to implement reasonably -- with the exception
of "Hypotheticality and Flight Recorders", which itself seems rather
hypothetical. But even there, a lexical version might be a reasonable
thing to consider:

$x = 0;
temp {
$x = 1234;
print $x; # prints 1234
}
print $x; # prints 0

That pushes much of the work back on the compiler (which needs to run
wires into the flight recorder), but that strikes me as reasonable; you
don't need or want EVERYTHING captured by a flight recorder.

But all that can wait. The point is that I think the proposed model
is robust enough for what you want to do with it (caveat my limited
understanding of same).

It'd be nice to generalize
from that, but Perl 6 is moving more toward using a thing we're calling
"environmental" lexical variables to represent dynamic context anyway,
which I think is more like what you'd call a deep binding. So from
a Perl perspective, temporization is being de-emphasized somewhat.

FWIW, the terms "deep" and "shallow" binding are old Lisp terms (in case
you hadn't already guessed), and as used in the Lisp community, they
refer to dynamic binding exclusively. (For this purpose, deep binding
is clearly less efficient, which is why you don't hear those terms used
much at all any more; shallow binding is taken for granted.)

But in any case, I do not understand this bit, which is probably a
reflection of the Synopsis reading I have to do.

On the other hand, as you point out, you do have to be careful about
unwinding exceptions. We'll need to do that as lazily as possible,
since some exceptions will include an optional continuation to
resume if the exception thrower wants to allow the exception to be
"defatalized".

That allows only one way to continue, though, and it's under exclusive
control of the exception-thrower. Have you encountered the Common Lisp
'restart' concept? (The reference is [5], but there's a fair amount to
it, so I won't try to produce a capsule summary.) The advantage of
restarts is that they allow users (both throwers and catchers) to define
named restart actions that can be invoked programmatically, or
interactively via the debugger. Specifying that a particular action
should happen whenever a given error is encountered then becomes a
simple matter of binding a handler that invokes the right restart, which
could even be done by a third party (i.e. neither the thrower nor the
catcher). But only certain predefined restarts are possible, which
makes the exception handling process a multiway negotiation where no
party has full control. (IIRC, the original paper [6] makes this
distribution of control an explicit goal of the design.)

In fact, I suspect we'll end up handling all warnings as exceptions
that are defatalized by default by an outermost warning exception
handler, and it'd be a shame if, when we resume after a warning, the
temporizations have all been undone. Even if they get rezipped at
resumption, that's gotta be a big performance hit just to emit a
warning.

But that argues that warnings should be handled close to the point that
the warning is emitted, at least in the case where all that is required
is to suppress them. I can't resist quoting the entire WARN definition
from the CMUCL implementation, which also serves as a good example of
using restarts:

(defun warn (datum &rest arguments)
"Warns about a situation by signalling a condition formed by datum and
arguments. While the condition is being signaled, a muffle-warning
restart exists that causes WARN to immediately return nil."
(kernel:infinite-error-protect
(let ((condition (coerce-to-condition datum arguments
'simple-warning 'warn)))
(check-type condition warning "a warning condition")
(restart-case (signal condition)
(muffle-warning ()
:report "Skip warning."
(return-from warn nil)))
(format *error-output* "~&~@<Warning: ~3i~:_~A~:>~%" condition)))
nil)

The SIGNAL call is what gives handlers a chance to run. In the normal
case, nobody handles it, and control drops through the RESTART-CASE to
the FORMAT, where the warning is printed. (SIMPLE-WARNING is not fatal
because it is not a subclass of ERROR, so the default debugger handler
also gives it a pass.) On the other hand, if somebody binds a condition
handler to SIMPLE-WARNING that does "(invoke-restart 'muffle-warning)",
then the restart does a RETURN-FROM, and the output is skipped. Since
the handler is a closure, the code can examine the contents of the
condition object before deciding to muffle it. And all of this happens
in the dynamic context of the WARN function (inside
kernel:infinite-error-protect, which sounds like a really good thing).

So maybe there's some way of running CATCH
blocks in the dynamic context of the thrower without rezipping twice.
Have to think about the semantic ramifications of that though...

Larry

I'm not sure that the cost would be that large; we're only talking about
tens of state-capture PMCs here. (I certainly hope.)

But in any case, rezipping may not be required. At the risk of
trying your patience, I will lay one more bit of Common Lisp
error-handling semantics on you, as I hope it will serve as food for
thought.

The CL error signalling process [7] has two phases: The first phase
decides which handler takes the error, and the second actually does the
job. This is supported by two syntactic abstractions: HANDLER-CASE
works like try/catch, and is typically implemented in terms of
HANDLER-BIND, which provides the lower-level but more general mechanism.
Sweeping much under the rug here, the low-level handlers are closures
that are run in the dynamic environment where the error is signalled; if
they decide to handle an error, they do so via a nonlocal transfer of
control back to the right handler in the HANDLER-CASE environment.

The upshot is that the bound handlers can be made pretty light-weight
and therefore insensitive to the dynamic environment in which they run
(unless, of course, they need to be), without causing the author of the
HANDLER-CASE code to lose control of the environment in which the error
cleanup forms run.

So if you would consent to putting the "when" guard tests of each
"catch" in closures, with the caveat to the user that they could be
evaluated an arbitrary number of times in somebody else's backyard,
Parrot could always find exactly the place to which it's going to unzip
before it starts.

And this leads to the biggest win of the "know before you throw"
approach: You find out about unhandled errors immediately, before
you've destroyed the most interesting part of the stack. That is the
chief drawback of the current Parrot model; I've been ruminating about
it for a while now, but have no concrete proposals yet. I'll bet that
wrestling with dynamic binding will lead me to generate one (or two) --
if you or someone else doesn't beat me to it.

Thanks for speaking up, and thanks especially for listening,

-- Bob

[5] http://www.lispworks.com/documentation/HyperSpec/Body/09_adb.htm

[6] http://www.nhplace.com/kent/Papers/Exceptional-Situations-1990.html

[7] http://www.lispworks.com/documentation/HyperSpec/Body/09_ad.htm

Bob Rogers

unread,
Jan 4, 2006, 9:26:03 PM1/4/06
to perl6-i...@perl.org
This patch fixes two find_exception_handler bugs, and identifies a
third, all related to executing pushaction subs:

1. It is necessary to wait until after popping the action before
calling it, lest the signalling of another error during action execution
causes you to lose, and lose, and lose, . . .

2. Even without an error, it seems that actions could sometimes be
run twice because stack_pop compared entry->cleanup to 0 instead of
STACK_CLEANUP_NULL. (I can't reproduce this now, so it probably
depended on something else as well.)

3. Actions are not run at all if the context is skipped by calling a
continuation. This is a deeper problem, so I've just added a TODO test
in order to document it as a known problem; I hope to fix this in the
course of implementing dynamic binding.

TIA,

action-1.patch

Leopold Toetsch

unread,
Jan 5, 2006, 5:38:04 AM1/5/06
to Bob Rogers, perl6-i...@perl.org

On Jan 5, 2006, at 3:26, Bob Rogers wrote:

> This patch fixes two find_exception_handler bugs, and identifies a
> third, all related to executing pushaction subs:

Applied - r10902

> 1. It is necessary to wait until after popping the action before
> calling it, lest the signalling of another error during action
> execution
> causes you to lose, and lose, and lose, . . .

Good catch.

>
> 2. Even without an error, it seems that actions could sometimes be
> run twice because stack_pop compared entry->cleanup to 0 instead of
> STACK_CLEANUP_NULL. (I can't reproduce this now, so it probably
> depended on something else as well.)

Well, STACK_CLEANUP_NULL is ((Stack_cleanup_method)NULLfunc), which is
the same as 0 on all current platforms (AFAIK), but your patch is of
course correct.

> 3. Actions are not run at all if the context is skipped by calling
> a
> continuation. This is a deeper problem, so I've just added a TODO test
> in order to document it as a known problem; I hope to fix this in the
> course of implementing dynamic binding.

It should work, when implementing the stack unwinding in
Continuation.invoke (which is also called by Exception_handler.invoke).
Now the question arises: what to do on the 2nd time the sub is running.
There are probably 2 options:
a) user code has to reenable exception handlers and cleanup actions,
i.e. above stack unwinding is permanent
b) the control stack remains unchanged, all exception handlers and
cleanup actions are still in place.

> TIA,

I thank you,
leo

Bob Rogers

unread,
Jan 5, 2006, 8:35:29 PM1/5/06
to Leopold Toetsch, perl6-i...@perl.org
From: Leopold Toetsch <l...@toetsch.at>
Date: Thu, 5 Jan 2006 11:38:04 +0100

On Jan 5, 2006, at 3:26, Bob Rogers wrote:

> 2. Even without an error, it seems that actions could sometimes be
> run twice because stack_pop compared entry->cleanup to 0 instead of
> STACK_CLEANUP_NULL. (I can't reproduce this now, so it probably
> depended on something else as well.)

Well, STACK_CLEANUP_NULL is ((Stack_cleanup_method)NULLfunc), which is
the same as 0 on all current platforms (AFAIK), but your patch is of
course correct.

Then this doesn't explain how I fixed the double invocation problem. It
could be that the problem itself was just a figment of my imagination,
an artifact of hastily rewriting the test case (my vote), or something I
had introduced temporarily while reordering find_exception_handler code.
Oh, well.

> 3. Actions are not run at all if the context is skipped by calling
> a
> continuation. This is a deeper problem, so I've just added a TODO test
> in order to document it as a known problem; I hope to fix this in the
> course of implementing dynamic binding.

It should work, when implementing the stack unwinding in
Continuation.invoke (which is also called by Exception_handler.invoke).
Now the question arises: what to do on the 2nd time the sub is running.
There are probably 2 options:
a) user code has to reenable exception handlers and cleanup actions,
i.e. above stack unwinding is permanent
b) the control stack remains unchanged, all exception handlers and
cleanup actions are still in place.

If I had to choose between these two, I'd say that 'b' is easier,
because popping is easier than repushing.

But it would be much nicer if the dynamic state were automagically
restored to the same exact state as when the closure was taken; then, no
adjustments are necessary in PIR code. This can be done if the
continuation stores some notion of the binding_stack state at that time.
I am expecting to argue (when I have time to complete the research) that
dynamic bindings should also be kept on the control stack; if that
happens, then the rezipping process can also do the right thing wrt
restoring exception and action state, almost for free.

That's why I didn't want to try to fix the continuation-unwinding
issue now; I am hoping to get multiple birds with a single stone.

How convenient for all this that Exception_handler is a subclass of
Continuation. It's always good to Do The Right Thing, but much easier
if you can simply inherit it.

-- Bob

Bob Rogers

unread,
Jan 7, 2006, 8:36:30 PM1/7/06
to Leopold Toetsch, perl6-i...@perl.org
This is an attempt to summarize my thinking about the instruction
interface to dynamic binding and its interaction with the other
dynamically-scoped bits of Parrot. I am hoping to get feedback before
diving further into the implementation details.

Please let me know what you think. TIA,

------------------------------------------------------------------------

0. Table of contents.

1. The control stack is really the dynamic stack.
2. There ought to be a popaction instruction.
3. Dynamic binding state also belongs on the control stack.
4. Dynamic binding needs bind_global, bind_location, and unbind_n ops.
5. Implementation should be done in phases.
6. Notes.

1. The control stack is really the dynamic stack.

Currently, the control stack is used to store the following things:

1. Exception handlers. These are manipulated by the push_eh and
clear_eh instructions.

2. Cleanup actions. These are pushed by pushaction, but there is
currently no way to pop these, save indirectly via popmark (or returning
from the sub).

3. Stack marks. These are manipulated by pushmark and popmark, and
allow multiple control stack entries to be removed at once.

4. Local return addresses. These are pushed by the bsr and jsr
instructions, and popped by ret.

Of these operations, the language features that require the first two
are lexically determined. That is, they are determined by HLL
constructs that are lexical features of the program [1], despite having
dynamic scope, and hence the start and end of these features' lifetimes
can be precisely located in the program text. We always know when and
what we have to pop.

Furthermore, and particularly in the case of error handlers and
return addresses, all four are meaningful only while that calling
context is active [2], and so must necessarily be popped when the sub
returns.

Note that bsr/jsr don't quite fit this model in principal. However,
they are meaningful only within a single context, and it isn't possible
to bsr to a place that does a push_eh and returns, so in practice
bsr/jsr must be used as if they were lexically determined.

2. There ought to be a popaction instruction.

As a related issue, it is something of an annoyance that there is no
way to pop an action other than to push/pop a mark. True, having a
popaction instruction is not strictly necessary, but by the same token,
neither is clear_eh.

The pushaction instruction is useful for implementing 'leave' blocks.
The following Perl6 code:

{
...
leave { do_some_cleanup($lexical_reference); }
}

would compile into something like

cleanup = newclosure cleanup_sub
pushmark 42
pushaction cleanup
...
popmark 42

The pushmark/popmark could only be omitted if the block was in tail
position, i.e. it returned immediately. If there were a popaction
instruction, on the other hand, it could look like this:

cleanup = newclosure cleanup_sub
pushaction cleanup
...
popaction 1

This is one instruction smaller, takes up slightly less control stack
space, but most usefully allows us to pass a boolean flag to popaction
that tells whether to suppress automatic execution of the action. With
this, since we still have the closure lying around, we can also decide
to call an action differently for normal exit [3]:

...
popaction 0
cleanup(other, args)

Consequently, I would like to suggest the following changes:

=item B<pushaction>(in PMC)

Push the given Sub PMC $1 onto the control stack. If the control stack
is unwound due to a C<popmark> or normal subroutine return, the
subroutine will be invoked with a single integer argument of 0.
If the control stack is unwound due to an exception, the
subroutine will be invoked with a single integer argument of 1.
An action on the top of the control stack can also be removed
explicitly via the C<popaction> instruction, which takes an argument
that specifies whether or not to invoke the action.

=item B<popaction>(in INT)

Pops the action at the top of the control stack. The boolean
argument $1 tells whether the action sub should be invoked; if true,
the action is invoked with a single integer argument of 0 (to denote
normal return), and if $1 is false, the action is discarded without
being invoked. An exception is raised if the top item on the
control stack is something other than an action.

[The change to the pushaction description is mostly just clarification
of existing semantics.]

Interestingly, if pushaction is implemented, then stack marking
becomes strictly unnecessary, as the control stack could always be
popped explicitly; there is no other Parrot feature that strictly
requires popmark [4]. However, it ought to be more efficient to use a
single popmark at the end of a block to get rid of three or more dynamic
state entries. Since there may be lots and lots of these little things
at the end of any given Perl6 block, it seems worth keeping these ops.

3. Dynamic binding state also belongs on the control stack.

To the list of uses for the control stack, I would like to add
dynamic binding. I think this is a natural place to store dynamic
binding state because (a) dynamic binding is also lexically determined
by HLL syntax (AFAIK), and (b) having a single stack for all of the
dynamically-scoped features that are affected by rezipping greatly
simplifies the implementation.

Note that this does increase the level of constraint on the dynamic
binding stack, i.e. you can't "ret" or "clear_eh" if there are dynamic
bindings in the way. I hope (and expect) that nobody will care.

4. Dynamic binding needs bind_global, bind_location, and unbind_n ops.

I would therefore like to propose the following instruction interface
to dynamic binding. In a nutshell, this adds (a) a bind_location
instruction that takes an explicit location object and a new value and
establishes the binding on the control stack; (b) a bind_global
instruction with two variants that handles the important special case of
global variables by creating a VariableLocation object and binding its
PMC arg to that; and (c) an unbind_n instruction (corresponding to
unbind_globals in the patch posted 30-Dec-05) that explicitly pops a
specified number of either kind of dynamic binding.

=item B<bind_global>(in STR, in PMC)

Bind the PMC $2 as the value of the global symbol $1 in the current
dynamic context. If $2 is a Null PMC, then the global is effectively
made locally unbound. The newly-created dynamic binding will be used
by C<find_global> and C<store_global> in the current dynamic
environment only, i.e. this call and all calls made from it.

The lifetime of a dynamic binding lasts until either (a) it is
popped explicitly by C<unbind_n> or C<popmark>; or (b) control exits
from the context where the binding was made. Note that the second
case includes tail calling; all dynamic bindings in the current
context are undone before the tail-called sub starts execution.

Note that there is no C<bind_global_p_s_p> op (i.e. corresponding to
C<store_global_p_s_p>, where the first "p" is a namespace), as dynamic
binding only makes sense with respect to an execution context.

=item B<bind_global>(in STR, in STR, in PMC)

Bind the PMC $3 as the value of the symbol $2 of namespace $1 in the
current dynamic context. The binding is created whether or not
namespace $1 exists already; if it does not, binding to a symbol in
it does not actually create the namespace.

=item B<bind_location>(in PMC, in PMC)

Given a location PMC in $1 (i.e. something derived from the
C<Location> class), bind it dynamically to the value in $2, with
identical scope and lifetime as for C<bind_global>. If $2 is a Null
PMC, then the global is effectively made locally unbound, if that is
supported by the location. During the dynamic lifetime of the
binding and within the dynamic scope of the binding sub, this
location will appear to have a different value than outside the
dynamic scope (e.g. in coroutines created before the binding),
though that value may change during the lifetime if the location is
modified by other means. See C<Location> and its derived classes
for specifics.

=item B<unbind_n>(in INT)

Pop zero or more dynamic bindings for symbols or locations from the
control stack, with the count specified as $1, restoring their
original values. There must be at least $1 bindings on the top of
the control stack, or an exception is raised before any of the
bindings are popped.

Note that an explicit C<unbind_n> is not always needed, as all of a
sub's dynamic bindings are automatically undone when the sub returns
(see C<bind_global> for details). However, C<unbind_n> is useful
when the dynamic binding lifetime ends before the exit from the sub
(but see also C<popmark>).

This also leaves room for a bind_global op with an "(in PMC, in PMC)"
signature where $1 is a symbol table object, in case Parrot ever defines
such a thing.

5. Implementation should be done in phases.

Here's an outline of subsequent work, which also serves as a summary
of where I need feedback:

1. If the popaction proposal is acceptable, implement that. This is
orthogonal to dynamic binding, so it could be skipped, but logically it
seems to belong as part of the control stack semantic cleanup.

2. If the instruction interface to dynamic binding is acceptable,
finish the detailed design. This a matter of defining what a Location
object is [5], and how rezipping works in detail, but that's not
trivial.

3. Once the final design is accepted, implement rezipping as it
applies to current uses of the control stack, i.e. without dynamic
bindings. This is worthwhile on its own, as it should also fix some
current bugs relating to incorrect rezipping.

4. If all goes well, implement dynamic binding. This could even be
broken further into two stages, one for implementing Location and
GlobalLocation (to handle variable binding), and a separate stage for
StructureLocation (to handle binding of arbitrary arrays or hashes).

6. Notes.

[1] I would be very interested to hear of exceptions [6]. Tcl,
perhaps?

[2] Stack marking shares this constraint only because it operates on
the control stack. One could design a marking mechanism for the
user stack that didn't have this limitation. In fact, there is a
bug in popmark that is equivalent to the one I found yesterday in
clear_eh, because popmark doesn't properly limit itself to its own
context. I was expecting to argue that pushmark/popmark should be
eliminated instead of fixed, but I've changed my mind, though the
fix should still be part of the rezipping implementation.

[3] Admittedly, this is not strictly necessary. In order to be useful,
the action almost has to be a closure, so one can always change the
behavior of the action by tweaking a lexical variable. This also
allows one to disable execution just by returning immediately, but
it seems cleaner to say what you mean directly.

[4] This is not strictly true; you could implement a sort of a "PASM
longjmp" by doing a popmark (to get rid of intervening bsr return
addresses) followed by a goto -- but would you really want to?

[5] I imagine Location objects will also be useful for representing
lvalues internally in Perl6, for cases where they need to be
created in one place and stored into in another, so I should at
least survey the field in order to ensure that the model is
adequate.

[6] Pun intended.

Steve Gunnell

unread,
Jan 8, 2006, 2:02:37 AM1/8/06
to perl6-i...@perl.org
Hi,

I'm sitting here thinking about cross language calls and what I don't
see anywhere is a prohibition that stops a context from popping a
handler or action or whatever that it didn't place there.

Is there an intent to prohibit or restrict this kind of behaviour?

It also seems to me that with coroutines and threads you might have
diverging contexts with different active exceptions and actions. So is a
stack the correct model for this or are we talking a hierarchical chain
of contexts each with a stack. Sort of like the user stack I guess.

Cheers,

Steve Gunnell

Bob Rogers

unread,
Jan 8, 2006, 11:05:26 AM1/8/06
to Steve Gunnell, perl6-i...@perl.org
From: Steve Gunnell <stev...@westnet.com.au>
Date: Sun, 08 Jan 2006 15:02:37 +0800

Hi,

I'm sitting here thinking about cross language calls and what I don't
see anywhere is a prohibition that stops a context from popping a
handler or action or whatever that it didn't place there.

Is there an intent to prohibit or restrict this kind of behaviour?

Yes, indeed. In fact, this is more or less a requirement of the
existing implementation. It is not always strictly enforced (see my
"[PATCH] Restrict clear_eh to handlers in the current context" message
of a few days ago), but the intent is to make it so.

It also seems to me that with coroutines and threads you might have
diverging contexts with different active exceptions and actions.

I am assuming the ithreads model, where there is no sharing of contexts
between threads. I believe that is how threads are currently
implemented in Parrot . . .

So is a stack the correct model for this or are we talking a
hierarchical chain of contexts each with a stack.

Yes, it is not strictly a stack, as contexts can have multiple active
calls. (The term "spaghetti stack" comes to mind.) Part II of the
design will cover this in more detail. But again, this is already a
feature of the current implementation.

Sort of like the user stack I guess.

Cheers,

Steve Gunnell

Actually, I would have assumed that the user stack operated more or less
independently of the call chain, but I see it is kept in the context
structure and not the interpreter. Hmmm . . .

-- Bob

Leopold Toetsch

unread,
Jan 8, 2006, 11:21:25 AM1/8/06
to Bob Rogers, perl6-i...@perl.org
Bob Rogers wrote:

> Actually, I would have assumed that the user stack operated more or less
> independently of the call chain, but I see it is kept in the context
> structure and not the interpreter. Hmmm . . .

I think so too. It's just in the context as it was there, w/o further
recent discussion.

Comments welcome

> -- Bob

leo


Bob Rogers

unread,
Jan 8, 2006, 2:45:12 PM1/8/06
to Leopold Toetsch, perl6-i...@perl.org
From: Leopold Toetsch <l...@toetsch.at>
Date: Sun, 08 Jan 2006 17:21:25 +0100

Bob Rogers wrote:

Comments welcome

leo

Sure enough, it doesn't behave as I would expect across call and return
(see patch). But I'm not sure how I would expect it to behave for
coroutines, so my earlier comment ("Hmmm . . .") still stands. My gut
feeling, though, is that calling a continuation should not affect the
user stack, so for consistency the same should probably be true of
coroutines. In that case, the model becomes "one user stack per
thread," and the user stack can be moved to the interpreter struct.

On the other hand, a "one user stack per coroutine" model doesn't
seem that much harder, and might be more tractable for users/compilers
to deal with. I think I could be persuaded either way.

Since nobody has complained, though, it seems safe to deal with this
after implementing rezipping for the dynamic context. It could be that
rezipping would also have to do something with the user stack, but it
seems unlikely. It seems to me that both the "per coroutine" and "per
thread" models would need to have the user stack in the interpreter in
order to preserve its state across call/return.

-- Bob

user-stack-tests.patch

Steve Gunnell

unread,
Jan 9, 2006, 5:46:42 AM1/9/06
to perl6-i...@perl.org
On Sun, 2006-01-08 at 11:05 -0500, Bob Rogers wrote:
> From: Steve Gunnell <stev...@westnet.com.au>
> Date: Sun, 08 Jan 2006 15:02:37 +0800
>
> Hi,
>
> I'm sitting here thinking about cross language calls and what I
don't
> see anywhere is a prohibition that stops a context from popping a
> handler or action or whatever that it didn't place there.
>
> Is there an intent to prohibit or restrict this kind of behaviour?
>
> Yes, indeed. In fact, this is more or less a requirement of the
> existing implementation. It is not always strictly enforced (see my
> "[PATCH] Restrict clear_eh to handlers in the current context" message
> of a few days ago), but the intent is to make it so.
>
> It also seems to me that with coroutines and threads you might have
> diverging contexts with different active exceptions and actions.
>
> I am assuming the ithreads model, where there is no sharing of
contexts
> between threads. I believe that is how threads are currently
> implemented in Parrot . . .

Well I haven't looked at the code but I was thinking along the lines
that Threads and Coroutines inherit their current context from
somewhere ... So which bits can they safely inherit and which bits must
be masked?

Also how does all this interact with the lightweight context that Perl
Blocks need?

> So is a stack the correct model for this or are we talking a
> hierarchical chain of contexts each with a stack.
>
> Yes, it is not strictly a stack, as contexts can have multiple active
> calls. (The term "spaghetti stack" comes to mind.) Part II of the
> design will cover this in more detail. But again, this is already a
> feature of the current implementation.
>
> Sort of like the user stack I guess.
>
> Cheers,
>
> Steve Gunnell
>
> Actually, I would have assumed that the user stack operated more or
less
> independently of the call chain, but I see it is kept in the context
> structure and not the interpreter. Hmmm . . .
>
> -- Bob

Judging by comments about Forth on Parrot this behaviour may have been a
feature of an earlier calling convention where the stack was used to
pass parameters.

BTW I am happy to be told RTFM if there is an M. 8-)

Thanks,

Steve Gunnell


Bob Rogers

unread,
Jan 9, 2006, 10:32:12 PM1/9/06
to Steve Gunnell, perl6-i...@perl.org
From: Steve Gunnell <stev...@westnet.com.au>
Date: Mon, 09 Jan 2006 18:45:24 +0800

On Sun, 2006-01-08 at 11:05 -0500, Bob Rogers wrote:

> From: Steve Gunnell <stev...@westnet.com.au>
> Date: Sun, 08 Jan 2006 15:02:37 +0800
>

> . . .


>
> It also seems to me that with coroutines and threads you might have
> diverging contexts with different active exceptions and actions.
>
> I am assuming the ithreads model, where there is no sharing of contexts
> between threads. I believe that is how threads are currently
> implemented in Parrot . . .

Well I haven't looked at the code but I was thinking along the lines


that Threads and Coroutines inherit their current context from
somewhere ... So which bits can they safely inherit and which bits must
be masked?

The dynamic context is defined solely by the control stack, which is
intimately tied to the call chain. Ithreads means that there is no
sharing of context between threads, because each thread starts with an
empty control stack. Coroutines would share whatever context they
inherited in common by virtue of being started from the same sub call in
the running program. But each coroutine (or non-coroutine) would not
need to know about the others, if there are any, so there's no need for
"masking."

Also how does all this interact with the lightweight context that Perl
Blocks need?

You've lost me. We were talking about dynamic context here, as opposed
to anything that might pertain solely to a block (which would therefore
make it lexical).

-- Bob

0 new messages