Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: [perl #37940] substr and memory issues

7 views
Skip to first unread message

Joshua Isom

unread,
Dec 14, 2005, 6:59:01 AM12/14/05
to parrotbug...@parrotcode.org
Forgot the file.

revcomp.pir

Joshua Isom

unread,
Dec 14, 2005, 6:52:50 AM12/14/05
to bugs-bi...@rt.perl.org
# New Ticket Created by Joshua Isom
# Please include the string: [perl #37940]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org/rt3/Ticket/Display.html?id=37940 >


For the reverse-compliment benchmark, I've gotten it working(albeit not
well), but with one major caveat. Since, to my knowledge, parrot has
no equivalent of perl's tr///, I implemented it using in place
substitutions with substr. Sometimes I get parrot to panic and
quit(and might I add that the error message is pretty fixed from my
look at the code), but other times it's able to keep going. Parrot
would slowly build up memory taken, and going to over 130 megs real,
1.3 gigs virtual, and my last running, 700 memory regions. Then,
parrot will either panic and die because it can't get any more memory,
or either parrot or the system readjusts the memory and reduces it to
about 8 megs real, 650 megs virtual, and only 50 regions, and continues
on without a trouble, no more increase in regions of memory, not
exceeding ten megs of real memory and sometimes less than four, and a
very slow increase of virtual.

I'm using the substr_s_i_i_s variation, and parrot r10448. I noticed
in src/memory.c there's a DETAIL_MEMORY_DEBUG that's not defined
anywhere, and only mentioned in src/memory.c, but when I tried defining
it, I couldn't get parrot to build. Turning on trace "fixes" the
memory issue, from the command line or from within the pir. I haven't
noticed any problems with any of the other substr variants.

I've attached the file, with the work around commented out and with the
substr. The input for it is via stdin, and is the output from fasta,
with n = 2,500,000, which comes out to a nice 2.4 meg file, which is
coincidental because after 24 lines of input to process I get the
choke.

Oh, and at the moment I'm at ten minutes in, about 200 lines, out of
41671. Perl takes half a second. I'm not going to see if it'll finish
in under 30 hours.

Leopold Toetsch

unread,
Dec 14, 2005, 12:08:03 PM12/14/05
to perl6-i...@perl.org, bugs-bi...@netlabs.develooper.com
Joshua Isom (via RT) wrote:

>
> For the reverse-compliment benchmark, I've gotten it working(albeit not
> well), but with one major caveat. Since, to my knowledge, parrot has
> no equivalent of perl's tr///,

We will need tr///, if we want that benchmark complete in reasonable time.

> .. I implemented it using in place

> substitutions with substr. Sometimes I get parrot to panic and
> quit(and might I add that the error message is pretty fixed from my
> look at the code), but other times it's able to keep going.

Ugly. COW ping pong is the problem.

We have:

$S1 = substr $S0, j, 1

that creates a COW copy of length 1 and sets the COW flag on both
strings, because these strings are sharing the same body.

Then inside Switch():

substr $S0, j, i, .to

This has to modiy the string $S0 in place, therefore it creates an
un-COWed new body by reallocationg it.

I've no fast solution for that problem directly just a better workaround:

Instead of the first substr, use

ch = ord $S0, j

ans use the INTVAL value inside .from and as the left argument to
Switch. You should also avoid the extra $S0 copy and just work inside
'line' and substr in width chunks just for printing.

leo

Leopold Toetsch

unread,
Dec 14, 2005, 12:41:40 PM12/14/05
to p6i List, bugs-bi...@netlabs.develooper.com

On Dec 14, 2005, at 18:08, Leopold Toetsch wrote:

> ... You should also avoid the extra $S0 copy and just work inside

> 'line' and substr in width chunks just for printing.

Oops. That would create the same problem but worse - 'line' aka 'seq'
would be reallocated.

leo

Roger Browne

unread,
Dec 14, 2005, 4:40:37 PM12/14/05
to perl6-i...@perl.org
Leopold Toetsch wrote:

> We will need tr///, if we want that benchmark complete in reasonable time.

Better still, we could add some new opcodes, each of which performs one
entire shootout benchmark :-)

Regards,
Roger Browne

Leopold Toetsch

unread,
Jan 9, 2006, 3:27:01 PM1/9/06
to perl6-i...@perl.org, bugs-bi...@netlabs.develooper.com

On Dec 14, 2005, at 12:52, Joshua Isom (via RT) wrote:

> [ substr related PANIC ]

I've now a rather simple test case: a string reverse_inplace that shows
some parts of the problem.
(You might ulimit -v yourself to a few 100 Megs before running the
program)

.sub main :main
.local string s
.local int N
N = 500000
s = repeat '0', N
$S1 = repeat '1', N
s .= $S1
$I0 = length s
print_item 'len'
print_item $I0
print_newline
rev(s)
print "ok\n"
.end
.sub rev
.param string str
.local int i, len
len = length str
dec len
i = 0
beginwhile:
if i >= len goto endwhile
$S0 = substr str, i, 1 # (1)
$S1 = substr str, len, 1
substr str, i, 1, $S1 # (2)
substr str, len, 1, $S0
dec len
inc i
goto beginwhile
endwhile:
.end

The C<substr (1)> creates a COW reference pointing inside the B<str>.
This means that the C<PObj_COW_FLAG> is set on both strings, because
they now share the same string body.
The C<string_replace (2)> wants to modify C<str>, finds it COWed and
thus allocates a new string body in C<Parrot_unmake_COW()>.
In above code we'd get C<len/2> reallocations of the string memory.
But as C<$S0> and C<$S1> still hold references to the old buffer, it
can't be reclaimed and we end with a C<PANIC: Out of mem!>.

Why it's actually PANICing isn't totally clear, but I think the reasons
is that the GC tries to create some extra storage for increased memory
demand, which eventually is too much.

More investigations are ver welcome.

leo

Leopold Toetsch

unread,
Jan 10, 2006, 8:36:00 AM1/10/06
to perl6-i...@perl.org, bugs-bi...@netlabs.develooper.com
Leopold Toetsch wrote:
>
> On Dec 14, 2005, at 12:52, Joshua Isom (via RT) wrote:
>
>> [ substr related PANIC ]

After a lengthy session with gdb and some added debug prints, I've now
tracked down and fixed the reason for the memory panic. The sweep code
tried to avoid freeing buffers, if there were plenty of PMCs left
usable, which of course is totally bogus for string-only code. I've
removed that part.

Anyway, the COW-ping-pong issue is *not* solved, just the PANIC.
Reversing a string of 100000 bytes (1/10th of the example code) with the
posted rev.pir takes:

time 30 s [1]
GC runs 7147
memory collected 1.8 GByte

[1] unoptimized build, default runcore, but that really doesn't matter ;)

leo

0 new messages