Stuffing Your Hand Down the Disposal #
Since we have our heads tilted in the direction of ruby+gc, Eustáquio “TaQ” Rangel posted concerns about Ruby’s garbage collector yesterday on Ruby-Talk which led to an interesting bit of code from Yohanes Santoso for watching the garbage collector slurp up variables gone from scope.
Specifically this code which creates a bunch of objects and watches them fade from existence.
class CustObj attr_accessor :val, :next def initialize(v,n=nil) @val = v @next = n end def to_s "Object #{@val} (#{self.object_id}) points to " \ "#{@next.nil? ? 'nothing' : @next.val} " \ "(#{@next.nil? ? '':@next.object_id})" end end def list print "Listing all CustObj's with ObjectSpace\n" print "#{ObjectSpace.each_object(CustObj) {|v| puts v}}" \ " objects found\n\n" end begin # start a new scope so we can exit it later c1 = CustObj.new(1,CustObj.new(2,CustObj.new(3))) c4 = CustObj.new(4,CustObj.new(5)) c6 = CustObj.new(6) c1.next.next.next = c1 # comment this and check again puts "### Initial" list c1 = nil c4.next = nil GC.start puts "### After gc, but still within declaring scope" list end puts "### Exited the scope" list GC.start # here I want c1 disappears puts "### After gc, outside of declaring scope" list
Here’s a great script for understanding how the collector works and how important scope is to the collector. The important variable to watch is Object #1. Notice how, even after its set to nil
, its object is still around because it is referenced by Object #3. And it’s still around after the scope is closed. But once the scope is closed and the GC is manually run, Object #1 disappears.
The point of this illustration isn’t to encourage you to run GC manually. It’s to encourage you to use scope to control the variables you’re hanging on to. Even if it means enclosing some stuff in begin..end
.
Here’s a little watcher class, based on the above that you can use to monitor the presence of objects:
class GCWatcher def initialize; @objs = []; end def watch( obj ) @objs << [obj.object_id, obj.inspect] obj end def list puts "** objects watched by GcWatcher **" @objs.each do |obj_id, obj_inspect| if ( ObjectSpace._id2ref( obj_id ) rescue nil ) puts "#{ obj_inspect } is still around." else puts "#{ obj_inspect } is collected." end end puts "** #{ @objs.length } objects watched **" puts end end
Create a GCWatcher
object and fill it with references using the watch
method. The object will only keep track of object IDs, so it keeps no reference to the actual objects. See a complete sample here. (Inspired by ruby-talk:147351.)
TaQ
Just for comparison reasons, that’s how the original code looks like: http://redhanded.hobix.com/ruby-talk/147345
Then Yohanes fixed it with the begin … end stuff. And what a difference! No more lost objects there. :-)
MenTaLguY
Of course, you could also write
GCWatcher
usingWeakRef
:IIRC internally
WeakRef
just does theObjectSpace._id2ref
dance too, but I think it’s slightly more readable.twifkak
Am confused… I thought
begin...end
and the like had the exact same scope as their containers (i.e. thec1
variable is available outside). What’s this craziness with the GC and the it not being so?twifkak
That is to say, I understand the circular reference and conservative GC bit. I just don’t understand why escaping what seems to me to be a non-existent scope should nudge the GC into cleaning it up.
MenTaLguY
Actually, the OP was abusing the definition of “conservative GC”. Ruby’s GC would be properly termed precise, rather than conservative.
Conservative collectors look at raw memory and assume that any data that might be a pointer is one. A precise collector is one, like Ruby’s, that knows the structure of its objects and where the references to other objects lie.
Matt
Yeah, I second the confusion. begin..end doesn’t seem to start a new scope, so what’s going on? Also, I’ve always heard Ruby’s collector referred to as conservative… I think even the pickaxe book says that. So what’s going on here?
why
Variables declared inside a
begin..end
or inside a block are block local, they perish with the close of the block.George
I think the GC is precise when it comes to object to object references, but conservative when it comes to stack to object refs: some of these refs live in the C stack, and Ruby doesn’t know the layout of that so it has to assume that anything that looks like a pointer is one.
Matt
why – I thought that was the case, but it only seems to work with blocks, not begin..end
Timmy
irb(main):001:0> x NameError: undefined local variable or method `x' for main:Object from (irb):1 irb(main):002:0> begin irb(main):003:1* x = 1 irb(main):004:1> end => 1 irb(main):005:0> x => 1
Timmy
Sorry about that. I thought I was getting linebreaks with my ‘code’ tag. Here it is again (doesn’t this show that variables created inside begin..end survive after the end?):
Matt
Yeah, that was the exact same result I was getting. Is this something that changed with 1.9 perhaps?
Timmy
No, I get the same results when I do it on 1.9.0.
In a message on ruby-talk, Yohanes Santoso makes a distinction between “declaring scope” and “variable scope” which I suspect has something to do with this. Anybody able to elaborate on this distinction?
Timmy
Ok, I’m getting very sad now. When I run the CustObj code listed above I never get rid of Object #1. Even after exiting the scope and executing GC.start, it still persists. The only object that ever disappears is Object #5. What is going on here – why are we getting different results? (And while we’re at it, why wasn’t Yohanes Santoso able to reproduce TaQ’s results in the ruby-talk thread in question?)
mfp
First of all: don’t expect any of this to be 100% reproducible. Your C stack is probably different from mine. The above results illustrate this too.
why
Okay, so the
begin..end
doesn’t introduce its own local variables. It doesn’t leveragelocal_push
. Is it effective in clearing GC, though? And why? Look through Ruby’s source, it looks like thebegin..end
is just a collection of nodes, nothing more.MenTaLguY
Can someone verify that Ruby does in fact walk the C stack? I’d be surprised if it did, as that would seriously reduce its portability…
mfp
MenTaLguY
Ahh, so it does. And indeed I suspect that is the reason for all the various interesting GC behavior being described here.
mfp
We can easily go beyond the mere suspicion:
Let’s run that under gdb:
We set a breakpoint at rb_big_abs in order to get the object_id of the first object and set a conditional breakpoint based on that:
Now we can see when the first object is about to be marked:
Breakpoint reached. Let’s see what caused that object to be marked (i.e. where the reference came from):
Alright, there was a ref in the C stack. Let’s cont until the current stack frame returns…
Isn’t this beautiful? obj1@1075648652 references obj2@1075648672, which is marked in gc_mark_children, which iterates over the entries in the iv_tbl. Likewise, obj2@1075648672 points to obj3@1075648692, which points back to obj1@1075648652. This is why we ran into the gc_mark breakpoint for the second time (!).
Comments are closed for this entry.