PickPocket, a Marshal Ransack Hack #
Today’s hack is a Marshal hack, which is a highly common (but quite untapped) language that also has no formal layout beyond what Ruby’s source code has to say. Most times you only hear about slight changes between major Ruby versions (1.6 -> 1.8) when something goes kaput. No Ruby books I know of go near dissecting it. And, strangely enough, Minero’s classic Ruby Hacking Guide doesn’t even touch it.
We are voodoo doctors. Take us to the center of the marsh.
Tasting Just a Few Sample Bytes
Marshal is the ultra-slim encoded bytespeak that Ruby can pull out when siphoning objects through a skinny straw. For when your Ruby shares a malt with someone else’s Ruby. Yeah, well, it’s actually very easy to pick up.
>> Marshal.dump("Koichi") => "\004\010\"\vKoichi"
Say, that’s not too bad. We dumped out a little Marshal and you can at least see plain old Koichi
in there. Other than that there are just four other characters, starting with "\004\010"
, which is the Marshal 4.8 (current) marshal header.
The other characters are a quote ("
), which means “look, a string is coming up”. And \v
, which is a string length.
>> "\v"[0] - 5 => 6
Yeah, it takes a little math, but there you can see it: \v
means a string length of 6. So, in summary: this is a Ruby 1.8.4 Marshal string containing a string with six characters and they are Koichi
.
Skipping Bytes
Okay, so, it turns out that everything that gets Marshalled comes with these offset bytes (like \v
above) which measure strings and hashes and arrays and floats. (So does Python’s pickle and many other binary serialization formats.)
The PickPocket hack is based on these two ideas:
- You can skip through marshalled objects pretty easily.
- By slapping a header in the middle of a marsh, you can load only certain fragments.
Take this array:
>> Marshal.dump ["Goto80", "Treewave", "YMCK"] => "\004\010[\010\"\vGoto80\"\rTreewave\"\tYMCK"
This Marshal reads out loud like this:
Header, Array(3)[ String(6), String(8), String(4) ]
So, if we want to load the second element, we can do some math to find where that element lives in the marsh. Skip the header (2 bytes), skip the Array counter (2 bytes), skip the first string (2 bytes + 6 bytes)... which leaves us at position 12.
Now, will it let us load from the middle of the Array? Or what?
>> str = "\004\010[\010\"\vGoto80\"\rTreewave\"\tYMCK"[12..-1] => "\"\rTreewave\"\tYMCK" >> Marshal.load(str) TypeError: incompatible marshal file format (can't be read) format version 4.8 required; 34.13 given
Oh, wait! The header!
>> Marshal.load("\004\010" + str) => "Treewave"
Hey, klawboom!! That worked. It loaded the object and ignored anything after it.
Picking Pockets
The final part of this hack is to come up with the code for walking down into the marsh and coming up with the object we want. Here’s what I’ve got in mind.
Let’s use, as our sample corpus, a marshalled dump of the RubyGems repository. It’s of a nice, wieldsome size (2M) and it would be nice to reach in and grab one gem.
>> PickPocket(File.read('rubygems.m')).gems['hpricot-0.4-mswin32'].get => #<Gem::Specification:0x811b124 @name="hpricot" ...>
Instead of actually loading all the objects in the dump, this query is executed when the get
method is run. It’ll search the rubygems.m file for a gems
instance variable. And then it’ll search that variable for an 'hpricot-0.4-mswin32'
key.
So far, it all fits in about a hundred lines of code: pickpocket.rb. More marshal hacking tomorrow.
Update: The RubySpec wiki has started a page on the Marshal format which looks to be a good start.
Riduidel
Doesn’t it look like a plain old pointer ladder throwed just in the middle of our Ruby jewelry store ? Well, as far as it is done night-time (which is needed for pick-pocketing), that’s first class robbery.
hgs
If we are talking robbery and devious behaviour, I can’t help wondering about buffer overrun attacks, though unless this stuff gets executed I can’t see how to implement, or more importantly, detect such an attack.
(Oh, this interface has a spelling checker now. It doesn’t like my British spelling of /behavio(?:u?)r/.)
Daniel Berger
I’m curious what you think of the notion of Marshal.dump creating a subclass of String. See ruby-talk:76055 for more fo what I mean.
With a MarshalledString class, we could add methods that would make it easier (or at least more obvious) to do what you’re doing here.
MenTaLguY
Hmm, you think maybe rewriting marsh’d strings is the way forward with proxying in sandbox?
J`ey
Clever! What I have come to expect!
J`ey
Can you read on a certain part of a file? If you could, then you delay reading until the asking for something.
hgs
Re: update. So that’s the weirdness with packed integers! And the ; construct is half-way to Lempel-Ziv encoding. Impressive. Thank you for this.