hoodwink.d enhanced
RSS
2.0
XHTML
1.0

RedHanded

Using Hashes to Memoize #

by why in bits

Whoa, here’s a meme I totally overlooked a month ago. But Mauricio’s brought it back for us with his log parsing script. Originally, the idea was just to use Hashes to cache method results. But Mauricio’s combined it with Marshal so you can cache memoization on disk!

His example is perfect:

 iptocountry = Hash.new do |h,ip|
   h[ip] = `geoiplookup #{ip}`.chomp.gsub(/^GeoIP Country Edition: /,"")
 end
 iptocountry.update Marshal.load(File.read("geo.cache")) rescue nil

The iptocountry hash pairs up IP address and their geographic location. Presumably, many of Mauricio’s visitors return often, so this saves his computer a lot of exhaustion. He saves old pairs in geo.cache and if he ever asks the Hash for an IP that hasn’t been looked up, it goes to the shell command to lookup the location. Brilliant!

As I was saying, this idea has been building. MenTaL talked about using Hash.new (definitely read it, it’s very simple!) and Doug Landauer expanded into how to use Hash.new to memoize the Fibonacci series. This is going to replace Ajax in 2006.

said on 30 Dec 2005 at 15:10

Nice. Of course, why not serialize with YAML ?

I remember twiddling with serialization back in the pre-YAML days. I came to regret using Marshal one day when I upgraded Ruby and then all of my goldfish were dead.

Sure, Marshal has its place, but when you’re just storing a hash of strings? YAML is the thing.

said on 30 Dec 2005 at 17:57
MenTaLguY: there’s no gain for me in using YAML , since I don’t really want to read the cache, and I’m not concerned about it being invalidated by a Ruby upgrade (I’ll be happy to let my box work on that for 5 minutes when Ruby 2.0 is released ;-). But there are two reasons why I prefer not to use it in this case:
  • require ‘yaml’ generates lots of warnings with RUBYOPT =-w
  • YAML .dump is about ~50 times slower than Marshal, which means that it takes over 4 seconds to serialize the cache, instead of “nothing” in human time. This makes a difference since I normally run that script “interactively” (i.e. in blocking mode).

Comments are closed for this entry.