back to _why's Estate
A massive handshake to Yukihiro Matsumoto, who has completed version 1.8.0 of the agile Ruby language. This work has been in progress for several years now, since Matz forked the 1.7 branch in mid-2001.
Many of 1.8.0's new features have been long awaited by the Ruby crowd. I'm just going to run through a few of my favorite new features that I've longed to have in a stable Ruby version. I'm not going to include all of 1.8.0's features, only the ones that have stuck out to me. If I make a terrible omission, please let me know. I'd love to build on this.
(Incidentally, a really nicely formatted write-up of changes has been done by Michael Granger.)
First, I'll detail a number of changes that affect the core set of classes that Ruby operates upon. Matz has spent a lot of time reviewing and cleaning these classes in detail. For example, Matz' work on cleaning up the Block/Proc classes took him several months to decide upon and implement. I can tell you that he is really thinking through each small change made in Ruby.
allocate
methodScanning through RAA, you will find many libraries which use a variety of strange techniques for creating objects without calling class constructors. The ability to build a blank class instance is incredibly useful. Perhaps you're loading data from a database and you want to populate classes based on their property set. Perhaps you're writing a serialization library. Perhaps you want to copy only certain parts of an object. Ultimately, you want control over the allocation and initialization of an object.
Historically, SOAP4R has used Marshal.load
to create a blank object. This technique works well, but involves some sketchy logic to assemble a marshalled string.
Providing that our class name is a String in variable class_name
and a Hash of properties and their values is stored in class_props
:
msh = "\004\006o:%c%s\000" % [ class_name.length + 5, class_name ] o = ::Marshal.load( msh ) class_props.each_pair { |k,v| o.instance_eval "@#{k} = v" }
Another common way of dealing with this is found in Chris Morris' clxmlserial library. This approach involves shorting the constructor by temporarily alias
ing it.
Here's an allocate
technique which leverages alias
:
class Class def allocate self.class_eval %{ alias :old_initialize_with_args :initialize def initialize; end } begin result = self.new ensure self.class_eval %{ undef :initialize alias :initialize :old_initialize_with_args } end result end end
While both are neat bits of code, they are both obviously circumvention hacks. The allocate
method has been added to allow proper bypassing of the constructor.
o = Object::const_get( class_name ).allocate class_props.each_pair { |k,v| o.instance_eval "@#{k} = v" }
This method also comes with an accompanying API call for extensions:
VALUE rb_obj_alloc( VALUE klass )
to_str
If there is a primary thrust to Ruby 1.8.0 (and the releases building up to it), it is duck typing. We differentiate our objects by the methods that they possess. You should see more and more respond_to?
in Ruby code rather than kind_of?
. This concept is a core mantra of Ruby, right alongside the principle of least surprise (POLS).
Whereas the phrase "duck typing" doesn't appear in 1.6.x era versions of the PickAxe, you can see it popping up all over ruby-talk: [53644], [56614], [76059]. (You might also check out Types in Ruby.)
Let's take a bit of code we might have used previously in Ruby:
# Read entire contents of a file def read_file( file_name ) # Ok. It's a file name. if file_name.is_a? String File.open( file_name ).read # But what if it's an IO object? # Let's read from it! elsif file_name.is_a? IO file_name.read end end
The above code is trying to abstract away the reading of data by handling read
operations inside the method. Since we're so used to Java and Python techniques, we tend to identify an object based on what classes it descends from. We look at the file_name
var and figure that if we check for descent from a String
class, then we are covered if someday we decide to extend String
class on our own and use that as our file_name
.
With to_str
, Matz is giving us a simpler way of demonstrating that our classes can be used as strings directly. Nearly all builtin methods use to_str
to determine if an object is (or can be used as) a String. Think of it: if you extend the String
class, then you have to write alternate methods (sub!
, length
, append
, etc.) tailored to your needs.
Instead, simply write a to_str
method and we can treat such an object like a String
in our example:
def read_file( file_name ) if file_name.respond_to? :to_str File.open( file_name ).read elsif file_name.respond_to? :read file_name.read end end
So why not use to_s
? Because to_s
coerces objects into strings. So, to_str
is an implicit cast, whereas to_s
is an explicit cast.
Think of timestamps. We want to be able to easily convert a timestamp into a String for printing:
puts "Time.now: " + Time.now.to_s #=> Time.now: Mon Aug 04 13:37:43 MDT 2003
But we don't want to load a File based on a timestamp:
File.open( Time.now ) #=> TypeError: cannot convert Time into String from (irb):3:in `initialize' from (irb):3:in `open' from (irb):3
Generally, we just don't need a timestamp to act as a String. So we use to_s
to explicitly convert it.
If you're having a hard time remembering which is which, I would remember that there is a reason that to_s
is shorter. First, it implies that the object isn't really much of a string, so we're only using the first letter 's'. Also, to_s
is shorter because more objects will have to_s
methods, so you'll end up typing it more frequently.
With to_str
, we're tagging an object as much closer to being a string, so we give it the first three letters. It's almost half of a string!
If a block is passed to Class.new
or Module.new
, the block is executed within the context of the class or module. This is great for creating anonymous classes and modules without needing to call eval. Entire anonymous classes and modules can now be created with a syntax that isn't far off from the normal class
and module
declarations.
m = Module.new do def test_me "called <module>::test_me" end end class NewTest; end NewTest.extend m NewTest.test_me #=> "called <module>::test_me"
As for creating anonymous classes:
c = Class.new do def test_me "called <klass>::test_me" end end c1 = c.new c1.test_me #=> "called <klass>::test_me"
Here's a small but significant change. Ruby 1.8 now allows you to declare classes and other constants using the full path to the constant. Previously, you had to surround such declarations with the module declaration. They also has to be nested for each module declaration.
Here's a declaration for the Foo::Bar
class:
module Foo class Bar; end end
Modules must still be declared in 1.8 as shown above. But we can now add methods to the Foo::Bar
class without nesting the method declaration in a module declaration:
class Foo::Bar def baz; end end
The old syntax is still valid and acceptable:
module Foo class Bar def baz; end end end
I will mention a few changes to Proc
, simply because it's the such a useful construct and slight changes in behavior help indicate its future.
With respect to return
and break
, you used to treat a proc
(or lambda
) the same as a block. Let's examine the following code:
def proc_test( num ) 3.times do |i| return i if num == i end return 0 end proc_test( 2 ) #=> 2
In the above, we have a return
inside the iterating block. It makes perfect sense for a block to return from the caller. Blocks have fairly transparent scoping. (By the way, if you haven't noticed, everyone has their own ideas about how block scoping should work.)
In the case of a proc
(or lambda
), Ruby is beginning to protect their scope more than with a block. In Ruby 1.8.0, both break
and return
exit the scope of the proc
, but do not exit the scope of the caller.
def proc_test( num ) p = proc do |i| return i if num == i end 3.times do |x| p.call( x ) end return 0 end proc_test( 2 ) #=> 0
In 1.6.8, proc_test( 2 )
would return 2. You can see how the proc
is becoming less like a block and more like an anonymous method.
I'd also like to mention a fix that was made in 1.6.8, but still bites me from time to time. I'm sure many of you will encounter this, not sure what to make of it.
Suppose we have an event handling system in our GUI system. A system for handling mouse clicks. We have a method (simulate_click
) which we can use to test calling all the click handlers and a method (add_click_handler
) for introducing new handlers. We'll also add a click handler which catches the the click if it leaves the upper corner of the screen and prevents the event from bubbling.
def simulate_click( x, y ) @click_handlers ||= [] @click_handlers.each do |h| return false unless h.call( x, y ) end return true end def add_click_handler( &block ) @click_handlers ||= [] @click_handlers << block end add_click_handler do |x, y| return ( x > 25 && y > 25 ) end
Looks harmless? Well, this code is just dying to break. Any call to simulate_click
will throw a LocalJumpError
.
simulate_click( 60, 70 ) #=> LocalJumpError: return from proc-closure from (irb):78:in `call' from (irb):78:in `simulate_click' from (irb):77:in `each' from (irb):77:in `simulate_click' from (irb):81
Our problem is that we're dealing with an orphaned block. Several situations can create an orphaned block, but the most common is to receive a block through a method call, assign it to a variable and use it outside of the original scope. Also, if a block crosses to another thread. It becomes difficult to tell how that return
was intended. (Also, avoid using break
or retry
directly inside of an orphaned block.)
So how do we fix our script? Get rid of return
!
add_click_handler do |x, y| x > 25 && y > 25 end
Alternatively, pass in a Proc
to give the return
some context. (Note that just performing a to_proc
conversion on an orphaned block won't do the trick.)
The moral of the story is: think about what you're doing when you use return
, break
or retry
in your code. For iterators, don't use the above technique. Rather, use block_given?
and yield
, which prevent the block from becoming orphaned.
Now, let's cover some new methods found in Ruby's core classes. Pay special attention to the additions in the Array class. Many of those methods will become a crucial part of your development, should you learn them.
Don't get the wrong idea about Ruby's new allocate
method (described above). No one's picking on constructors. Constructors have a solid spot in Ruby's future. In fact, a new constructor has been added to every Object.
In Ruby, the initialize
method is called when an object is created with new
:
class Person attr_accessor :name, :company, :phone, :created_at def initialize( name, company, phone ) @name, @company, @phone, @created_at = name, company, phone, Time.now end end bill = Person.new( 'Bill Bobson', 'The Mews at Windsor Heights', '801-404-1200' ) bill.created_at #=> Tue Aug 05 14:09:52 MDT 2003
However, when an Object is copied with clone
or dup
, then constructor is skipped. This thwarts our timestamp mechanism above, though. What if we want to copy the basic data for the office, but reset the creation date and leave the name blank?
The initialize_copy
constructor gets called on a clone
or dup
. Ruby passes in the object to copy from. We can pick and choose what data we want to keep. Initialize some data on our own.
So, rather than starting with a duplicate and stripping out data, we assemble a blank object with the pieces we're copying.
class Person def initialize_copy( from ) @company, @phone, @created_at = from.company, from.phone, Time.now end end carol = bill.dup carol.name = 'Carol Sonbob' carol.created_at #=> Tue Aug 05 14:10:24 MDT 2003
This method is easily the most anticipated addition to the core classes. Inject allows you to introduce a single value into the scope of an iterating block. Each value returned by the block is introduced on the successive call.
We'll start with a simple example:
[1, 2, 3].inject( "counting: " ) { |str, item| str + item.to_s } #=> "counting: 123"
See if you can figure out how the above works without my explanation. Ask yourself: how does the string get passed into the block? How do the numbers get added to the string? And how does the block return the full string?
With inject
, we supply the method with a value which will accompany us as we iterate through the block. This injected value is passed into the block as the first parameter. (In the above: str) The second parameter is a value from the object we're iterating through.
The injected value is really only used on the first pass through the iterator. At the end of the first pass, inject
keeps the return value of the block and injects it into the block on the second pass. The return of the second pass is injected into the third pass, and so on.
To be clear, let's inspect the block parameters on our inject
call:
[1, 2, 3].inject( "counting: " ) do |str, item| puts [str, item].inspect str + item.to_s end # ["counting: ", 1] # ["counting: 1", 2] # ["counting: 12", 3] #=> "counting: 123"
You can see the string building. The evolution of the injected value. The inject
method is great if you are building a single value from the contents of any Enumerable type. Its uses are numerous and it has been said that inject
could replace using most of Enumerable's other methods.
The existing Enumerable#sort
method leverages a block to sort value. Two values are handed to the block and the block must compare the two items. This process can be expensive, as in the following example:
["here", "are", "test", "strings"].sort { |a,b| a.hash <=> b.hash } #=> ["test", "strings", "here", "are"]
In the above block, a number of unnecessary hashes are generated. This is evidenced if we print out the hashes as they are generated:
["here", "are", "test", "strings"].sort do |a,b| puts ah = a.hash puts bh = b.hash ah <=> bh end # 431661103 # -914358341 # -914358341 # -890696794 # 431661103 # -890696794 # 834120758 # -890696794 # 834120758 # 431661103 #=> ["test", "strings", "here", "are"]
Ten hashes in all are generated by sort
as it works to compare these values against each other. In addition, Enumerable#sort
can be difficult to master as the return value must be -1, 0, or 1, each signifying greater than, equal to and less than. I'm sure that a certain two of those are frequently confused by newcomers.
The Enumerable#sort_by
performs the now-infamous Schwartzian transform by simply asking you to generate values which can be used for sorting. In the above case, we really just want to use a string's hash
for sorting, so let's return the hash to sort_by
, which can do the rest of the work for us.
["here", "are", "test", "strings"].sort_by { |a| a.hash } #=> ["test", "strings", "here", "are"]
Much more compact. Much more efficient. Nothing fancy to remember.
Harry Ohlsen has contributed a neat bit of code, demonstrating how sort_by
can be used to sort objects elegantly. This code is so simple and readable that I had to include it here for your affections.
Assuming the Person
class introduced above in the initialize_copy
section:
persons = [ Person.new( 'Roger Andies', 'IBM', '456-101-2345' ), Person.new( 'Bill Bobson', 'Carl's Jr.', '608-121-0001' ), Person.new( 'Bill Bobson', 'The Mews at Windsor Heights', '466-404-1200' ), Person.new( 'Harvey Winston', 'ARUP Labs', '707-255-1212' ) ]
We can then sort this Array of Objects by providing sort_by
with a list of the properties to sort by, in order of precedence:
persons.sort_by { |p| [p.name, p.company, p.created_at] }
Amazingly simple! The above code will return a list of Person
objects, sorted first by name, then by company, then by creation date. So in the case of dueling Bill Bobson's, the Bill Bobson with the alphabetically early company name will prevail.
The any?
method checks an Enumerable to see if any of its values can meet a comparison. This comparison is contained within a block.
To see if any members of an Array meet a regular expression:
["Mr. Janus", "Mr. Telly", "Ms. Walters"].any? { |x| x =~ /^Ms\./ } #=> true
The all?
method checks an Enumerable to see if all of its values can meet a comparsion. Like any?
, the comparison is expressed by a block.
To see if all members of an Array meet a regular expression:
["Mr. Janus", "Mr. Telly", "Ms. Walters"].all? { |x| x =~ /^Ms\./ } #=> false
Here's a great method for sorting data. You can almost think of this as an expansion of Array#reject
which returns separate Arrays for both the accepted and rejected data.
For example, in my YAML testing suite, I execute about 150 tests and results are returned to me in the form of an Array of Hashes. Each hash has a success
key, which indicates the tests that pass and fail.
# Load my test results tests = YAML::load( `ruby yts.rb` ) # Separate tests into successes and fails success, fail = tests.partition { |t| t['success'] }
I now have a list of all successful and failing tests. This is exactly the code that I'll be using to generate HTML results for my tests.
The new transpose
method basically reverses the dimensions of a two-dimensional Array. Given an Array a1
and its transposed counterpart Array a2
: a1[0][1]
becomes a2[1][0]
, a1[1][0]
becomes a2[0][1]
and a1[0][0]
is a2[0][0]
.
# A simple two-dimensional array [[1,2,3],[3,4,5]].transpose #=> [[1, 3], [2, 4], [3, 5]] # A more complex three-dimensional array [ [[1, 2, 3], [:a, :b, :c]], [[4, 5, 6], [:d, :e, :f]], [[7, 8, 9], [:g, :h, :i]] ].transpose #=> [[[1, 2, 3], [4, 5, 6], [7, 8, 9]], [[:a, :b, :c], [:d, :e, :f], [:g, :h, :i]]]
Often Arrays are merged horizontally with means such as Array#concat
. The concat
method appends one Array onto another:
[1, 2].concat( [3, 4] ) #=> [1, 2, 3, 4]
With zip, the Arrays are merged side-by-side. This can be thought of as merging Arrays vertically, to give a new Array with an added dimension.
[1, 2].zip( [3, 4] ) #=> [[1, 3], [2, 4]]
Think of two packages of cookies. Dark and light cookies, each in cylindrical plastic wrappers. The cookies are taken out and stacked next to each other on a counter top. This way, if someone is getting ready for a party, they could easily remove the top two cookies (one light, one dark) and place them on a plate.
The zip
method is handy for placing Arrays side-by-side in a stack, so sections of these Arrays can be handled together.
Let's say we've developed a machine, a Ruby-powered robotic maid, which can sort through our milk and cookies and create snack plates for us at night. Here's the program we'll execute to give her the complete list:
milk = [:milk1, :milk2, :milk3] light = [:light1, :light2, :light3] dark = [:dark1, :dark2, :dark3] milk.zip( light, dark ) #=> [[:milk1, :light1, :dark1], [:milk2, :light2, :dark2], [:milk3, :light3, :dark3]]
At last snack time is free of the bureaucracy and organizational turmoil that has plagued it for years!
The Hash#merge
method allows you to update
a Hash, but merge
returns a new Hash. This is great for inheriting pairs from several Hashes.
Say we want to set up a few Hashes with some defaults and create a new Hash with the overiding values from an incoming Hash:
def make_point_hash( point ) center = { :x => 1, :y => 2 } big = { :r => 10 } center.merge( big ).merge( point ) end make_point_hash( { :x => 20 } ) #=> {:y=>2, :r=>10, :x=>20}
Previously, this was done with hsh.dup.update
. The Hash#merge!
method is a preferable alias for Hash#update
, indicating the destructive nature of an update
.
Ranges are an odd object really. An object that represents many object. Stands here in place of a broad set of numbers so they don't all have to be present for roll call.
The Range#step
method adds a lot of extra functionality to the Range class. I venture to say that it will become one of the most highly used Range methods in the world!
In Ruby 1.6, we had stepping with Integers:
0.step(360, 45) {|angle| puts angle }
Can you tell which of the above parameters is the limit
and which is the step
. Sure, it's not too hard. You might say let's start at zero and step to three-sixty with a stride of forty-five. The readability is slightly hindered by the method call coming between the 0 and the 360.
Try Range#step
now:
(0..360).step(45) {|angle| puts angle }
Which reads from zero to three-sixty let's take steps of forty-five. This is certainly a small change, but it certainly helps to give increased purpose to our core classes.
Ruby has excellent support for regular expressions, but we're still working on giving Ruby it's own angle on them. Matches from a string are returned as MatchData objects, which can be read as an Array.
text = "name: Jen" matches = /^(\w+): (\w+)$/.match( text ) # matches[0] = "name: Jen", matches[1] = "name", matches[2] = "Jen"
Enough regular expressions in your code, you might tire of keeping track of the index for each regular expression group. This RCR mandated MatchData#captures
, which returns an array of the captured groups from a match.
text = "name: Jen" matches = /^(\w+): (\w+)$/.match( text ) if matches key, value = matches.captures # key = "name", value = "Jen" end
Frankly, Ruby's Regexp support rules. For example, you can pass a Regexp into a String as if it were an Array index. Ruby will check for a match.
"cat"[/c/] #=> "c" "cat"[/z/] #=> nil
With Ruby 1.8.0, you can pass in an optional second argument which will return the content of the nth matching group. How nice!
re_phone = /(\d{3})-(\d{3})-(\d{4})/ "986-235-1001"[re_phone, 2] #=> "235"
For those who use the sprintf
or String#%
syntax, you can now print an object inspection with the %p
parameter.
hsh = {'x'=>1, 'y'=>1} puts "Hash is: %p." % hsh #=> Hash is: {"x"=>1, "y"=>1}.
Matz has started to open up the Ruby standard library to include support for XML, XML-RPC, SOAP, YAML, OpenSSL, unit testing, distributed computing, and much more. These libraries allow Ruby to provide a great deal functionality out-of-the-box. These libraries are also guaranteed a long life and greater support.
I would like to empasize that the decision to include these libraries in the core distribution is my favorite part of Ruby 1.8.0. We are getting closer to providing a complete toolkit for application development. We still offer a fewer set of libraries than other scripting languages, but these libraries are of incredible quality and utility.
I'm going to go through a few of these libraries, giving some sample code and pointers to where documentation can be had.
REXML is an XML library of the highest order. It is simple to use, full of features, faithful to Ruby's ideals and quite swift. Many who have been frustrated by the design of other XML libraries, find complete satisfaction in REXML. Allow me a short demonstration.
One of REXML's greatest features is it's XPath support. Let's suppose we have an XML document stored in a string, such as:
xmlstr = <<EOF <mydoc> <someelement attribute="nanoo">Text, text, text</someelement> </mydoc> EOF
Now, let's load the above document into a REXML::Document
object:
require 'rexml/document' xmldoc = REXML::Document.new xmlstr
If we want to access the text in the /mydoc/someelement
node, we can simply access the elements
property with an XPath string between square brackets:
xmldoc.elements['/mydoc/someelement'].text #=> "Text, text, text"
Attributes can be accessed via the REXML::Attributes
object:
xmldoc.elements['/mydoc/someelement'].attributes['attribute'] #=> "nanoo"
You can do a surprising amount with knowledge of just the above techniques. I will leave you with one other before I hand you off to REXML's documentation.
The REXML::Elements#each
method is useful for cycling through a set of matching XML nodes. Supposing we wanted to cycle through all someelement
nodes:
xmldoc.elements.each('/mydoc/someelement') do |ele| puts ele.text end
REXML also has APIs for creating XML, stream parsing, event-based (SAX) parsing, entity processing, and a wealth of encodings.
For more information, I would suggest studying in the following order:
YAML is a simple, readable language for storing data. Ruby 1.8.0 introduces native support for loading and generating YAML.
A simple Array of Strings can be represented in YAML:
- bicycle - car - scooter
Hashes as well:
title: Ruby in a Nutshell author: Yukihiro Matsumoto publisher: O'Reilly and Associates
These are simple examples, though. YAML can handle a wide variety of Ruby objects and maintain its readability. YAML is a great solution for configuration files, adhoc protocols and serializing data between other scripting languages.
Incidentally, I am personally responsibile for development of this particular library. The C code that powers Ruby's YAML support is called Syck and extensions which use the same parser are available for Python and PHP. This shared parser and emitter helps guarantee that data objects are interpreted the same by the extension.
Loading YAML documents is extremely simple:
require 'yaml' obj = YAML::load( File.open( 'books.yml' ) )
If books.yml
contains a hash, then a Ruby Hash will be returned. If the document contains a list, then a Ruby Array will be returned.
To turn the object back into YAML:
require 'yaml' File.open( 'books.yml', 'w' ) do |f| f << obj.to_yaml end
Try it sometime in IRb. Instead of using Kernel::p
to inspect your objects, try Kernel::y
:
require 'yaml' a = { 'time' => Time.now, 'symbol' => :Test, 'number' => 12.0 } y a # --- # number: 12.0 # symbol: !ruby/sym Test # time: 2003-08-04 21:08:37.430417 -06:00
To learn YAML, I would suggest study the following documents in order:
WEBrick is a socket server toolkit now included with Ruby 1.8.0. The library has been in development for several years and has long been a boon to Ruby developers.
Here's a basic example web server:
require 'webrick' s = WEBrick::HTTPServer.new( :Port => 2000, :DocumentRoot => Dir::pwd + "/htdocs" ) trap( "INT" ) { s.shutdown } s.start
As you can see, WEBrick is a snap. In some simple benchmarks, I've found its file-serving to be comparable to Apache 1.3. WEBrick is a threading server, so it can handle a number of concurrent connections.
My favorite part of WEBrick is its pluggable architecture. You basically map (or mount) services to specific namespaces within a given server. Any request issued under that URI namespace is passed to the plugin.
Take this SOAP server as an example:
require 'soaplet' srv = SOAP::WEBrickSOAPlet.new s.mount( "/soap", srv )
Now all requests sent to http://localhost:2000/soap/
will be processed by the SOAPlet.
There isn't much English documentation on either WEBrick or its compatriots, so you might have to dig through source code to accomplish more complicated endeavors.
Honestly, my favorite part of developing in C is dynamic linking. It's so neat to load a shared object and shake hands with it and say, "Hey, there little guy. Welcome to the program." And the great thing about Ruby/DL is that you can do it all from Ruby.
I'll give you just a few examples and then refer you to ext/dl/doc/dl.txt
in the Ruby 1.8.0 distribution, which documents much of what this module can do.
In this example, we're going to interface with the curl shared library. You'll be amazed how simple it is. One of the ways Ruby/DL could really benefit the Ruby community is by allowing developers to write Ruby extensions without writing them in C. Here we've got libcurl.so
and we're going to interface with it directly.
require 'dl/import' module Curl extend DL::Importable dlload "/usr/local/lib/libcurl.so" extern "char *curl_version()" end puts Curl.curl_version
We'll start with an easy one. Retrieving the version number. Libcurl has a curl_version() call, which returns a string. All we have to do is provide Ruby/DL with the function prototype and we're set!
I'm encapsulating the Curl API in a module called Curl
. The extend DL::Importable
introduces a number of methods for interfacing with the DLL. I load the shared object with dlload
. Then, provide the prototype (minus the semicolon) to extern
. Now the Curl module has a method called curl_version
which can be used to retrieve the version!
Things start to get more complicated when dealing with C structures. But Ruby/DL has ways of handling structs
, callbacks and even pointers!
Let's try adding a method for the curl_version_info(), which returns a struct
.
require 'dl/struct' module Curl VersionInfoData = struct [ "int age", "char *version", "uint version_num", "char *host", "int features", "char *ssl_version", "long ssl_version_num", "char *libz_version", "char **protocols" ] extern "void *curl_version_info(int)" end ver = Curl::VersionInfoData.new( Curl.curl_version_info( 0 ) ) puts "Curl version: " + ver.version puts "Built on " + ver.host.to_s puts "Libz version: " + ver.libz_version.to_s
In many cases, you may not even be modifying the struct that is returned from a DL call. Above we're casting a void pointer to a VersionInfoData
struct with the new
method. If you don't need to get into the nitty-gritty, then don't bother. Simply have the function prototype returning a void pointer and pass the return of a call into other calls.
I will also show you a simple demonstration in pointer math with Ruby/DL. The above VersionInfoData
contains a list of supported protocols in a char pointer-pointer. This is an array of strings, ended with a pointer. We'll use DL::sizeof
to retrieve the size of a character pointer and loop until we hit NULL.
puts "Supported protocols:" (0..100).step( DL::sizeof('s') ) do |offset| protocol = ( ver.protocols + offset ).ptr break unless protocol puts ".. #{protocol}" end
Now that's nifty. Pointer math can be done against the DL::PtrData
class!
Now, if you want some great examples, head over to the Ruby/DL site. They've got a concise libxslt sample, a GTK+ sample, and a bunch of Win32API samples. I dare someone to write a whole extension in this. Seriously, I will send that person a free bathrobe.
The StringIO class hardly needs an explanation. Long have Ruby developers bundled this class with their packages. By allowing you to read and write to a String like an IO object, the StringIO class keeps you from having to treat Strings and Files like separate creatures. Instead, require the StringIO class and handle everything with each
, readlines
, seek
and all of your other favorite IO methods.
require 'stringio' s = StringIO.new( <<EOY ) .. string to read from here .. EOY s2 = StringIO.new s.readlines.each do |line| # Very basic stripping of HTML tags line.gsub!( /<[^>]+>/, '' ) s2.write( line ) end
StringIO can be especially helpful to those who are writing parsers (which includes the abundance of you who are writing templating engines!). Remember that, like other IO classes, StringIO keeps line number (StringIO#lineno
) and character position (StringIO#pos
) data, which is essential for error-reporting.
Also, many developers were using a pure Ruby version of StringIO for Ruby 1.6.8. In Ruby 1.8.0, StringIO is a C extension.
I have been dying for this library to come up in the standard library. Most of you don't know it, but you can quit using Net::HTTP
and Net::FTP
. The open-uri
library is a Ruby equivalent to wget or curl (the library mentioned in the previous section).
Basically, this module allows you to use the basic open
method with URLs.
require 'open-uri' require 'yaml' open( "http://www.whytheluckystiff.net/why.yml" ) do |f| feed = YAML::load( f ) end
The above script loads my YAML news feed into the feed
variable. The block gets passed a StringIO object (as previously discussed), which can be read by the YAML module. Ain't it lovely to see everything working together so nicely?
Seems too simple? No way. You have plenty of control over the sending of headers:
open("http://www.ruby-lang.org/en/", "User-Agent" => "Ruby/#{RUBY_VERSION}", "From" => "foo@bar.invalid", "Referer" => "http://www.ruby-lang.org/") {|f| ... }
And there is plenty of metadata mixed in to the response:
open("http://www.ruby-lang.org/en") {|f| f.each_line {|line| p line} p f.base_uri # URI::HTTP http://www.ruby-lang.org/en/ p f.content_type # "text/html" p f.charset # "iso-8859-1" p f.content_encoding # [] p f.last_modified # Thu Dec 05 02:45:02 UTC 2002 }
I used a variant of this module in RAAInstall. It was essential. We could pull down all sorts of URLs from RAA and just pass them on to open-uri
without worry. Saved quite a bit of time.
Every object in Ruby has an inspect
method, which allows the contents of an object to be readably displayed at any time. A common way to inspect
objects is to use the Kernel#p
method, which prints an inspection of a Ruby object:
>> p Hash['mouse', 0.4, 'horse', 12.3] {"horse"=>12.3, "mouse"=>0.4}
Above, the contents of a Hash are displayed. Strings are surrounded by quotes. Numbers, dates, are formatted simply.
Unfortunately, complicated class structures can still be difficult to read at times. Here's a YAML news feed which, when printed with Kernel#p
, becomes a mess of Arrays and Hashes, wrapped to fit my terminal:
=> {"modified"=>Wed Feb 05 12:29:29 UTC 2003, "language"=>"en-us", "title"=>"My First Weblog", "issued"=>Wed Feb 05 12:29:29 UTC 2003 , "author"=>{"name"=>"John Doe", "url"=>"/johndoe/", "email"=>"joh n.doe@example.com"}, "contributors"=>[{"name"=>"Bob Smith", "url"= >"/bobsmith/", "email"=>"bob.smith@example.com"}], "subtitle"=>"Ai n't the Interweb great?", "created"=>Wed Feb 05 12:29:29 UTC 2003, "link"=>"/johndoe/weblog/", "entries"=>[{"modified"=>Wed Feb 05 12 :29:29 UTC 2003, "title"=>"My First Entry", "issued"=>Wed Feb 05 1 2:29:29 UTC 2003, "id"=>"e34", "contributors"=>[{"name"=>"John Doe ", "role"=>"author", "url"=>"/johndoe/", "email"=>"john.doe@exampl e.com"}, {"name"=>"Bob Smith", "role"=>"graphical-artist", "url"=> "/bobsmith/", "email"=>"bob.smith@example.com"}], "summary"=>"A ve ry boring entry; just learning how to blog here...", "subtitle"=>" In which a newbie learns to blog...", "content"=>[{"lang"=>"en-us" , "data"=>"Hello, __weblog__ world! 2 < 4!\n"}, {"type"=>"text/html" , "lang"=>"en-us", "data"=>"<p>Hello, <em>weblog</em> world! 2 < ; 4!</p>\n"}, {"type"=>"image/gif", "lang"=>"en-us", "data"=>"GIF8 9a\f\000\f\000\204\000\000\377\377\367\365\365"}], "created"=>Wed Feb 05 12:29:29 UTC 2003, "link"=>"/weblog/archive/45.html"}], "ba se"=>"http://example.com"}
The PrettyPrint module (pp) is a severe enhancement to the traditional inspect
technique. The goal of the module is to enhance readability by throwing in some conservative whitespace to indicate hierarchy and perform wrapping of longer content.
The same document printed with pp
:
>> require 'pp' => true >> pp YAML::load( File.open( 'pie.yml' ) ) {"modified"=>Wed Feb 05 12:29:29 UTC 2003, "language"=>"en-us", "title"=>"My First Weblog", "issued"=>Wed Feb 05 12:29:29 UTC 2003, "author"=> {"name"=>"John Doe", "url"=>"/johndoe/", "email"=>"john.doe@example.com"}, "contributors"=> [{"name"=>"Bob Smith", "url"=>"/bobsmith/", "email"=>"bob.smith@example.com"}], "subtitle"=>"Ain't the Interweb great?", "created"=>Wed Feb 05 12:29:29 UTC 2003, "link"=>"/johndoe/weblog/", "entries"=> [{"modified"=>Wed Feb 05 12:29:29 UTC 2003, "title"=>"My First Entry", "issued"=>Wed Feb 05 12:29:29 UTC 2003, "id"=>"e34", "contributors"=> [{"name"=>"John Doe", "role"=>"author", "url"=>"/johndoe/", "email"=>"john.doe@example.com"}, {"name"=>"Bob Smith", "role"=>"graphical-artist", "url"=>"/bobsmith/", "email"=>"bob.smith@example.com"}], "summary"=>"A very boring entry; just learning how to blog here...", "subtitle"=>"In which a newbie learns to blog...", "content"=> [{"lang"=>"en-us", "data"=>"Hello, __weblog__ world! 2 < 4!\n"}, {"type"=>"text/html", "lang"=>"en-us", "data"=>"<p>Hello, <em>weblog</em> world! 2 < 4!</p>\n"}, {"type"=>"image/gif", "lang"=>"en-us", "data"=>"GIF89a\f\000\f\000\204\000\000\377\377\367\365\365"}], "created"=>Wed Feb 05 12:29:29 UTC 2003, "link"=>"/weblog/archive/45.html"}], "base"=>"http://example.com"}
In Hashes, pp
attempts to keep keys and values on the same line. But if longer content is found, the value is placed on a newline. This sort of layout is possible through a flexible PP
class used for string construction.
Whereas Ruby's inspect
receives no arguments, your custom pretty_print
method will receive an instance of the PP
class, which allows you to give cues as to where to group or allow breaks in content. Here's an example from the Array#pretty_print
method included with PP
:
class Array def pretty_print(pp) pp.group(1, '[', ']') { self.each {|v| pp.comma_breakable unless pp.first? pp.pp v } } end end
I think this is a great model for building strings. This same approach could be use effectively to build HTML, XML, or Textile from data structures. A very similiar technique is used by Ruby's YAML emitter.
In fact, I'll also mention that if you've loaded the YAML module, you can use Kernel#y
to print structures in YAML:
>> require 'yaml' => true >> y YAML::load( File.open( 'pie.yml' ) ) --- modified: 2003-02-05 12:29:29.000000 Z language: en-us title: My First Weblog issued: 2003-02-05 12:29:29.000000 Z author: name: John Doe url: "/johndoe/" email: john.doe@example.com # .. etc ..
Ruby works quite well across platforms. I'm always surprised at how well my Ruby code can execute flawlessly across platforms. Ten years ago cross-platform apps were an absolute joke! But now it's another pleasant reality for scripters.
The un
library takes avantage of Ruby's cross-platform support to provide the common UNIX commands for all Ruby users. If you're on Windows, rather than installing Cygwin or MinGW, you can use UNIX commands through your Ruby 1.8.0 installation.
To execute from the commandline, type: ruby -run -e
. Type the command. Followed by --
. Finish with the options to the command.
Here are a list of un
's included commands:
ruby -run -e cp -- [OPTION] SOURCE DEST ruby -run -e ln -- [OPTION] TARGET LINK_NAME ruby -run -e mv -- [OPTION] SOURCE DEST ruby -run -e rm -- [OPTION] FILE ruby -run -e mkdir -- [OPTION] DIRS ruby -run -e rmdir -- [OPTION] DIRS ruby -run -e install -- [OPTION] SOURCE DEST ruby -run -e chmod -- [OPTION] OCTAL-MODE FILE ruby -run -e touch -- [OPTION] FILE
You could provide aliases in your environment for these to emulate the UNIX commands. Or you could use these to build cross-platform Makefiles or simple .bat/.sh scripts.
The neatest thing about un
is how it works, though. Let's dissect the commandline.
The first section (ruby -run
) simply starts the Ruby interpreter and requires the un
library. Then, the -e cp
option indicates that we want to execute the cp
method (which is mixed in from the un
library.) The double-dash (--
) indicates that all further options will be sent to ARGV
(and hence to un
). So, un
simply reads the rest of the line from ARGV
. Pretty clever!n
by why the lucky stiff
august 18, 2004