hoodwink.d enhanced


UTF-8 Plugin for Rails, Fine for Ruby 1.8 #

by why in inspect

Here’s the story thus far. Ruby has no Unicode support in 1.8 (except for Regexps), but it is forthcoming and Matz has stated his intentions. In the meantime, there’s been a quiet work to scrape up Unicode-aware String classes.

Today, Manfred Stienstra has lobbed a bunch of details on using the new UTF-8 encoding plugin for Rails. You can certainly use just the String class extensions in any other traditional Ruby stuff.

By creating this plugin we haven’t resolved all our problems. One of the biggest problems is that we can only process UTF-8 encoded strings. [...] Sure, there are solutions like iconv to re-encode this data, but life would be a lot simpler if we didn’t have to think about this.

This plugin by Julian Tarkhanov does require the Unicode library.

said on 16 Jan 2006 at 16:53

I can’t wait to see the day.

said on 17 Jan 2006 at 02:26

Isn’t the String class extension’s capitalize method wrong? Its first line is “byte_capitalize unless utf8_pragma?” but shouldn’t it be “return byte_capitalize unless utf8_pragma?”?

said on 17 Jan 2006 at 06:37

and the difference is?

said on 17 Jan 2006 at 07:44

The difference is that with the return it does not call and return the Unicode::capitalize(Unicode::normalize_KC(self)) if utf8_pragma?...

said on 17 Jan 2006 at 08:16

No, Scritch is right. That is a bug in the string_overrides.rb.

I will send a patch to Julian.

said on 19 Jan 2006 at 04:47

Like I said elsewhere, there are problems with this plugin. In short – there exists code which always wants byte-oriented String#slice, #count, etc. (file handling, net packet processing, db adapters, etc). If the string contains valid utf8 data, such code will fail.

It seems that at least Webrick contains some, because I had problems with Rails when running with Webrick and this plugin.

It is very dangerous to override String methods depending on string content.

Comments are closed for this entry.