hoodwink.d enhanced
RSS
2.0
XHTML
1.0

RedHanded

HTML Filtering For RedCloth #

by why in bits

This isn’t a patch for RedCloth, it’s a method you can use to filter out HTML in general. But it works nicely with RedCloth output. I use it for the comments on this site. It’s the best solution I can think of for world-writable files.

 class String
     ## Dictionary describing allowable HTML
     ## tags and attributes.
     BASIC_TAGS = {
         'a' => ['href', 'title'],
         'img' => ['src', 'alt', 'title'],
         'br' => [],
         'i' => nil,
         'u' => nil,
         'b' => nil,
         'pre' => nil,
         'kbd' => nil,
         'code' => ['lang'],
         'cite' => nil,
         'strong' => nil,
         'em' => nil,
         'ins' => nil,
         'sup' => nil,
         'sub' => nil,
         'del' => nil,
         'table' => nil,
         'tr' => nil,
         'td' => nil,
         'th' => nil,
         'ol' => nil,
         'ul' => nil,
         'li' => nil,
         'p' => nil,
         'h1' => nil,
         'h2' => nil,
         'h3' => nil,
         'h4' => nil,
         'h5' => nil,
         'h6' => nil,
         'blockquote' => ['cite']
     }

     ## Method which cleans the String of HTML tags
     ## and attributes outside of the allowed list.
     def clean_html!( tags = BASIC_TAGS )
         gsub!( /<(\/*)(\w+)([^>]*)>/ ) do
             raw = $~
             tag = raw[2].downcase
             if tags.has_key? tag
                 pcs = [tag]
                 tags[tag].each do |prop|
                     ['"', "'", ''].each do |q|
                         q2 = ( q != '' ? q : '\s' )
                         if raw[3] =~ /#{prop}\s*=\s*#{q}([^#{q2}]+)#{q}/i
                             pcs << "#{prop}=\"#{$1.gsub('"', '\\"')}\"" 
                             break
                         end
                     end
                 end if tags[tag]
                 "<#{raw[1]}#{pcs.join " "}>" 
             else
                 " " 
             end
         end
     end
 end

Be sure to use it after you convert your Textile to HTML.

 comment = RedCloth.new( entry.comment ).to_html
 comment.clean_html!

I’d like to make RedCloth’s built-in filter allow this kind of customization. It may even be worthwhile to have it scan for allowed CSS within a style declaration. On a Wiki, it’s nice to allow people to come up with widths and floating directions and detailed colors, you know?

What do you think

said on 12 Jan 2005 at 07:24

I think it is vulnerable to this attack:

>
said on 12 Jan 2005 at 07:24

< plaintext>

said on 12 Jan 2005 at 07:31
said on 12 Jan 2005 at 07:32

said on 12 Jan 2005 at 07:33

Hm, holds up quite well. :)

said on 12 Jan 2005 at 07:34

I wonder what happens when I use newlines.

said on 12 Jan 2005 at 07:39
said on 12 Jan 2005 at 07:47

said on 12 Jan 2005 at 10:00

My wish for redcloth would be that the output could be created using a visitor style approach. This would make it a lot easier to create different output scripts like tolatex and todocbook.

said on 12 Jan 2005 at 10:13

said on 12 Jan 2005 at 11:15

So what did you do to dodge the CDATA one? :)

Regarding mailto: and so on: I think it might be nice to have an additional URL filter. Maybe the attribute Array ought to be a Hash with limitations like this:

'a' => {'href' => 'secure-uri', 'title' => nil},
said on 12 Jan 2005 at 11:18

+1 on xal’s comment. having a generic visitor interface would be cool

said on 12 Jan 2005 at 11:32

mmm, good ideas people.

david alan black has a to_docbook method which is fledgling, but which we are just starting to improve.

said on 13 Jan 2005 at 15:33
Hi, I made a small modification to this code so that it keeps the end slashes like in
in order for it to output valid xhtml.
def clean_html( text, tags = BASIC_TAGS )
        text.gsub!( /<(\/*)(\w+)([^>\/]*?)( *\/*)>/ ) do
            raw = $~
            tag = raw[2].downcase
            if tags.has_key? tag
                pcs = [tag]
                tags[tag].each do |prop|
                    ['"', "'", ''].each do |q|
                        q2 = ( q != '' ? q : '\s' )
                        if raw[3] =~ /#{prop}\s*=\s*#{q}([^#{q2}]+)#{q}/i
                            pcs << "#{prop}=\"#{$1.gsub('"', '\\"')}\"" 
                            break
                        end
                    end
                end if tags[tag]
                "<#{raw[1]}#{pcs.join " "}#{raw[4]}>" 
            else
                " " 
            end
        end
        text
   end
said on 12 May 2005 at 13:24
Not Work Safe!]]>
said on 06 Jul 2005 at 23:42

’*’ => nil (or just the hash internal default)

I don’t like how it uses so much explicitness of “blocking known tags”.

Comments are closed for this entry.