Christoffer's Hpricot Goodies #
So, in what ways have you guys extended Hpricot? I really enjoy this collection of accessories to Hpricot by Christoffer Sawicki, who also wrote the Hpricot-based HTML-to-feed library called Feedalizer.
He has one script that does gsub!
on all text nodes in the document. Another script is for generating tables of contents from the headers on an HTML page. I imagine that would go great with Markdown and Textile. (See also: del.icio.us/tag/hpricot.)
dela med noll
Yeah thanks goes to us for letting him share it :))
But they are nice little trinkets of code, they are. Props to Qerub, aka Christoffer!
UnderpantsGnome
I have an HTML Scrubber based on Hpricot, but I’m currently working on redoing it so that scrub is part of Hpricot instead of being a separate class.
Thanks for making it so easy.
Andrew
Does Hpricot supports opensearch types of XML ? I am having a difficult time to parse the following -
titleAlso how do I deal with
the
Any help..
Andrew
the tags didn’t show up.. lets try again.. Nope seems like I can’t provide an example here.. anyway opensearch tags ”:” in their tag name like “opensearch:title”
also how to deal with single xml tags like “br /” or “link /”
Thanks for your help..
G.Lindqvist
Nice one Qerub, Hpricot Goodies is really useful. Thanks! ;)
Qerub
Heh. Thanks for the publicity, but more important: thanks again for Hpricot!
Yes, HTML Outliner is being used to generate table of contents for articles. I should probably bundle some code that takes the
HTMLOutliner#outline
tree and returns a multidimensional<ul>
that is ready to be used.why
UnderpantsGnome: That would be a great addition to the main lib. I actually really like the
strip
methods you’ve made. What other plans do you have?Andrew: Send me some XML . Hpricot doesn’t have problems parsing namespaces, however its xpath syntax doesn’t support namespaces since its a hybrid of CSS and XPath.
UnderpantsGnome
why: I was/am basically just moving the block from HtmlScrubber, less the config into my Hpricot additions. Mostly becasue then I could call it Hpricot::Scrub and that made me laugh.
Then you could do:
I haven’t had any new needs for this, though I was considering making the config more like perl’s HTML ::Scrubber where you can specify global attributes to allow/deny but also specify attributes to allow/deny at the tag level.
Other than that I’m open to suggestions.
Thanks again for making this so easy to accomplish, I so didn’t want to rewrite HTML ::Scrubber from scratch.
UnderpantsGnome
why- I was playing around with Hpricot Scrub and it seems to have gotten unhappy since 0.4.86 (last working) I also have an image sneaking through that I don’t think should be. I have the current changes with a “test” that shows the failure on recent gems and the stray image on <= 0.4.86.
You can grab it here if you’d like to take a look hpricot_scrub.zip
As usual feedback welcome.
why
Well, it looks like scrub’s use of
traverse_all_element
is the source of the problem. If you remove stuff while it’s traversing, things end up getting skipped.You know what I’m saying?
UnderpantsGnome
Oh, duh… this works better, unless you see a potential issue with this that I’m missing.