I’m currently writing a monster application doing indexing of feeds from blogs, del.icio.us, Flickr, Digg etc. It’s been a bit of a learning curve, but using the excellent and comprehensive Rome library has made it a great deal easier than it might otherwise have been. Rome handles Atom feeds as well as the various confusing RSS variants.

Making sense of feeds is often tricky, as different publishers can use various tags in different ways, or even add their own by introducing custom namespaces. The use of namespaces (which Rome supports through a plugin system) is a promising way of adding custom information, but not without its problems.

Having completed indexing of Flickr, del.icio.us and plain vanilla feeds, I turned to Digg feeds. Imagine my surprise to find out that not only is their advertised namespace (http://digg.com/docs/diggrss/) in fact offline, but they don’t even use it properly in their main RSS feeds.

http://digg.com/rss/containerscience.xml correctly specifies xmlns:digg=”http://digg.com/docs/diggrss/”, but http://digg.com/rss/index.xml specifies xmlns:digg=”docs/diggrss/”. And that just happens to be their main RSS feed. I know it seems niggling to complain, but it messes with the Module I’ve written to handle their tags and seems a tad careless. Oh well, guess I’ll just skip that feed.

More information about Rome, Digg etc:

2 Comments »

There are 2 comments to "Rome, feeds and un-Digg-able RSS". You may leave your own comment.
1. toxi, November 6th, 2006 at 17:06

Hey Marius, XML namespaces don’t need to map to a working URI, they’re only required to be syntactically correct. However often, (or maybe as best practice) the URI used has some form of human readable documentation of the namespace. For example:

http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul
http://xmlns.com/foaf/0.1/

Good luck & looking forward to see your project. Are you sure you really have *fully* indexed flickr & co? ;)

2. marius watz, November 6th, 2006 at 20:27

Hey toxi, I realize that, but it just seems stupid. Why make them look like URLs when they are actually not? Feedburner’s namespace URI (http://rssnamespace.org/feedburner/ext/1.0) goes to an empty page on a domain that’s even for sale. If the marketing department knew that, they’d surely be annoyed.

As of this moment, I have 6500 feeds and 95000 entries indexed, with 30000 unindexed feeds in the pipeline. The sheer network usage is staggering. The vast majority are blogs. My issue now is to make any sensible presentation of them by Friday…

Comment on this entry

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">