I’m currently writing a monster application doing indexing of feeds from blogs, del.icio.us, Flickr, Digg etc. It’s been a bit of a learning curve, but using the excellent and comprehensive Rome library has made it a great deal easier than it might otherwise have been. Rome handles Atom feeds as well as the various confusing RSS variants.
Making sense of feeds is often tricky, as different publishers can use various tags in different ways, or even add their own by introducing custom namespaces. The use of namespaces (which Rome supports through a plugin system) is a promising way of adding custom information, but not without its problems.
Having completed indexing of Flickr, del.icio.us and plain vanilla feeds, I turned to Digg feeds. Imagine my surprise to find out that not only is their advertised namespace (http://digg.com/docs/diggrss/) in fact offline, but they don’t even use it properly in their main RSS feeds.
http://digg.com/rss/containerscience.xml correctly specifies xmlns:digg=”http://digg.com/docs/diggrss/”, but http://digg.com/rss/index.xml specifies xmlns:digg=”docs/diggrss/”. And that just happens to be their main RSS feed. I know it seems niggling to complain, but it messes with the Module I’ve written to handle their tags and seems a tad careless. Oh well, guess I’ll just skip that feed.
More information about Rome, Digg etc:
- Why does Digg not follow standard RSS? (blog post)
- Rome Wiki
- ROME in a Day: Parse and Publish Feeds in Java, excellent article about Rome and introductory techniques over on XML.com
- del.icio.us/garuda/flowofthings is a collection of links I’ve collected, with plenty of feed and blog programming stuff for Java.




