The Rambler :: blog

Monday, April 19, 2004

Read Short Stories, Rip Schubert Songs, Realise Someone's Suspicions 

To read this post and the rest of The Rambler in its current incarnation please click here. Thank you!*****
Scott, in comments to a couple of posts down, asks the pretty reasonable question 'what the Dickens is RSS'? Well, here, in approximately non-techie language, I will try to explain what it is, and why it's really cool.

First a [sort of] definition. No one actually agrees what RSS stands for, but the easiest definition is 'Really Simple Syndication'. XML, on the other hand, with which - in this context at least - it is pretty interchangable as a term, definitely stands for 'eXtensible Markup Language'. XML is very, very handy for web designers and the like, as it's a much more powerful version of coding text and other stuff than HTML. Crucially, it is much more systematic than HTML - if you code HTML wrong (like forget to close an italics tag, say), your browser can still read the webpage, even if it might look wrong. If you code XML wrong, it simply doesn't work. It is absolutely strict, and as a result, can be reliably read by machines.

If you want to know what XML looks like, click here for the XML version of the Google weblog. It looks horrid, but thankfully producing all this stuff is fully automated if you're on Moveable Type/Typepad, or switch on the Site Feed thing in Blogger. The point is that for every post you publish, a little bit of XML code is published in parallel, building up to a big page like the one for the Google Weblog - and every bit of code for every post for every XML/RSS feed in the world follows exactly the same standard format:

< item >
< title >This is the title of the post (if you don't use titles, the first few words of your post end up here)< title >
< link >This is the permalink for that post or article< / link >
< description >This is optional, but is usually either a summary, or the complete post text< /description >
< / item >

The tags aren't formatting instructions, like i or b, simply a way of defining what's between them as a certain kind of thing (a link, a title, a piece of descriptive text, or the whole item itself). So, in addition to the nicely formatted blog that people can read, you have a very pure, abstract version that machines can read, and format in any way you can imagine. So, for example, the first two boxes in my left-hand column are automatically generated from XML published by other people - the first is from the Daypop Top 40 most popular links, the second is from the music category of the del.icio.us communal bookmarks page, but they could be generated from any site publishing an XML version of itself. This is why the system is also called RSS - because it is a really simple way of syndicating out your site. Once you know this is what people are doing, you can see it everywhere - most sites with updated news headlines, or 'most popular sites at the moment', or share prices, or whatever, are probably drawing on an RSS feed somewhere. You get free, dynamic (changing) content for your site, but someone else is doing the work for you.

Which is what makes it so useful for webpage owners. What makes it useful for webpage readers is thanks to a whole bunch of services such as the excellent Bloglines. With Bloglines, and other newsreaders, you simply 'subscribe' to any XML feed you can find and like, and when you log on to Bloglines, you get a page listing all the blogs or news sites you subscribe to, with an indication of how many new posts have appeared on each since you last checked. From there you can read the summaries of each post, and if you like the look of it, click through to the blog post itself. (The jpg Matt provides is a really useful way to see what an RSS reader looks like. If you go here you can access a Bloglines screeonshot: the lefthand pane is the blogs you subscribe to, the righthand pane is what a blog post looks like when read in Bloglines.) So instead of spending all morning clicking around every blog you read, only to find that half of them haven't updated, simply keep Bloglines running on your desktop, and it will tell you when they're updated. If - as I do - you have a feed from a newspaper (the Telegraph are by far the most organised of the British papers on this score, it has to be said), you can scan through an entire day's headlines in minutes, and it gets updated regularly throughout the day; so if you're on the ball and want to, you get to read stuff pretty much as it's published. It's a bit like blog radio.

You don't need to understand XML to publish an RSS feed - as I said, the two big blog services provide this automatically, although it's not necessarily switched on for Blogger - but it does help to appreciate the difference between the HTML version of your site (with pretty formatting and all), and the XML version (with its absolutely strict, logical markup code). For those of us who are closet code junkies, it is really satisfying seeing all those tags and attributes working hard.

This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License. All non-proprietary code is valid XHTML.