Reading and writing
RSS and Atom
with perl
Ian Malpass, October 2005
What is RSS?
What does it stand for?
It depends on which version you're talking about.
- RDF Site Summary, or
- Rich Site Summary, or
- Really Simple Syndication
It doesn't really matter, since they all try to do the same thing.
RSS is an XML-based file format designed to provide a machine-readable representation or summary of a site's content.
It is often used to distribute, or syndicate, content to other sites or readers.
What is RSS?
What's it like?
Typically contains
- A section of metadata about the source of the content, e.g.
- The URL of the originating site
- A description of the content
- The name and/or email address of the publisher or author
- A stack of content items.
- Usually fixed length, and first-in, first out.
- Items have some or all of a title, description, and link.
- Authors often put all their content in the description, rather than just a summary.
- Can also have per-item metadata.
What is RSS?
An RSS bestiary
RSS has a bit of a turgid history, which has led to a bunch of different RSS versions and "standards".
There are two broad categories - those based on RDF/XML, and those that aren't.
| RDF/XML | Non-RDF/XML |
|---|---|
|
RSS 0.9 The original RSS, "RDF Site Summary", developed by Netscape. It was based on an early draft of the RDF specification, and is not actually valid according to the final published standard. |
RSS 0.91 A revised and simplified version of RSS 0.9 released by Netscape. The first RSS to actually get serious use. "RSS" stood for "Rich Site Summary". |
|
RSS 1.0 Developed by the RSS-DEV mailing list after Netscape lost interest in RSS, RSS 1.0 is valid RDF/XML. It is designed to be modular with a small core of tags that can then be extended using RDF vocabularies (most commonly the Dublin Core metadata vocabulary). |
RSS 0.92 A revised RSS 0.91, released by Dave Winer of Userland Software at about the same time as RSS 1.0. 0.92, like 0.91, has a fixed collection of tags for describing your content and metadata. |
| RSS 2.0 A major revision and expansion of RSS 0.92. Winer called it "Really Simple Syndication". |
Writing RSS and Atom feeds
RSS 1.0
#!/usr/bin/perl
use warnings;
use strict;
use XML::RSS;
my $rss = XML::RSS->new( version => '1.0' );
$rss->channel(
title => 'Example RSS 1.0 Feed',
link => 'http://www.example.org/',
dc => {
publisher => 'Ian Malpass',
date => '2005-10-11T21:46:00Z'
}
);
$rss->add_item(
title => 'Hello world',
link => 'http://www.example.org/hello.html',
description => 'Greetings to you, world!',
dc => {
date => '2005-10-11T21:46:00Z'
}
);
$rss->save( 'example-1.xml' );
<?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" > <channel rdf:about="http://www.example.org/"> <title>Example RSS 1.0 Feed</title> <link>http://www.example.org/</link> <dc:date>2005-10-11T21:46:00Z</dc:date> <dc:publisher>Ian Malpass</dc:publisher> <items> <rdf:Seq> <rdf:li rdf:resource="http://www.example.org/hello.html" /> </rdf:Seq> </items> </channel> <item rdf:about="http://www.example.org/hello.html"> <title>Hello world</title> <link>http://www.example.org/hello.html</link> <description>Greetings to you, world!</description> <dc:date>2005-10-11T21:46:00Z</dc:date> </item> </rdf:RDF>
Writing RSS and Atom feeds
RSS 2.0
#!/usr/bin/perl
use warnings;
use strict;
use XML::RSS;
my $rss = XML::RSS->new( version => '2.0' );
$rss->channel(
title => 'Example RSS 2.0 Feed',
link => 'http://www.example.org/',
lastBuildDate => 'Tue, 11 Oct 2005 21:46:00 GMT',
managingEditor => 'Ian Malpass'
);
$rss->add_item(
title => 'Hello world',
link => 'http://www.example.org/hello.html',
description => 'Greetings to you, world!',
pubDate => 'Tue, 11 Oct 2005 21:46:00 GMT'
);
$rss->save( 'example-2.xml' );
<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:blogChannel="http://backend.userland.com/blogChannelModule"> <channel> <title>Example RSS 2.0 Feed</title> <link>http://www.example.org/</link> <description></description> <lastBuildDate>Tue, 11 Oct 2005 21:46:00 GMT</lastBuildDate> <managingEditor>Ian Malpass</managingEditor> <item> <title>Hello world</title> <link>http://www.example.org/hello.html</link> <description>Greetings to you, world!</description> <pubDate>Tue, 11 Oct 2005 21:46:00 GMT</pubDate> </item> </channel> </rss>
Writing RSS and Atom feeds
Atom
#!/usr/bin/perl
use warnings;
use strict;
use XML::Atom::SimpleFeed;
my $atom = XML::Atom::SimpleFeed->new(
title => 'Example Atom Feed',
link => 'http://www.example.com/',
modified => '2005-10-11T21:46:00Z',
author => {
name => 'Ian Malpass'
}
);
$atom->add_entry(
title => 'Hello world',
link => 'http://www.example.com/hello.html',
content => 'Greetings to you, world!',
modified => '2005-10-11T21:46:00Z',
);
open ATOM, '>example-atom.xml'
or die "Can't write to example-atom.xml: $!";
$atom->save_file( \*ATOM );
<feed version="0.3"
xmlns="http://purl.org/atom/ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<author>
<name>Ian Malpass</name>
</author>
<entry>
<id>tag:www.example.com,2005-10-12:%2Fhello.html</id>
<author></author>
<content mode="escaped"
type="text/html">Greetings to you, world!</content>
<issued>2005-10-12T05:43:32Z</issued>
<link href="http://www.example.com/hello.html"
rel="alternate"
type="text/html" />
<modified>2005-10-11T21:46:00Z</modified>
<title mode="escaped"
type="text/html">Hello world</title>
</entry>
<generator url="http://search.cpan.org/dist/XML-Atom-SimpleFeed"
version="0.7">XML::Atom::SimpleFeed</generator>
<link href="http://www.example.com/"
rel="alternate"
type="text/html" />
<modified>2005-10-11T21:46:00Z</modified>
<title mode="escaped"
type="text/html">Example Atom Feed</title>
</feed>
Getting RSS feeds
#!/usr/bin/perl use warnings; use strict; use HTTP::Request; use LWP::UserAgent; my $ua = LWP::UserAgent->new; my $request = HTTP::Request->new( GET => 'http://www.indecorous.com/perl/rss/example-1.xml' ); print "Requesting...\n"; my $response = $ua->request( $request ); print " Status: ", $response->status_line, "\n"; print " Last modified: ", $response->header( 'last-modified' ), "\n"; print " Etag: ", $response->header( 'etag' ), "\n\n"; $request->header( 'If-Modified-Since', $response->header( 'last-modified' ) ); $request->header( 'If-None-Match', $response->header( 'etag' ) ); print "Requesting again...\n"; $response = $ua->request( $request ); print " Status: ", $response->status_line, "\n";
Requesting... Status: 200 OK Last modified: Wed, 12 Oct 2005 22:15:58 GMT Etag: "22ee24-387-434d8b1e" Requesting again... Status: 304 Not Modified
Reading RSS and Atom feeds
XML::RSS
- Reads RSS 0.9, 0.91, 1.0, and 2.0
- Have to know what the data you want is called, e.g. "pubDate" in RSS 2.0 or "dc:date" in RSS 1.0.
- Dies unhelpfully when passed badly formed RSS.
XML::RAI
- Uses XML::RSS::Parser
- Reads all flavours of RSS, and also Atom.
- Tries to cope gracefully with badly formed RSS.
- Provides an abstraction interface, which attempts to allow you to access data from all the different formats using a single nomenclature.
Reading RSS and Atom feeds
Sample parser
#!/usr/bin/perl
use warnings;
use strict;
use XML::RAI;
use Data::Dumper;
my $rai = XML::RAI->parsefile( $ARGV[0] );
my $channel = $rai->channel;
print "Channel:\n";
print " Title: " . $channel->title . "\n";
print " Link: " . $channel->link . "\n";
print " Modified: " . $channel->modified . "\n";
print " Publisher: " . $channel->publisher . "\n";
for ( @{$rai->items} ) {
print "Item:\n";
print " Title: " . $_->title . "\n";
print " Link: " . $_->link . "\n";
print " Description: " . $_->description . "\n";
print " Created: " . $_->created . "\n";
}
Reading RSS and Atom feeds
RSS 1.0
Channel: Title: Example RSS 1.0 Feed Link: http://www.example.org/ Modified: 2005-10-11T21:46:00+0000 Publisher: Ian Malpass Item: Title: Hello world Link: http://www.example.org/hello.html Description: Greetings to you, world! Created: 2005-10-11T21:46:00+0000
RSS 2.0
Channel: Title: Example RSS 2.0 Feed Link: http://www.example.org/ Modified: 2005-10-11T21:46:00+0000 Publisher: Ian Malpass Item: Title: Hello world Link: http://www.example.org/hello.html Description: Greetings to you, world! Created: 2005-10-11T21:46:00+0000
Reading RSS and Atom feeds
Atom
Channel: Can't call method "query" on an undefined value at /guest2/ian/perl/lib//XML/RAI/Object.pm line 38
