Downes.ca ~ Serialized Feeds

Serialized Feeds

Feb 05, 2009
By Stephen Downes

Originally posted on Half an Hour, February 5, 2009.

A serialized feed is one in which posts are arranged in a linear order and where subscribers always begin with the first post, no matter when they subscribe to the feed. This contrasts with an ordinary RSS feed, in which a subscriber will begin with today's post, no matter when the feed started.

The idea of serialized feeds has been around for a while. This page from 2005, for example, allows you to read Cory Doctorow's novel Someone comes to Town, Someone leaves Town by RSS. And Russell Beattie offers serialized books via his Mobdex serialized feeds system. In 2006, a company called FeedCycle offered what it called cyclic feeds. " For example, if you were to take Moby Dick and divide it into 100 parts, and publish them all in one huge RSS feed, that would be a cyclic RSS feed." Feed cycles, as they have come to be called, have also been used for podcasts. Tony Hirst has written about serialized feeds, demonstrating the concept with services like OpenLearn Daily,

There is no academic literature discussing the use of serialized feeds to support online learning, though the subject of paced online learning has been discussed. Anderson, Annand and Wark (2005) examine the question of pacing from the perspective of student interactions. "Increased peer interaction can boost participation and completion rates, and result in learning outcome gains in distance education courses." But the use of serialized feeds does not automatically increase interactions. It is also arguable that pacing itself improves learning outcomes.

Serialized Feeds: Basic Approach

A serialized feed is basically a personalized feed, because each person begins at a different time. Personalized we data is typically managed by CGI or some other server process which gathers relevant information about the user (such as the time he or she subscribed to the feed) and generating the resulting feed. This feed is then typically identified with a serial number, which is processed when the RSS feed is requested by an aggregator.

This approach, however, raises some concerns:

First, it creates a scalability issue. RSS feed readers typically access a web site once an hour. If a CGI process is run for each feed, then each user results in 24 CGI requests a day. Even if the frequency is scaled back, having large numbers of users can place a considerable load on server processing.

Second, it creates a coordination issue. If each feed is personalized then in order for interaction to occur there needs to be some mechanism created to identify users of relevantly similar feeds.

These problems were addressed by adopting a cohort system for serialized feeds. But first, some discussion on the structure of a serialized feed.

In order so simplify coding, the gRSShopper framework was used. This allowed courses to be constructed out of two basic elements: the page and the post.

The page corresponds to a given course. It consists of typical page elements, such as page title, content and, where appropriate, a file location, along with default templates and project information. Page content defined RSS header content. Pages are identified with a page ID number. The page also has a creation date, which establishes its start date, set by default to the exact time and date the page was created.

The post corresponds to an individual RSS feed item. While a person subscribed to an RSS feed as a whole (corresponding to a page), he or she receives individual posts as RSS posts over time. A course thus consists basically of a page and a series of posts. Posts are identified by post ID numbers. Posts are associated with pages with a thread value corresponding to the ID number of the page.

Serialized Feeds: Pacing

Pacing is managed through two basic elements.

First, each page defined an cohort number. This number establishes the size of the cohort, in days. Thus, is a page offset number is '7', then a new edition of the course will start every 7 days. In the gRSShopper serialized feeds system, a new, serialized, page is created for each cohort. This page is identified by (a) the ID number of the original master page, and (b) the offset from that page, in total number of days, from the start date of the master page. These serialized pages are stored as records in the database.

Second, each post is assigned an offset number. This number defines the number of days after the start of the course that the post is to appear in the RSS feed. For example, suppose the course starts March 10. Suppose the post has an offset number of 6. Then the post should appear in the RSS feed on March 16.

This creates everything we need to create a serialized feed. To begin, we have a master page and series of associated posts:

Page Master (time t days, cohort size c)

Post 1 (t+o days)
Post 2 (t+o days)
etc.

The page also has a set of serialized pages, created as needed, each corresponding to an individual cohort:

Page Master (time t days)

Serialized Page 1 (t+(cr*1)days)
Serialized Page 2 (t+(c*2)days)
etc.

Each serialized page has a start date d, which is t+(c*2)days, and by comparing the interval i between the current date and the start date, we can determine which post should be posted in its RSS feed - it will be the post or posts with an offset value of i.

Page Master (time t days)

Serialized Page 1 (t+(c*1)days)
- Post i1
Serialized Page 2 (t+(c*2)days)
- Post i2
etc.

Serialized Feeds: Processes

Processing to produce the serialized feed occurs in three stages:

First, the author creates a master or edited. This creates database records for the master page and for each of the posts associated with the master page.

Second, the script creates a series of pages for a given cohort. This occurs when a potential subscriber invokes the subscribe script. Essentially, the script creates the RSS feed content for each day the course runs. These are stored in the database and identified with a cohort number and a publish date.

Third, a nightly cron job prints the daily page for each cohort for each course. The idea here is that the script creates a static page that may be accessed any number of times without creating a CGI process. Static pages are stored in a standardized location: base directory/course ID number/cohort offset number

Subscribing to a serialized feed this becomes nothing more than a matter of pointing a browser to the appropriate page. For example, pointing the browser to http://course.downes.ca/course/127/17.xml allows the learner to subscribe to course number 127 (the Fallacies course), cohort number 17 (the cohort that started 17 days after the course was first created). This link is created and displayed by the subscribe script.

This process has several advantages. First, fixes the content of the course to what is currently defined when the student signs up to the course; the course may be edited for subsequent users without changing what was originally defined for previous users. Second, processing time is minimized and front-loaded, allowing the system to scale massively. Third, and most significantly, multiple users are served by the same RSS file. Not only does this save significantly on processing, it also sets up an environment where interaction may be facilitated.