Hello everybody. It has been over a year since the last release of the newsfeeds module, so about time for an update. This new version includes a couple of major enhancements over 4.0.
Caching
The single most important challenge since the old days of Newsfeeds 3 is to reduce the number of calls the module will make to the source of the feed it displays. I.e. caching mechanisms. Newsfeeds 3 let DNN handle it as any regular module can. This means DNN decides that a module doesn’t need to “run” and just outputs the HTML blurb that has been cached. There were a couple of issues with this. Notably that (1) this bypassed the “intelligence” of the news source regarding caching and (2) a spike in loading if the site had been asleep for a while. Regarding 1: any news source can communicate in its feed how long it should be cached. A feed from CNN may well have a shorter refresh time than one from let’s say the DotNetNuke site. And regarding the second point: if the site has gone asleep and DNN decides all content should be refreshed then the module will download its feed when a user comes to visit. The biggest issue there was that the page would not load until the feed was in. This caused large delays (and unwanted speed dependencies) for DNN sites using this module.
Newsfeeds 4.0 approach
The introduction of feed merging in Newsfeeds 4 has compounded this issue as now a number of feeds may need to be retrieved upon load. The first intervention was to break page loading from the feed fetching. This can be done using Ajax. The page renders the Newsfeeds module as an empty container and after the page has loaded this empty container goes to the server to get its contents. It was subsequently tweaked to decide whether to do this or not based on the state of the cache. So if the module needed a refresh Ajax was used and if the cache was valid the module would render that cache immediately. This has worked well but I felt this was only a half-way solution.
Newsfeeds 4.1 enhancement
In the latest version we introduce a background task. A DNN scheduled task now looks through the cache and refreshes feeds if necessary. This background loading means it is definitively decoupled from page loading. If successful, then a refreshed feed will tell the module it needs to run the aggregation logic again (this is done within the page calling logic) but this now no longer depends on the loading of feeds and is virtually instant. The actual contents of the feed (plus pre-processing as described below) are cached in the database in the feed record:
The aggregated feed is cached in the module cache on disk (note we intend to expose this as an aggregated feed in a future version) as Feed_[ModuleId].resources:
As a result we have now set the Ajax feature by default to OFF. You can switch this yourself in the module settings (as well as indicating whether the background task should refresh the feeds of this module):
Pre-processing
In Newsfeeds 4.0 we introduced aggregation. This has been the largest technical challenge to date. We copied in logic from an old project called RssToolkit but this has also led to a number of issues. Most notably the sensitivity to feed validity. In part this is inevitable. Allow me to explain. For feeds to merge they must be similar in structure. Over the years many standards have evolved regarding syndication, the most popular of which is RSS. But even within RSS there are different flavors. If you’re interested check out Wikipedia on syndication, rss, rdf, atom, etc. To make matters worse, various news outlets make their own version of one of these standards. Over the past year I’ve been shocked to learn that large, respectable news outlets feed “illegal” feeds. I.e. their feed does not conform to the set standard. RssToolkit is quite strict about what it accepts and will reject feeds that do not conform to one of the set standards. In part this is necessary to avoid a mess when trying to merge different feeds. Internally all feeds are transformed to Rss 2.0 before being merged.
A new approach
For this version I created the possibility for the admin to define a “pre-processor” for a feed. This means you can intervene before the feed gets to the RssToolkit and make it compliant even if it isn’t. This is done through an XSL transformation or a so-called pre-processor.
XSL pre-processing
You can define a feed to go through XSL before it gets to the aggregator. My hope is that for some of the more popular news outlets people will share their XSL if the feed does not conform to RSS 2.0.
Note that since release we discovered a bug regarding the ability to link to file or using the dropdown with your own transformation. If you see an error about failure to compile the XSL then use the URL option to point to the XSL sheet. We’ll solve this for 4.1.1.
A custom pre-processor
Sometimes XSL is too cumbersome and you really want to process based on the string representation of the feed. This is now also possible. For this we created a new interface:
Namespace PreProcessing
Public Interface IPreProcessor
Function Process(ByVal xml As String) As String
End Interface
End Namespace
We included a Twitter pre-processor in the module which will transform a tweet to properly format links to tags and people.
The Twitter pre-processor is listed in the dropdown for pre-processing:
You can now create your own pre-processor and specify its full type specification in the module’s UI when editing a feed.
Where to get
The Newsfeeds project and its downloads can be found on CodePlex:
http://dnnnewsfeeds.codeplex.com/