We've talked a few times in the past about how RSS feeds are a great way to keep track of a bunch of different online content in one consistent interface. While a number of sites provide excellent RSS feeds of their content, there are a few common problems that can get annoying after a while. In particular some sites tend to have too much content, and a way to filter a feed to only see content you care about could be really useful. On the other side of the coin, a number of sites only provide the first paragraph of their posts, requiring you to click through to the site to read the whole thing.
Happily, both of these problems can be solved with just a little tweaking, using some online tools. Today I'm going to talk about one option for solving each problem, but there are definitely alternatives out there. I'll start out showing you how to use Feed Rinse to filter RSS feeds, and then get into the slightly more involved method of generating full-content feeds from partial feeds using Unsum.
Filtering RSS feeds with Feed Rinse
Sometimes, a particular site will just have too much content to deal with regularly. Often such sites will offer some kind of highlights feed in addition to the full feed, but if they don't, or if you want to tailor a feed just for yourself, a filtering tool is a great option. Feed Rinse is one of many RSS filtering options available, but I think it strikes the right balance of ease of use and filtering power.
To get started with Feed Rinse, head over to www.FeedRinse.com and sign up for an account. Once you've signed up, the main page should have a big button saying "Click Here to Get Started." As you might imagine, your next step is to click that button. From there, you'll be prompted with a page to enter your feeds. If you want to filter a bunch of feeds at the same time, you can enter one per line, or import a list that you've exported from your RSS reader. For this example, I wanted to pare down the feed for a scientific journal that I keep tabs on called the Proceedings of the National Academy of Sciences (PNAS). The address for the RSS feed of new PNAS articles is www.pnas.org/rss/ahead.xml, so I just added that feed, then clicked import.
Once the feed(s) is(are) imported, you're able to set up filters. In this case, PNAS has a ton of articles, but I'm particularly interested in those tagged [Biophysics and Computational Biology] or those about mutual information (sorry for the esoteric example, but the principles should be pretty clear). As such, I chose to allow any post with the appropriate tag in the title, or "mutual information" anywhere in the post.
After saving changes, Feed Rinse shows all of the feeds you've created. You can now subscribe to the filtered feeds just like you would any other RSS feeds. Clicking on the orange RSS logo to the left of the feed title will give you an option of adding the filtered feed to your reader of choice (I'm partial to Google Reader, but there are plenty of options out there.
Expanding partial feeds with Unsum.com
If you've subscribed to many RSS feeds, the chances are pretty good that you've run across some that don't show you all of the post content. Sites prefer that readers click through to the actual articles (so they can show ads and get better analytics), so many will provide partial content in RSS feeds to encourage readers to click through to read the full thing. Unfortunately, if you spend a lot of time in your RSS reader, it can get rather annoying to click through to read full posts. For example, I spend most of my RSS time in the mobile version of Google Reader, and many of the sites aren't optimized for mobile sites; furthermore, if I am offline, I'm completely out of luck.
If you want to get around such partial feeds, Unsum.com lets you convert partial feeds to full RSS feeds. While the process is a bit more involved than the filtering example above, once you've done it, the feed will update itself with full content, letting you stay in the warm embrace of your RSS reader.
To get started, click the "Converter" tab and enter the url of the RSS feed you want to start with. In this example, I'm expanding the feed from Rams Herd, a St. Louis Rams blog which recently moved from a full feed to a partial (again, sorry for the esoteric example).
Unsum provides a few different ways to identify the full content of a page, but in my experience, only the regular expression option worked. The basic idea of Unsum is that it needs a way to identify the beginning and end of the relevant content on any given post. Based on these criteria, it will then extract the content and place it into the new feed.
Telling Unsum how to identify the beginning and end is the tricky part. The key is to look at the html of an example page and find elements that will always come immediately before and after the content. For this example, I went to one of the Rams Herd posts, and looked at the source code in Chrome by right clicking, and choosing "View Page Source".
To get to the right part of the html, I search for the title of the post which is, in this case, surrounded by an h1 tag. The key piece then, is to identify a unique element that comes immediately before the start of the article. In this case, I choose <div class="article">, but this will be different for every site (but needs to be the same for every post on the same site). I next identified a pattern to mark the end of the content. In this case, I used <!-- JOM COMMENT START -->. Again, this will be different for each site you use.
With the start and end patterns in hand, you can set up the regular expression by typing the first pattern, then '(.+?)' (without the quotes), then the second pattern. In my case, this turned out to be:
<div class="article">(.+?)<!-- JOM COMMENT START -->
If you've never used regular expressions before, that probably looks like nonsense, but the basic idea is that you want to keep everything between the start and the end patterns (keep everything inside the parentheses). If you take a look back at the first screenshot of this section, you'll see where I entered this expression. In my experience, this field was very finicky. Even adding an extra space at the end messed things up, so be careful. I also filled in a title, and then hit "Generate".
The resulting feed is shown on the next page, and in the screenshot below. The "converted RSS" url can be added to your RSS reader just like any other feed. Once doing so, you should have a new feed that pulls the full content of articles, based on the links from the partial feed.
Anyway, hopefully that gives you a good idea of the types of tweaks you can use to make RSS even better!