About the Bayesian Feed Filtering Project
The Bayesian Feed Filtering (BayesFF) project will be trying to identify those articles that are of interest to specific researchers from a set of RSS feeds of Journal Tables of Content by applying the same approach that is used to filter out junk emails.
We will investigate the performance of a tool (Sux0r) that will aggregate and filter a range of RSS and ATOM feeds selected by a user. The algorithm used for the filtering is similar to that used to identify spam in many email filters only in this case it will be "trained" to identify items that are interesting and should be highlighted, not those that should be junked.
An important element of the project is investigating whether the filtering is effective enough to be helpful to users (specifically, in this case, researchers looking at journal tables of content for interesting newly-published papers) and disseminating information about the potential of this approach within the JISC community. We appreciate that the potential applicability of the technique is much wider, it applies to any area where a user might want to monitor alerts from a wide range of sources in the knowledge that many of the items in the feeds will be irrelevant. Anyone who has subscribed to dozens of seemingly relevant feeds only to find that they are presented with more items than they can scan is familiar with this problem.
Initially 20 volunteers will take part in a trial. If you are not part of the trial but are interesed in using this tool for Filtering of your RSS Feeds, please feel free to Register. You can then log in and suggest feeds to add to the directory (these will need to be approved by an administrator). To train documents you must set up vectors and categories. Watch this short YouTube tutorial on how to start classifying documents using Naive Baysian Categorization and sux0r. For the purposes of our trial we have called our vector "Interestingness" with categories "Interesting" and "Not Interesting".
sux0r 2.0.6 is a blogging package, an RSS aggregator, a bookmark repository, and a photo publishing platform with a focus on Naive Bayesian categorization and probabilistic content. OpenID enabled (version 1.1); as both a consumer and a provider.
Naive Bayesian categorization is the ouija board of mathematics. Known for being good at filtering junk mail, the Naive Bayesian algorithm can categorize anything so long as there are coherent reference texts to work from. For example, categorizing documents in relation to a vector of political manifestos, or religious holy books, makes for a neat trick. More subjective magic 8-ball categories could be "good vs. bad", risk assessment, insurance claim fraud, whatever you want.
sux0r allows users to maintain lists of Naive Bayesian categories. These lists can be shared with other users. This allows groups to share, train, and use sux0r together.
Bayesian Feed Filtering News