About the Bayesian Feed Filtering Project

The Bayesian Feed Filtering (BayesFF) project will be trying to identify those articles that are of interest to specific researchers from a set of RSS feeds of Journal Tables of Content by applying the same approach that is used to filter out junk emails.

We will investigate the performance of a tool (Sux0r) that will aggregate and filter a range of RSS and ATOM feeds selected by a user. The algorithm used for the filtering is similar to that used to identify spam in many email filters only in this case it will be "trained" to identify items that are interesting and should be highlighted, not those that should be junked.

An important element of the project is investigating whether the filtering is effective enough to be helpful to users (specifically, in this case, researchers looking at journal tables of content for interesting newly-published papers) and disseminating information about the potential of this approach within the JISC community. We appreciate that the potential applicability of the technique is much wider, it applies to any area where a user might want to monitor alerts from a wide range of sources in the knowledge that many of the items in the feeds will be irrelevant. Anyone who has subscribed to dozens of seemingly relevant feeds only to find that they are presented with more items than they can scan is familiar with this problem.

Initially 20 volunteers will take part in a trial. If you are not part of the trial but are interesed in using this tool for Filtering of your RSS Feeds, please feel free to Register. You can then log in and suggest feeds to add to the directory (these will need to be approved by an administrator). To train documents you must set up vectors and categories. Watch this short YouTube tutorial on how to start classifying documents using Naive Baysian Categorization and sux0r. For the purposes of our trial we have called our vector "Interestingness" with categories "Interesting" and "Not Interesting".

About Sux0r

sux0r 2.0.6 is a blogging package, an RSS aggregator, a bookmark repository, and a photo publishing platform with a focus on Naive Bayesian categorization and probabilistic content. OpenID enabled (version 1.1); as both a consumer and a provider.

Naive Bayesian categorization is the ouija board of mathematics. Known for being good at filtering junk mail, the Naive Bayesian algorithm can categorize anything so long as there are coherent reference texts to work from. For example, categorizing documents in relation to a vector of political manifestos, or religious holy books, makes for a neat trick. More subjective magic 8-ball categories could be "good vs. bad", risk assessment, insurance claim fraud, whatever you want.

sux0r allows users to maintain lists of Naive Bayesian categories. These lists can be shared with other users. This allows groups to share, train, and use sux0r together.

sux0r 2.0.6 is open source and is distributed under the GNU General Public License.

PHP5 GPL Get sux0r at SourceForge.net. Fast, secure and Free Open Source software downloads

Volunteer Trial at Heriot-Watt University

Login: Please Login, using the user name and password provided.

View Feeds: Once logged in, please click on Feeds to display your Journals. The default will display is all articles from your journals is date order. You may find it easier to click on the journal title in the left collumn to display articles only from that journal.

Rate Articles: Below each article is a drop down menu which allows you to rate the article as "Interesting" or "Not Interesting". Please only rate articles that really are of interest or those really not of interest. Leave articles that are semi-interesting untouched.

Train Other Articles: It may be the case that there are very few articles of interest to you in the current issues of the journals you have selected. To train the system about your interests you can submit the abstracts of articles that you have either written, cited or that are of particular importance to your research for training. Do do this click on your Username (top right). Then click on edit bayesian. Under the Documents heading paste the title and abstract of these articles into the text area. Ensure the drop down menu is set to "Interesting" and click "Train".

Important Dates:
By 28th August: Initial meeting should have taken place and you will have received training in how to rate articles and train documents.
Throughout September: Rating of articles and training of documents. This will allow the sytem to learn your interests.
25th September: Training will cease and you will take a break from using the system to allow new articles to build up.
By 30th October: A second meeting will have taken place. This meeting will determine how successful the system has been at filtering articles based on your interests.

Important Note: This trial is part of a research project which will determine the success of the methodology, it is not a usability test of the user interface.


Bayesian Feed Filtering News

Unless otherwise specified, contents of this site are copyright by the contributors and available under the
Creative Commons Attribution 3.0. Contributors should be attributed by full name or nickname.