Thursday, March 29, 2012

PLoS comp biol goes Wikipedia

Wikipedia  has become a primary resource of information for many students when looking up basic information. However there is an interesting gap between the scientific community and the people who are regularly contributing to wikipedia articles. There are only few prominent scientists who are regulars, such as the Pfam authors who recently integrated wikipedia into the Xfam series of databases. Another major science related project on wikipedia with about ten thousand articles describing various genes is GeneWiki, lead by Andrew Su. A possible reason for this difference in communities might be the lack of acceptance as academic publishing for wikipedia articles. As of today PLoS comp biol tries to resolve this disparity by publishing a new type of manuscripts, Topic Pages.

Topic Pages are designed to provide review style articles. These articles serve as a copy of reference, that can be cited and will show up in Pubmed. It will also be released at wikipedia where a living copy of the document can be edited and updated by the wider public. This is done in collaboration with the wikiproject computational biology.

How does this work? In short, an article is first submitted to PLoS where it is peer reviewed and upon acceptance it will be published by PLoS comp biol as well as uploaded to wikipedia. While this sounds rather straightforward, one of the issues with this approach is around licensing.

PLoS is publishing all articles under a very liberal license, the Creative Commons Attribution License.  This means, you can do with the article what you want, even change the license, as long as you credit the original sources. This license is in fact more liberal than the wikipedia license, which is Creative Commons Attribution Share Alike. This means we can take a PLoS comp biol article and publish it on wikipedia, as long as we cite the original source of the text, but we can not do this in the opposite direction.

In order to avoid any licensing conflicts, Spencer Bliven set up a custom Mediawiki instance with the liberal PLoS style license. It can be found at http://topicpages.ploscompbiol.org/.  By drafting the manuscript there we are able to transfer the content easily over to both PLoS and wikipedia, once it has passed the PLoS review process. Besides this, also the review process is transparent and you can see what the referees commented on our article at the talk page of the article (both at wikipedia and the topicpages sites)

Our  latest paper is the first such Topic Page. It provides a review on Circular Permutations in proteins, a type of relationship in proteins, whereby the proteins have a changed order of amino acids in their protein sequence while their 3D shape remains very similar.



For more information read the full article at plos ( doi:10.1371/journal.pcbi.1002445 ) or take a look at the latest version of this at wikipedia. Also read the PLoS comp biol editorial, announcing the Topic Pages


Sunday, March 18, 2012

GSoC 2012 - how to get started with a proposal

To get started with a proposal I would recommend to look at the BioJava
project proposals from the last two years (and here) and
see what kind of projects got funded and how those proposals were
written. Think about what you would like to work on. Get a copy of
BioJava and see how related features are working. Come up with a plan
on how to extend this.

We are fairly flexible regarding what kind of projects we will run
this summer and this really depends on the submitted project
proposals. All proposals will be compared and ranked together with
other projects from the Bio* projects. As such a good proposal is key
to get funded.

A good proposals shows

- the motivation of the student
- that the candidate is qualified to do what he is proposing
- adds useful new functionality to BioJava
- discusses possible risks and what to do about them

It is difficult to answer questions like "how should I perform this or
that project?" - There are more than one possible path and it depends
on your skills and interest what will be the best answer for this.
Overall I recommend to pick a project on a topic that is close to your
(future?)  thesis, or is of particular interest for you.

Here a couple of more thoughts which are project specific:

-  The best projects are those that you come up with yourself. If you
want to distinguish yours from every other proposal, suggest something
which is not on our list.

- File parsers:

if you want to work on file parsers take a look at existing ones. What
features do they provide? How can they be extended? For example if you
want to work on the CATH parser, take a look at how the SCOP parser
works. What features are available around this (access to domains) and
how can something like this be set up for CATH. Look at how the CATH
website provides files.

- Porting of algorithms:

There are several approaches possible for doing this. I recommend that
you should have some background both in C and in Java for this. Get a
copy of the algorithm you want to port, compile it, and take a look at
the source. There are several ways how to proceed for the actual port
and having a good strategy for this is key for this proposal. Perhaps
try to use your strategy on some simple test case to see how this
might work.

- BioJava in the cloud

The goal here is parallelization of existing code. What parts of
biojava are suitable for this? How can they be parallelized and moved
to current cloud infrastructure? There is a lot of online material
available for this which will be helpful here.

Friday, March 16, 2012

BioJava at at Google Summer of Code 2012

The Open Bioinformatics foundation as an umbrella organisation for
BioJava has been accepted to participate in this year's Google Summer
of Code. 



This means we will again be able to offer mentoring through BioJava
this year. Accepted students will get a stipend of 5,000$ from Google.
Participation is possible from most countries in the world, as long as
you are eligible to work in the country in which you'll reside
throughout the duration of the program.

If you are interested in working on a BioJava related project, now is
the time to start preparing and discussing your proposals. For the
last two years we had many applications for the projects proposed by
mentors. If you want to distinguish your application I recommend to
propose your own  project. Don't forget to discuss any proposal with
us before you submit them. We will try to provide feedback and match
you with a suitable Mentor.

Also see http://biojava.org/wiki/Google_Summer_of_Code and Google's
FAQs: http://www.google-melange.com/document/show/gsoc_program/google/gsoc2012/faqs

The student application deadline is April 6th. Google will announce
which proposals got accepted on April 23rd.

BioJava 3.0.3 released

BioJava 3.0.3 has been released and is available from
http://www.biojava.org/wiki/BioJava:Download as well as from the
BioJava maven repository at http://www.biojava.org/download/maven/ .



New Features

BioJava 3.0.3 adds several new features

- Significant improvements for the web service module (ncbi blast and
hmmer web services)

- Fastq parser (ported from the biojava 1 series to version 3)

- Support for SIFTS-PDB to UniProt mapping

- Improved support for working with external protein domain definitions

- Protmod module renamed to modfinder

- Numerous improvements all over the place (several hundred commits
since last release)

- We are also working on an update for the legacy biojava 1.8 series.

This release would not have been possible with contributions from
numerous people, thanks to all for their support!

Happy BioJava-ing!