Protein Sequence and Structure Bioinformatics: Hackathon 2010

Showing posts with label Hackathon 2010. Show all posts

Friday, January 22, 2010

BioJava Hackathon - Last Day

Today was the last day of the BioJava Hackathon. It has been an exciting week and we made progress along several lines, which I will talk about in a moment. Special thanks go to Jonathan Warren for organizing the meeting room at the Sanger Institute. Also thanks to our hackers without who this hackathon would not have been possible. In particular thanks to Scooter Willis, Jules Jacobsen, Andy Yates, Jonathan Warren, Christoph Gille, Matias Piipari for participating during the week and to our special guests who joined us for a day, Richard Holland and Jim Procter.

All the code that has been written is available through the new modules labeled with the biojava3 name. Most work was related to the new sequence and protein structure modules:

Sequence modules

There have been a lot of discussions about the current way sequences are represented over the last years. As such the "sequence guys" among the developers were working on coming up with a new design which is providing a biological meaningful (think central dogma) representation of sequences. What is still missing are file parsers using the new modules. The first fasta parser is about to be committed by Scooter as I am writing this. There is still more work required before the code will be ready for the next release. Still this is the beginning of a new data representation which should make the code base ready for the next couple of years.

Structure modules

The protein structure modules are the BioJava3-part which is closest to be released. During this week we have added the CE algorithm for protein structure alignment, implemented core interfaces for a generic Model View Control wrapping of various 3D visualization tools, we added better support for chemically modified residues (like MSE) and natural ones like Selenocysteine. They are treated now as amino acids. We also re-factored the code base to have the structure data model clearly separated from the new graphical user interfaces. This gui module now provides a nice way for calculating and visualizing protein structure alignments.

Next BioJava release (3.0)

There is still more work required to push the new sequence module to a state where it can be released. We also did not write any documentation this week, so that will have to be added later on. We will try to bring up the modules to a state where they can be released over the next weeks. Once a module is release ready a detailed summary of the new features will be posted to the mailing list. In any case there will be a BioJava 3.0 release in time for the ISMB/BOSC conference as we have been doing during the last years.

Wednesday, January 20, 2010

BioJava Hackathon - Day 3 - Structure Modules

Today the main new feature in the structure modules is the release of a Java port of the Combinatorial Extension (CE) algorithm. This contains both a version of the algorithm that can be run from command line, as well as a GUI to view the results and trigger custom alignments. Essentially this is what is available from the RCSB website from: http://www.rcsb.org/pdb/workbench/workbench.do

About the generic design for Model View Control for 3D viewers, an unsolved problem is currently how to deal with selections. Selecting ranges, chains or atoms in proteins is done using a scripting interface at PyMol or Jmol. Shall we have a scripting interface (based on the syntax of one of these) or shall we have multiple select methods that accept various arguments? Jules Jacobsen wrapped the Jmol-Biojava interface using the new interface definitions for the MVC.

Tuesday, January 19, 2010

BioJava Hackathon - Day 2

Yesterday's contributor who added most lines of code is Michael Heuer, who is joining the hackathon from remote (i.e. somewhere in the US). He added the new FASTQ parser to BioJava. Well done Michael!

During the morning session we did a "Post Up", a silent and structured way of doing brainstorming. This was in order to come up with a new requirement how to do some state of the art pushing on the sequence modules. Scooter moderated a discussion where we focused on biologically meaningful representations of biological sequences. A Chemical Compound will be at the core of any sequence representation and we want to have different types of sequences like Chromosome sequence, Scaffold, DNA, RNA, Protein, and Sugars.

We started with test-driven development for the new sequence interfaces and then we will wrap the existing sequence code with the new interfaces. Here you can see us during the brainstorming session:

On the 3D structure side of things, we added a new 3D structure-gui module that is going to provide the Model View Control interface for the various open source viewers.

Monday, January 18, 2010

BioJava Hackathon - Day 1 part 2

Continuation of Day 1...

We had more discussion about how to deal with the sequence modules, bytecode dependencies of the core module and related topics. Seems there is a general agreement about moving the current sequence code out of the core module into its own space. Will continue tomorrow morning, when Richard Holland is back.

On a different side of things, Christoph Gille, Jules Jacobsen and I were discussing how to provide a Model View Control interface for using various open source 3D visualization libraries (Jmol, RCSB Libraries, Astex Viewer) together with Biojava.

We spent a lot of time discussing today, hope to be able to get more code done tomorrow.

BioJava Hackathon - Day 1

Hi,

I am going to blog every day about the BioJava Hackathon, so you can stay updated with what is happening here in Cambridge.

In the morning I gave this presentation around which we had several discussions about what are the most critical issues we want to solve. The issues are:

Installation problems. Getting the latest checkout of the new Maven based build system causes problems for some of us. Sorting our the installation procedure is a major topic of the afternoon. It works successfully with the latest Eclipse, the m2eclipse plugin and subclipse plugin. Some of the NetBeans based developers also reported no problems during installations.
Features. The Biojava features should become a first class citizen. This means it should be possible to instantiate them independently of sequence objects.
Simplify Sequences: Sequences should be Strings as far as possible. Only convert them to Sequence objects if required.
Some of the BioJava 3 docu is not up to date and can lead to misunderstandings. The latest BioJava 3 code is available in the trunk
Memory efficiency: Make sure that iterating over RichSequences is memory efficient. (Fix a memory leak there)
Bytecode: The Biojava - core module should not require the Bytecode module.

Andy Yates is tweeting about it at http://twitter.com/search?q=%23biojava

Saturday, January 16, 2010

BioJava Hackathon 2010

I am off to Cambridge, U.K. where we will have the BioJava Hackathon next week. I am planning to blog on a regular basis about what is going on there.

Protein Sequence and Structure Bioinformatics