Tuesday, December 28, 2010

BioJava 3.0 released

Today we released BioJava 3.0. It is available from http://biojava.org/wiki/BioJava:Download.
Over the last year BioJava has undergone a major re-write. It has been modularized into small, re-usable components and a number of new features have been added. The new approach, modeled after the apache commons, minimizes dependencies and allows for easier contribution of new components.

At the present the main modules are:
biojava3-core: The core module offers the basic tools required for working with biological sequences of various types (DNA, RNA, protein). Besides file parsers for popular file formats it provides efficient data structures for sequence manipulation and serialization.
biojava3-genome: The genome module provides support for reading and writing of gtf, gff2, gff3 file formats
biojava3-alignment: This module provides implementations for pairwise and multiple sequence alignments (MSA). The implementation for MSA provides a flexible and multi-threaded framework that works in linear space and that, as an option, allows the users to define anchors that are used in the build up of the multiple alignment.
biojava3-structure: The 3D protein structure module provides parsers and a data model for working PDB and mmCif files. New features in this release are the implementation of the CE and FATCAT structural alignment algorithms and the support of chemical component definition files, for a chemically and biologically correct representation of modified residues and ligands.
biojava3-protmod: The protein modification module can detect more than 200 protein modifications and crosslinks in 3D protein structures. It comes with an XML file and Java data structures to store information about different types of protein modifications collected from PDB, RESID, and PSI-MOD.
Not every feature of the BioJava 1.X code base was migrated over to BioJava 3.0. A modularized version of the 1.X sources is available as a new "biojava-legacy" project.

Monday, December 13, 2010

New Data Drilldown options

One of my favorite features that was recently added to the RCSB PDB site is the drilldown of search results. In this release an extension to this was added. It is now possible to drill down through EC numbers as well as through the SCOP classification.

A useful trick, that some people might not have noticed yet is that the drilldown is available for the whole set of PDB entries. By clicking on the total number of entries on top of the page one can access this faceted browsing interface over the whole database. The screenshot below shows where you have to click to access this feature.



After clicking on the total number of entries here is the drilldown for the whole of PDB:

Saturday, December 11, 2010

Personal Structure Annotations at RCSB PDB

The latest RCSB PDB release provides the possibility of attaching personal annotations to PDB entries. If you have been using the iPhone application, you might have noticed that this feature has been introduced  there already a few weeks ago. Now you can also annotate your favorite proteins directly at the RCSB PDB website.

How does this work? If you are not logged into myPDB, and you view the details of an entry on the Structure Summary page you will see something like this:


Before you can create an annotation you need to get a MyPDB account and log in. This is possible on the left-hand menu in the MyPDB box:


After logging in you can tag and annotate every entry:

I am using this tool to keep comments and notes on various PDB entries. For the future it would be nice to be able to share those notes with some of my friends or students. Another nice feature for the future would be to be able to attach "positional" features in order to e.g. annotate active sites or domain boundaries.