Tuesday, June 1, 2010

biojava-structure now supports Chemical Component Dictionary

I have updated the BioJava structure data model to support the PDB
chemical component dictionary. This has the benefit that now

* Chemically modified amino acids can be detected (and treated as
amino acids, rather than Hetatom groups)
* It is possible to get a component type for each Group, which allows
to identify ligands.

As a consequence the nr. of amino acids in a chain can change compared
to the previous data representation. As such the loading of chem.
comps is set to "false" by default. It can be configure by the
"loadChemCompInfo" flag in the PDB/mmCIF file parsers.
PDB ID 1A4W - Thrombin with Thiazole-containing Inhibitors. Image source: RCSB PDB

An example where this representation makes a difference is PDB ID 1A4W. This structure contains several Ligands and a chemically modified residue. Without the help of the Chemical Component Dictionary it would have been difficult to correctly represent this protein.

You can get the code either from BioJava SVN, or from the (still slightly experimental) Maven repository at http://www.biojava.org/download/maven/ .