Tuesday, December 29, 2009

BioJava Hackaton

The BioJava Hackaton will be held from from Jan. 19th-22nd at the Genome Campus, Hinxton, Cambridge, UK.

Our goals are:
- Further development of the BioJava modules. For module specific goals see the Modules pages on the BioJava wiki.
- Anything the module leaders deem necessary for their modules.
- Add new modules
- Finalizing BioJava 3.0

Tuesday, September 1, 2009

An easy way to set up a web site (using Java, Maven and Eclipse)

Download the following open source software:

* Tomcat 6 from http://tomcat.apache.org/download-60.cgi
unpack it in a local directory. (I got the Core tar.gz one)

* Eclipse (the JEE edition) from http://www.eclipse.org/downloads/
launch Eclipse

In Eclipse:

* Install the Maven (m2eclipse) plugin for eclipse by adding


as a new site under Help->Install New Software-> Add

Select the the checkboxes of the modules you want to install.

Create a new Eclipse project

* Select File-> New -> Other -> Maven Project

leave project name and location to the default values

as a project Archetype select maven-archetype-webapp

as a group ID select the name of your organisation e.g. org.biojava

Artifact ID is the name of your module. E.g.: myweb (this will also be in your servlet URL)

This will create the following servlet in your workspace.

Deploy the project on Tomcat

* Add a new Tomcat 6 server : Select File-> New -> Other -> Server

* select Apache-> Tomcat 6

* point
it to the location at which you upacked Tomcat

* add your servlet to it and press Finish

* Start your server buy either clicking the green button, or right click on your servlet project and say Run As -> Run on Server

* Tomcat will start and the servlet is running

Add new jsp pages and start the development of your servlet!

Finally: If you want to give the servlet to your sysadmin to release on a production site, just do a Right click on the project and do Run As -> Maven package . This will create a myweb.war in the target subdirectory of your project that can be deployed in stand alone tomcat instances outside of eclipse.

Tuesday, August 25, 2009

Hydrogen bonds in proteins

Hydrogen bonds play an important part in determining the 3D structure of proteins. The formation of H-bonds contributes to the stability of a protein. They provide a characteristic pattern in secondary structure elements that can be used for secondary structure assignment.

A hydrogen bond is formed when two electonegative atoms compete for the same hydrogen atom. While the hydrogen is covalently bound to one of the atoms, the donor, it interacts with the other atom, the acceptor. The possibility to interact with one atom, while covalently bound to another, is special for hydrogens. The electrostatic and covalent aspects of the bond cause the three atoms to stay mostly linear.

H-bonds in proteins most frequently involve the C=O and N-H groups of amino acids in the polypeptide chain. A single oxygen can simultaneously participate in two bonds.

Here again some example code that can calculate the energy of an H-bond between the N-H and C=O atoms of two amino acids using BioJava. It assumes that the coordinates for H-bonds were calculated as described in one of my earlier postings.

/** calculate HBond energy of two groups in cal/mol ...
* see Creighton page 147 f
* Jeffrey, George A., An introduction to hydrogen bonding, Oxford University Press, 1997.
* categorizes hbonds with donor-acceptor distances of
* 2.2-2.5 Å as "strong, mostly covalent",
* 2.5-3.2 Å as "moderate, mostly electrostatic",
* 3.2-4.0 Å as "weak, electrostatic".
* Energies are given as 40-14, 15-4, and <4 kcal/mol respectively.

public double calculateHBondEnergy(AminoAcid one, AminoAcid two )
throws StructureException{

//System.out.println("calcHBondEnergy" + one + "|" + two);

Atom N = one.getN();
Atom H = one.getH();

Atom O = two.getO();
Atom C = two.getC();

double dno = Calc.getDistance(O,N);
double dhc = Calc.getDistance(C,H);
double dho = Calc.getDistance(O,H);
double dnc = Calc.getDistance(C,N);

double contact = MINDIST ;

// there seems to be a contact!
if ( (dno < contact) || (dhc < contact) || (dnc < contact) || (dno < contact)) {
//System.out.println("!!! contact " + one + " " + two);

double e1 = Q / dho - Q / dhc ;
double e2 = Q / dnc - Q / dno ;

double energy = e1 + e2;

// bond too weak
if ( energy > HBONDHIGHENERGY)
return 0;

// test to avoid bond too strong
if ( energy > HBONDLOWENERGY)
return energy;



Tuesday, August 18, 2009

Monday, August 17, 2009

BioJava plans for the next months

Posted this to the BioJava mailing lists:

Here a quick summary of what I propose to be our action plan for the next months for BioJava:

* I would like to call for a code-freeze in 2 weeks (or so) in order to finalize the new modularized and mavenized version of biojava for the developers. The current developmental trunk will remain permanently frozen and all future work should continue at a new location in SVN. As such it will be important that all developers commit any changes they are working on before that.

* We will update the documentation for how to obtain a new mavenized checkout on the wiki.

* After the change the new modules need to be tested and if no major problems are found, the ok will be given to continue working on the new modules (at the new location)

* All developers should obtain a new checkout.

* We need to identify sub-module leaders who will take over leadership of the sub-modules.

In order to come up with a new release of biojava we should continue development on the new modules for a few months. Talking off list with Richard Holland it looks like we will have a hackaton in January in Cambridge, U.K. (details to be finalized and announced). I suggest that we use that opportunity to focus on further developing the modules and make a new public BioJava release shortly after that.

At the present I see the following topics that would be great to work on until and during the hackaton in order to prepare a shiny new version of BioJava for public release:

+ Work on standardizing the organization of the modules (tests, examples, source, docu etc.)
+ Add new modules
+ Improve existing modules
+ Anything the module leaders deem necessary for their modules.
+ Use OSGI for visualization related modules

I can post a more detailed and specific list of things to work on if people are interested.

Sunday, August 16, 2009

How to calculate H atoms for Nitrogens in protein structures

Only few protein structures contain coordinates for hydrogen (H) atoms. This is because X-ray crystallography usually does not resolve the positions of H atoms. They have only one electron and they barely scatter X-rays. Only a few structures have been determined with a high enough resolution to place the hydrogen atoms (< 1.2 Å). Some PDB files contain H atoms that have been modeled into the protein structure. Neutron Diffraction is a method that does allow to observe H atoms. An protein that has been determined with a combination of X-ray and Neutron Diffraction and that contains hydrogen atoms is PDB ID:3HGN .

If hydrogen atoms are required for calculations (e.g. to calculate hydrogen bond energies in protein structures) they need to be inferred from the available information. Here two possible ways to approximate positions for H-atoms that are bound to the Nitrogen (N) atom of an amino acid.

Let's start with having a look at the peptide plane. We can use vectors from the plane in order to calculate H atoms.

A simple way to infer the coordinates is by putting the H on the opposite side of the O atom that is bound to C. C coordinates are from amino acid i. N, Cα atoms from amino acid i+1, for which the H atom is being calculated. For the examples below we are using BioJava. It treats every atom as a vector. This means, if the code fragments below talk about atoms it is meant as a synonym for a vector in a 3D space.

Group a  = groups[i];
Group b  = groups[i+1];

// atom will be for group b.
Atom H = calcSimpleH(a.getC(),b.getN(),b.getCA());

/** Calculates the H atom for group B (i+1) */
private static Atom calcSimpleH(Atom c,Atom o, Atom n) throws StructureException{

Atom h = Calc.substract(c,o);
     double dist = Calc.getDistance(c,o);
     double x = n.getX() + h.getX() / dist;
     double y = n.getY() + h.getY() / dist;
     double z = n.getZ() + h.getZ() / dist;


     return h;

One issue with this first approach is that H is not necessarily directly opposite of the C-O.

An alternative method is shown below:

Use the unit vectors NC and NCα to get the direction of the vector and substract it from N.

Group a  = groups[i];
Group b  = groups[i+1];
Atom H = calc_H(a.getC(),b.getN(),b.getCA());

* Calculates the H atom for group B (i+1)
private static Atom calc_H(Atom C, Atom N, Atom CA) throws StructureException

     Atom nc  = Calc.substract(N,C);
     Atom nca = Calc.substract(N,CA);

     Atom u_nc  = Calc.unitVector(nc)   ;
     Atom u_nca = Calc.unitVector(nca);

     Atom added = Calc.substract(u_nc,u_nca);

     //(According to Creighton the distance N-H is 1.03 +/- 0.02 Å.)
     Atom U = Calc.unitVector(added);

     Atom H = Calc.add(N,U);

     return H;

If you are doing your own implementation of this I recommend to add a check that the distance N-H is really 1 Angstrom. Additionally write out a modified PDB file and use a 3D viewer to verify that the atom is placed on the correct side of the N.