Protein Sequence and Structure Bioinformatics: How to calculate H atoms for Nitrogens in protein structures

Only few protein structures contain coordinates for hydrogen (H) atoms. This is because X-ray crystallography usually does not resolve the positions of H atoms. They have only one electron and they barely scatter X-rays. Only a few structures have been determined with a high enough resolution to place the hydrogen atoms (< 1.2 Å). Some PDB files contain H atoms that have been modeled into the protein structure. Neutron Diffraction is a method that does allow to observe H atoms. An protein that has been determined with a combination of X-ray and Neutron Diffraction and that contains hydrogen atoms is PDB ID:3HGN .

If hydrogen atoms are required for calculations (e.g. to calculate hydrogen bond energies in protein structures) they need to be inferred from the available information. Here two possible ways to approximate positions for H-atoms that are bound to the Nitrogen (N) atom of an amino acid.

Let's start with having a look at the peptide plane. We can use vectors from the plane in order to calculate H atoms.

A simple way to infer the coordinates is by putting the H on the opposite side of the O atom that is bound to C. C coordinates are from amino acid i. N, Cα atoms from amino acid i+1, for which the H atom is being calculated. For the examples below we are using BioJava. It treats every atom as a vector. This means, if the code fragments below talk about atoms it is meant as a synonym for a vector in a 3D space.


Group a  = groups[i];
Group b  = groups[i+1];

// atom will be for group b.
Atom H = calcSimpleH(a.getC(),b.getN(),b.getCA());

/** Calculates the H atom for group B (i+1) */
private static Atom calcSimpleH(Atom c,Atom o, Atom n) throws StructureException{

Atom h = Calc.substract(c,o);
     double dist = Calc.getDistance(c,o);
     //System.out.println(dist);
     double x = n.getX() + h.getX() / dist;
     double y = n.getY() + h.getY() / dist;
     double z = n.getZ() + h.getZ() / dist;

     h.setX(x);
     h.setY(y);
     h.setZ(z);

     return h;
}

One issue with this first approach is that H is not necessarily directly opposite of the C-O.

An alternative method is shown below:

Use the unit vectors NC and NCα to get the direction of the vector and substract it from N.



Group a  = groups[i];
Group b  = groups[i+1];
Atom H = calc_H(a.getC(),b.getN(),b.getCA());


/**
* Calculates the H atom for group B (i+1)
*/
private static Atom calc_H(Atom C, Atom N, Atom CA) throws StructureException
{

     Atom nc  = Calc.substract(N,C);
     Atom nca = Calc.substract(N,CA);

     Atom u_nc  = Calc.unitVector(nc)   ;
     Atom u_nca = Calc.unitVector(nca);

     Atom added = Calc.substract(u_nc,u_nca);

     //(According to Creighton the distance N-H is 1.03 +/- 0.02 Å.)
     Atom U = Calc.unitVector(added);

     Atom H = Calc.add(N,U);

     return H;
}

If you are doing your own implementation of this I recommend to add a check that the distance N-H is really 1 Angstrom. Additionally write out a modified PDB file and use a 3D viewer to verify that the atom is placed on the correct side of the N.

2 comments:

Bob HansonOctober 7, 2010 at 5:52 AM
Jmol uses the first of these (based on RasMol) for calculation of "hydrogen bond" dipole-dipole energies and, in Jmol 12, for DSSP analysis.

For the CALCULATE HYDROGEN command itself and quaternion frame "N" analysis, however, Jmol uses the second of these.

I realize the code says this, but it wasn't clear on first reading that this is the H position for group i+1, not group i.
AOctober 7, 2010 at 3:30 PM
Thanks for the comment. I modified the text above to make it more clear that the H is for group i+1.