Friday, January 15, 2010

All vs All Structure Alignments in PDB

Proteins can have various degrees of similarity. If two proteins show high similarity in their amino acid sequence, it is generally assumed that they are closely evolutionary related. With increasing evolutionary distance the degree of similarity usually drops, but proteins can still show similar function and have an overall similar 3D structure, even if the sequence similarity is low. The detection of such remote similarities is important in order to infer functional and evolutionary relationships between protein families and is a core technique used in structural bioinformatics.

For the RCSB-PDB web site I have recently been working on a new all against all comparison of all protein chains. While protein sequence comparisons can be computed quickly, the calculation of protein structure alignments is much more time consuming. So far we were computing about 140 mio. pairwise alignments in ~100.000 CPU hours on the Open Science Grid (OSG). With the help of Chris Bizon we could easily deploy our code there and I can highly recommend giving the OSG a try also for other scientists. A technical report about how we computed about 140 mio. pairwise alignments in ~100.000 CPU hours is available from here: