Tuesday, August 2, 2011

Tabular Report Web Services at the RCSB PDB

I have fallen behind on describing new features of the RCSB PDB web site... Let me catch up! One of the features that I find useful for people who want to script the site is that each of the queries on the site (and there are many) can be represented by simple XML.

How can you find that XML? Once you have run a query, you can access your query history from the left-hand menu. (We might actually move this to the top in the next release to make this more visible, but that's a different story.) Under the Query Details menu you can find the XML.

Here is an example XML for a query. This one below would simply list all current protein structures.

So what can you do with this XML?  You can POST this using the Search Web Services and get back a list of matching PDB IDs for this query. This functionality has been available for a while. As a new feature of the current RCSB PDB website release, Chunxiao extended this service to allow Custom Reports in a few different file formats.

Custom Reports

I have mentioned these reports already in a previous blog entry. On the Web site these reports allow to obtain Image Collages,  pre-defined reports containing various fields, and exports to Excel to mention just a few of the available options.

As part of the new Web service  these reports allow to fetch various fields via XML, comma separated file, or as an Excel file. There are step by step instructions available for how to use this new feature. Happy scripting!


  1. This is great stuff andreas. Keep up the good work. What are you using to "power" these services ? Is it all Postgre/MySQL ?

  2. Hi J,

    thanks for the comment. Depending on which of the queries is being run, we are currently using a technology stack consisting of Memcached and Hibernate on top of MySql, as well as some custom in-memory caching for frequently used data. For more details check out http://www.ncbi.nlm.nih.gov/pubmed/21382834

  3. Hi, i want to extract the protein data and store into my MySql database..so plz guide me...how can this issue resolved?

  4. I recommend downloading a local copy of the PDB files and using BioPython or BioJava to parse the files. Then you can load the data you want to work with into mysql...

    1. Thanks Andreas for the quick response.But i want to extract protein sequence data from databases like Uniprot,NCBI etc., and store into my MySql database.So to extract protein sequence information from various databases,what i have to do??
      Thank you:)

    2. Did you take a look at the documentation pages for BioPython or BioJava? Check out Biopython-SeqIO or the BioJava cookbook pages...