Saturday, December 10, 2011

RCSB PDB Fall 2011 web site update

Another major website update has been released for the RCSB web site. One of the main new features is Peter's new visualisation of surfaces with the Protein Workshop viewer. The calculation of surfaces is extremely fast and the resulting images look amazing. Here the capsid of a rather large virus (PBCV-1) - PDB ID 1M4X.

To visualize the surfaces, simply launch Protein Workshop from the viewer menu on the Structure Summary page.

Once the viewer has downloaded and installed itself, turn the surface on by dragging the Surfaces handle from Off to Opaque.

There are quite a number of display options, for a more detailed descriptions of all display options view here

A list of all the new features of the Fall 2011 release, is available on the "what's new page".

Saturday, November 19, 2011

Google Scholar Citations Open To All

This week Google Scholar citations opened up to the public. After the initial release which was providing limited access only, the latest versions is now open for all scientists. What is interesting is that they seem to have created profiles for many scientists in an automated way. While the transparency of this is amazing, I can also imagine that there will be a push-back about privacy issues.

You can check out e.g who are the most cited scientists in Computational Biology or scientists at UCSD or check out Author profiles (e.g. yours truly). At the moment those lists are sorted by number of citations. Would be nice to have a few more sort fields like the already calculated h-index. Another missing feature is an API. Would be nice to be able to script their database and create custom reports.

I believe Google Scholar is setting a new standard for tracking success of scientists and is a serious threat to the ISI web of knowledge.

Saturday, October 22, 2011

Google Summer of Code Mentor Summit 2011

This weekend I am spending at the Google Campus for the Google Summer of Code (GSoC) Mentor summit 2011. Just as in the previous years this is organized as an Unconference, a self-organizing conference without a prepared program. Instead the conference starts with the participants developing the program together. This year is the largest Mentor Summit ever and we are about 100 people over capacity (In total we are more than 300).

Hands up, "Who has edited a Wikipedia page?" almost every hand goes up. 

Some GSoC 2011 stats: 

This year Google accepted

48 new organisations 175 total orgs
1115 students
There were over 2k mentors
Students from 68 countries

Session Planning
If people want to organize sessions they have 30 seconds to say what topic they want to propose and write the title on a post-it note. The note will be put on a white board. Then people will vote on what sessions they want to attend. Finally the notes on the board are arranged according how the crowd likes.

After two minutes of planning a line is starting to form and many people are excited to propose sessions. Since so many people are proposing sessions the recommendation is to propose only one session per person.

Small selection of topics:
(too many to mention them all)

Wikipedia integration sucks.
How to build a million dollar computing infrastructure for free.
Open source (OS) gaming
How to pass the 5 developer barrier
OS for international development
LibreOffice: How to revive a project
Open Source for open science
How to build your own internet
GSoC What worked what did not work and how to improve next year.
Creating a cover archive for music
How to integrate semantic web into scientific applications
Visualisation of biological networks
Organizing the effort for documentation
How to get students to become long term contributors
Programmer oriented web semantics
Forming a non profit, how to fundraise for it
How to do telekinetic control of user interfaces
Refactoring the music industry
What to do if you have 100+ sub-modules
Community building
How to make the experience more local
How geeks give away their power with body language
How to get an keep female developers
Open source in Asia
Aging project infrastructure
OS in higher education
How goverments can sponsor open source projects
Contribute to Openstreetmap

Detailed Program

Some Pictures

I like the built-in power plugs in the meeting room tables

The Google Tyrannosaurus

Session - Integrating  Creative Commons content

Demo: software to create music videos from Flickr.
Problem: nobody can get attributions right.

Flickr has 200 million CC images. However they are not all using the same variation of CC. Different types of CC are not compatible with each other.

now commenting on Etherpad:

Open Science Session

Your Wikipedia Integration Sucks


Cycling at the Google campus

Non-Profit Infrastructure for Software Freedom

Fundraising 101 ("Free as in Freedom So Who Pays for the Beer?")
- plan
- don't depend on only one sponsor
- types of sponsors: indivduals, small businesses in community, businesses that use your software, local, foundations
-per-feature fundraising (-sometimes, be careful)
- What does a sponsor want to buy? - don't sell your email list. not everything is for sale. Virtue, in some cases forgiveness. Hiring
- And what are you prepared to sell? - ads on website, sponsored emails. Non-profits might have tax issues with certain types of ads. (depending on how they look)
- How to find sponsors. Ask everybody, look at your logs, use your mailing lists, be create, brave, hear no, be tenacious.
- Prepare your elevator pitch. Can you explain the benefit of your project in one sentence to an eight year old. Include your contact information. Make it easy to help. For events: plan and be realistic.
- Ask for help and offer help.
- Build a relationship, they might be able to connect you with somebody else who can sponsor.
- Make it easy to contribute, have a "donate now" button. Accept credit card, send thank you email.
- Enable Microdonations
- How to get graphic help. You need to go to where the artists are.

At lunch: a drone, remote controlled from a phone:

Walking around the campus

How to build your own Internet using open wireless

Thursday, September 15, 2011

RCSB PDB web site September update

This week the latest update to the RCSB PDB web site went life. One of the major new features in this release is a new search interface. Alex has redesigned the top bar search box:

If you enter a search term, the new auto-suggest box provides suggestions what you might mean and allows to trigger precise searches, which are powered by an efficient new lookup mechanism in the background (written by Dimitris).

If you are not happy with the suggestions that you see (I can hardly imagine that ;-), you can still press enter and do the full text search across the PDB, which has been there already before.

Another major new feature of this release are the improvements to the PDB101, the educational section of the site. Proteins that are described in more detail as part of the Molecule of the Month articles, now have a Discussed Structure page. A particular nice detail that I want to point out is Greg's new 3D molecular viewer that works on iPhones and iPads. These gadgets usually can't display Jmol, our standard 3D viewer, due to the lack of Java support.  To work around this limitation there is now a new HTML5-sprite base animation. If you access a Discussed Structure page with a mobile device this is displayed and can be rotated left-right by "moving" it with your finger. Other improvements on the educational section are that the molecule of the month articles now show up in search results and you can download an article onto your phone. 

I will describe more new features some other time..

Tuesday, August 2, 2011

Tabular Report Web Services at the RCSB PDB

I have fallen behind on describing new features of the RCSB PDB web site... Let me catch up! One of the features that I find useful for people who want to script the site is that each of the queries on the site (and there are many) can be represented by simple XML.

How can you find that XML? Once you have run a query, you can access your query history from the left-hand menu. (We might actually move this to the top in the next release to make this more visible, but that's a different story.) Under the Query Details menu you can find the XML.

Here is an example XML for a query. This one below would simply list all current protein structures.

So what can you do with this XML?  You can POST this using the Search Web Services and get back a list of matching PDB IDs for this query. This functionality has been available for a while. As a new feature of the current RCSB PDB website release, Chunxiao extended this service to allow Custom Reports in a few different file formats.

Custom Reports

I have mentioned these reports already in a previous blog entry. On the Web site these reports allow to obtain Image Collages,  pre-defined reports containing various fields, and exports to Excel to mention just a few of the available options.

As part of the new Web service  these reports allow to fetch various fields via XML, comma separated file, or as an Excel file. There are step by step instructions available for how to use this new feature. Happy scripting!

Tuesday, July 26, 2011

Google Scholar Citations - Google Scholar Blog

Google is preparing the release of a new feature at Google Scholar - personalized user profiles. They can be used to track your citations and as a reporting tool for calculating the h-index. The access can be kept private or made public. I believe this can be a serious threat to the ISI Web of Science. Can't wait until I can get access to "My Citations" and play around with this new feature!

Read more at: Google Scholar Citations - Google Scholar Blog

UPDATE: In the meanwhile I got access to the new profile feature. The secret is to click on the "MyCitations" tab in Scholar every now and then, until it works. At the moment it found most, but not all of my papers, with some highly cited ones missing.  I will make my profile publicly available, as soon as I have figured out how to add the missing articles.

UPDATE 2: In the meanwhile I figured out how to find the missing articles. Scholar is using groups of citations. In my case my family name with the special character ć lead to two groups of citations, one for an author with and without the accent on the name. Using the import functionality those groups could get merged and my profile is now available from

Tuesday, May 3, 2011

PDB101 a new view for educational users

 PDB-101 is a new view of the RCSB PDB molecule of the months that is dedicated for educational users. It groups the molecules of the months into categories that are easy to access. The categories have been chosen so they represent important aspects of a structural view of biology.

Read more about PDB101 at

Wednesday, March 23, 2011

BioJava at Google Summer of Code 2011

The Open Bioinformatics Foundation has again been accepted as a mentoring organization for this year's Google Summer of Code. This means we will be able to offer mentoring through  BioJava again this year. Accepted students will get a stipend of 5,000$ from Google. Participation is possible from most countries in the world, as long as you are eligible to work in the country in which you'll reside throughout the duration of the program.

If you are interested in working on a BioJava related project, now is the time to start preparing and discussing your proposals. Last year we had many applications for the projects proposed by mentors. If you want to distinguish your application I recommend to propose your own  project. Don't forget to discuss any proposal with us before you submit them. We will try to provide feedback and match you with a suitable Mentor.

Also see and Google's

The student application deadline is April 8th. Google will announce which proposals got accepted on April 25th.

Wednesday, March 9, 2011

Quality assurance for the query and distribution systems of the RCSB Protein Data Bank

The RCSB PDB web site serves around 165 000 unique visitors per month. Did you ever wonder how a site like this can provide reliable service? Our latest publication Quality assurance for the query and distribution systems of the RCSB Protein Data Bank, describes some of the principles that we apply to ensure quality.