Stable Identifiers for WikiPathways

Today we did another milestone release of WikiPathways, already the 8th one in our release cycle process. The milestones have been coming steadily, roughly every four weeks. This one was slightly behind schedule, though this was to be expected as we saw some pretty heavy modifications.

So what’s new? The main new feature is “Stable Pathway Identifiers”. So that means that from now on, a pathway may be identified by e.g. “WP254” instead of “Apoptosis” While this may sound as exciting as a new cover sheet for TPS reports, this is in fact important groundwork for some interesting features in the future.

Stable identifiers?

As more and more people start linking to wikipathways, (e.g. MSIGDb as we discovered recently) it’s important to keep those links stable and reliable. The disadvantage of identifying pathways just by their names, as we did before, is that the risk is too high that you want to rename them.

You saw the same change happening with Wormbase a few years ago. Wormbase used biological names like “clk-2” or “rad-5”. The C. elegans people have a neat convention of 3-letter gene names plus a number. Clk stands for clock, a class of genes dealing with developmental timing, and rad stands for radiation sensitive. But there are always problems with this kinds of names, like genes being named based on assumptions that turn out to be wrong, or a gene being named independently by two research groups, (clk-2/rad-5 is an example of that actually). This creates all kinds of problems doing bioinformatics. So they introduced identifiers like WBGene00000537.

The problem with descriptive names is that they can be “wrong”. A non-descriptive name is arbitrary, so it’s never wrong, and there is never pressure to change it. We want to use that idea for pathways too.

By the way, old links to WikiPathways are still valid through mediawiki’s redirect mechanism. But the recommended way of referring to pathways is by making use of the new identifiers.

Tags: , ,

Comments are closed.