Internet, I has it

January 29th, 2008

I know there must have been a time before the internet happened but I just can’t understand how people managed to survive…

Ernie and Bert Rape your soul

pure genius!

openstreetmap rules

January 19th, 2008

Lately I have been telling everyone who wants to listen about openstreetmap, the initiative to make a wiki-style map of, well, basically everything. For about three years people have been using GPS recording devices to collect streets, one street at a time, as though it’s a big stamp collection or something. What I find totally amazing is that this actually works very well, and by now they’ve got a very decent map of all urban areas in Europe. This is absolutely a great example for the wiki way of doing things, and I see this as an encouragement for wikipathways.

I came across openstreetmap because I was looking at video’s of the Chaos Communication Congress 2007. The talk on openstreetmap is well worth the download.

Nobel Prize

October 12th, 2007

I think this is exciting news: Al Gore and the IPCC won the Nobel Peace Prize.  In my opinion there is no problem more urgent to humanity than climate change and it’s great that this is recognized by such a powerful institute.

Good Old Web 1.0

August 13th, 2007

All biological pathways in a cell are linked together and can interact with each other in unpredictable ways. When putting a pathway down on paper it’s sometimes hard to decide where exactly one pathway ends and the other begins. As such, borders between Pathways are nearly always completely arbitrary. My supervisor always says (jokingly) that a good rule of thumb is that a pathway should include only that which can fit on a single screen (and as monitor resolutions have increased in the past, so has the average pathway size).

I ran into the problem of pathway size when testing my Pathway Diff viewer. When you do a Diff you want to see the old and new version side by side, but that takes too much screen space. The solution I came up with is to view them in two panels with scrollbars, as in the screenshot below:

two-panel-scroller

The two panels scroll together and you can zoom them in and out together as well.

As this is all done in Java, it’s not much work to turn this into a little applet so you can view the pathway online, using good old-fashioned Web 1.0 technology.

All this has the additional advantage that by letting the applet render the image instead of relying on SVG, I don’t have to let the browser render the SVG. SVG rendering in firefox has turned out to be quite buggy.

Visual Pathway Diff: First Screenshots

July 11th, 2007

As the mid-term evaluation is underway, I reached a major milestone: My program can now output the difference set in SVG format, meaning that it is possible to compare Biological Pathways visually.

Here are two examples (click on the thumbnails to see a larger image)

Pathway Comparison A

In this screen shot you see two versions of the Acetyl Choline Pathway (very important for transmitting electrical signals between neurons). As you can see I’ve straightened out a few arrows in the new version (on the right). The yellow color indicates things that have changed, the text balloons in the middle explain in more detail which attributes of those things have changed.

Pathway Comparison B

This pathway represents the Alcohol Dehydrogenase reaction (activated when you have a hangover 😀 ). In the image you can see I’ve added a few missing reaction compounds (in green) as well as set the label for the other compounds (in yellow).

SVG output is only one of the possible output formats of GpmlDiff. Another option is to write to an XML-type format that I dubbed DGPML. DGPML is designed in such a way that it should be possible to write a patch utility that takes a GPML Pathway and then applies a DGPML difference set to it.

In the context of wikipathways, I think there are a couple of interesting use cases for a patch utility. I didn’t think about this when I wrote my GSOC proposal, but I think this would be very useful. So useful in fact that I’m going to ask my supervisor Alex Pico if he thinks I should put it in my plan for the second half of the summer.

Swinging towards Swing

July 1st, 2007

Pendulum

It is said that every problem in Computer Science can be solved by adding another layer of indirection. In this case we added two just to be sure 🙂

As I wrote before, Thomas’ project is to make PathVisio toolkit-independent so we can draw Pathways to an SWT Widget, a Swing JPanel or even an SVG file directly through the Graphics2D abstraction layer.

This work is now complete, the Pathway drawing component is now completely toolkit independent. In the screen shot below, you can see three different versions of PathVisio running side-by-side. In the top-left you see the old SWT version. In the bottom you see a Swing-only version, and in the right you see the new Graphics2D Pathway widget inside the old SWT GUI.

screenshot.png

In the latter case, the Pathway drawing is rendered with Graphics2D onto a Swing JPanel, which is then wrapped inside the SWT app using the SWT-AWT Bridge. Surprisingly, there is hardly any performance overhead.

But what’s the advantage? Using Thomas’ code I now have an easy way to visualize my Pathway difference sets to SVG. My original idea was to generate SVG directly using one of the XML writing libs in java, but now I can use Batik and the Graphics2D functions which is much easier. I just managed to get the first graphical output of my gpmldiff tool. It would have taken me at least a week longer if I had to write the SVG directly.

Progress update: XmlDiff

June 4th, 2007

My first target for Summer of Code is simple: to find a tool for extracting differences between two Pathways. Pathways are stored in gpml, an XML-based format. So my first idea was to look for an XmlDiff utility suitable for use in wikipathways. I spent some time reviewing all the various XmlDiff utilities out there. It turns out there are quite a few. And you know, if there are a dozen equivalent solutions for a certain problem, then you can bet that probably none of them are going to work for you.

Sequence Alignment Problem

So while reading more into what makes a good XmlDiff tool, I found some interesting whitepapers that caused me to rethink the problem. I now believe it would not be hard to write my own version of XmlDiff that is completely optimized for my specific problem, namely finding differences between Pathways.

With plain text, the differencing problem was solved nicely a long time ago with the famous diff utility. Basically it takes a text file line by line and considers if a line was either inserted, deleted or just changed. For this it implements the Longest Common Subsequence (LCS) algorithm.

(Incidentally, I learned that the LCS algorithm is exactly the same as the Needleman Wunsch algorithm, well known in Bioinformatics as an old solution to the Sequence Alignment problem (see image). Clearly, Nature edits DNA Sequences the same way as we edit text files: Insertions, Deletions, Mutations).

Now although XML is also text, it has a neat hierarchical structure that we would like to make use of. The line opeartions for diff don’t make much sense for XML. LCS also doesn’t cut it, so I had to read up on stuff like Nodeset correspondence and Tree correction algorithms.

So here is my main problem with all these XmlDiff tools: no two XML formats are the same. Gpml is not SVG, which is not XHTML. The idea behind XML is that you can use a standard set of tools for any type of document because they are structured in the same way, but in this case I think we’re better off with seperate tools for each type of document. Gpml is much simpler than the superset of all of XML. You can think of a whole bunch of operations that make sense on certain XML documents but not on Gpml:

  • Move a subtree up or down in the hierarchy (In gpml, all tags always live at the same level. <Pathway> is always at the root, <DataNode> is always just below <Pathway>)
  • Change the order of elements (In gpml, the order of elements does not matter)
  • Move a subelement of <DataNode> to a different <DataNode>. (Technically this could happen, but it just doesn’t make any sense)

These editing operations just don’t make any sense for Pathways. A generic XmlDiff is unnecessarily complex, and has output that is hard to interpret.

So, GpmlDiff instead of XmlDiff. A good decision, or just a case of
not-invented-here syndrome? Let me know what you think 🙂

I’m already making good progress with the implementation of GpmlDiff. I only need to decide how to format the output. The only available standard is XUpdate, but that seems to assume ordered Xml. Suggestions are welcome too!

SWT vs Swing

May 28th, 2007

When we started on the PathVisio project more than a year ago, I had to choose which GUI toolkit we were going to use. I chose the SWT toolkit instead of the more commonly used Swing toolkit.

At the time I had a bit of experience with Swing interfaces, and I knew how ugly they can be. One of the first things you learn to do in Swing is to enable the “system look and feel”, to at least have the illusion of integration with the rest of the operating system.

Swing was touted by Sun as enabling a “cross-platform look and feel” meaning that the program would have the same look, whether you use it on Solaris, Windows or the Mac. I think this is a silly feature. Most people never run a program on different platforms. The only people who do that are either the application developers themselves, or they work at Sun, where people are blind to subtleties of modern desktops. The consequence, of course, is that you can recognize Swing apps a mile away because of the look of the GUI

SWT was supposed to fix that. SWT lets you create a GUI that is much more integrated with the rest of the operating system. Since the PathVisio project is heavily focused on usability, this sounded like a good match. But, as I found out over time, SWT is not without its problems either.

  • You always have to package a huge extra library, including native libs for each target platform.
  • SWT, Webstart, Mac OS X: Pick only two. You simply cannot have all three.
  • SWT is relatively new and hence relatively buggy. We’ve hit a number of SWT bugs during PathVisio development. Luckily, some get fixed with each new version.
  • And not least of all, the choice for SWT has prevented us from integrating PathVisio with other applications, most notably Cytoscape.

But we can get the best of both worlds by factoring out the SWT dependent code and creating a toolkit-independent layer. Luckily such a toolkit-independent layer already exists: it is called Graphics2D and is officially part of Java. There is no Graphics2D implementation as part of SWT, but if you search around you can find other open source projects that ran into the same problem and already provide a partial solution.

Integrating this idea into PathVisio will in fact be the first step of the project of Thomas Kelder, another GSOC student working for GenMAPP. I really hope it works out as planned because this is going to open a whole new set of possibilities.

Pathways and Networks

May 11th, 2007

After the last post, a few people have asked me for examples of Biological Pathways. Well, if you just head over to WikiPathways you can find hundreds of them. (There are other websites with pathways too, such as KEGG)

A Biological Pathway is usually drawn fairly simple, with only a few dozen interacting components. They’re kinda like flow-charts, not with all the visual complexity that you see in gene networks. I think this is the way it should be. “pathways” and “networks” each provide complementary views of something that happens inside a cell, one view is structured but complex, the other is more free-form but easier to understand.

Hello world!

May 8th, 2007

Yes, I admit. Caught red-handed. I too am contributing to the ever-expanding Blogosphere. Recent events have triggered me to actually do something that was on the someday/maybe list for the longest time. This is the result.

What recent events am I talking about? Why, I thought you’d never ask. I’m going to participate in this year’s Google Summer of Code!

My project is entitled “Visual history of biological pathways“. I’d like to explain that a bit.

I’m a PhD student, and my research involves something called Biological pathways. Biological Pathways are simply drawings of processes that take place within a Cell, for example, you could have an Insulin Signalling Pathway that schematically shows how Insulin activates the Insulin Receptor in the cell membrane, which in turn activates Ras / MAP-K proteins which in turn activate many other enzymes and glucose transporters which eventually leads to uptake of glucose from the blood.

Representing pathways as simple drawings can be a useful research tool. My contribution to all this has been the development (together with fellow PhD student Thomas Kelder) of PathVisio, a pathway creation and editing program written in Java. It’s currently in Beta.

At the same time, the research group where I work, BiGCaT Bioinformatics, was setting up a collaboration with the GenMAPP group at UCSF, San Francisco. GenMAPP was officially accepted as a Mentoring organization, and they asked me and Thomas to apply. I’m really excited about this, this is a great opportunity for me for two reasons: through the summer of code I can work closer together with the people at UCSF, plus I get to be part of a really cool open source event.

Recently at GenMAPP they started something called WikiPathways (no prizes for guessing what that is). WikiPathways is what my Summer Project is all about. Basically it boils down to this: I think WikiPathways needs a history feature, so that you can easily see what other people have changed. Although this may seem at first like a minor detail, I think it is of immense psychological value for any wiki-like system.

One thing is for sure, It’s going to be a very interesting summer!