Posts Tagged ‘diffviewer’

Good Old Web 1.0

Monday, August 13th, 2007

All biological pathways in a cell are linked together and can interact with each other in unpredictable ways. When putting a pathway down on paper it’s sometimes hard to decide where exactly one pathway ends and the other begins. As such, borders between Pathways are nearly always completely arbitrary. My supervisor always says (jokingly) that a good rule of thumb is that a pathway should include only that which can fit on a single screen (and as monitor resolutions have increased in the past, so has the average pathway size).

I ran into the problem of pathway size when testing my Pathway Diff viewer. When you do a Diff you want to see the old and new version side by side, but that takes too much screen space. The solution I came up with is to view them in two panels with scrollbars, as in the screenshot below:

two-panel-scroller

The two panels scroll together and you can zoom them in and out together as well.

As this is all done in Java, it’s not much work to turn this into a little applet so you can view the pathway online, using good old-fashioned Web 1.0 technology.

All this has the additional advantage that by letting the applet render the image instead of relying on SVG, I don’t have to let the browser render the SVG. SVG rendering in firefox has turned out to be quite buggy.

Visual Pathway Diff: First Screenshots

Wednesday, July 11th, 2007

As the mid-term evaluation is underway, I reached a major milestone: My program can now output the difference set in SVG format, meaning that it is possible to compare Biological Pathways visually.

Here are two examples (click on the thumbnails to see a larger image)

Pathway Comparison A

In this screen shot you see two versions of the Acetyl Choline Pathway (very important for transmitting electrical signals between neurons). As you can see I’ve straightened out a few arrows in the new version (on the right). The yellow color indicates things that have changed, the text balloons in the middle explain in more detail which attributes of those things have changed.

Pathway Comparison B

This pathway represents the Alcohol Dehydrogenase reaction (activated when you have a hangover 😀 ). In the image you can see I’ve added a few missing reaction compounds (in green) as well as set the label for the other compounds (in yellow).

SVG output is only one of the possible output formats of GpmlDiff. Another option is to write to an XML-type format that I dubbed DGPML. DGPML is designed in such a way that it should be possible to write a patch utility that takes a GPML Pathway and then applies a DGPML difference set to it.

In the context of wikipathways, I think there are a couple of interesting use cases for a patch utility. I didn’t think about this when I wrote my GSOC proposal, but I think this would be very useful. So useful in fact that I’m going to ask my supervisor Alex Pico if he thinks I should put it in my plan for the second half of the summer.

Progress update: XmlDiff

Monday, June 4th, 2007

My first target for Summer of Code is simple: to find a tool for extracting differences between two Pathways. Pathways are stored in gpml, an XML-based format. So my first idea was to look for an XmlDiff utility suitable for use in wikipathways. I spent some time reviewing all the various XmlDiff utilities out there. It turns out there are quite a few. And you know, if there are a dozen equivalent solutions for a certain problem, then you can bet that probably none of them are going to work for you.

Sequence Alignment Problem

So while reading more into what makes a good XmlDiff tool, I found some interesting whitepapers that caused me to rethink the problem. I now believe it would not be hard to write my own version of XmlDiff that is completely optimized for my specific problem, namely finding differences between Pathways.

With plain text, the differencing problem was solved nicely a long time ago with the famous diff utility. Basically it takes a text file line by line and considers if a line was either inserted, deleted or just changed. For this it implements the Longest Common Subsequence (LCS) algorithm.

(Incidentally, I learned that the LCS algorithm is exactly the same as the Needleman Wunsch algorithm, well known in Bioinformatics as an old solution to the Sequence Alignment problem (see image). Clearly, Nature edits DNA Sequences the same way as we edit text files: Insertions, Deletions, Mutations).

Now although XML is also text, it has a neat hierarchical structure that we would like to make use of. The line opeartions for diff don’t make much sense for XML. LCS also doesn’t cut it, so I had to read up on stuff like Nodeset correspondence and Tree correction algorithms.

So here is my main problem with all these XmlDiff tools: no two XML formats are the same. Gpml is not SVG, which is not XHTML. The idea behind XML is that you can use a standard set of tools for any type of document because they are structured in the same way, but in this case I think we’re better off with seperate tools for each type of document. Gpml is much simpler than the superset of all of XML. You can think of a whole bunch of operations that make sense on certain XML documents but not on Gpml:

  • Move a subtree up or down in the hierarchy (In gpml, all tags always live at the same level. <Pathway> is always at the root, <DataNode> is always just below <Pathway>)
  • Change the order of elements (In gpml, the order of elements does not matter)
  • Move a subelement of <DataNode> to a different <DataNode>. (Technically this could happen, but it just doesn’t make any sense)

These editing operations just don’t make any sense for Pathways. A generic XmlDiff is unnecessarily complex, and has output that is hard to interpret.

So, GpmlDiff instead of XmlDiff. A good decision, or just a case of
not-invented-here syndrome? Let me know what you think 🙂

I’m already making good progress with the implementation of GpmlDiff. I only need to decide how to format the output. The only available standard is XUpdate, but that seems to assume ordered Xml. Suggestions are welcome too!