We can now easily solve the problem of bioinformatics data integration. But how do we put that data in the hands of scientists?
At General Bioinformatics we put data in triple stores, and use SPARQL to query that data. Triple stores are great for data integration, but you still have to figure out how to put that data in the hands of scientists. Integrating data is only half of the problem, we also have to present that data. The problem isn’t that SPARQL is hard to use per se (it’s really rather plain and sensible). The problem is that SPARQL is supposed to be only a piece of plumbing at the bottom of a software stack. We shouldn’t expect scientists to write SPARQL queries anymore than we expect them to carry adjustable pliers to a restroom visit.
The General SPARQL app is one of the new ways to present triple data.
How do you use it?
The app lets you build a network step by step. Nodes and edges can be added to a network in a piecemeal fashion. Nodes can represent various biological entities, such as: a pathway, a protein, a reaction, or a compound. Edges can represent any type of relation between those entities.
For example, you can start by searching for a protein of interest. The app places a single node in your network. You can then right-click on this node to pull in related entities. For example, all the pathways that are related to your protein. Or all the Gene Ontology annotations. Or all the reactions that your protein is part of. Or the gene that encodes for your protein. And you can continue this process, jumping from one entity to the next.
Watch this screencast and it will start to make sense:
How does it work?
In the background, the General SPARQL app maintains a list of SPARQL queries. Each item in the search menu, and each item in the context (right-click) menu, is backed by one SPARQL query. When you click on them, a query is sent off in the background, and the result is mapped to your network according to certain rules.
When you first install the app, it comes pre-configured with a basic set of SPARQL queries, although it’s possible to provide your own set. The initial set is designed to work with public bioinformatics SPARQL endpoints provided by the EBI and Bio2RDF. But as great as these resources are, public triple stores can sometimes be overloaded. The app works with privately managed triple stores just as well.
Where can I find it?
The easiest way to get the app is simply from the Cytoscape App manager. Just install Cytoscape 3.0, start it, and go to menu->Apps->App Manager and search for “General SPARQL”. Or download it on from the app store website. What’s even better is that the source code is available on github.
Also, if you have a chance, come see my poster at Vizbi 2015 in Boston.
Tags: cytoscape, semantic web, sparql, visualization