PhyloWS and BioSQL are my hackathon targets
After the cold rain and stinging wind yesterday this day in Tokyo presents itself most magnificent. The sky is blue, the wind is still cold but the air is clear and crisp, and, best of all, I am writing this post with snow-covered Mt. Fuji in clear sight from the 8th floor of the CBRC building in the Tokyo Bay Area.
The Open Space session on Monday and ensuing discussions resulted in two main targets for me to work on this week. Rutger Vos and Chris Zmasek are joining forces with me to define a basic Phyloinformatics Web-Services API, or PhyloWS in short (pronounced “phylowiz”). (You can also watch some of our ramblings on the PhyloWS workgroup page at the BioHackathon wiki.)
Obviously this could be a rather monumental task, so we are trying to break it down into manageable units. We are starting with defining possible scopes (such as along the axis of broad phyloinformatics data types), use-cases, and distilling an initial set of API requirements from those. It is quickly becoming clear that the number of possible queries and desirable operations is nearly endless. Rutger already attracted the awe of our fellow programmers in the room by creating a gigantic matrix of all possible inputs versus all possible outputs, with the cells being the operation(s) that the combination would correspond to (see the workgroup page). Chris’ insights from a comparative genomics perspective (see, for example, his recent paper in Genome Biology) are also invaluable; it is interesting to hear how different the obstacles are that he encounters in his data analyses from, for example, the issues with disseminating research trees of life.
Our next step will need to be narrowing down by API scopes, query priorities, and implementation feasibility. As a start, the way I am looking at API scopes is from the viewpoint of the types of possible service providers. For example, one type of service (or, in this case, data) provider is a database of phylogenetic trees, such as TreeBASE, or a taxonomy database, such as ITIS. Another type of service provider would be one offering to execute phylogenetic analysis methods, such as CIPRES. In reality, service providers may fall into more than one category; for example, TreeBASE also stores character data (such as alignments).
We have already gathered a substantial amount of documentation along the lines of the above, and if you read this you are very much invited to comment on any of the aspects, ranging from how we are organizing this, to what we are missing, to what in your opinion we should focus on.
My other main target is working towards the long-overdue v1.0 release of BioSQL, and serving as a readily available consultant to the BioSQL interoperability and web-services group. When I’m not watching Mt. Fuji, I’m sitting next to Mark Schreiber – one of the great things about a hackathon is that interaction just can’t get any easier. In addition, our PhyloWS documentation work has already pointed out several pieces of information a phylogenetic tree database should have available, but that the current version of BioSQL’s PhyloDB module can’t store, so in a related goal I intend to make several additions to the schema module to close those gaps.
As is always the case for a hackathon, there are many times more tasks I would want to work on than I could possibly accomplish. The main goal for the remaining time is to get enough done such that the main activation barriers have been surmounted, so that the rest of the work can be completed when we return home and are engulfed again in our normal daily obligations.