iPhylo: August 2006

Roderic D. M. Page

Tuesday, August 29, 2006

Collaborative data matrices using EditGrid

EditGrid is an online collaborative spreadsheet tool that I stumbled across via Ogle Earth. It strikes me thjat this could be a way to create phylogenetic data sets collaboratively.

As a quick test I grabbed the Vertebrates example file that comes with MacClade, exported the NEXUS file as a table, opened it in Excel, then uploaded the Excel file to EditGrid. You can see the results here.

The spreadsheet is a natural metaphor for phylogenetic data, although in this application is is likely to be better suited to morphological data where a team of people are assembling a matrix from various sources.

The developers of EditGrid have a blog whioch converys their own sense of excitement about this project.

Tuesday, August 08, 2006

Connotea and TreeBASE

One of my (forever) ongoing projects is to map taxon names in TreeBASE to names in external databases (such as uBio) as a way of checking that the names are correct, adding the ability to handle synonyms, and hierarchical queries (see my earlier post for more details).

Now, many names in TreeBASE aren't in any of the major name databases (fossils seem particularly poorly supported), which means hunting on Google for the name. In some cases I come across the name and the original reference for the name, which means I can document that the name is correct. For example, TreeBASE taxon T8737 is Eocaiman cavernensis, which doesn't occur in any of the name sources I use (uBio, ITIS, NCBI, IPNI, etc.). It's a fossil crocodilian, described by George Gaylord Simpson in 1933.

The original description in American Museum Novitates is online (hdl:2246/2050), courtesy of the AMNH's DSpace server. So, how do I link the name and the publication -- without me creating a new database to do this? Well, Connotea to the rescue. I add Simpson's paper to Connotea, tagged with the TreeBASE TaxonID T8737, and viola, the information is stored.

Now, to make use of this we need to do a little bit more, such as have a triple store that contains both the TreeBASE names and the Connotea record, but given that Connotea serves RSS 1.0 (i.e., RDF), this is easy.

What I like about this is:

I don't have to do much work

The publication information is stored where others can see it and make use of it (i.e., if my experiments with these ideas fall by the wayside, the data still remain).

Now, back to the tedious task of mapping...