Wednesday, August 17, 2016

Containers, microservices, and data

Docker Some notes on containers, microservices, and data. The idea of packaging software into portable containers and running them either locally or in the cloud is very attractive (see Docker). Some use cases I'm interested in exploring.

Microservices

In Towards a biodiversity knowledge graph (doi:10.3897/rio.2.e8767) I listed a number of services that are essentially self contained, such as name parsers, reconciliation tools, resolvers, etc. Each of these could be packaged up and made into containers.

Databases

We can use containers to package database servers, such as CouchDB, ElasticSearch, and triple stores. Using containers means we don't need to go through the hassle of installing the software locally. Interested in RDF? Spin up a triple store, play with it, then switch it off if you decide it's not for you. If it proves useful, you can move it to the cloud and scale up (e.g., sloppy.io).

Data

A final use case is to put individual datasets in a container. For exmaple, imagine that we have a large Darwin Core Archive. We can distribute this as a simple zip file, but you can't do much with this unless you have code to parse Darwin Core. But imagine we combine that dataset with a simpel visualisation tool, such as VESpeR (see doi:10.1016/j.ecoinf.2014.08.004). Users interested in the data could then play with the data without the overhead of installing specialist software. In a sense, the data becomes an app.