Some of the projects I’ve worked on, and my areas of interest and expertise, in no particular order:
I was one of the first committers on Apache Hadoop, and worked on it and many other distributed systems projects in the Hadoop ecosystem for the best part of a decade while working at Cloudera. I wrote four editions of the bestselling book Hadoop: The Definitive Guide, published by O’Reilly.
I was the primary author of Spark support in GATK, working with members of the Broad Institute GATK team from 2015 to 2019. I also created Disq for reading and writing bioinformatics sequencing formats from Spark.
In early March 2020 I began collating the disparate sources of UK COVID-19 data, by writing web crawlers to integrate the data for COVID-19 tests, confirmed cases, and deaths into a set of CSV files. My work has been used by many different individuals and organisations, including John Burn-Murdoch’s visualizations for the Financial Times.
In 2019 I produced the data analyses and visualizations of Welsh school funding data, for the Level the Playing Field campaign for fair funding for schools in Wales.
In 2020 I wrote a blog about data visualization, and created one new visualization per week - with no constraints on dataset, visualization type, or technology.
Over the years I’ve created many geometric visualizations in my spare time.