Pastures new-ish

After nine and a half years, today is my last day at Cloudera. It’s difficult to write those words as so much of my life has been bound up with this company. On the day I started, I didn’t meet my co-workers as I was living several thousand miles away in a barn in Wales. (The others were in a borrowed meeting room in San Mateo.) As I leave I am still in a barn in Wales (different barn though), but a lot has happened in the intervening period.

On the personal side, my family and I lived in San Francisco during the early formative years of Cloudera, a time we will always treasure for the lifelong friendships we made.

On the professional side, it is no exaggeration to say that working at Cloudera has been the highlight of my career. I already knew that Hadoop was pretty special when I joined (I may have been biased as I was writing a book on it), but I had no idea how it would transform the industry and how it would be used in every sector you could imagine.

To all of you I have worked with over the last decade—at Apache, Cloudera and elsewhere, on many projects—I consider myself to be incredibly fortunate to have had the opportunity to work with you. Thank you.

So what’s next for me?

Jim Waldo, who worked on distributed systems at Sun, once said that he alternated six month periods between the lab and the outside world: in the lab he and his team built systems software, and in the outside world he saw how people used the system he was building. Doing so gave him valuable feedback on the system design, even though it was time away from being able to build the system.

In some ways this is another way of framing the explore/exploit tradeoff, where you decide between exploring new technological ground—building a new system—and exploiting that system to solve particular problems you are interested in, which is why you built the system in the first place. (Of course, this framing is oversimplified, since there are many people working on both parts simultaneously. It’s a useful way of thinking about things as an individual actor though.)

For the past few years I have been working on a few open source biology and healthcare projects (like GATK, Hail, and OHDSI). I think that the problems in biology are big enough and messy enough that new systems will need to be built. We can’t stop exploring the technological ground since the sheer amount of data will overwhelm even the best of today’s cutting-edge technology. (I like to cite the paper Big Data: Astronomical or Genomical? here for some concrete numbers.)

Having said that, there is still a lot of mileage left in our current crop of tools—which include Spark, TensorFlow, Jupyter, and the cloud. And this is what I am going to do: continue the work to apply tools like these to more bio projects, only now working as a freelancer. I plan to write more about what I’m up to on this blog, so please follow along.

Cloudera Inbox Zero for the first time ever!