Writing
Here is a list of my published articles in reverse chronological order. There are more posts on this blog, and I have also written a book on Hadoop.
- Spark Improvements in 4.1: Delivering results faster!, Broad GATK blog, 5 February 2019
- Large-Scale Health Data Analytics with OHDSI, Cloudera blog, 21 December 2017
- Understanding how Deep Learning learns to play SET®, Cloudera blog, 12 October 2017
- Hail: Scalable Genomics Analysis with Apache Spark, Cloudera blog, 2 May 2017
- How I Got Into Hadoop, Cloudera blog, 19 April 2016
- Genome Analysis Toolkit: Now Using Apache Spark for Data Processing, Cloudera blog, 6 April 2016
- Introduction to Hadoop: Real-World Hadoop Clusters and Applications, Dr. Dobbs Journal, 30 April 2013
- Hadoop: Writing and Running Your First Project, Dr. Dobbs Journal, 23 April 2013
- Hadoop: The Lay of the Land, Dr. Dobbs Journal, 16 April 2013
- Asking Any Question Of All Your Data, Forbes, 8 November 2010
- Running Hadoop MapReduce on Amazon EC2 and Amazon S3, Amazon Web Services Developer Connection, 18 July 2007
- Introduction to Nutch, Part 2: Searching, java.net, 16 February 2006
- Introduction to Nutch, Part 1: Crawling, java.net, 10 January 2006
- Did You Mean: Lucene?, java.net, 9 August 2005
- How To Build a Compute Farm, java.net, 21 April 2005
- Can’t beat Jazzy, IBM developerWorks, 22 September 2004
- Using XML Catalogs with JAXP, XML.com, 3 March 2004
- Scheduling recurring tasks in Java applications, IBM developerWorks, 4 November 2003
- Memoization in Java Using Dynamic Proxy Classes, O’Reilly Network, 20 August 2003
- Using Thread-Local Variables in Java, Dr. Dobb’s Journal, 1 July 2003, #350