Hadoop at OSCON 2007

I wasn't there, but Hadoop had two airings at OSCON this week. Doug Cutting was a part of the Open Source Radar Executive Briefing with Tim O'Reilly to talk about scaling.

He also gave a talk, with Eric Baldeschwieler, entitled "Meet Hadoop" where he gave a great exposition of the problem that Hadoop is designed to solve. In short: disk seeks are expensive, so databases built using sort-merge, which is limited by transfer speed not seek speed, scale better than traditional B-tree databases, which are limited by seek speed. More details and examples on the slides.

Eric's half of the talk gave some interesting tidbits about how Hadoop is being used at Yahoo! For example, they are running Hadoop on about 10,000 machines, and the biggest cluster is 1600 machines! With these kind of numbers I can see how Nigel Daley came to coin Nigel's Law:

In a large enough cluster, there are NO corner cases