Hadoop Query Languages
If you want a high-level query language for drilling into your huge Hadoop dataset, then you've got some choice:
Meanwhile, to learn more I recommend Pig Latin: A Not-So-Foreign Language for Data Processing (by Chris Olston et al), and the slides and videos from the Hadoop Summit.
(And I haven't even included Cascading, from Chris K. Wensel, which, while not a query language per se, is an abstraction built on MapReduce for building data processing flows in Java or Groovy using a plumbing metaphor with constructs such as taps, pipes, and flows. Well worth a look too.)
- Pig, from Yahoo! and now incubating at Apache, has an imperative language called Pig Latin for performing operations on large data files.
- Jaql, from IBM and soon to be open sourced, is a declarative query language for JSON data.
- Hive, from Facebook and soon to become a Hadoop contrib module, is a data warehouse system with a declarative query language that is a hybrid of SQL and Hadoop streaming.
Meanwhile, to learn more I recommend Pig Latin: A Not-So-Foreign Language for Data Processing (by Chris Olston et al), and the slides and videos from the Hadoop Summit.
(And I haven't even included Cascading, from Chris K. Wensel, which, while not a query language per se, is an abstraction built on MapReduce for building data processing flows in Java or Groovy using a plumbing metaphor with constructs such as taps, pipes, and flows. Well worth a look too.)