Hadoop Query Languages

If you want a high-level query language for drilling into your huge Hadoop dataset, then you've got some choice:

Pig, from Yahoo! and now incubating at Apache, has an imperative language called Pig Latin for performing operations on large data files.
Jaql, from IBM and soon to be open sourced, is a declarative query language for JSON data.
Hive, from Facebook and soon to become a Hadoop contrib module, is a data warehouse system with a declarative query language that is a hybrid of SQL and Hadoop streaming.

All three projects have different strengths, but there is plenty of scope for collaboration and cross-pollination, particularly in the query language. For example, at the Hadoop Summit in March, Joydeep Sen Sarma of Facebook said that they would be receptive to users who wanted to use Pig Latin or Jaql in Hive. And Kevin Beyer of IBM Research said that Pig and Jaql are converging, and they've had discussions with the Pig team about how to bring them even closer together.

Meanwhile, to learn more I recommend Pig Latin: A Not-So-Foreign Language for Data Processing (by Chris Olston et al), and the slides and videos from the Hadoop Summit.

(And I haven't even included Cascading, from Chris K. Wensel, which, while not a query language per se, is an abstraction built on MapReduce for building data processing flows in Java or Groovy using a plumbing metaphor with constructs such as taps, pipes, and flows. Well worth a look too.)