Hadoop beats terabyte sort record

Hadoop has beaten the record for the terabyte sort benchmark, bringing it from 297 seconds to 209. Owen O'Malley wrote the MapReduce program (which by the way has a clever partitioner to ensure the reducer outputs are globally sorted and not just sorted per output partition, which is what the default sort does), and then ran it on 910 nodes on Yahoo!'s cluster. There are more details in Owen's blog post (and there's a link to the benchmark page which has a PDF explaining his program). You can also look at the code in trunk.

Well done Owen and well done Hadoop!