How To Run Hadoop Benchmarking TestDFSIO on Cloudera Clusters
Posted by sranka on October 9, 2013
Out of the box hadoop provides a benchmarking mechanism for your cluster. While doing the same on Cloudera cluster, it was a fun ride, hence thought will share the same to reduce the pain and increase the fun.
Before you begin anything, set the HADOOP_HOME.The below command would work for RHEL.
For CDH “TestDFSIO” resides in — hadoop-mapreduce-client-jobclient-<version>-cdh<version>-tests.jar — in “lib/hadoop-mapreduce/” under “Cloudera Home Directory” in my case :
You will need to run read and write Test Benchmark as below :
hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-cdh4.3.0-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 1000 hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-cdh4.3.0-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 1000
Once you run the test you will see “TestDFSIO_results.log” file in the same directory. The content of the file would look below :
----- TestDFSIO ----- : write Date & time: Wed Oct 09 14:56:14 PDT 2013 Number of files: 10 Total MBytes processed: 10000.0 Throughput mb/sec: 5.382930941302368 Average IO rate mb/sec: 5.390388488769531 IO rate std deviation: 0.20763769922620628 Test exec time sec: 211.457 ----- TestDFSIO ----- : read Date & time: Wed Oct 09 14:57:47 PDT 2013 Number of files: 10 Total MBytes processed: 10000.0 Throughput mb/sec: 48.88230607167124 Average IO rate mb/sec: 49.50707244873047 IO rate std deviation: 5.8465670196729596 Test exec time sec: 39.954
Based on the numbers aboove, below would be the read and write Throughput across the cluster.
Total Read Throughput Across Clusters (Number of files * Throughput mb/sec) = 488.8MB/Sec Total Write Throughput Across Clusters(Number of files * Throughput mb/sec) = 53.82 MB/Sec<br />
Hope This helps
Happy Benchmarking !!!
Sunil S Ranka
“Superior BI is the antidote to Business Failure”