During one of the sizing exercise the ask for server capacity was more than the actual usage of cluster . Knowing the data and usage, I was not convinced that we should be asking for more memory space. That triggered the thought of
Conceptually FSIMG file is the balancesheet of all the file and their existence and location. If somehow we could read the metadata withing the file and make sence out of it, than it could help us as follow :
- how to keep the cluster clean.
- how to manage the space on server by means of knowing file duplication, last access time
- To Know which are longest running jobs
To more about the files and attributes :
STEP 1: Download the latest fsimage copy.
$ hdfs dfsadmin -fetchImage /tmp
$ ls -ltr /tmp | grep -i fsimage
-rw-r–r– 1 root root 22164 Aug 15 17:27 fsimage_0000000000000004389
$ hdfs oiv -i /tmp/fsimage_0000000000000001386 -o /tmp/fsimage.txt
This would launche a HTTP server which exposes read-only WebHDFS API by default at port “5978”.
For more detail on oiv, you can visit :
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
Hope This Helps
Sunil S Ranka
“Superior BI is the antidote to Business Failure”