Sunil S. Ranka's Weblog

Superior Data Analytics is the antidote to Business Failure

Posts Tagged ‘HDFS’

Accessing HDFS files on local File system using mountableHDFS – FUSE

Posted by sranka on April 9, 2015

Hi All

Recently we had one requirement wherein we had to merge the files post Map and Reducer job. Since the file needed to be given to the outbound team outside of Hadoop development team, having these files on local system would have been ideal. The customer IT team worked with cloudera and gave us a mount point using a utility/concept called “mountableHDFS” aka FUSE (Filesystem in Userspace)  .

mountableHDFS, helps allowing HDFS to be mounted (on most flavors of Unix) as a standard file system using the mount command. Once mounted, the user can operate on an instance of hdfs using standard Unix utilities such as ‘ls’, ‘cd’, ‘cp’, ‘mkdir’, ‘find’, ‘grep’

For more details on mountableHDFS :

https://wiki.apache.org/hadoop/MountableHDFS

For how to configure on cloudera :

http://www.cloudera.com/content/cloudera/en/documentation/core/v5-2-x/topics/cdh_ig_hdfs_mountable.html

 

Special thanks to Aditi Hedge for bringing to my attention.

Hope This Helps,

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , | Leave a Comment »

How to read HDFS fsImage file

Posted by sranka on April 2, 2015

During one of the sizing exercise the  ask for server capacity  was more than the actual usage of cluster . Knowing the data and usage, I was not convinced that we should be asking for more memory space. That triggered the thought of

Conceptually FSIMG file is the balancesheet of all the file and their existence and location. If somehow we could read the metadata withing the file and make sence out of it, than it could help us as follow :

  • how to keep the cluster clean.
  • how to manage the space on server by means of knowing file duplication, last access time
  • To Know which are longest running jobs

To more about the files and attributes :

STEP 1: Download the latest fsimage copy.

$ hdfs dfsadmin -fetchImage /tmp

$ ls -ltr /tmp | grep -i fsimage
-rw-r–r– 1 root root 22164 Aug 15 17:27 fsimage_0000000000000004389

$ hdfs oiv -i /tmp/fsimage_0000000000000001386 -o /tmp/fsimage.txt

This would launche a HTTP server which exposes read-only WebHDFS API by default at port “5978”.

For more detail on oiv, you can visit :

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html

 

Hope This Helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , , , | Leave a Comment »

HDFS Free Space Command

Posted by sranka on March 17, 2014

Hi All

With increasing data  volume , in HDFS space could be continued challenge. While running into some space related issue, following command came very handy, hence thought of sharing with extended virtual community.

At times it gets challenging to know how much of actual space a directory or a file is using.  Having a command which can give you human readable format of size is always useful.  Below command shows how to get actual human readable file size on HDFS

hdfs dfs -du -h /

241.3 G  /app
9.8 G    /benchmarks
309.6 G  /hbase
0        /system
59.6 G   /tmp
20.0 G   /user
[sranka@devHadoopSrvr06 ~]$

 

hadoop dfsadmin -report

Post running the command, below is the result, it takes all the nodes in the cluster and gives the detail break-up based on the space availability and spaces used.


Configured Capacity: 13965170479105 (12.70 TB)
Present Capacity: 4208469598208 (3.83 TB)
DFS Remaining: 2120881930240 (1.93 TB)
DFS Used: 2087587667968 (1.90 TB)
DFS Used%: 49.60%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 5 (5 total, 0 dead)

Live datanodes:
Name: 160.33.148.202:50010 (devHadoopSrvr08.ps.am.mycompany.com)
Hostname: devHadoopSrvr08.ps.am.mycompany.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 2793034095821 (2.54 TB)
DFS Used: 381953257472 (355.72 GB)
Non DFS Used: 1986904386765 (1.81 TB)
DFS Remaining: 424176451584 (395.05 GB)
DFS Used%: 13.68%
DFS Remaining%: 15.19%
Last contact: Mon Mar 17 12:43:05 PDT 2014

Name: 160.33.148.204:50010 (devHadoopSrvr10.ps.am.mycompany.com)
Hostname: devHadoopSrvr10.ps.am.mycompany.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 2793034095821 (2.54 TB)
DFS Used: 402465816576 (374.83 GB)
Non DFS Used: 1966391827661 (1.79 TB)
DFS Remaining: 424176451584 (395.05 GB)
DFS Used%: 14.41%
DFS Remaining%: 15.19%
Last contact: Mon Mar 17 12:43:05 PDT 2014

Name: 160.33.148.203:50010 (devHadoopSrvr09.ps.am.mycompany.com)
Hostname: devHadoopSrvr09.ps.am.mycompany.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 2793034095821 (2.54 TB)
DFS Used: 391020421120 (364.17 GB)
Non DFS Used: 1977837223117 (1.80 TB)
DFS Remaining: 424176451584 (395.05 GB)
DFS Used%: 14.00%
DFS Remaining%: 15.19%
Last contact: Mon Mar 17 12:43:06 PDT 2014

Name: 160.33.148.201:50010 (devHadoopSrvr07.ps.am.mycompany.com)
Hostname: devHadoopSrvr07.ps.am.mycompany.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 2793034095821 (2.54 TB)
DFS Used: 389182472192 (362.45 GB)
Non DFS Used: 1979675172045 (1.80 TB)
DFS Remaining: 424176451584 (395.05 GB)
DFS Used%: 13.93%
DFS Remaining%: 15.19%
Last contact: Mon Mar 17 12:43:04 PDT 2014

Name: 160.33.148.59:50010 (devHadoopSrvr06.ps.am.mycompany.com)
Hostname: devHadoopSrvr06.ps.am.mycompany.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 2793034095821 (2.54 TB)
DFS Used: 522965700608 (487.05 GB)
Non DFS Used: 1845892140237 (1.68 TB)
DFS Remaining: 424176254976 (395.04 GB)
DFS Used%: 18.72%
DFS Remaining%: 15.19%
Last contact: Mon Mar 17 12:43:05 PDT 2014

Hope This Helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , , , | Leave a Comment »

Big Data : Right Approach Right Solution

Posted by sranka on February 1, 2014

Hi All,

Past few months I have been meeting with clients and discussing their potential need of Big Data. The discuss gets to the bottom of , do they really need the Big Data ? The below link to my ITNext article talks about As big data goes bigger,IT managers are challenged with the task of identifying data that qualifies for big and finding appropriate solutions to process it.

Click Here To Read Full Article  : Right Approach, Right Solution 

Hope This Helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , , , , , | Leave a Comment »

How To Find Size Of Table In Hive / HDFS

Posted by sranka on November 19, 2013

Hi All

Volume on BigData being the constant challenge, as an administrator, you will have to keep a tab on the data growth, at the same time you need to make sure there is spurge growth of unwanted objects or folders. Typically you would want to be worried about the data growth in GB size. Hence below is the script which you could use to translate your current folder size to GB. Anything below GB would be shown as 0. This is a simple script, you can modify to track MB level details as well. Just change the multiplier factor of 1024.

sudo -u hdfs hadoop fs -du /app/hadoop/hive/warehouse/ | awk '/^[0-9]+/ { print int($1/(1024**3)) " [GB]\t" $2 }'

Hope This Helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Big Data | Tagged: , , , , , , , , , | Leave a Comment »

Permission Denied Exception When ran count(*) query on Hive/Impala

Posted by sranka on October 8, 2013

Hi All

While while running a simple

beeline > select count(*) from samvi_test_table;

Got the following error.

<br />Job Submission failed with exception 'org.apache.hadoop.security.AccessControlException(Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x<
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:149)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4716)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4698)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4672)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3035)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:2999)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2980)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:648)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:419)

After debugging into the details, found out the to read the count() the user needs to read the blocks on the HDFS. Since the user didn’t have needed privileges, I was getting the error. The issue was resolved using below ::


sudo -u hdfs hadoop fs -mkdir /user/root
sudo -u hdfs hadoop fs -chown root:root /user/root<br />

Hope this helps.

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , , , , , , , | Leave a Comment »

 
%d bloggers like this: