Sunil S. Ranka's Weblog

Superior Data Analytics is the antidote to Business Failure

Posts Tagged ‘Hive’

Need for Defining Reference Architecture For Big Data

Posted by sranka on May 7, 2014

Hi Fellow Big Data Admirers ,

With big data and analytics playing an influential role helping organizations achieve a competitive advantage, IT managers are advised not to deploy big data in silos but instead to take a holistic approach toward it and define a base reference architecture even before contemplating positioning the necessary tools. 

My latest print media article (5th in the series) for CIO magazine (ITNEXT) talks extensively about need of reference architecture in Big Data

Click Here For : Need For Defining Big Data Reference Architecture

 

Hope you Enjoy Reading this.

Hope this helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

For copy of May 2014 IT Next Magazine please visit http://www.itnext.in/digital_assets/330/IT-NEXT-Vol-05-Issue-04-May-2014.pdf ( My Article is on Page 37 )

 

 

Posted in Big Data | Tagged: , , , , , , , , , , , , | Leave a Comment »

How To Create External Hive Table on HBase

Posted by sranka on March 28, 2014

Hi All,

While building a data flow for replacing one of the EDW’ workflow using Big Data technology stack , came across some interesting findings and issues.  Due to  UPSERT ( INSERT new records or UPDATE existing records depending) nature of data we had to use Hbase, but to expose the outbound feed we need to do some calculation on HBase and publish that to Hive as external. Even though conceptually , its easy to create an external hive table on HBase is simple, but I had to go through some hoop.

 


Table Creation in hbase
hbase(main):002:0> create 'mytable', 'cf'
hbase(main):004:0> put 'mytable', 'first', 'cf:message', 'hello HBase'
hbase(main):005:0> put 'mytable', 'second', 'cf:foo', 0x0
0 row(s) in 0.0130 seconds
hbase(main):006:0> put 'mytable', 'third', 'cf:bar', 3.14159
0 row(s) in 0.0080 second

hbase(main):002:0> describe 'mytable'
DESCRIPTION ENABLED
'mytable', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICA true
TION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL =>
'2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false'
, ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}
1 row(s) in 0.9610 seconds

hbase(main):003:0> scan 'mytable'
ROW COLUMN+CELL
first column=cf:foo, timestamp=1395167684857, value=0
first column=cf:message, timestamp=1395167407496, value=hello HBase
second column=cf:foo, timestamp=1395167483988, value=0
third column=cf:bar, timestamp=1395167493639, value=3.14159
3 row(s) in 0.0760 seconds

Table Creation in Hive


Hive >  CREATE EXTERNAL TABLE hbase_table_3(key string, value string,value1 string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:foo,cf:message") TBLPROPERTIES ("hbase.table.name" = "mytable");

Table Access in Hive

hive> set hbase.client.scanner.caching=50000;
hive> desc hbase_table_3;
OK
key     string  from deserializer
value   string  from deserializer
value1  string  from deserializer
Time taken: 0.428 seconds
hive>

Zookeeper ,  Aux Path and hbase.client.scanner.caching 

Zookeeper is an important part of  Hadoop ecosystem, it works as a Resource Management service. You would need to make sure that it has a quorum with odd numbers (1,3,5) of instances. For accessing external table you need to have zookeeper services up and running.  Along with zookeeper you will need to make few changes.

If you have big HBase table, you will need to set higher hbase.client.scanner.caching property before running the Hbase query. In our case we use 50000.

See below changes needed to hive-site.xml and hbase-site.xml.

Changes in hive-site.xml

<property>
  <name>hive.zookeeper.quorum</name>
   <value>devapphdp08.samvi.com,devapphdp09.samvi.com,devapphdp07.samvi.com</value>
</property>
<property>
 <name>hive.aux.jars.path</name>
 <value>file:///usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.6.0.jar,file:///usr/lib/hive/lib/hbase.jar,file:///usr/lib/zookeeper/zookeeper.jar</value>
</property>

 

Changes in hbase-site.xml

<property>
  <name>hive.zookeeper.quorum</name>
   <value>devapphdp08.samvi.com,devapphdp09.samvi.com,devapphdp07.samvi.com</value>
</property>

 Linux Performance Tuning

Some of the following commands have helped enhancing performance.

echo 1 > /proc/sys/vm/drop_caches
echo 2 > /proc/sys/vm/drop_caches
echo 3 > /proc/sys/vm/drop_caches

 

Special Thanks to Aditi Hedge, Rathinavel Sivaswamy and Anurag Gupta for their inputs.

Hope this helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Big Data, HBase | Tagged: , , , , , , , , | Leave a Comment »

Big Data : Right Approach Right Solution

Posted by sranka on February 1, 2014

Hi All,

Past few months I have been meeting with clients and discussing their potential need of Big Data. The discuss gets to the bottom of , do they really need the Big Data ? The below link to my ITNext article talks about As big data goes bigger,IT managers are challenged with the task of identifying data that qualifies for big and finding appropriate solutions to process it.

Click Here To Read Full Article  : Right Approach, Right Solution 

Hope This Helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , , , , , | Leave a Comment »

How To Find Size Of Table In Hive / HDFS

Posted by sranka on November 19, 2013

Hi All

Volume on BigData being the constant challenge, as an administrator, you will have to keep a tab on the data growth, at the same time you need to make sure there is spurge growth of unwanted objects or folders. Typically you would want to be worried about the data growth in GB size. Hence below is the script which you could use to translate your current folder size to GB. Anything below GB would be shown as 0. This is a simple script, you can modify to track MB level details as well. Just change the multiplier factor of 1024.

sudo -u hdfs hadoop fs -du /app/hadoop/hive/warehouse/ | awk '/^[0-9]+/ { print int($1/(1024**3)) " [GB]\t" $2 }'

Hope This Helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Big Data | Tagged: , , , , , , , , , | Leave a Comment »

Permission Denied Exception When ran count(*) query on Hive/Impala

Posted by sranka on October 8, 2013

Hi All

While while running a simple

beeline > select count(*) from samvi_test_table;

Got the following error.

<br />Job Submission failed with exception 'org.apache.hadoop.security.AccessControlException(Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x<
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:149)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4716)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4698)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4672)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3035)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:2999)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2980)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:648)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:419)

After debugging into the details, found out the to read the count() the user needs to read the blocks on the HDFS. Since the user didn’t have needed privileges, I was getting the error. The issue was resolved using below ::


sudo -u hdfs hadoop fs -mkdir /user/root
sudo -u hdfs hadoop fs -chown root:root /user/root<br />

Hope this helps.

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , , , , , , , | Leave a Comment »

 
%d bloggers like this: