Sunil S. Ranka's Weblog

Superior Data Analytics is the antidote to Business Failure

Archive for the ‘HBase’ Category

How To Create External Hive Table on HBase

Posted by sranka on March 28, 2014

Hi All,

While building a data flow for replacing one of the EDW’ workflow using Big Data technology stack , came across some interesting findings and issues.  Due to  UPSERT ( INSERT new records or UPDATE existing records depending) nature of data we had to use Hbase, but to expose the outbound feed we need to do some calculation on HBase and publish that to Hive as external. Even though conceptually , its easy to create an external hive table on HBase is simple, but I had to go through some hoop.


Table Creation in hbase
hbase(main):002:0> create 'mytable', 'cf'
hbase(main):004:0> put 'mytable', 'first', 'cf:message', 'hello HBase'
hbase(main):005:0> put 'mytable', 'second', 'cf:foo', 0x0
0 row(s) in 0.0130 seconds
hbase(main):006:0> put 'mytable', 'third', 'cf:bar', 3.14159
0 row(s) in 0.0080 second

hbase(main):002:0> describe 'mytable'
'2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false'
, ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}
1 row(s) in 0.9610 seconds

hbase(main):003:0> scan 'mytable'
first column=cf:foo, timestamp=1395167684857, value=0
first column=cf:message, timestamp=1395167407496, value=hello HBase
second column=cf:foo, timestamp=1395167483988, value=0
third column=cf:bar, timestamp=1395167493639, value=3.14159
3 row(s) in 0.0760 seconds

Table Creation in Hive

Hive >  CREATE EXTERNAL TABLE hbase_table_3(key string, value string,value1 string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:foo,cf:message") TBLPROPERTIES ("" = "mytable");

Table Access in Hive

hive> set hbase.client.scanner.caching=50000;
hive> desc hbase_table_3;
key     string  from deserializer
value   string  from deserializer
value1  string  from deserializer
Time taken: 0.428 seconds

Zookeeper ,  Aux Path and hbase.client.scanner.caching 

Zookeeper is an important part of  Hadoop ecosystem, it works as a Resource Management service. You would need to make sure that it has a quorum with odd numbers (1,3,5) of instances. For accessing external table you need to have zookeeper services up and running.  Along with zookeeper you will need to make few changes.

If you have big HBase table, you will need to set higher hbase.client.scanner.caching property before running the Hbase query. In our case we use 50000.

See below changes needed to hive-site.xml and hbase-site.xml.

Changes in hive-site.xml



Changes in hbase-site.xml


 Linux Performance Tuning

Some of the following commands have helped enhancing performance.

echo 1 > /proc/sys/vm/drop_caches
echo 2 > /proc/sys/vm/drop_caches
echo 3 > /proc/sys/vm/drop_caches


Special Thanks to Aditi Hedge, Rathinavel Sivaswamy and Anurag Gupta for their inputs.

Hope this helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Big Data, HBase | Tagged: , , , , , , , , | Leave a Comment »

%d bloggers like this: