Sunil S. Ranka's Weblog

Superior Data Analytics is the antidote to Business Failure

Posts Tagged ‘Ranka’

What is Oracle Business Intelligence Cloud Service ( BICS )

Posted by sranka on August 11, 2016

Recently we have been getting lots of traction on BICS , existing OBIEE customers been asking for BICS . In nutshell :

BI Cloud Service enables organisations of all sizes to quickly and cost effectively deploy business intelligence with the simplicity of the cloud..

Silent features of BICS :

  • No need of software installation
  • No need of software maintenance
  • No upfront costs, low monthly subscription
  • Customers can get started in hours
  • 100% cloud based
  • Robust reporting with interactive visuals, auto-suggestions, detailed formatting, export, and more
  • Powerful analytics platform with advanced calculations and analytic functions
  • Easy self-serve data loading
  • Rich data integration options
  • Mobile access with no extra programming required
  • Comprehensive sharing framework
  • Role-based fine grain security
  • Simple self-service administration

Key Benefits :

  • Fast access and low cost speed time to value
  • Quick start means users are productive quickly
  • A single BI platform for all users helps consolidate analytic investments
  • Timely access to data means greater impact
  • Streamlined operations and reduced burden on IT

Summary :

Based on my past experience of working on OBIEE on premise, BICS is a good alternative for any IT organisation, BICS gives all the needed feature of On-Premise installation and flexibility of operation, management , and most importantly low cost solution. In next few post, I will describe more about BICS tool and features.

Hope this helps

Sunil S Ranka

“Superior Data Analytics is the antidote to Business Failure”



Posted in Uncategorized | Tagged: , , , , | Leave a Comment »

Cloud Allergy – Clouds Security and Changing Notion

Posted by sranka on June 30, 2016

With my recent role as CTO/Advisor with, during most of my conversation with Analytics leaders within the company, all are concern over security. At a recent conversation with another entrepreneur friend, one of his solution was stalled due to SQL injection issue on the cloud ( a valid concern , but is it valid ?) .

During my recent startup sting, cloud Allergy word was coined, and it did make sense, because allergies do exist and you need to go past them , and need to worry about the only life threating ones.

My Early Internet Days

I remember the year 1996 , when I had created my 1st email address — –,  20 years back we were apprehensive about using our real name as part of the email address, now past 20 years, only hackers and late night chat rooms create fake ids. In the year 2001, when I got my 1st credit card ( $500 credit limit ), using it for online shopping was a taboo, in fact till mid of 2005 I had paid my PG&E bill in person at the authorized facility . With the mindset, The fear was not to use personal or financial information over the public internet.

Changing Notion

Come the year 2013 ( within 15 years ) , using a credit card is a norm, giving credit card number to a Comcast agent seating overseas is a trivial and nonissue.  With the notion of facebook, whatsApps, SnapChat and many more social Apps, we take pride and  effort to share personal and important moments with our — extended Social Families — (Yes, just coined a new word ).  With google search data retention capability, I tell my customer — Google Knows you more than your Wife or partner — Most of us take backup of most important documents by sending via email to yourself.

Most importantly (Kaiser Permanente, a leading national HMO) has all the personal information about your recent visits, vaccination and secured messaging through their enhanced portal .

With the mobile banking capability taking a photo of cheque and depositing is just another norm.

With the changing notion, we will go past the – cloud allergy — behaviour and some of the security questions and concerns will be trivial or non-issue.

Giant Cloud Providers and Security Capabilities

At times if you look at the public clouds, AWS, Google, and MS Azure, these giants are able to attract more talented individuals than most of the companies small to mid-size companies. With cloud being their core focus, they have hundreds of brilliant minds dedicated to security. A company with a modest budget can not match the level of expertise prominent cloud providers can spend on security. Unlike earlier, Fast Deployment, Lower Costs, and Rapid Time to Value have assumed advantages of cloud, security will/is achieving the same level of confidence.

Public clouds at times are much safer than the internal network ( Sony and Target hacking were the best example we all can use )

Trust in and adoption of cloud computing continues to grow despite persistent cloud-related security and compliance concerns. Such is the overarching takeaway of Intel Security’s recent report, “Blue Skies Ahead? The State of Cloud Adoption.” – See more at

Different Cloud Service Models :  

With the evolving nature of the cloud, Understanding the relationships and dependencies between different cloud servicing models are critical to understanding cloud computing security risks. IaaS is the foundation of all cloud services, with PaaS building upon IaaS, and SaaS, in turn, building upon PaaS as described in the Cloud.

** Infrastructure as a Service (IaaS), delivers computer infrastructure (typically a platform virtualization environment) as a service, along with raw storage and networking. Rather than purchasing servers, software, data-center space, or network equipment, clients instead buy those resources as a fully outsourced service.

** Software as a service (SaaS), sometimes referred to as “on-demand software,” is a software delivery model in which software and its associated data are hosted centrally (typically in the (Internet) cloud) and are typically accessed by users using a thin client, normally using a web browser over the Internet.

** Platform as a service (PaaS), is the delivery of a computing platform and solution stack as a service. PaaS offerings facilitate deployment of applications without the cost and complexity of buying and managing the underlying hardware and software and provisioning hosting capabilities. This provides all of the facilities required to support the complete life cycle of building and delivering web applications and services entirely available from the Internet.

** Definitions are taken from the internet.

** The figure below shows an example of how a cloud service mapping can be compared against a catalogue of compensating controls to determine which controls exist and which do not — as provided by the consumer, the cloud service provider, or a third party. This can, in turn, be compared to a compliance framework or set of requirements such as PCI DSS, as shown.


** Mapping the Cloud Model to the Security Control & Compliance


** Text and Figure Taken from CSA (Cloud Security Alliance).



Customer needs to be made aware of what they are considering moving to the cloud. Not every dataset moved to the cloud, needs the same level of security. For low critical dataset, lower security can be used. For a high-value dataset with audit, compliance requirement might entail audit and data retention requirements, for high-value dataset with no regularity compliance restrictions, there could me need for more technical security than the data retention. In short, there would be always a place for all type of dataset in the cloud.


Posted in Uncategorized | Tagged: , , , , , , , , | Leave a Comment »

Need for Defining Reference Architecture For Big Data

Posted by sranka on May 7, 2014

Hi Fellow Big Data Admirers ,

With big data and analytics playing an influential role helping organizations achieve a competitive advantage, IT managers are advised not to deploy big data in silos but instead to take a holistic approach toward it and define a base reference architecture even before contemplating positioning the necessary tools. 

My latest print media article (5th in the series) for CIO magazine (ITNEXT) talks extensively about need of reference architecture in Big Data

Click Here For : Need For Defining Big Data Reference Architecture


Hope you Enjoy Reading this.

Hope this helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

For copy of May 2014 IT Next Magazine please visit ( My Article is on Page 37 )



Posted in Big Data | Tagged: , , , , , , , , , , , , | Leave a Comment »

How to find out a table type in Hive Metastore.

Posted by sranka on April 10, 2014

Hi All

As Hive metastore is getting into the center of nervous system for the different type of  SQL engines like Shark and Impala. It getting equally difficult to distinguish type of table created in Hive metastore. Eg. if we create a impala table using impala shell you will see the same table on hive prompt and vice versa. See the below example


Step 1 : “Create Table” in Impala Shell and “Show Table” On HIVE Shell

[] > create table impala_table ( id bigint);

[] > show tables 'impala_table';

Query: show tables 'impala_table'
Query finished, fetching results ...
| name             |
| impala_table |
Returned 1 row(s) in 0.01s

hive> show tables 'impala_table';
Time taken: 0.073 seconds

Step 2 : “Create Table” in Hive Shell and “Show Table” On Impala Shell

hive> create table hive_table ( id bigint);
Time taken: 0.058 seconds

Step 3 : Invalidate Metadata on Impala Shell ( This may not be needed always )

[] > invalidate metadata;
Query: invalidate metadata
Query finished, fetching results ...

Returned 0 row(s) in 5.11s

Step 4 : “Show Table” On Impala Shell


[] > show tables 'hive_table';
Query: show tables 'hive_table'
Query finished, fetching results ...
| name       |
| hive_table |
Returned 1 row(s) in 0.01s

In short this proves that tables are visible in both shells. Use describe formatted <table name>  command to find out the details. Storage Desc Params will show a value “serialization.format” for hive table, where in for Impala table, we will not have any value.


hive> describe formatted hive_table;
# col_name              data_type               comment

id                      bigint                  None

# Detailed Table Information
Database:               default
Owner:                  rsunil
CreateTime:             Thu Apr 10 13:13:09 PDT 2014
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               hdfs://
Table Type:             MANAGED_TABLE
Table Parameters:
transient_lastDdlTime   1397160789

# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:

serialization.format    1

Time taken: 0.115 seconds


hive> describe formatted impala_table;
# col_name data_type comment

id bigint None

# Detailed Table Information
Database: default
Owner: rsunil
CreateTime: Thu Apr 10 13:10:30 PDT 2014
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://
Table Parameters:
transient_lastDdlTime 1397160630

# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
Compressed: No
Num Buckets: 0
Bucket Columns: []
Sort Columns: []
Time taken: 0.185 seconds



For tables created in impala with Parquet format will give below class exception.

hive> describe formatted parquet_ob_mdm_et28;
FAILED: RuntimeException java.lang.ClassNotFoundException: com.cloudera.impala.hive.serde.ParquetInputFormat</pre>

Hope this helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Big Data | Tagged: , , , , , , , , | 1 Comment »

HDFS Free Space Command

Posted by sranka on March 17, 2014

Hi All

With increasing data  volume , in HDFS space could be continued challenge. While running into some space related issue, following command came very handy, hence thought of sharing with extended virtual community.

At times it gets challenging to know how much of actual space a directory or a file is using.  Having a command which can give you human readable format of size is always useful.  Below command shows how to get actual human readable file size on HDFS

hdfs dfs -du -h /

241.3 G  /app
9.8 G    /benchmarks
309.6 G  /hbase
0        /system
59.6 G   /tmp
20.0 G   /user
[sranka@devHadoopSrvr06 ~]$


hadoop dfsadmin -report

Post running the command, below is the result, it takes all the nodes in the cluster and gives the detail break-up based on the space availability and spaces used.

Configured Capacity: 13965170479105 (12.70 TB)
Present Capacity: 4208469598208 (3.83 TB)
DFS Remaining: 2120881930240 (1.93 TB)
DFS Used: 2087587667968 (1.90 TB)
DFS Used%: 49.60%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

Datanodes available: 5 (5 total, 0 dead)

Live datanodes:
Name: (
Rack: /default
Decommission Status : Normal
Configured Capacity: 2793034095821 (2.54 TB)
DFS Used: 381953257472 (355.72 GB)
Non DFS Used: 1986904386765 (1.81 TB)
DFS Remaining: 424176451584 (395.05 GB)
DFS Used%: 13.68%
DFS Remaining%: 15.19%
Last contact: Mon Mar 17 12:43:05 PDT 2014

Name: (
Rack: /default
Decommission Status : Normal
Configured Capacity: 2793034095821 (2.54 TB)
DFS Used: 402465816576 (374.83 GB)
Non DFS Used: 1966391827661 (1.79 TB)
DFS Remaining: 424176451584 (395.05 GB)
DFS Used%: 14.41%
DFS Remaining%: 15.19%
Last contact: Mon Mar 17 12:43:05 PDT 2014

Name: (
Rack: /default
Decommission Status : Normal
Configured Capacity: 2793034095821 (2.54 TB)
DFS Used: 391020421120 (364.17 GB)
Non DFS Used: 1977837223117 (1.80 TB)
DFS Remaining: 424176451584 (395.05 GB)
DFS Used%: 14.00%
DFS Remaining%: 15.19%
Last contact: Mon Mar 17 12:43:06 PDT 2014

Name: (
Rack: /default
Decommission Status : Normal
Configured Capacity: 2793034095821 (2.54 TB)
DFS Used: 389182472192 (362.45 GB)
Non DFS Used: 1979675172045 (1.80 TB)
DFS Remaining: 424176451584 (395.05 GB)
DFS Used%: 13.93%
DFS Remaining%: 15.19%
Last contact: Mon Mar 17 12:43:04 PDT 2014

Name: (
Rack: /default
Decommission Status : Normal
Configured Capacity: 2793034095821 (2.54 TB)
DFS Used: 522965700608 (487.05 GB)
Non DFS Used: 1845892140237 (1.68 TB)
DFS Remaining: 424176254976 (395.04 GB)
DFS Used%: 18.72%
DFS Remaining%: 15.19%
Last contact: Mon Mar 17 12:43:05 PDT 2014

Hope This Helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , , , | Leave a Comment »

How To Run Graph In Endeca Outside of Endeca Server

Posted by sranka on January 15, 2014

Hi All

Recently at one of the client,  we had a situation , where in Hadoop was taking lot longer that anticipated time to generate a file. The graph needed the file as an input, but since file was not getting generated on time, Endeca graph was picking up partially created file, causing data issue.  After looking into the issue, the best bet was to have a task dependency, we looked into running clover ETL from command line, but due to some product limitation, we were not able to run the graph from command line.

After discussing with Chris Lynskey from Oracle (original Endeca team) , I found that following simpleHttpApi could work:


Parameter: graphID
Description: Text Id, which is unique in specified sandbox. File path relative to sandbox root .

e.g. graph%2FLoadViewDefinitions.grf , the “/” needs to encoded to “%2F”
Mandatory: Yes

Parameter: sandbox
Description: Sandbox code

In cluster mode it’s ID of node which should execute the job. However it’s not final. If the graph is distributed, or the node is disconnected, the graph may be executed on some another node.

Description: MESSAGE | FULL
Mandatory: No
Default: MESSAGE

For more HTTP API please refere to Latitude Data Integrator Server Guide .

Hope this helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , , , , , | Leave a Comment »

How To retrieve/backup Views In Endeca

Posted by sranka on January 3, 2014

Hi All,

Last few weeks I have been engaged with a customer, helping them them with remediation of Endeca project. During remediation, faced a typical challenge, where all the graphs and EQLs were erroring out. After doing some research found out that its a known issue . I spent good amount of time on this issue, hence thought of sharing this trivial but useful information .

Issue :

Endeca views gets lost during development, causing all the dependent Graphs, EQLs to error out.

Root Cause :

Unknown ( Could be a potential product Issue)

Solution :

After adding all the view definitions, run Export View definition Graph provided by endeca example, see below link for details.

after running the Export Graph, the view definitions gets stored in a XML file in view-manager directory under config-in , see the picture below :


Take a back-up of the file  and store at secured location or version control the file. In case of a view definition lose, Go to integration server URL and click on following :

SandBox –> Project Name –>Config-in –> View-manager –> viewDefinition.xml –> fileEditor


Copy paste the content and hit UpdateFile button. Once you click on UpdateFile button, than run the Import View Definition , shown in the graph below.

Hope this helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , , , , , , , | Leave a Comment »

Bye Bye 2013 !!! Year of Big Data

Posted by sranka on January 3, 2014

Hi All,

Wishing you all readers!! a very happy new year. 2013 is over and dawn of 2014 has arrived. It just feel like yesterday and now we are here sitting and waiting for the year number to change. By the time I am writting blog, Australia, Mumbai and Dubai has already seen the dawn of new year. Hoping to finish this writing before the dawn of New York

2013 A Year Of awareness and Adoption

What a year !!!! year of Big Data adoption, cloud BI , and year of crunching more and more data. Year started with client talking more about cohesive architecture of Traditional BI and Big Data.  Talked with many clients making them understand  need of Big Data and qualifying or disqualifying their Big Data use case.

From Strategy To Real World Use

Now that business analytics are here and enterprises are grappling with their own Big Data, it’s time to set some technical strategies in motion to harness these assets. Fortunately, solutions for the data center that can deliver both high performance computing (HPC) and Big Data analytics are becoming increasingly scalable and affordable, even for medium-size businesses. With availability of solutions , Big Data is getting easier and easier with more affordable options.

OOW 2013, year of America’s Cup, Cloud and Big Data !!!!

Open world was like a Saga,  Larry arranged a big red carpet with wide screen TV and live telecast of America’s cup. Never enjoyed a sport in this setting, a sport about which most of the viewer had no clue, but glued to the wide screen, cheering for Larry’s team. Larry did it again, by the time results were announced every one knew about the sports. In my memory 2013 OOW would remembered as America’s cup venue and “NO SHOW” from Larry. For me listing to Larry live is highlight of OOW, but luckily I caught him on Sunday. Hats Off!!! to  Thomas Kurian for jumping in live and show casing the demo , which had no meaning for most of the audiece, but I must say only LARRY ELLISION can do a NO SHOW in front of thousands of audience with strogest media presence. Still he is the most loved/hated technological person.

This year open world was all bout Cloud and Big Data, now one could say that oracle is not a cloud company, in memory database was another talk of the talk. Apart from few feature, there was not much in BI space. The best feature, so far I felt was self service option in Endeca. As usual, Big party at treasure Island was fun. 

OOW2013 OOW13-2 OOW13-1 OOW13-3

2014 would be a year of Big Data, it would be a natural conversion of BI customers to implement Big Data. 

2014 was a great year from personal as well professional point of view. We had our share of joy and happiness, with multiple flights and many segment, still managed to spent time with kids. With few regrets, looking forward for a positive outlook of 2014.

In all 2013 was a satisfactory year, waiting for 2014. Thank You for all your continue support and looking forward for the same in 2013.

Wishing Every one a Happy and Prosperous New Year, Be Safe!!!!!!


Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , , , , , | Leave a Comment »

Hadoop Data Replication Strategy

Posted by sranka on October 17, 2013

Hi All

With replication and fault tolerance, an inbuilt feature of Hadoop. I was always curious to know how blocks are replicated. Got this information while reading “Hadoop The Definitive Guide Edition – 3 ”  in chapter 3 “The Hadoop Distributed Filesystem”. Thought would be interesting to share.

  • How does the namenode choose which datanodes to store replicas on?

Hadoop’s default strategy is to place the first replica on the same node as the client (for clients running outside the cluster, a node is chosen at random, although the system tries not to pick nodes that are too full or too busy). The second replica is placed on a different rack from the first (off-rack), chosen at random. The third replica is placed on the same rack as the second, but on a different node chosen at random. Further replicas are placed on random nodes on the cluster, although the system tries to avoid placing too many replicas on the same rack.

 The above entire text has been taken from Chapter 3 of “Hadoop The Definitive Guide Edition – 3 “

Hope This helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Big Data, Hadoop | Tagged: , , , , , , , | Leave a Comment »

How To Run Hadoop Benchmarking TestDFSIO on Cloudera Clusters

Posted by sranka on October 9, 2013

Hi All

Out of the box hadoop provides a benchmarking mechanism for your cluster. While doing the same on Cloudera cluster, it was a fun ride, hence thought will share the same to reduce the pain and increase the fun.

Before you begin anything, set the HADOOP_HOME.The below command would work for RHEL.


For CDH “TestDFSIO” resides in — hadoop-mapreduce-client-jobclient-<version>-cdh<version>-tests.jar — in “lib/hadoop-mapreduce/” under “Cloudera Home Directory” in my case :


You will need to run read and write Test Benchmark as below :

hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-cdh4.3.0-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 1000

hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.0.0-cdh4.3.0-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 1000

Once you run the test you will see “TestDFSIO_results.log”  file in the same directory. The content of the file would look below :

----- TestDFSIO ----- : write
 Date & time: Wed Oct 09 14:56:14 PDT 2013
 Number of files: 10
Total MBytes processed: 10000.0
 Throughput mb/sec: 5.382930941302368
Average IO rate mb/sec: 5.390388488769531
 IO rate std deviation: 0.20763769922620628
 Test exec time sec: 211.457

----- TestDFSIO ----- : read
 Date & time: Wed Oct 09 14:57:47 PDT 2013
 Number of files: 10
Total MBytes processed: 10000.0
 Throughput mb/sec: 48.88230607167124
Average IO rate mb/sec: 49.50707244873047
 IO rate std deviation: 5.8465670196729596
 Test exec time sec: 39.954

Based on the numbers aboove, below would be the read and write Throughput across the cluster.

Total Read Throughput Across Clusters (Number of files * Throughput mb/sec) = 488.8MB/Sec
Total Write Throughput Across Clusters(Number of files * Throughput mb/sec) = 53.82 MB/Sec<br />

Hope This helps

Happy Benchmarking !!!

Sunil S Ranka
“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , , , , , , , , | 1 Comment »

%d bloggers like this: