Sunil S. Ranka's Weblog

Superior Data Analytics is the antidote to Business Failure

Posts Tagged ‘Sunil Ranka’

What is Oracle Business Intelligence Cloud Service ( BICS )

Posted by sranka on August 11, 2016

Recently we have been getting lots of traction on BICS , existing OBIEE customers been asking for BICS . In nutshell :

BI Cloud Service enables organisations of all sizes to quickly and cost effectively deploy business intelligence with the simplicity of the cloud..

Silent features of BICS :

  • No need of software installation
  • No need of software maintenance
  • No upfront costs, low monthly subscription
  • Customers can get started in hours
  • 100% cloud based
  • Robust reporting with interactive visuals, auto-suggestions, detailed formatting, export, and more
  • Powerful analytics platform with advanced calculations and analytic functions
  • Easy self-serve data loading
  • Rich data integration options
  • Mobile access with no extra programming required
  • Comprehensive sharing framework
  • Role-based fine grain security
  • Simple self-service administration

Key Benefits :

  • Fast access and low cost speed time to value
  • Quick start means users are productive quickly
  • A single BI platform for all users helps consolidate analytic investments
  • Timely access to data means greater impact
  • Streamlined operations and reduced burden on IT

Summary :

Based on my past experience of working on OBIEE on premise, BICS is a good alternative for any IT organisation, BICS gives all the needed feature of On-Premise installation and flexibility of operation, management , and most importantly low cost solution. In next few post, I will describe more about BICS tool and features.

Hope this helps

Sunil S Ranka

“Superior Data Analytics is the antidote to Business Failure”

 

Posted in Uncategorized | Tagged: , , , , | Leave a Comment »

Cloud Allergy – Clouds Security and Changing Notion

Posted by sranka on June 30, 2016

With my recent role as CTO/Advisor with www.analytos.com, during most of my conversation with Analytics leaders within the company, all are concern over security. At a recent conversation with another entrepreneur friend, one of his solution was stalled due to SQL injection issue on the cloud ( a valid concern , but is it valid ?) .

During my recent startup sting, cloud Allergy word was coined, and it did make sense, because allergies do exist and you need to go past them , and need to worry about the only life threating ones.

My Early Internet Days

I remember the year 1996 , when I had created my 1st email address — coolguy123@yahoo.com –,  20 years back we were apprehensive about using our real name as part of the email address, now past 20 years, only hackers and late night chat rooms create fake ids. In the year 2001, when I got my 1st credit card ( $500 credit limit ), using it for online shopping was a taboo, in fact till mid of 2005 I had paid my PG&E bill in person at the authorized facility . With the mindset, The fear was not to use personal or financial information over the public internet.

Changing Notion

Come the year 2013 ( within 15 years ) , using a credit card is a norm, giving credit card number to a Comcast agent seating overseas is a trivial and nonissue.  With the notion of facebook, whatsApps, SnapChat and many more social Apps, we take pride and  effort to share personal and important moments with our — extended Social Families — (Yes, just coined a new word ).  With google search data retention capability, I tell my customer — Google Knows you more than your Wife or partner — Most of us take backup of most important documents by sending via email to yourself.

Most importantly kp.org (Kaiser Permanente, a leading national HMO) has all the personal information about your recent visits, vaccination and secured messaging through their enhanced portal .

With the mobile banking capability taking a photo of cheque and depositing is just another norm.

With the changing notion, we will go past the – cloud allergy — behaviour and some of the security questions and concerns will be trivial or non-issue.

Giant Cloud Providers and Security Capabilities

At times if you look at the public clouds, AWS, Google, and MS Azure, these giants are able to attract more talented individuals than most of the companies small to mid-size companies. With cloud being their core focus, they have hundreds of brilliant minds dedicated to security. A company with a modest budget can not match the level of expertise prominent cloud providers can spend on security. Unlike earlier, Fast Deployment, Lower Costs, and Rapid Time to Value have assumed advantages of cloud, security will/is achieving the same level of confidence.

Public clouds at times are much safer than the internal network ( Sony and Target hacking were the best example we all can use )

Trust in and adoption of cloud computing continues to grow despite persistent cloud-related security and compliance concerns. Such is the overarching takeaway of Intel Security’s recent report, “Blue Skies Ahead? The State of Cloud Adoption.” – See more at http://www.baselinemag.com/cloud-computing/slideshows/cloud-deployments-grow-despite-security-concerns.html#sthash.nXNytNaT.dpuf

Different Cloud Service Models :  

With the evolving nature of the cloud, Understanding the relationships and dependencies between different cloud servicing models are critical to understanding cloud computing security risks. IaaS is the foundation of all cloud services, with PaaS building upon IaaS, and SaaS, in turn, building upon PaaS as described in the Cloud.

** Infrastructure as a Service (IaaS), delivers computer infrastructure (typically a platform virtualization environment) as a service, along with raw storage and networking. Rather than purchasing servers, software, data-center space, or network equipment, clients instead buy those resources as a fully outsourced service.

** Software as a service (SaaS), sometimes referred to as “on-demand software,” is a software delivery model in which software and its associated data are hosted centrally (typically in the (Internet) cloud) and are typically accessed by users using a thin client, normally using a web browser over the Internet.

** Platform as a service (PaaS), is the delivery of a computing platform and solution stack as a service. PaaS offerings facilitate deployment of applications without the cost and complexity of buying and managing the underlying hardware and software and provisioning hosting capabilities. This provides all of the facilities required to support the complete life cycle of building and delivering web applications and services entirely available from the Internet.

** Definitions are taken from the internet.

** The figure below shows an example of how a cloud service mapping can be compared against a catalogue of compensating controls to determine which controls exist and which do not — as provided by the consumer, the cloud service provider, or a third party. This can, in turn, be compared to a compliance framework or set of requirements such as PCI DSS, as shown.

Picture1

** Mapping the Cloud Model to the Security Control & Compliance

 

** Text and Figure Taken from CSA (Cloud Security Alliance).

 

Conclusion:

Customer needs to be made aware of what they are considering moving to the cloud. Not every dataset moved to the cloud, needs the same level of security. For low critical dataset, lower security can be used. For a high-value dataset with audit, compliance requirement might entail audit and data retention requirements, for high-value dataset with no regularity compliance restrictions, there could me need for more technical security than the data retention. In short, there would be always a place for all type of dataset in the cloud.

 

Posted in Uncategorized | Tagged: , , , , , , , , | Leave a Comment »

Big Data – Tez, MR, Spark Execution Engine : Performance Comparison

Posted by sranka on February 25, 2016

There is no question that massive data is being generated in greater volumes than ever before. Along with the traditional data set, new data sources as sensors, application logs, IOT devices, and social networks are adding to data growth. Unlike traditional ETL platforms like Informatica, ODI, DataStage that are largely proprietary commercial products, the majority of Big ETL platforms are powered by open source.

With many execution engines, customers are always curious about their usage and performance.

To put it into perspective, In this post I am running set of query against 3 key Query Engines namely Tez, MapReduce, Spark (MapReduce) to compare the query execution timings.

create external table sensordata_csv
(
ts string,
deviceid int,
sensorid int,
val double
)
row format delimited
fields terminated by '|'
stored as textfile
location '/user/sranka/MachineData/sensordata'
;

drop table sensordata_part;

create table sensordata_part
(
deviceid int,
sensorid int,
val double
)
partitioned by (ts string)
clustered by (deviceid) sorted by (deviceid) into 10 buckets
stored as orc
;

"**********************************************"
"** 1) Baseline: Read a csv without Tez"
" set hive.execution.engine=mr"
" select count(*) from sensordata_csv where ts = '2014-01-01'"
"**********************************************"
2016-02-25 02:57:27,444 Stage-1 map = 0%,  reduce = 0%
2016-02-25 02:57:35,880 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.84 sec
2016-02-25 02:57:44,420 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.99 sec
MapReduce Total cumulative CPU time: 4 seconds 990 msec
Ended Job = job_1456183816302_0046
MapReduce Jobs Launched:
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 4.99 sec   HDFS Read: 3499156 HDFS Write: 6 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 990 msec
OK
16733
Time taken: 32.524 seconds, Fetched: 1 row(s)

"**********************************************"
"** 2) Read a csv with Tez"
" set hive.execution.engine=tez"
" select count(*) from sensordata_csv where ts = '2014-01-01'"
"**********************************************"
Total jobs = 1
Launching Job 1 out of 1

Status: Running (application id: application_1456183816302_0047)

Map 1: -/-    Reducer 2: 0/1
Map 1: 0/1    Reducer 2: 0/1
Map 1: 0/1    Reducer 2: 0/1
Map 1: 0/1    Reducer 2: 0/1
Map 1: 1/1    Reducer 2: 0/1
Map 1: 1/1    Reducer 2: 1/1
Status: Finished successfully
OK
16733
Time taken: 16.905 seconds, Fetched: 1 row(s)

"**********************************************"
"** 3) Read a partition with Tez"
" select count(*) from sensordata_part where ts = '2014-01-01'"
"**********************************************"
Total jobs = 1
Launching Job 1 out of 1
Status: Running (application id: application_1456183816302_0047)

Map 1: -/-    Reducer 2: 0/1
Map 1: 0/2    Reducer 2: 0/1
Map 1: 1/2    Reducer 2: 0/1
Map 1: 2/2    Reducer 2: 0/1
Map 1: 2/2    Reducer 2: 1/1
Status: Finished successfully
OK
16733
Time taken: 6.503 seconds, Fetched: 1 row(s)

"**********************************************"
"** 4) Read a partition with Spark"
" select count(*) from sensordata_part where ts = '2014-01-01'"
"**********************************************"

Time taken: took 5.8 seconds

"**********************************************"
"** 5) Read a csv with Spark"
" select count(*) from sensordata_csv where ts = '2014-01-01'"
"**********************************************"
Time taken: took 4.5 seconds

Query 1select count(*) from sensordata_csv where ts = ‘2014-01-01’

Query 2select count(*) from sensordata_part where ts = ‘2014-01-01’

Below tables shows the execution timings :
Screen Shot 2016-02-24 at 11.07.03 PM

Conclusion Which Engine is right :

Spark being In memory execution engine comes out to be a clear winner, but in certain scenario especially in the current scenario of running query on partition table TEZ execution engines comes closer to spark.

With this you can not conclude that you Spark will solve your — World Hunger Problem — of Big ETL, being continuously growing product Spark has its own challenges when it comes to productization of the Spark workload, same holds True with TEZ. In all MR engine has been around for the most time and its been the core of HDFS framework, for mission critical workloads which are not time bound, MR could be the best choice.

Hope This Helps

Sunil S Ranka

About Spark : http://spark.apache.org/

About MapReduce : https://en.wikipedia.org/wiki/MapReduce

About Tez : https://tez.apache.org/

Posted in Hadoop | Tagged: , , , , , , , , , , | 1 Comment »

Big Data : Right Approach Right Solution

Posted by sranka on February 1, 2014

Hi All,

Past few months I have been meeting with clients and discussing their potential need of Big Data. The discuss gets to the bottom of , do they really need the Big Data ? The below link to my ITNext article talks about As big data goes bigger,IT managers are challenged with the task of identifying data that qualifies for big and finding appropriate solutions to process it.

Click Here To Read Full Article  : Right Approach, Right Solution 

Hope This Helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , , , , , | Leave a Comment »

Open Source Big Data Technologies

Posted by sranka on January 29, 2014

Hi All

While doing a comparison analysis for building a reference architecture for Big Data technology stumbled on a very impressive Open source Big Data Technology mashup . Thanks to http://www.bigdata-startups.com/ . The most impressive part of this mashup is breaking the whole Big Data operational paradigm into multiple stages and giving available opensource technology.

Open Source Big Data Techonologies

Hope This Helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , | Leave a Comment »

How To Run Graph In Endeca Outside of Endeca Server

Posted by sranka on January 15, 2014

Hi All

Recently at one of the client,  we had a situation , where in Hadoop was taking lot longer that anticipated time to generate a file. The graph needed the file as an input, but since file was not getting generated on time, Endeca graph was picking up partially created file, causing data issue.  After looking into the issue, the best bet was to have a task dependency, we looked into running clover ETL from command line, but due to some product limitation, we were not able to run the graph from command line.

After discussing with Chris Lynskey from Oracle (original Endeca team) , I found that following simpleHttpApi could work:

http://<server>:<Port>/clover/simpleHttpApi/graph_run?sandbox=<sandbox>&graphID=<graphName>&nodeID=node01&verbose=MESSAGE

Parameter: graphID
Description: Text Id, which is unique in specified sandbox. File path relative to sandbox root .

e.g. graph%2FLoadViewDefinitions.grf , the “/” needs to encoded to “%2F”
Mandatory: Yes

Parameter: sandbox
Description: Sandbox code
Mandatory:Yes

Parameter:nodeID
Description:
In cluster mode it’s ID of node which should execute the job. However it’s not final. If the graph is distributed, or the node is disconnected, the graph may be executed on some another node.
Mandatory:No

Parameter:verbose
Description: MESSAGE | FULL
Mandatory: No
Default: MESSAGE

For more HTTP API please refere to Latitude Data Integrator Server Guide .

Hope this helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , , , , , | Leave a Comment »

How To retrieve/backup Views In Endeca

Posted by sranka on January 3, 2014

Hi All,

Last few weeks I have been engaged with a customer, helping them them with remediation of Endeca project. During remediation, faced a typical challenge, where all the graphs and EQLs were erroring out. After doing some research found out that its a known issue . I spent good amount of time on this issue, hence thought of sharing this trivial but useful information .

Issue :

Endeca views gets lost during development, causing all the dependent Graphs, EQLs to error out.

Root Cause :

Unknown ( Could be a potential product Issue)

Solution :

After adding all the view definitions, run Export View definition Graph provided by endeca example, see below link for details.

https://wikis.oracle.com/display/endecainformationdiscovery/EID+3.0+Export+View+Configuration

after running the Export Graph, the view definitions gets stored in a XML file in view-manager directory under config-in , see the picture below :

Endeca

Take a back-up of the file  and store at secured location or version control the file. In case of a view definition lose, Go to integration server URL and click on following :

SandBox –> Project Name –>Config-in –> View-manager –> viewDefinition.xml –> fileEditor

Endeca1

Copy paste the content and hit UpdateFile button. Once you click on UpdateFile button, than run the Import View Definition , shown in the graph below.

https://wikis.oracle.com/display/endecainformationdiscovery/EID+3.0+Import+View+Configuration

Hope this helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , , , , , , , | Leave a Comment »

How To Find Size Of Table In Hive / HDFS

Posted by sranka on November 19, 2013

Hi All

Volume on BigData being the constant challenge, as an administrator, you will have to keep a tab on the data growth, at the same time you need to make sure there is spurge growth of unwanted objects or folders. Typically you would want to be worried about the data growth in GB size. Hence below is the script which you could use to translate your current folder size to GB. Anything below GB would be shown as 0. This is a simple script, you can modify to track MB level details as well. Just change the multiplier factor of 1024.

sudo -u hdfs hadoop fs -du /app/hadoop/hive/warehouse/ | awk '/^[0-9]+/ { print int($1/(1024**3)) " [GB]\t" $2 }'

Hope This Helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Big Data | Tagged: , , , , , , , , , | Leave a Comment »

Big Data : A Perfect data strom

Posted by sranka on July 25, 2013

Hi All,

Lately I have been spending lot of time on Big Data, its application, architecture and processes. As part of the process, my thoughts on Big Data in print Media. I hope you would enjoy reading.

http://issuu.com/itnext/docs/it_next-vol-04-issue-06-july-2013/43?e=1503387/3929020

Hope this helps.

Sunil S Ranka

“Superior BI is the antidote to Business Failure”

Posted in Uncategorized | Tagged: , , , , , , , | Leave a Comment »

Oracle Apache Hadoop Hive ODBC Driver For OBIEE

Posted by sranka on July 3, 2013

Hi All,

Last few months have been a crazy ride, wrapping up on new opportunities and getting upto speed on Hadoop took all the free time away . During my read on Hadoop , I always wondered why we dont have out of the box ODBC driver? I tried using out of the box ODBC driver given by HortonWorks and it worked up to certain extend but running using cloudera was a challenge. Finally to the sigh, with 11.1.1.7 Oracle has introduced Oracle Apache Hadoop Hive ODBC Driver For OBIEE.

So far extracting the data using Hadoop have never been a challenge, but using the data to make sense out of it has been a constant challenge.  With this integration, there would be some amount of relief with the existing investment people have made in OBIEE. Hive’s SQL dialect, called HiveQL doenst support the full SQL-92 specification, please refer 11.1.1.7 Metadata Repository Builder’s Guide for not supported features.

For details on where to start please refer 11.1.1.7 Metadata Repository Builder’s Guide.  For importing the metadata from Hive you will have to download ODBC driver from oracle support web, for details please refer DocID 1520733.1 .

In coming weeks I will be working more on Big Data approach and Solutions. 

Hope this helps

Sunil S Ranka

“Superior BI is the antidote to Business Failure

 

 

Posted in 11g, Me, OBIEE | Tagged: , , , , , , , , , , | Leave a Comment »

 
%d bloggers like this: