Querying the Data Lake

The Apache Open Source contributions to Hadoop are numerous and cover a broad portion of a reference architecture. It has been some time since we considered foundational low cost storage and in-place query capabilities. And as we saw in the "Data Lakes" blog posting, many organizations utilized this foundational offering.

The low cost commodity hardware could be scaled to align with a growing appetite and commensurate increased workloads. Queries in-place reduced dramatically the need for manual programming efforts traditionally required to extract, transform and load data into relational data models. Queries could be constructed using similar tools and techniques, albeit not identical and often necessitating some degree of retraining, in a simple (okay, simpler) manner than traditional methods. There were even attempts to leverage existing business queries remapped to Hadoop equivalents. Some actually worked. But the query experience from the business community missed expectations, occasionally taking hours/days to complete when their relational equivalents were completed in seconds or minutes.

Although the reduced cost of ownership for Hadoop was quite favorable, queries that took several hours to complete were a tremendous setback. The hopes and aspirations of using a centralized data store for operational reporting and even potentially operational analytics appeared quite doubtful. Many different attempts to combine hybrid components and arrange them in differing orders to address performance concerns failed, some failing quite miserably. An in-memory approach was surely needed. And one was created.

Apache Spark is a massively scalable, distributed in-memory parallel query software that extended the Hadoop footprint by providing in-memory query capability with exceptionally fast response times. In some cases, benchmarks of Apache Spark well-outperformed relational counterparts in similar size or volume tests. So we add Spark to our architectural illustration below. The Data Lake concept now has commodity storage, in-place query still avoiding expensive manual processes and an in-memory query processing component called Spark.

Key Aspect	Response	NexJ DAI
Hadoop EcoSystem	HDFS Low Cost Commodity Storage Hive In-Place Query	NexJ DAI Integrates with Hadoop Data Lakes as a potential source system
Hadoop EcoSystem	Spark In-Memory Query	NexJ DAI Provisions Semantic View Data consumable through a Spark Adapter

How does your organization use a Data Lake? What 360-degree data views power your analytics? We welcome your thoughts, value your insights and action your feedback: share below!

Don't capsize your client relationships with poor info!

Trying to tell corporate and investment bankers about the importance of understanding their customers is a lot like listing the benefits of butter to a pastry chef. They get it. They know it really, really well. And yet, considering we are in the business of Customer Relationship Management, I sometimes can’t help but question what this means.

Read Post

Here’s How Leveraging Client Information Can Help You Become The Banker of Choice

Innovation and a global approach

Read Post

When It Comes To CRM, User Adoption Is Always A Challenge. Here's How One Of Our Biggest Clients Pulled It Off

You may have the best possible CRM solution on the market, a deep vertical award-winning software geared to meet your every need. What you will always struggle with is poor user adoption.

Read Post

Querying the Data Lake

Don't Miss Out Subscribe Today!

Search Our Blog

Categories

Popular Posts

subscribe today

Industry Solutions

Our Company

Resource Center

Contact us

Related CRM Topics for the Finance & Insurance Industry

Querying the Data Lake

Don't Miss Out Subscribe Today!

Search Our Blog

Categories

Popular Posts

Related Posts

How Your CRM Manages Relationship Hierarchy Can Define How Well You Know Your Customers

Don't capsize your client relationships with poor info!

Here’s How Leveraging Client Information Can Help You Become The Banker of Choice

When It Comes To CRM, User Adoption Is Always A Challenge. Here's How One Of Our Biggest Clients Pulled It Off

subscribe today

Industry Solutions

Our Company

Resource Center

Contact us

Related CRM Topics for the Finance & Insurance Industry