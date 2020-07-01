With the normal enterprise using hundreds of SaaS solutions, each with an unique database, it’s no wonder business leaders complain their data is siloed. Imagine now that a CEO wants to comprehend the relationship between data in these disparate systems.

All of a sudden, she’s looking at the world’s most confusing dashboard, all the while wondering: Can I trust these records? The CEO placates themselves with the information that at the very least they have data to look at. But in the end, it generates more questions than answers.

If you’re in a competitive industry (which we all are), it’s about time that Chief executive officers take their data analysis to another level. How to get it done? Three data complexities are in the core of every leader’s challenge to gain business advantages from their data:

Siloed data

Do you have trouble seeing your data at all? Are you mentally scanning your systems and realizing just how many databases you have? An enterprise organization might be collecting reams of data from its industrial operations but can’t derive the data’s value due to the siloed nature of its datacenter database. The data isn’t reaching any dashboard in a meaningful way – it is an universal problem. With enterprise data doubling every couple of years, it takes modern tools and strategies to keep up.

The company can begin the procedure of solving the problem by defining the company purpose of its industrial data – to predict demand in the coming months to avoid a shortfall. That business purpose, with buy-in at multiple corporate levels, drives the entire engagement and makes it possible for the company to keep the technology simple and focus on the end result. The result is clean, trustworthy, valuable data in a dashboard, which can be unlocked from the database and published.

Siloed data takes some elbow grease to access, but it becomes a lot easier when you have a goal in your mind for the data. It cuts through noise and helps you make decisions easier if you know what your location is going.

Untrustworthy data

Do you have trouble trusting your data? You have a dashboard, yet you’re confident the data is wrong, or a lot of it is missing. You can’t take action onto it, because you hesitate to trust it. Data trustworthiness is really a prerequisite to make your data action-oriented. But, most data has problems – missing values, invalid dates, duplicate values, and meaningless entries. If you don’t trust the numbers, you’re better off minus the data.

Data can there be for you to take action on, so you should really be able to trust it. One key strategy is to maybe not bog down your team with maintaining systems, but alternatively use simple, maintainable cloud-based systems that use modern tools to make your dashboard real.

No data

Often you don’t even have the data you will need to decide. “No data” will come in many forms:

You don’t track it. For example, you’re an e-commerce company that wants to comprehend how email promotions can help your sales, however, you don’t have a customer email list.

You track it however, you can’t get access to it. For example, you start collecting emails from clients, but your email SaaS system doesn’t allow you to export your emails. Your data is indeed siloed that it efficiently doesn’t exist for analysis.

You track it but need to do some calculations before you can put it to use. For example, you have a full customer email list, a listing of product purchases, and you simply need to join the 2 together. This is a great place to be and is where we see the great majority of businesses.

That means finding patterns and insights not just within datasets, but across datasets. This is just possible with a modern, cloud-native data lake.

Data Lakes

Step one for almost any data project – today, tomorrow and forever – is to define your organization need.

Do you will need to comprehend your customer better? Whether it is click behavior, email campaign engagement, order history, or customer care, your customer generates more data today than ever before, and the data can give you clues as to what she cares about.

Do you will need to comprehend your costs better? Most enterprises have hundreds of SaaS applications generating data from internal operations. Whether it’s manufacturing, purchasing, supply chain, finance, engineering, or customer care, your organization is generating data at an instant pace.

Don’t be overwhelmed. You can cut through the noise by defining your organization case.

The 2nd step in your data project is to take that business case and ensure it is real in a cloud-native data lake. Yes, a data lake. I know the definition of has been abused over time, but a data lake is very simple; it’s a way to centrally store all (all!) of one’s organization’s data, cheaply, in open source formats to make it easy to access from any direction.

Data lakes used to be expensive, difficult to manage, and bulky. Now, all major cloud providers (AWS, Azure, GCP) have established guidelines to keep storage dirt-cheap and data accessible and very flexible to work with. But data lakes are still hard to implement and require specialized, focused knowledge of data architecture.

How does a data lake solve the above problems?

Data lakes de-silo your data. Since the data stored in your data lake is all in the same spot, in open-source formats like JSON and CSV, there aren’t any technological walls to over come. You can query every thing in your data lake from a single SQL client. If you can’t, then that data is not in your data lake and you should carry it in. Data lakes give you visibility into data quality. Modern data lakes and expert consultants build in a variety of checks for data validation, completeness, lineage, and schema drift. These are important concepts that together tell you if your data is valuable or garbage. These types of patterns come together nicely in a modern, cloud-native data lake. Data lakes welcome data from anywhere and allow for flexible analysis across your entire data catalog. If you can format your data into CSV, JSON, or XML, then you can certainly put it in your data lake. This solves the problem of “no data.” It is very easy to create the relevant data, either by finding it in your organization, or engineering it by analyzing across your data sets. An example would be joining data from Sales (your CRM) and Customer Service (Zendesk) to find out which product category has the most useful or worst customer satisfaction scores.

If you’re struggling with one of these three core data issues, the perfect solution is to start with a crisp definition of one’s business need, and then build a data lake to execute on that need. A data lake is just a central repository for flexible and cheap data storage. If you concentrate on keeping your data lake simple and intended for the analysis you need for the business, these three core data problems will be a thing of days gone by.

This article was contributed by Robert Whelan, data engineering & analytics practice manager at 2nd Watch.