By Darshan Joshi, Chief
Technology Officer, CYTRIO
Numbers
show that frictionless data access offers significant business benefits. Over half of businesses
worldwide believe seamless data access has become even more critical in
decision-making in the last couple of years.
Yet,
practical data management problems are keeping many companies from reaching
data sharing nirvana. The diverse issues show that the moving data from
databases and applications to cloud data warehouses and cloud-based data lakes
has never been smooth. Many companies face data swamps and realize they need a
proper data platform, architecture blueprint, and governance to get to that
nirvana state.
Enter
data fabric and data mesh. The former is a design concept that serves as an
integrated layer (fabric) of data and connecting processes. In this way, it
leaves architectural choices open for data solution providers. Many of its
existing data integration patterns can be tweaked to leverage data catalog,
knowledge graphs, and dynamic data integration to construct data fabric.
Specifically, data fabric can be implemented with data lakes and data
warehouses. On the other hand, data mesh adopts a more humanized approach,
where teams take responsibility for specific data sources, mixing and matching
domain experience and data engineering know-how to create analytics-ready data
sets for business.
But
this is not an article about which data architecture companies should adopt.
Instead, as a data privacy practitioner, I like to point to the relatively
large gaping hole that everyone is missing. And that's data privacy. If not
addressed, companies are simply walking into a data-driven pothole.
The data privacy blind spot
Let's
begin with that gaping hole.
What
strikes me as highly odd is that even as regulators, governments, and consumers
are demanding privacy and visibility into data usage at the enterprise level,
neither the advocates of data mesh nor data fabric talk enough about privacy.
Yes, security is an inherent part of these architectures, but the apparent
deafening silence on data privacy seems to overlook a major data issue - one
that faces every data user.
Despite
being well-designed to solve the data access headache and address various
enterprise-wide use cases, data mesh and data fabric architectures only focus
on what we call "central data." This data resides in applications, databases,
data lakes, and data warehouses.
But
this is not the only type of data in which we interact. Critical enterprise
data also sits in office documents, which we call "non-central" or "edge data."
It's one reason why chief privacy officers fret over PDF files in your OneDrive
and office laptops that make data engineers and scientists grumble.
The
problem is that most data architectures prescribe how data flows from central
to edge and vice versa, but they do not factor in security and privacy of data
on the edge. Instead, they all see it as "central" data and leave it to the
companies to make the distinction via security and privacy frameworks and
enforcement. The problem is this makes data vulnerable to threats. And since
many of these architectures deal with disparate data sources, the security and
privacy threat exponentially increases.
One
can argue that data mesh holds on to the concept of data ownership for longer
than data fabric. Theoretically, it also addresses the data governance
questions. Still, neither approach stops an employee who exports data into a
PDF report from a physical or virtual data warehouse, even though that report
may contain sensitive data. Once exported, that data can be shared without any
regard for data governance rules that applied for that same data when it was
part of the data lake.
These
approaches also don't address someone exporting data using valid APIs. For
example, a sales team may export customer lists containing confidential
Personally Identifiable Information (PII) about the customer into a
spreadsheet. Then, they share this spreadsheet with others without any regard
for role-based access control (RBAC) or governance.
Get your privacy in order
first
Security
and privacy need to be coded into any data architecture. They must also apply
to both "central" and "non-central" data; otherwise, you only end up with a
notion of good data governance.
Businesses
also need to look for a consolidated view of their data that goes beyond the
use of "central" data. At the same time, they need an organization-wide
alignment on their security and privacy postures.
Keep
in mind that CXOs - think product, data, and information security, among others
- will have data access overlaps. So, it's critical that your data security and
privacy posture recognize these overlaps and establish transparent governance,
ownership, and communications.
No
matter which architecture you finally choose, you need to:
- Find a sound privacy and security partner who
can offer a single solution-led approach that takes into account your
organization's central and edge data
- Align your employees to your privacy and
security posture - from the CXO to the person downloading customer data on
Excel - to maximize your investment in a privacy and security solution
Have a privacy-first
mindset
The
bottom line is that without a privacy-first mindset and organization-wide
alignment, you will end up with fragmented solutions. This means you are
addressing your security and privacy elements in a fragmented manner, which
makes you reactive as new use cases and solutions may expose new privacy
holes.
Without
a single, enterprise-wide policy and approach, all data architectures - new and
old - are only designed to fail. And in today's world, such failures
increasingly come with a price tag that can derail your well-thought-out
business ambitions.
##
ABOUT THE AUTHOR
Darshan
Joshi is co-founder and Chief Technology Officer at CYTRIO. He has more than 20 years
of data and data management experience, having held SVP/VP of technology and engineering
roles at industry leading data and data management companies such as
Informatica, Symantec, and Veritas.