By Shubham Thakur, Analytics Consultant at Brillio
Modern
data architecture can solve many business problems, streamline your value chain
and provide a central data repository of for your internal team, partners, and
various stakeholders. But still there are some business owners who have not
given a thought about it. Before they consider modern data architecture, they
should understand its evolution.
Evolution
of Data Architecture
Stage
1: The Transaction Processing Database
This
is the stage were databases were designed for doing transaction processing. They
were good doing that and served business many needs, but they were not designed
to do the analytics. These types of databases were available in early 1970s. Databases
were slow, and storage was costly for serving basic business intelligence (BI)
and analytical need. Complex BI tools were necessary and skilled personals were
required to carry out those tasks.
These
tools were not built for end users, though. Data experts were needed to satisfy
user reporting requirements. One benefit of this was data governance and
reporting accuracy was stellar because only those who knew most about the data
were skilled enough to produce the reports. But soon the requirements for
reports grew in size and number, so the ability to provide reports in timely
manner was hampered, creating data bottlenecks.
Stage
2: Database with Self-service BI tools
The
data bottlenecks were answered in this stage with departmental data silos and
tools designed to work with them. Reporting requirements were met quickly with
self-service tools, but data governance took a backseat, which lead to data
chaos. These happened where people from different departments would produce
reports for the same business metric but have significantly different results,
because there was no standard procedure for report generation. There was no
standardization in defining metric, also, which lead to an altogether a new
outcome, which was more time was spent of arguing about authenticity of data
than acting on it.
Then
organizations needed something that is based on highly governed data and
provides agility of the self-service reporting silos and the accuracy of the
reports produced by data experts. This was a new set of techniques which was
not only limited by slow database performance or expensive storage. A new
computing paradigm was born out of necessity from companies like Yahoo, Google,
Facebook, and LinkedIn, whose main asset was data.
They
also needed to quickly process and derive value from those incredible volumes
of data. New technologies like Hadoop and Spark massively parallel processing
databases based on concepts like commodity hardware and elastic resource
allocation were built with high speed analytics in mind which changed the
landscape and this led to the third wave of the modern data architecture.
Stage
3: Data Platform Services
The
two previous stages were characterized almost entirely by the need to work
around existing technology and cost limitation. This new stage required a new
way of thinking.
Without
the technical and economic limitations that have been imposed on data teams, organizations
shifted from report creators to insight generators, educators, and enablers. Data
silos could be eliminated to provide users with a comprehensive view of what's
happening and how it all interrelates rather than having to figure out what
data can be ignored, reducing storage and processing time. Businesses can now
focus on identifying all of the ignored and forgotten data sources that add
real value.
Then
users can also maximize that value by allowing them to look outside their walls
for ways that can help users make better decisions about whatever impacts a business.
In addition, they can look for new ways to enable not just their internal
business users, but customers, partners, and suppliers to access data that
makes them more efficient and effective.
For
this to happen, a common language and set of metrics, plus a data dictionary
that enables users to ask and answer their own questions allows for data
governance en masse. Users can gain a greater understanding of data and how to leverage
it. They can also easily access,
understand and generate real business value. Now instead of working for one
report for one person, experts can just create a reusable model that can be
shared to everyone.
Modern
data architecture is defined not so much by a specific technology stack but
rather by the organizational impact that it enables. Organizations like Looker
have developed a data platform service have an interesting take on situation.
Source: Looker
For
example, here is the Looker view of modern data architecture. At the bottom of
the diagram data has been stored in lots of different places. It might have SAP
applications, Salesforce or Zendesk, Data and transactional databases, ERP and
Web analytics tools. That used to be what had to be done to extract transform
and then load that data into a warehouse. That transformation step was usually
complicated and difficult, so a lot of logic would be baked into the
transformation, making it very inflexible.
But
because of new databases, there is no need to pre transform data anymore make
this new service plug and play. Tools like Looker sit on top of the database,
the platform contains data models, which provides the ability to govern
transformation in a flexible and agile way. Once analysts have created the
model, anyone in the organization can use it and answer their own questions.
Now
let us focus on those technological advancements that helped in making a true
modern data architecture.
Technological
Advancements
Cloud
Migration and Multi-cloud strategy
According
to McKinsey Global Institute, "cloud is potentially the most revolutionary
catalyst of a fundamentally new approach to data-architecture since it provides
businesses a way to quickly scale up AI resources and capabilities to a
competitive advantage." Cloud migration is the process of moving existing data
processes from on-premise facility to cloud base environment. With server less
data platforms like Amazon S3 and Google BigQuery, organizations can build and
operate data-centric applications with infinite scale without worrying about Installation,
configuration solutions or managing workloads. Containerized data solutions
using Kubernetes enable companies to detach and automate deployment of extra
computational power and storage system whenever needed.
Every
cloud provider has been offering services with unique propositions. Some cloud
providers are better with transaction handling, some are better at managing subscription
based services, and some are better at
managing analytical services, so choosing the right cloud partner with the right
set of services is critical for organizational success and can save lots of
time and money.
Many
companies are struggling to manage these services. That's where platforms like
Google Anthos come into the picture. Google Anthos is a multi-cloud infrastructure
management platform which can handle deployment and management of containerized
services for any cloud platform that organization is using.
Artificial
Intelligence and Machine Learning in Data Engineering and Operations
A
well set up data pipeline is a work of art because it seamlessly connects
multiple datasets to a business intelligence tool to allow clients, internal users,
and stakeholders to perform complex analysis. But according to Sisense, a business
analytics software company, the data preparation phase of the whole phase has
its own issues and complex challenges. It is a creative process and it is
necessary, but saving and automating the repetitive usage of the logic every
time we want to deploy something new into the system is a challenge. Today with
the use of artificial intelligence (AI) and machine learning, it is possible to
make data preparation process more efficient for BI platforms to use it at much
faster rate.
AI
can help in data engineering in a few ways. First through its systems, it can
apply simple rulesets to help standardize the data. Secondly, AI can recommend
a data model structure, including providing joins for column and it can create
dimensions also, Finally, AI can help in data ingestion and can save a lot of
time.
Data
operations is the new agile operational technique to emerge from the mutual
knowledge of IT and big data practitioners. It focuses on the implementation of
data management practices and processes that increase the speed and accuracy of
analytics, including data access, quality control, automation, integration and,
eventually, model deployment and management.
##
About the Author
Shubham Thakur, Analytics Consultant at Brillio
Currently serving as Analytics Consultant at Brillio, Shubham is a Big data and data management enthusiast with deep interest to help customers drive digital transformation across organization. His experience in Business Analytics and Visualization domain helps customers to solve business problems effectively, improving customer satisfaction and drive operational efficiency.
References: