Virtualization Technology News and Information
Article
RSS
Laying the Foundation for Tomorrow with a Modern Data Architecture

By Shubham Thakur, Analytics Consultant at Brillio

Modern data architecture can solve many business problems, streamline your value chain and provide a central data repository of for your internal team, partners, and various stakeholders. But still there are some business owners who have not given a thought about it. Before they consider modern data architecture, they should understand its evolution.

Evolution of Data Architecture

Stage 1:  The Transaction Processing Database

This is the stage were databases were designed for doing transaction processing. They were good doing that and served business many needs, but they were not designed to do the analytics. These types of databases were available in early 1970s. Databases were slow, and storage was costly for serving basic business intelligence (BI) and analytical need. Complex BI tools were necessary and skilled personals were required to carry out those tasks.

These tools were not built for end users, though. Data experts were needed to satisfy user reporting requirements. One benefit of this was data governance and reporting accuracy was stellar because only those who knew most about the data were skilled enough to produce the reports. But soon the requirements for reports grew in size and number, so the ability to provide reports in timely manner was hampered, creating data bottlenecks.

Stage 2:  Database with Self-service BI tools

The data bottlenecks were answered in this stage with departmental data silos and tools designed to work with them. Reporting requirements were met quickly with self-service tools, but data governance took a backseat, which lead to data chaos. These happened where people from different departments would produce reports for the same business metric but have significantly different results, because there was no standard procedure for report generation. There was no standardization in defining metric, also, which lead to an altogether a new outcome, which was more time was spent of arguing about authenticity of data than acting on it.

Then organizations needed something that is based on highly governed data and provides agility of the self-service reporting silos and the accuracy of the reports produced by data experts. This was a new set of techniques which was not only limited by slow database performance or expensive storage. A new computing paradigm was born out of necessity from companies like Yahoo, Google, Facebook, and LinkedIn, whose main asset was data.

They also needed to quickly process and derive value from those incredible volumes of data. New technologies like Hadoop and Spark massively parallel processing databases based on concepts like commodity hardware and elastic resource allocation were built with high speed analytics in mind which changed the landscape and this led to the third wave of the modern data architecture.

Stage 3: Data Platform Services

The two previous stages were characterized almost entirely by the need to work around existing technology and cost limitation. This new stage required a new way of thinking.

Without the technical and economic limitations that have been imposed on data teams, organizations shifted from report creators to insight generators, educators, and enablers. Data silos could be eliminated to provide users with a comprehensive view of what's happening and how it all interrelates rather than having to figure out what data can be ignored, reducing storage and processing time. Businesses can now focus on identifying all of the ignored and forgotten data sources that add real value.

Then users can also maximize that value by allowing them to look outside their walls for ways that can help users make better decisions about whatever impacts a business. In addition, they can look for new ways to enable not just their internal business users, but customers, partners, and suppliers to access data that makes them more efficient and effective.

For this to happen, a common language and set of metrics, plus a data dictionary that enables users to ask and answer their own questions allows for data governance en masse. Users can gain a greater understanding of data and how to leverage it. They can also  easily access, understand and generate real business value. Now instead of working for one report for one person, experts can just create a reusable model that can be shared to everyone.

Modern data architecture is defined not so much by a specific technology stack but rather by the organizational impact that it enables. Organizations like Looker have developed a data platform service have an interesting take on situation.

 

Source: Looker

For example, here is the Looker view of modern data architecture. At the bottom of the diagram data has been stored in lots of different places. It might have SAP applications, Salesforce or Zendesk, Data and transactional databases, ERP and Web analytics tools. That used to be what had to be done to extract transform and then load that data into a warehouse. That transformation step was usually complicated and difficult, so a lot of logic would be baked into the transformation, making it very inflexible.

But because of new databases, there is no need to pre transform data anymore make this new service plug and play. Tools like Looker sit on top of the database, the platform contains data models, which provides the ability to govern transformation in a flexible and agile way. Once analysts have created the model, anyone in the organization can use it and answer their own questions.

Now let us focus on those technological advancements that helped in making a true modern data architecture.

Technological Advancements

Cloud Migration and Multi-cloud strategy

According to McKinsey Global Institute, "cloud is potentially the most revolutionary catalyst of a fundamentally new approach to data-architecture since it provides businesses a way to quickly scale up AI resources and capabilities to a competitive advantage." Cloud migration is the process of moving existing data processes from on-premise facility to cloud base environment. With server less data platforms like Amazon S3 and Google BigQuery, organizations can build and operate data-centric applications with infinite scale without worrying about Installation, configuration solutions or managing workloads. Containerized data solutions using Kubernetes enable companies to detach and automate deployment of extra computational power and storage system whenever needed.

Every cloud provider has been offering services with unique propositions. Some cloud providers are better with transaction handling, some are better at managing subscription based services, and  some are better at managing analytical services, so choosing the right cloud partner with the right set of services is critical for organizational success and can save lots of time and money.

Many companies are struggling to manage these services. That's where platforms like Google Anthos come into the picture. Google Anthos is a multi-cloud infrastructure management platform which can handle deployment and management of containerized services for any cloud platform that organization is using.

Artificial Intelligence and Machine Learning in Data Engineering and Operations

A well set up data pipeline is a work of art because it seamlessly connects multiple datasets to a business intelligence tool to allow clients, internal users, and stakeholders to perform complex analysis. But according to Sisense, a business analytics software company, the data preparation phase of the whole phase has its own issues and complex challenges. It is a creative process and it is necessary, but saving and automating the repetitive usage of the logic every time we want to deploy something new into the system is a challenge. Today with the use of artificial intelligence (AI) and machine learning, it is possible to make data preparation process more efficient for BI platforms to use it at much faster rate.

AI can help in data engineering in a few ways. First through its systems, it can apply simple rulesets to help standardize the data. Secondly, AI can recommend a data model structure, including providing joins for column and it can create dimensions also, Finally, AI can help in data ingestion and can save a lot of time.

Data operations is the new agile operational technique to emerge from the mutual knowledge of IT and big data practitioners. It focuses on the implementation of data management practices and processes that increase the speed and accuracy of analytics, including data access, quality control, automation, integration and, eventually, model deployment and management.

##

About the Author

Shubham Thakur, Analytics Consultant at Brillio

Shubham Thakur 

Currently serving as Analytics Consultant at Brillio, Shubham is a Big data and data management enthusiast with deep interest to help customers drive digital transformation across organization. His experience in Business Analytics and Visualization domain helps customers to solve business problems effectively, improving customer satisfaction and drive operational efficiency.

References:

Published Friday, February 12, 2021 7:38 AM by David Marshall
Filed under: ,
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
top25
Calendar
<February 2021>
SuMoTuWeThFrSa
31123456
78910111213
14151617181920
21222324252627
28123456
78910111213