Virtualization Technology News and Information
Three Phases of Big Data: Advancing Analytics with Big Compute in the Cloud

Article Written by Leo Reiter, Chief Technology Officer at Nimbix

Many of today's complex science, engineering, development, and business problems require high-performance computing (HPC) capabilities. Everything from particle physics to smart cities to bioinformatics depends on processing power and speed we only dreamed of a decade ago. In all these cases, Big Data is key to both identifying the right questions and coming up with the best answers, and in many cases, the compelling value lies in coming up with those answers in real time. The challenge for researchers and innovators is securing access to a mix of infrastructure and applications that will generate the Big Compute power they need. On-premises infrastructure continues to be too expensive and complex for many organizations that need next level data and analytics capabilities. IT budgets remain tight and analytics projects have to compete with cyber security, compliance, and other essential operations for a piece of the pie.

As cloud infrastructure technology and integrated service delivery matures, the HPC capabilities required for certain projects is becoming available through "as-a-service" models, enabling enterprises and researchers to access the right level of computing and low-latency service for their specific needs. The initial (and ongoing) goal of most CSPs was to build easily accessible, multi-purpose, flexible infrastructure that customers could use as needed for a wide variety of reasons. As the technology matures and is adopted as mainstream, confidence in the security and reliability of public and hybrid cloud infrastructure deepens, opening new frontiers. One of those opportunities is performing research and business analytics in purpose-built HPC cloud environments.

Big Data Projects in Three Phases

Breaking data and analytics projects into three general phases-capture, analyze, and visualize-helps guide the strategic development of the particular compute and storage capabilities required. The performance, scalability, and cost-effectiveness of Big Data implementations are enhanced by carefully integrating and aligning existing infrastructure investments, cloud environments, and project objectives.

Capture Phase - The capture phase includes curating and storing the data. Simply put, this is the act of listening for, "mining", or filtering information from one or more sources into a set for future processing. This phase typically requires large amounts of storage with relatively minor computational capabilities. 

Analysis Phase - When we hear the term "analytics" by itself, it typically revolves around these acts. Here is where we make sense of the data we captured in the first phase. There are several types of analytics driven by different types of problems and uses. Predictive Analytics, for example, can be used to guess how a particular person or machine will infiltrate a secure facility (e.g. computer network) based on actions they have taken so far and how this matches known patterns of behavior. Descriptive and Prescriptive Analytics are typically used in business to answer "what" and "why" for past and future events, respectively. Analytics can be performed after the "capture" stage or in some cases during, helping to reduce the "noise" and focus on relevant information as it comes in. In either case, these processes require large amounts of compute capacity with fast access to the data.

Visualization Phase - The visualization phase is where we present and communicate the results of the "analysis" in a human readable form. This can range from a graphical representation of threats to a computer network to a geographical map of infectious disease cases. In many cases the user interacts with the visualization - rotating, zooming, etc. In short, "visualization" is where human interaction with Big Data turns into actionable intelligence. This phase requires less computational power than "analysis", but must offer real-time performance and response for a good user experience. 

Big Challenges

Technology is advancing so fast it's hard for mere mortals to keep up. Advanced difficulties emerge alongside advanced capabilities when it comes to the brass tacks of implementation and practical use. Big Data, HPC, and machine learning promise to help us with all this, but we have to lay the groundwork first. There are always the related concerns to address: the security of systems and data, especially sensitive consumer, health, and financial information; compliance requirements; and global market pressures including disruptive competition, skills shortages, and economic uncertainty. As a result, the majority of enterprises have only just begun to leverage their data for valuable insights.

The pressure to ramp up to Big Data competency is intensifying. The statistics highlighted in a recent Forbes forecast are mind-bending: The global market for Big Data and business analytics software will grow 50% to $187B by 2019. The market for prescriptive analytics software alone will grow at 22% CAGR to $1.1B in 2019. In less than 5 years, 40% of net new business intelligence investments will be funneled into predictive and prescriptive analytics. Now consider that within the same time period, 1.7 megabytes of new information will be created every second for every human, adding up to 44 zettabytes of data (IDC). Yet only a minute percentage of this data is ever analyzed (0.5% according to a 2013 analysis by MIT Technology Review's Antonio Regalado). 

Big Opportunities

All these numbers add up to one realization: you'd better figure it how to harness that potential and business intelligence before your competitors do. The good news is, all that untapped data promises limitless opportunities once we apply the power of Big Compute. The sky is the limit: scientists at CERN found that analyzing vast numbers of subatomic particle collisions in the Large Hadron Collider requires compute and storage capabilities that exceed CERN's data centers. They have begun using HPC in the cloud in their quest to figure out how the universe works - literally.

Beyond precise and timely customer insights, Big Data analytics help companies optimize products, service delivery, and operations. Predictive and prescriptive analytics are also becoming essential to risk reduction through fraud detection, APT pattern detection, scenario modeling, and behavioral profiling for HR use. As IoT, automation, bioinformatics and other widespread technological phenomena go mainstream, the case for HPC in the cloud will be increasingly clear and compelling. It represents one of the biggest transformations in computing since the PC, with the potential to advance digital business and research exponentially, blowing the doors off Moore's Law.

Get Ready to Make the Big Leap

We can't afford to think of high performance analytics as the wave of the future. It's time to get on board now and prepare a next generation Big Data strategy. The alternative is to get left behind in a surge of disruption and fierce, fast-moving competition. Those with big compute capabilities in place will be poised to pounce. Those without will be stuck in the infrastructure weeds


About the Author

Leo Reiter, Chief Technology Officer at Nimbix

Leo Reiter 

Leo Reiter is the Chief Technology Officer at Nimbix, a leader in cloud-based, High-Performance Computing (HPC) infrastructure and applications. Leo is a virtualization and cloud computing pioneer with over 20 years of experience in software development and technology strategy.

Prior to Nimbix, Leo was co-founder and CTO of Virtual Bridges. Leo is an entrepreneur with a strong background in LeanStartup and Agile methodologies.

Published Tuesday, October 04, 2016 7:02 AM by David Marshall
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<October 2016>