Virtualization Technology News and Information
Article
RSS
The Apache Ignite Journey: High Performance Computing

By Stanislav Lukyanov, Technical Director, GridGain Systems

This is the third article in a series about the Apache Ignite journey. The previous article described the use case most organizations start with: Application Acceleration and Data Caching to accelerate and provide scalability for different types of workloads. Once Ignite is deployed, many companies recognize they can solve an even more difficult challenge, achieving much more flexible high performance computing at a much lower cost. 

The challenge: real-time analytics on terabytes of data 

With rising demand for rapid analytics on ever-growing amounts of hybrid data, enterprises eventually run into critical performance limitations. Examples of this challenge can be found in many industries.

  • Financial services - Fraud detection requires real-time analysis of huge amounts of both historical and streaming data. Portfolio managers must also be able to adjust in real time the portfolios of hundreds or thousands of clients in response to rapidly evolving market forces, such as company or industry news, market volatility, or world events.
  • Pharmaceuticals - Companies seeking new drug treatments must continually shrink the time required for analysis.
  • Manufacturing - The growing demand for predictive maintenance requires real-time analysis of data collected from sensors on everything from washing machines to jet engines - combined with historical information on this equipment.
  • Supply chain - Shippers must have insight into all the factors - personnel, vehicle status, product demand, weather conditions, road construction, etc. - in order to keep schedules on track.

In each case, the processing requirements are enormous: hundreds of thousands of calculations must be run per second, often with sub-second response times. These applications also depend on very large and constantly shifting datasets as inputs, and they must allow for custom queries. Accomplishing all this with conventional enterprise computing infrastructure is impossible.

The industry's current go-to solutions for this, proprietary high performance computing platforms, have been around for decades. However, this is an extremely expensive approach and out of reach for all but the largest organizations.

High performance computing with Apache Ignite

A newer approach to high performance computing relies on combining in-memory computing with massively parallel processing (MPP) across a distributed compute cluster. Apache Ignite, a distributed database designed for high-performance computing with in-memory speed, is deployed on a cluster of distributed machines or in the cloud and pools the available RAM, CPUs and storage resources of a computer cluster. Ignite can be deployed on-premises, in a public or private cloud, or in a hybrid environment. It can also be deployed on specialty mainframe processor cores. For a mainframe, instead of pooling the CPUs and RAM of a cluster of server nodes, the in-memory computing layer on the mainframe pools the CPU cores and RAM allocated for its use.

Apache Ignite supports compute APIs that enable application developers to create custom applications in their language of choice - Java, .Net, C++, for example - and execute custom logic - functions, lambdas - on the data in the in-memory computing cluster.

Apache Ignite essentially turns a group of commodity machines or a cloud environment into a distributed supercomputer of interconnected Ignite nodes, achieving the necessary speed and scale by processing queries in memory and reducing network utilization via APIs for data and compute-intensive calculations. Ignite provides the broadest implementation of MPP algorithms to enable co-located processing, which improves performance and scalability by reducing data transmission over the network. 

Ignite's distributed compute and machine learning APIs leverage MPP. Ignite also exposes APIs so developers can add and distribute their own user-defined MPP functionality using C++, Java, or .NET. Calculations are performed on local data sets available on the nodes, which avoids data shuffling over the network, increasing performance by orders of magnitude.

Ignite for portfolio exposure management 

Let's look at how Apache Ignite solved the portfolio exposure management challenge for one of the world's largest financial services firms. In the firm's asset management division, portfolio managers invest money on behalf of other clients with the goal to achieve a certain level of return for a certain level of risk. The investments can include financial vehicles like bonds or physical assets like real estate.

A key task of the portfolio manager is managing exposure to risk. The manager must be able to view in real time the current risk with respect to any given factor, including current events, market movement, company news and announcements, etc. This requires aggregating and transforming data from multiple source systems - market data, static data, holding data, etc. - and creating what is essentially a huge, fast pivot table with business logic that includes customer analytics. The performance challenges include real-time calculations (less than a second return) based on terabytes of data, the ability to run custom transformations at runtime, auditability, and the ability to continually adapt to new types of queries. Prior to deploying Apache Ignite, the firm saw calculation times ranging from 1.5 to 3 minutes, which was far too long.

Until recently, the approach to solving this challenge was to leverage a relational database and abstract the calculations into a separate layer. However, with the demand for real-time responses on ever-growing amounts of data, this approach struggles. More data means larger caches, which increases cost and memory management complexity. Moving data over the network can also create a significant impact on performance. 

With Apache Ignite, the financial services firm was able to create an extremely fast in-memory data grid that distributes relevant data to the nodes and processes the data locally, dramatically reducing network traffic, while also providing scalable compute and MPP. 

With Ignite in place, the financial services firm was able to reduce calculation time from minutes down to 500 to 600 milliseconds.

Conclusion

Apache Ignite makes high performance computing affordable for an entirely new tier of enterprises that could never have afforded the traditional scale up approach. From financial services to supply chain management to pharmaceuticals to manufacturing, Ignite can deliver the performance and scale companies need to transform their industries by doing more with their data faster than ever before. The next article in this series will discuss the final use case on the typical Apache Ignite journey: Ignite as a distributed database for hybrid transactional/analytical workloads (HTAP). 

To learn more about Apache Ignite, check out the sessions from last year's Apache Ignite Summit.

##

ABOUT THE AUTHOR

Stanislav-Lukyanov 

Stanislav Lukyanov is a Technical Director at GridGain Systems a provider of enterprise-grade in-memory computing solutions based on Apache Ignite. A software engineer and architect, he helps to build the next generation of in-memory computing solutions to answer modern industry challenges.

Published Thursday, March 03, 2022 7:31 AM by David Marshall
Filed under: ,
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<March 2022>
SuMoTuWeThFrSa
272812345
6789101112
13141516171819
20212223242526
272829303112
3456789