By Stanislav Lukyanov,
Technical Director, GridGain Systems
This is the third article in a series about the Apache Ignite journey. The previous article described the use case most organizations
start with: Application Acceleration and Data Caching to accelerate and provide scalability for different types of
workloads. Once Ignite is deployed, many companies recognize they can solve an
even more difficult challenge, achieving much more flexible high performance
computing at a much lower cost.
The challenge: real-time analytics on terabytes of data
With rising demand for rapid analytics on ever-growing amounts
of hybrid data, enterprises eventually run into critical performance
limitations. Examples of this challenge can be found in many industries.
- Financial
services - Fraud detection requires real-time analysis of huge
amounts of both historical and streaming data. Portfolio managers must
also be able to adjust in real time the portfolios of hundreds or
thousands of clients in response to rapidly evolving market forces, such
as company or industry news, market volatility, or world events.
- Pharmaceuticals
- Companies seeking new drug treatments must continually shrink the time
required for analysis.
- Manufacturing
- The growing demand for predictive maintenance requires real-time
analysis of data collected from sensors on everything from washing
machines to jet engines - combined with historical information on this
equipment.
- Supply
chain - Shippers must have insight into all the factors -
personnel, vehicle status, product demand, weather conditions, road
construction, etc. - in order to keep schedules on track.
In each case, the processing requirements are enormous: hundreds
of thousands of calculations must be run per second, often with sub-second
response times. These applications also depend on very large and constantly
shifting datasets as inputs, and they must allow for custom queries.
Accomplishing all this with conventional enterprise computing infrastructure is
impossible.
The industry's current go-to solutions for this, proprietary
high performance computing platforms, have been around for decades. However,
this is an extremely expensive approach and out of reach for all but the
largest organizations.
High performance computing with Apache Ignite
A newer approach to high performance computing relies on
combining in-memory computing with massively parallel processing (MPP) across a
distributed compute cluster. Apache Ignite, a distributed database designed for
high-performance computing with in-memory speed, is deployed on a cluster of
distributed machines or in the cloud and pools the available RAM, CPUs and
storage resources of a computer cluster. Ignite can be deployed on-premises, in
a public or private cloud, or in a hybrid environment. It can also be deployed
on specialty mainframe processor cores. For a mainframe, instead of pooling the
CPUs and RAM of a cluster of server nodes, the in-memory computing layer on the
mainframe pools the CPU cores and RAM allocated for its use.
Apache Ignite supports compute APIs that enable application
developers to create custom applications in their language of choice - Java,
.Net, C++, for example - and execute custom logic - functions, lambdas - on the
data in the in-memory computing cluster.
Apache Ignite essentially turns a group of commodity machines or
a cloud environment into a distributed supercomputer of interconnected Ignite
nodes, achieving the necessary speed and scale by processing queries in memory
and reducing network utilization via APIs for data and compute-intensive
calculations. Ignite provides the broadest implementation of MPP algorithms to
enable co-located processing, which improves performance and scalability by
reducing data transmission over the network.
Ignite's distributed compute and machine learning APIs leverage MPP.
Ignite also exposes APIs so developers can add and distribute their own
user-defined MPP functionality using C++, Java, or .NET. Calculations are
performed on local data sets available on the nodes, which avoids data
shuffling over the network, increasing performance by orders of magnitude.
Ignite for portfolio exposure management
Let's look at how Apache Ignite solved the portfolio exposure
management challenge for one of the world's largest financial
services firms. In the firm's asset
management division, portfolio managers invest money on behalf of other clients
with the goal to achieve a certain level of return for a certain level of risk.
The investments can include financial vehicles like bonds or physical assets
like real estate.
A key task of the portfolio manager is managing exposure to
risk. The manager must be able to view in real time the current risk with
respect to any given factor, including current events, market movement, company
news and announcements, etc. This requires aggregating and transforming data from
multiple source systems - market data, static data, holding data, etc. - and
creating what is essentially a huge, fast pivot table with business logic that
includes customer analytics. The performance challenges include real-time
calculations (less than a second return) based on terabytes of data, the
ability to run custom transformations at runtime, auditability, and the ability
to continually adapt to new types of queries. Prior to deploying Apache Ignite,
the firm saw calculation times ranging from 1.5 to 3 minutes, which was far too
long.
Until recently, the approach to solving this challenge was to
leverage a relational database and abstract the calculations into a separate
layer. However, with the demand for real-time responses on ever-growing amounts
of data, this approach struggles. More data means larger caches, which
increases cost and memory management complexity. Moving data over the network
can also create a significant impact on performance.
With Apache Ignite, the financial services firm was able to
create an extremely fast in-memory data grid that distributes relevant data to
the nodes and processes the data locally, dramatically reducing network
traffic, while also providing scalable compute and MPP.
With Ignite in place, the financial services firm was able to
reduce calculation time from minutes down to 500 to 600 milliseconds.
Conclusion
Apache Ignite makes high performance computing affordable for an
entirely new tier of enterprises that could never have afforded the traditional
scale up approach. From financial services to supply chain management to
pharmaceuticals to manufacturing, Ignite can deliver the performance and scale
companies need to transform their industries by doing more with their data
faster than ever before. The next article in this series will discuss the final
use case on the typical Apache Ignite journey: Ignite as a distributed database
for hybrid transactional/analytical workloads (HTAP).
To learn more about Apache Ignite, check out the sessions from
last year's Apache Ignite Summit.
##
ABOUT THE AUTHOR
Stanislav Lukyanov is a Technical Director at GridGain Systems a provider of enterprise-grade in-memory
computing solutions based on Apache Ignite. A software engineer and architect,
he helps to build the next generation of in-memory computing solutions to
answer modern industry challenges.