Zenoss Inc., a leading provider of management software for physical, virtual, and cloud-based IT environments, today announced that Los Alamos National Laboratory (LANL) is utilizing Zenoss to provide event/fault management and monitoring for 13 of their 16 Supercomputing clusters including RoadRunner which contains more than 24,000 nodes and is one of the most powerful computer systems in the world.
Los Alamos National Lab has been pursuing projects in the high-performance computing space for many years. As part of this continuing pursuit, LANL developed RoadRunner, the world’s largest massively parallel high-performance computer. When initially launched, Roadrunner was the first hybrid supercomputer and the first supercomputer to attain a sustained petaflop/second performance benchmark.
After searching and trying several monitoring solutions, LANL chose Zenoss to assist them in implementing and extending their event management system to work with RoadRunner. LANL was thrilled with the results and through this collaboration with Zenoss they created a hierarchical event management system that was able to monitor events and faults for each of RoadRunner’s individual nodes.
“Providing service monitoring for Los Alamos’ RoadRunner, one of the world’s fastest supercomputers, is an honor and a clear illustration of our ability to handle the scale and performance requirements of the most demanding computing environments,” says Bill Karpovich, CEO and Co-Founder of Zenoss, Inc.
Currently LANL is continuing to develop their next generation supercomputing platform, named Cielo, which will support all three NNSA national laboratories – Lawrence Livermore National Laboratory, Los Alamos National Laboratory and Sandia National Laboratories. The team at LANL will be collaborating with Zenoss on their next generation solution (code name Avalon) which will enable LANL to continue to provide high performance event/fault management for the Cielo project.