GigaIO unveiled compelling AI training, fine-tuning,
and inference benchmarks that demonstrate the performance, cost, and power
efficiency of GigaIO's AI fabric compared with RDMA over Converged Ethernet
(RoCE). Key results include 2x faster training and fine-tuning and 83x better
time to first token for inferencing, demonstrating how smarter interconnects
can have a transformative impact on AI infrastructure.
As AI models
grow more complex, interconnect inefficiency presents an unexpected critical
bottleneck. Testing showed that GigaIO's AI fabric outperformed traditional
RoCE Ethernet in every AI workload, and can enable organizations to:
-
Train models twice as fast
-
Reduce time to first token by
83.5x for instant user response
-
Cut power consumption by 35-40%
without sacrificing performance
-
Deploy multi-GPU clusters faster
and more easily
-
Achieve reduced infrastructure
costs due to simpler hardware configurations
Throughout the
testing, the same GPUs, servers, operating systems, and application software
were used, with only the interconnects varied to isolate the differences they
contributed.
The PCIe-native
design of GigaIO's AI fabric enables organizations to achieve target
performance with fewer GPUs and lower power consumption, and eliminates the
need for additional networking hardware such as NICs and Ethernet switches,
further reducing energy use. Tests show RoCE systems require 35-40% more
hardware (and energy) to provide equivalent performance.
Unlike RoCE,
GigaIO's AI fabric eliminates protocol overhead and complex RDMA tuning,
simplifying system setup with seamless GPU discovery and minimal tuning
requirements. In contrast, RoCE demands extensive configuration and
troubleshooting to achieve suboptimal performance. "With GigaIO, we spend less
time on infrastructure and more time optimizing LLMs," said Greg Diamos, CTO of
Lamini, an enterprise custom AI platform.
Benchmark Results
GigaIO's AI
fabric achieved better results than RoCE across the entire AI work chain.
Training and fine-tuning achieved better GPU utilization in multi-GPU setups,
with 104% higher throughput in distributed training scenarios compared with
RoCE. And in inferencing, for models like Llama 3.2-90B Vision Instruct,
GigaIO's AI fabric reduced Time-to-First Token (TTFT) by 83.5 times,
significantly improving responsiveness for interactive AI applications like
chatbots, vision systems, and RAG pipelines, which responded in milliseconds
vs. seconds.
For the large
model Llama 3.2-90B Vision Instruct, GigaIO's AI fabric achieved 47.3% higher
throughput and was able to handle the same user load with 30-40% less hardware
than RoCE. In a 16-GPU AMD MI300X cluster, GigaIO's AI fabric delivered 38%
higher training throughput and superior GPU utilization, enabling faster
convergence on large-scale models.
"Our AI fabric
isn't just faster, it's cheaper to deploy and operate," said Alan Benjamin, CEO
of GigaIO. "Teams report 30-40% lower power consumption, making it a compelling
alternative to traditional Ethernet-based interconnects for organizations facing
power constraints or seeking to optimize AI infrastructure costs. Our AI fabric
enables faster time-to-value and more scalable AI deployments by delivering
superior performance while consuming less power."
Review all test results in the "Smarter Interconnects for
Power-Constrained AI" white paper here.