Accelerating data for NVIDIA GPUs

21 November, 2019
Douglas O'Flaherty
IBM

These days, most AI and big data workloads need more compute power and memory than one node can provide. As both the number of computing nodes and the horsepower of processors and GPUs increases, so does the demand for I/O bandwidth. What was once a computing challenge can now become an I/O challenge.

ecosystem partners and developersFor those scaling up their AI workloads and teams, high-performance filesystems are being deployed because it addresses that I/O challenge. These deliver the bandwidth needed to feed the systems to keep them busy.

This week, NVIDIA announced a new solution–Magnum IO–that complements the capabilities of leading-edge data management systems such as IBM Spectrum Scale and helps address AI and big data analytics I/O challenges.

NVIDIA Magnum IO is a collection of software APIs and libraries to optimize storage and network I/O performance in multi-GPU, multi-node processing environments. NVIDIA developed Magnum IO in close collaboration with storage industry leaders, including IBM. The NVIDIA Magnum IO innovative software stack includes several NVIDIA GPUDirect technologies (Peer-to-Peer, RDMA, Storage, and Video) and communications APIs (NCCL, OpenMPI, and UCX). NVIDIA​ GPUDirect Storage is a key feature of Magnum IO, enabling a direct path between GPU memory and storage to improve system throughput and latency, therefore enhancing GPU and CPU utilization.

NVIDIA Magnum IO is designed to be a powerful complement to the IBM Spectrum Storage family. IBM Spectrum Scale, for example, was developed from the beginning for very high-performance environments. It incorporates support for Direct Memory Access technologies. Now, NVIDIA Magnum IO  is extending I/O technologies to speed NVIDIA GPU I/O.

For technology solution providers such as IBM and NVIDIA, the key is to integrate processors, GPUs, and appropriate software stacks into a unified platform designed specifically for AI. NVIDIA is also a major player in this space – 90 percent of accelerator-based systems incorporate NVIDIA GPUs for computation.[1]

Recently, IBM and NVIDIA have been working together to develop modern IT infrastructure solutions that can help power AI well into the future. The synergies created by the IBM and NVIDIA collaboration have already been demonstrated at the largest scales. Currently, the two most powerful supercomputers on the planet – Summit at Oak Ridge and Sierra at Lawrence Livermore National Labs – are built from IBM Power processors, NVIDIA GPUs and IBM Storage. A key to these installations is the fact that they were assembled using only commercially available components. Leveraging this crucial ingredient, there are a range of solutions being offered by IBM and our Business Partners–from IBM supported versions of the complete stack to SuperPOD reference architectures featuring IBM Spectrum Scale and NVIDIA DGX systems.

Developing technology designed to increase data pipeline bandwidth and throughput is only part of the story. These solutions provide comprehensive reference architectures that incorporate a wide range of IBM Spectrum Storage family members, including IBM Cloud Object Storage for scalable data, IBM Spectrum Discover to manage and enhance metadata, and IBM Spectrum Protect to provide modern multicloud system security. The focus is on user productivity and support for the entire data pipeline.

The announcement of NVIDIA Magnum IO highlights the benefits of ecosystem collaboration to bring innovation to AI. As enterprises move rapidly toward adopting AI, they can do so with confidence and support of IBM Storage.

[1] Intersect360 Research HPC User Site Census survey data, 2019

The post Accelerating data for NVIDIA GPUs appeared first on IBM IT Infrastructure Blog.