NVIDIA To Talk Hopper GPU & Grace CPU Architecture at Hot Chips 34

NVIDIA will reveal brand new details about its Hopper GPU and Grace processor in the next iteration of Hot Chips (24) in the coming week. Experienced engineers from the company will explain innovations in accelerated computing for modern data centers and advanced networking systems with topics focusing on the Grace processor, GPU Hopper, NVLink switch and Jetson Orin module. .

NVIDIA to Unveil Details of Hopper GPU and Next-Gen Grace CPU at Hot Chips 34

Hot Chips is an annual event that brings together system and processor architects and allows companies to discuss details, such as technical details or the current performance of their products. NVIDIA plans to discuss the company’s first server-based processor and new Hopper graphics card. The NVSwitch interconnects the company’s Jetson Orin chip and system on a module or SoM.

The four presentations during the two-day event will provide insight into how the company’s platform will achieve increased performance, efficiency, scalability and security.

NVIDIA hopes to “demonstrate a design philosophy of innovating across the entire stack of chips, systems, and software where GPUs, CPUs, and DPUs act as peer processors.” So far, the company has already created a platform that leverages AI, data analytics, and high-performance computing tasks within cloud service providers, supercomputing centers, enterprise data and autonomous AI systems.

Data centers demand flexible clusters of processors, graphics cards, and other accelerators pushing huge pools of memory to produce the power-efficient performance demanded by today’s workloads.

Jonathon Evans, Distinguished Engineer and 15-year NVIDIA veteran, will describe the NVIDIA NVLink-C2C. It connects processors and graphics cards at 900 Gb/s with five times the power efficiency of the existing PCIe Gen 5 standard, thanks to data transfers consuming 1.3 picojoules per bit.

NVLink-C2C combines two processors to create the NVIDIA Grace processor with 144 Arm Neoverse cores. It’s a processor designed to solve the world’s most important computing problems.

The Grace processor uses LPDDR5X memory for maximum efficiency. The chip activates one terabyte per second of bandwidth in its memory while keeping the power consumption of the entire complex at 500 watts.

NVLink-C2C also connects Grace CPU and Hopper GPU chips as memory sharing peers in the NVIDIA Grace Hopper Superchip, providing maximum acceleration for performance-intensive tasks such as AI training.

Anyone can create custom chiplets using NVLink-C2C to seamlessly connect to NVIDIA GPUs, CPUs, DPUs, and SoCs, expanding this new class of embedded products. The interconnect will support the AMBA CHI and CXL protocols used by Arm and x86 processors.

The NVIDIA NVSwitch merges multiple servers into a single AI supercomputer using NVLink, interconnects operating at 900 gigabytes per second and over seven times the bandwidth of PCIe 5.0.

NVSwitch allows users to link 32 NVIDIA DGX H100 systems into an AI supercomputer that delivers an exaflop of peak AI performance.

Alexander Ishii and Ryan Wells, two veteran NVIDIA engineers, explain how the Switch allows users to build systems with up to 256 GPUs to tackle demanding workloads like training AI models with more than 1 000 billion parameters.

Source: Nvidia

The switch includes engines that accelerate data transfers using the NVIDIA Scalable Hierarchical Aggregation Reduction Protocol. SHARP is a network computing capability that debuted on NVIDIA Quantum InfiniBand networks. It can double the data rate on communication-intensive AI applications.

Jack Choquette, a distinguished senior engineer with 14 years of experience in the company, will provide an in-depth tour of the NVIDIA H100 Tensor Core GPU, aka Hopper.

Using the new interconnects to scale to unparalleled heights fulfills many industry-leading features that improve performance, efficiency, and throttle protection.

Hopper’s new Transformer Engine and improved Tensor Cores deliver 30x the previous-gen acceleration on AI inference with the world’s most important neural network models. And it uses the world’s first HBM3 memory system to deliver a whopping three terabytes of memory bandwidth, the largest generational increase ever by NVIDIA.

Among other novelties:

  • Hopper adds virtualization support for multi-tenant and multi-user setups.
  • New DPX instructions speed up recursive loops for fine mapping, DNA, and protein analysis applications.
  • Support for Hopper packs for enhanced security with confidential computing.

Choquette, one of the main chip designers on the Nintendo 64 system early in his career, will also describe the parallel computing techniques underlying some of Hopper’s advances.

Michael Ditty, an architect with 17 years at the company, will provide new performance specifications for NVIDIA Jetson AGX Orin, an engine for advanced artificial intelligence, robotics and autonomous machines.

The NVIDIA Jetson AGX Origin integrates 12 Arm Cortex-A78 cores and an NVIDIA Ampere architecture GPU to deliver up to 275 trillion operations per second on AI inference jobs.

Source: Nvidia

The latest production module contains up to 32 gigabytes of memory and is part of a compatible family that narrows down to the handheld Jetson Nano 5W development kits.

All new chips support the NVIDIA software stack which accelerates over 700 applications and is used by 2.5 million developers.

Based on the CUDA programming model, it includes dozens of NVIDIA SDKs for vertical markets such as automotive (DRIVE) and healthcare (Clara), as well as technologies such as recommendation systems (Merlin) and Conversational AI (Riva).

The NVIDIA AI platform is available from all major cloud service and system manufacturers.

Source: Nvidia

Source of information: NVIDIA

Comments are closed.