Chiplets for HPC: The Path to Specialization and Scalability

Latitude Design Systems
May 6, 2024
3 min read

Introduction

As we look towards the future of high-performance computing (HPC), it's becoming increasingly clear that the traditional approach of relying on general-purpose processors alone is no longer sustainable. The average performance improvement for systems on the TOP500 list has been stagnating (Figure 1), highlighting the need for a paradigm shift in how we design and build HPC systems.

Nature's Way: Specialization

The solution lies in embracing specialization, a concept that has been fundamental to the evolution of complex systems. Just as nature has evolved from powerful generalists to specialized species, the computing landscape is shifting towards a diverse ecosystem of specialized compute elements. The end of Dennard scaling and the impending demise of Moore's Law have ushered in an era of scarcity, where energy efficiency and performance density are paramount.

Following the Money: Domain-Specific Compute Driven by Hyperscalers

To understand the direction of the computing industry, we must "follow the money." The hyperscale companies, such as Apple, Google, and Amazon, are leading the charge in developing domain-specific accelerators tailored to their workloads. This trend presents an opportunity for HPC to leverage the economic model being created by these industry giants.

The Open Chiplet Marketplace: A New Economic Model

The emergence of the Open Chiplet Marketplace, driven by initiatives like the Open Domain Specific Architecture (ODSA) and UCIexpress, is lowering the barrier to entry for specialized computing. By providing licensable intellectual property (IP) and enabling third-party assembly, this model allows for the integration of diverse compute elements into a cohesive system.

For HPC, the Open Chiplet Marketplace presents a unique opportunity to "play" in this new ecosystem at a relatively low incremental cost. Rather than reinventing the wheel, HPC can leverage existing commercial IP where it makes sense and focus its efforts on developing open, reusable accelerators tailored to its specific needs. This approach aligns with the 80:20 rule, where open efforts should concentrate on the 20% of components that don't make commercial sense to license.

Architecture Specialization for Science

One compelling example of architecture specialization is the use of custom hardware accelerators for scientific applications. Figure 4 illustrates several such cases, including accelerators for Density Functional Theory (DFT), CryoEM, genomics, and digital fluid dynamics. By designing hardware tailored to the specific algorithms and computational patterns of these applications, significant performance and efficiency gains can be achieved.

domain specific compute driven by hyperscalars

The DFT case, which accounts for 25% of the Department of Energy's (DOE) workload, is particularly noteworthy. By targeting the LS3DF formulation of the DFT algorithm, which minimizes off-chip communication and scales linearly with system size, a Field-Programmable Gate Array (FPGA) or custom chiplet accelerator could potentially deliver 50 to 100 times the performance of a GPU.

Chiplets: Enabling Specialization for HPC

Chiplets, the building blocks of the Open Chiplet Marketplace, are instrumental in making specialization accessible for HPC. Figure 5, adapted from the DARPA CHIPS program, illustrates the concept of chiplets and their role in enabling modular system design.

Chiplets make specialization accessible for HPC

By standardizing die-to-die (D2D) physical layer interfaces through initiatives like ODSA (Figure 6), chiplets can be seamlessly integrated into heterogeneous systems, enabling a more flexible and cost-effective approach to system design.

The Universal Chiplet Interconnect Express (UCIe) protocol, built on top of PCIe or Compute Express Link (CXL), provides a standardized way to connect chiplets for various use cases, such as memory integration (CXL.mem) and accelerator integration (CXL.cache) (Figure 7).

Photonic Multi-Chip Modules for High Bandwidth Memory Integration

One critical aspect of enabling specialization in HPC is the ability to integrate high-bandwidth memory with compute chiplets. Figure 8 depicts a photonic multi-chip module (MCM) approach being explored, which leverages silicon photonics and through-silicon vias to enable high escape bandwidth for remote memory integration.

Project 38: A Cross-Agency Architectural Exploration

Project 38 (P38) is a collaborative effort involving the Department of Defense (DoD), the DOE Office of Science, and the National Nuclear Security Administration (NNSA). Its mission is to demonstrate a high-performance, co-designed node optimized for GraphBLAS workloads and to explore the modular integration of LBNL/ANL IP with commercial chiplets using the Open Chiplet approach.

This project aims to create a new capability for the U.S. Government to rapidly assemble and prototype server-class chip designs, leveraging the flexibility and cost-effectiveness of the Open Chiplet Marketplace.

Conclusion

The future of HPC lies in embracing specialization and leveraging the emerging Open Chiplet Marketplace. By combining domain-specific accelerators with general-purpose processors in modular, heterogeneous systems, HPC can achieve unprecedented levels of performance and efficiency. This paradigm shift, driven by initiatives like ODSA, UCIe, and collaborative efforts like Project 38, presents a promising path forward for advancing scientific computing and addressing the grand challenges of our time.

Reference

G. Michelogiannakis, "Chiplets for HPC," presented at the Chiplet Summit, Open Chiplet Economy, OCP Sponsored Tutorial, Santa Clara, California, USA, Feb. 6, 2024.

Chiplets for HPC: The Path to Specialization and Scalability

Recent Posts

Comments