Introduction
The semiconductor industry is witnessing megatrends driven by the insatiable demands of high performance computing (HPC) and generative artificial intelligence (AI) applications. Data centers are projected to double their electricity consumption by 2026 to support the explosive growth of internet activity and the AI boom. Meanwhile, mobile devices and emerging technologies like autonomous vehicles, smart factories, and immersive communication services are generating unprecedented amounts of data that need to be processed and analyzed.
At the heart of this data revolution are breakthroughs in AI models like large language models (LLMs) and multimodal models that can generate human-like text, images, speech, video, and 3D content. Training these models requires massive computational capabilities only achievable through continuous innovation in semiconductor technology. From Deep Blue to AlphaGo to ChatGPT, each leap in AI has been enabled by advances in processors, memory, and system integration.
Overcoming Compute, Memory, and Power Walls
However, realizing the full potential of HPC and generative AI faces significant challenges known as the compute wall, memory wall, and power wall. As the number of cores and transistor density increases, limitations in memory bandwidth, power consumption, and parallel scaling become bottlenecks.
The relentless pursuit of higher computing performance and energy efficiency has become the primary focus of the semiconductor industry for the next decade. Innovations are required at multiple levels to overcome these walls:
Transistor scaling through design technology co-optimization (DTCO) with novel architectures and materials
Advanced memory technologies like High Bandwidth Memory (HBM)
Advanced system integration using 2.5D and 3D integrated circuit (3DIC) packaging
New interconnect technologies (electrical and optical)
Disruptive thermal management materials and solutions
A holistic system integration approach tightly coupling compute, memory, and interconnects is key to unlocking substantial gains in performance, bandwidth, energy efficiency, and form factor.
DTCO for Energy-Efficient SoCs
At the transistor level, DTCO innovations in 3nm, 5nm, and 7nm process nodes have enabled 1.6-1.8x higher logic density, 11-13% performance boost, and 21-30% lower energy consumption compared to previous generations. As illustrated in the following, chip-level energy efficiency has improved dramatically from 28nm to 3nm nodes.
Memory technologies like HBM are also advancing rapidly, with higher capacities through increased memory tiers (4-hi to 16-hi) and improved bandwidth from more I/O channels (1024 to 2048) and faster data rates (4Gbps to 16Gbps). The following figure shows the energy efficiency gains in advanced DRAM compared to conventional memory types.
2.5D System Integration for HPC
To combat the memory bandwidth bottleneck, 2.5D system integration packages have emerged as a powerful solution. These interposer-based packages can integrate multiple chiplets (CPUs, GPUs, AI accelerators) and HBM stacks side-by-side using three mainstream technologies:
Silicon interposers with through-silicon vias (TSVs) and back-end-of-line (BEOL) redistribution layers (RDLs)
Substrate interposers with embedded local interconnect chips
Polymer RDL interposers, optionally with local interconnect chips
As depicted in the following figures, the key benefits of 2.5D packages include higher compute density from integrating more chiplets, increased memory bandwidth from incorporating more HBM stacks, and design partitioning for optimized yields and costs.
The Chip-on-Wafer-on-Substrate (CoWoS) technology has been instrumental in scaling up interposer sizes and BEOL layers to accommodate more chiplets and HBMs. Standards like the Universal Chiplet Interconnect Express (UCIe) are emerging to enable high-bandwidth, low-latency, and power-efficient chiplet-to-chiplet communication within these 2.5D packages.
However, as chip thermal design power (TDP) increases beyond 1500W for HPC applications, advanced thermal solutions like liquid cooling and immersion cooling systems are required to maintain reasonable power usage effectiveness (PUE) values below 1.2.
3DIC for Extreme Memory Bandwidth and Energy Efficiency
While 2.5D integration provides substantial improvements, the ultimate solution for eliminating the memory bottleneck is 3D integrated circuit (3DIC) technology. By vertically stacking logic and memory dies using fine-pitch micro-bump or bumpless bonding, 3DICs can achieve unprecedented interconnect densities exceeding 1 million bumps per square millimeter.
As illustrated in the following figures, 3DIC integration enables:
Exponential increases in memory bandwidth, up to 1000 GB/s for wafer-on-wafer (WoW) stacking compared to 410 GB/s for HBM2E
Drastic reductions in energy consumption per bit transferred due to shorter interconnect lengths
Flexible stacking configurations like face-to-face (F2F), face-to-back (F2B), chip-on-wafer (CoW), wafer-on-wafer (WoW), logic-on-logic (LoL), and logic-on-memory (LoM)
TSMC has demonstrated bonding pitches below 0.9 μm for their proprietary chip-on-wafer Systemon Integrated Chips (SoIC) technology, enabling 3DIC solutions all the way from cost-sensitive mobile devices to extreme HPC applications.
The computational throughput and efficiency gains from 3DIC integration are remarkable. The following figure illustrates the bandwidth, latency, energy efficiency, and form factor improvements as systems transition from discrete packages to 2.5D interposers and finally to 3DIC configurations with sub-micron pitch bumpless bonding.
Some practical examples showcase the immense potential: A logic-on-memory 3DIC stack can deliver over 1000 GB/s bandwidth compared to 410 GB/s for HBM2E and just 64 GB/s for GDDR6. TSMC's latest chiplet-based processor with 2.5D and 3DIC integration claims 8X better AI performance and 5X better performance-per-watt over the previous generation.
Scaling the Third Dimension with SoIC and CoWoS Co-Integration
While 3DIC promises revolutionary gains, the ability to scale in the third dimension and integrate an increasing number of chiplets is limited by area constraints. This is where TSMC's innovative 3DFabric technology platform comes into play by combining the best of 2.5D (CoWoS) and 3DIC (SoIC) integration schemes.
The 3DFabric comprises wafer-level system integration technologies like CoWoS (Chip-on-Wafer-on-Substrate), InFO (Integrated Fan-Out), SoIC (System-on-Integrated-Chips), CoW (Chip-on-Wafer), and WoW (Wafer-on-Wafer). By leveraging existing infrastructure and combining 2.5D and 3DIC capabilities, the 3DFabric enables unprecedented compute density, bandwidth density, and energy-efficient performance (EEP) for HPC/AI workloads.
Co-Integrating Photonics for Data Center Bandwidth Growth
While electrical I/O scaling will continue, the explosive growth of data traffic in AI data centers is driving the adoption of silicon photonics for high-speed, high-bandwidth, low-energy data transfer over long distances. As Figure 38 shows, innovative chip-to-chip/chiplet and chip-to-package optical/electrical (OE) integration platforms like COUPE are emerging to meet the bandwidth needs across a wide range of applications and performance/power requirements.
Generative AI Drives Advanced Node and Packaging Innovation
Ultimately, the market forces propelling semiconductor innovation are the relentless demands of generative AI applications like large language models and multimodal models. With model sizes exceeding 175 billion parameters for text generation and 1.7 trillion parameters for multimodal tasks, the computational complexity and data movement requirements are skyrocketing.
This necessitates a multi-pronged approach leveraging advanced logic node scaling all the way to 3nm and beyond, advanced packaging technologies like 2.5D and 3DIC, disruptive interconnect solutions (electrical and optical), and revolutionary thermal management techniques. Only through such a holistic innovation across transistors, memories, interconnects, architectures, and system integration can the semiconductor industry hope to fulfill the exponential performance and efficiency needs of the generative AI era.
As illustrated in Figure 35, combining advanced technology nodes like N3 and N2 with DTCO for faster, denser transistors, and system technology co-optimization (STCO) via 2.5D/3DIC integration is critical to unlocking the full attainable performance for complex AI workloads like GPT-3. The operational intensity gap between peak performance and achievable performance is bridged by providing 45x higher logic-to-logic and logic-to-memory bandwidth through innovations like next-gen NVLinks, Infinity Fabrics, and high HBM counts.
Conclusion
The semiconductor megatrends catalyzed by HPC and generative AI applications are driving multi-dimensional innovations across the entire chip-package-system stack. Advanced nodes, memories, 2.5D and 3DIC packaging, new interconnect technologies, materials, and architectures are converging to overcome the compute, memory, and power walls. Intelligent system integration platforms like TSMC's 3DFabric are key enablers for scaling systems into the future while maximizing performance, bandwidth, energy efficiency, and form factor compaction. As data centers race to support the AI boom and devices generate unprecedented data volumes, the semiconductor industry's ability to sustain relentless innovation will be critically tested in the years ahead.
Reference
[1] "Advanced System Integration Technology Trend of HPC and GAI," in 2024 VLSI TSA Tutorial, pp. 60-81, 2024.
Comments