NVIDIA, according to The Information resource, is forced to delay the start of mass production of next-generation AI accelerators based on the Blackwell architecture, while maintaining the high pace of Hopper production. The problem is said to be related to TSMC’s Chip on Wafer on Substrate (CoWoS) packaging technology.

It is noted that NVIDIA recently informed Microsoft about delays affecting the most advanced solutions of the Blackwell family. We are talking, in particular, about Blackwell B200 products. Serial production of these accelerators may be delayed by at least three months – at best until the first quarter of 2025. This could impact plans by Microsoft, Meta✴ and other data center operators to expand capacity for AI and HPC workloads.

According to research firm SemiAnalysis, the delay is due to the physical design of Blackwell’s products. These are the first mainstream accelerators to use TSMC’s CoWoS-L packaging technology. This is a complex and highly accurate technique that involves the use of an organic interposer – the limit of the capabilities of the previous generation CoWoS-S technology was reached in the AMD Instinct MI300X. A silicon interposer suitable for the B200 would be too fragile. However, the organic interposer does not have the best electrical characteristics, so silicon bridges are used for communication.

The main problem lies in the materials used – due to the difference in the coefficient of thermal expansion of various components, bends appear that destroy the contacts and the chiplets themselves. At the same time, the accuracy and precision of connections is extremely important for the operation of the internal interconnect NV-HBI, which connects two computing tiles at a speed of 10 TB/s. Therefore, now NVIDIA and TSMC are busy reworking the bridges and, according to rumors, several layers of metallization of the tiles themselves.

However, TSMC is experiencing a shortage of CoWoS packaging capacity. The company has been ramping up CoWoS-S capabilities over the past two years, primarily to meet NVIDIA’s needs, but the latter is now transitioning its products to CoWoS-L. Therefore, TSMC is building an AP6 factory for the new packaging technology, and will also transfer existing AP3 capacities to CoWoS-L. At the same time, TSMC’s competitors cannot and are unlikely to be able to provide at least some alternative packaging technology that will suit NVIDIA in the near future.

Thus, it is reported that NVIDIA will have to decide how to use the available production capacity of TSMC. According to SemiAnalysis, the company is almost entirely focused on the GB200 NVL36/72 rack-mount super accelerators, which will go to hyperscalers and a small number of other players, while the B100 and B200 HGX solutions are “now effectively cancelled,” although small quantities of the latter should still hit the market. However, NVIDIA also has a backup plan.

The plan is to release simplified monolithic B200A chips based on a single B102 die, which will also form the basis for the B20 accelerator, aimed at China. The B200A will receive only four HBM3e stacks (144 GB, 4 TB/s), and its TDP will be 700 or 1000 W. An important advantage in this case is the possibility of using CoWoS-S packaging. The B200A chips will end up in mass HGX systems instead of the originally planned B100/B200.

The B200A will be replaced by the B200A Ultra, which will have increased performance, but there will be no memory upgrade. They will also be included in the HGX platform, but that is not the main thing. Based on them, NVIDIA will offer compromise super accelerators MGX GB200A Ultra NVL36. They will receive eight 2U nodes, each of which will have one Grace processor and four 700-W B200A Ultra. The accelerators will still be fully integrated by the NVLink5 bus (single-chip 1U switches), but inside the node all communication with the CPU will be tied to PCIe switches in two ConnectX-8 adapters.

The main advantage of the GX GB200A Ultra NVL36 will be air cooling due to its relatively low power – only 40 kW per rack. This is a lot, but it will still make it possible to place new products in many data centers without their radical re-equipment, albeit at the cost of losing placement density (for example, skipping rows). According to SemiAnalysis, these super accelerators, in case of a shortage of “full-fledged” GB200 NVL72/36, will also be purchased by hyperscalers.

Leave a Reply

Your email address will not be published. Required fields are marked *