All modern graphics accelerators are offered with a strictly defined video memory volume during production, and in the most productive models, HBM memory is generally integrated on the same substrate as the main crystal. However, memory volume requirements have been growing faster lately, and vendors are asking more and more for additional memory. A radically different approach is offered by Bolt Graphics, which recently announced the Zeus series of accelerators.
Despite the “AI pandemic”, Bolt Graphics does not focus on artificial intelligence in its announcement, but calls Zeus the first GPU specifically designed for HPC, rendering, ray tracing and even computer games. Interestingly, Zeus is not based on some closed architecture: the scalar part of the new GPU is built on the basis of the RISC-V RVA23 specification, the vector is represented by FP64 ALU based on a slightly modified RVV 1.0. Other functions are implemented through custom extensions and separate accelerator blocks. They all use a common 128 MB cache. The picture is completed by a telemetry block and an internal interconnect for communicating with other computing blocks.
Zeus 1c26-032 (Image source: Bolt Graphics)
The chiplet approach is used. The basic “building block” of the Zeus 1c26-032 includes a GPU chiplet, which is connected to 32 GB of on-board LPDDR5x memory (273 GB/s) and an external DDR5 memory controller (90 GB/s), i.e. if desired, you can install another 128 GB of RAM (two SO-DIMM modules). The GPU chiplet has DisplayPort 2.1a and HDMI 2.1b controllers built into it, and it communicates with the outside world via an IO chiplet, to which it is connected via a 256 GB/s channel. The IO chiplet offers an unusual set of ports. In addition to two PCIe 5.0 x16 interfaces (64 GB/s each), there is a dedicated RJ-45 port for the BMC and a 400 GbE QSFP-DD port. Finally, there is a hardware video encoding unit capable of handling dual 8K@60 AV1/H.264/H.265 streams.
The declared performance level in vector FP64/FP32/FP16 calculations is 5/10/20 Tflops, and in matrix INT16/INT8 — 307.2/614.4 Tflops. The hardware ray acceleration unit (path tracing) produces up to 77 gigarays. For comparison: NVIDIA RTX 5090 is capable of producing 32 gigarays, and FP64 performance is 1.6 Tflops. At the same time, in low-precision calculations, current NVIDIA solutions are still faster than Zeus 1c26-032. However, the new product has an important advantage — its TDP level is only 120 W. The second PCIe 5.0 x16 interface can be used to directly combine two cards.
The accelerator version with two chiplets is called Zeus 2c26-064/128, and with four — 4c26-256. The last numbers indicate the amount of soldered LPDDR5X memory. As for the expandable memory, the number of available SO-DIMM slots also depends on the model and is up to eight, so in the flagship configuration, the base 256 GB of LPDDR5x can be supplemented with as much as 2 TB of DDR5. Performance increases almost proportionally with the increase in the number of GPU chiplets, but there are some other nuances. Thus, in Zeus 2c26-064 and Zeus 2c26-128 (both versions have a TDP of 250 W) there is only one IO chiplet, and the GPU chiplets are united by a bus with a speed of 768 GB.
Zeus 4c26-256 has four I/O chiplets in its composition, which provide eight PCIe 5.0 x4 controllers (one chiplet, 32 lines in total) and six 800GbE OSFP ports (three chiplets). The GPU chiplets are connected to each other by a 512-GB/s bus. Each of them is connected to its own IO chiplet at a speed of 256 GB/s. The flagship’s thermal package is 500 watts, the accelerator, according to Bolt Graphnics, develops 20 Tflops in FP64 mode, almost 2500 Tflops on FP8 calculations and is capable of processing up to 307 gigarays.
The developers clearly included wide clustering capabilities in their brainchild, as evidenced by the presence of a powerful network subsystem. Both modest configurations of two GPUs connected directly via Ethernet 400GbE and large-scale rack-level systems containing 80 Zeus 4c26-256 boards connected both to a switch and directly to each other are supported. Such a cluster consumes 44 kW, but is capable of running large physical simulations or training AI models due to the huge array of shared memory, amounting to 160 TB. The computing performance of such a cluster reaches 1.6 Pflops in FP64 mode and 196 Pops in FP8 mode.
One of the features of the new products is the Glowstick ray tracer, which can work in real time in almost all modern 3D modeling or video editing packages, such as Maya, 3ds Max, Blender, SketchUp, Houdini and Nuke. It will be supplemented by the proprietary Bolt MaterialX library, containing more than 5,000 high-quality textures. And thanks to the support of the OpenUSD standard, it can easily be integrated into any rendering and post-processing chain. An electromagnetic simulator Bolt Apollo is also planned. Proprietary Vulkan / DirectX drivers and SDK using LLVM are promised.
Bolt Graphics has scheduled early access to developer kits for Q4 of this year. 2U servers based on Zeus are expected to appear in Q3 2026, with mass shipments of servers and PCIe cards expected to begin no earlier than Q4 of the same year. It’s hard to say how well the new architecture will perform, but if preliminary Zeus tests are to be believed, the gains over existing accelerators are significant, especially in terms of power consumption.