Z Telum: IBM’s 5 GHz CPU with AI acceleration

All About Z Telum: IBM’s 5 GHz CPU with dedicated AI accelerator!

Back in 2019, IBM introduced its Z15 processor, with a total of 12 cores at 5.2 GHz and a huge amount of L3 cache memory. Paradoxically, now the company has presented the new generation of Z series processors with the IBM Z Telum, which has a smaller number of physical cores (eight in this case) but should be much more versatile, flexible and powerful than the previous generation. We tell you all about what they have presented, below.

It appears that, at least for IBM, the future of computing does not lie in increasing the number of processor cores, but rather in decreasing the number of cores! With this new generation of CPUs, the manufacturer has reduced the number of cores while significantly improving other aspects such as doubling the amount of L3 cache memory (256 MB compared to 128 MB of its predecessor), or introducing separate paging systems.

This is IBM Telum Z, the company’s vision of the future of computing

The internal composition of the die has a total of 8 physical cores, each of which can generate two processing threads thanks to IBM’s SMT2 technology (similar to Intel’s HyperThreading or AMD’s SMT) for a total of 16 processing threads.

A particularity of this processor is that all its cores are out of order, and therefore are designed to execute instructions avoiding execution stalls and thus increasing the average number of instructions it is able to solve per clock cycle. The design of this processor is therefore intended for real-time applications that require instantaneous processor response, and for this reason IBM has focused on maximizing single-threaded performance.

IBM Telum Z cache

To achieve this goal, IBM has implemented 32 MB of L2 cache, which is initially available exclusively for each CPU core (4 MB per core), as well as 256 MB of L3 cache, as previously stated. In comparison, an Intel Core i7-10700K processor has only 20 MB of what Intel calls Smart Cache (L2 + L3), so this is a massive amount of cache memory. Furthermore, IBM has configured four pipelines that interface with the cores in just 19 clock cycles (3.8 ns), implying that dealing with the cache should be extremely fast.

Finally, it is worth mentioning that this IBM processor is manufactured by Samsung with its 7 nm process node. Thus, it has 22.5 billion transistors in an area of 530 mm² and, mind you, it is built in 17 layers. IBM has not provided any further details on this, but we’ll keep an eye on it because it’s quite interesting.

Dedicated cores for AI applications

One of the features that makes this IBM Z Telum processor very interesting is the fact that they have integrated specific cores for Artificial Intelligence (i.e., with respect to the previous generation, the number of cores has been reduced from 12 to 8 but, in addition to the improvements we have already told you about, it actually has more cores by having these specific performance cores).

According to IBM, these cores reach 6 TFLOPS in FP16 calculations, and it is worth noting that they are treated as AI accelerators. These cores can read and write data directly from their L2 cache at 120 GB/s. This data can be pre-processed before being available in the AI accelerator itself, increasing the bandwidth to 600 GB/s.

Although we compared the cache of this IBM processor to that of an Intel consumer CPU, it is built for real-time applications such as finance, stock market, insurance, health, infrastructure, and so on. It is also expected that these processors are intended to be incorporated in multi-chip systems, and IBM claims that they can even work in a dual matrix with racks of 8 to 32 processors.

