CPU Arms Race Heats Up — Literally!

In the last week or so we have seen a flurry of news releases from Cavium and NetLogic. These two heavy hitters in the high-end packet processing space have been promoting their latest technologies which push performance ever higher:

  • Cavium announced it just shipped first samples of its high-end 32-core Octeon II
  • NetLogic announced the new XLP864 (with 64 VirtuCores)….coming less than 12 months since they sampled the original high-end XLP832 in Sept 2010…and ATCA boards shipping (from Continuous Computing, at least!) since January 2011

It is not my intention to debate the pros and cons of these two technologies in this blog – that is too big a topic for this discussion. The point I make here is how this market has changed since the original versions of these technologies burst on the scene some 5 years ago.

Back in 2006-2007 the XLR and Octeon dramatically outperformed general purpose processors in packet handling tasks. But over time the lack of radically-optimized devices allowed general purpose CPUs to catch up or even surpass their packet processing equivalents for certain tasks. The “tick-tock” approach of Intel has been delivering strong year over year improvements in performance, resulting in customers asking, “Even if new packet processors come out, can they realistically sustain an advantage?”

Well, I think these latest announcements go a long way to demonstrate that the packet processing market now has two credible, mature and well-funded companies capable of applying that “tick-tock” approach to the multi-core packet processing market. Although in isolation these announcements may not seem like a big deal, in context of the ability of the packet processing vendors to deliver significant performance increases every 12 to 18 months, they go a long way.

Of course, that’s before you consider the additional performance they can deliver with their integrated security engines, RegEx (pattern matching) hardware accelerators and specialized memory connections for faster look-ups (TCAM such as NL11K KBP)….not to mention the glue-less XAUI connections to the latest 40G switches.

However, it is not all smooth sailing for the packet processing vendors. They are bound by the same laws of physics as the general purpose CPU vendors. It is all very well pushing the bounds in terms of number of cores and frequency, but they need to ensure they stay on the leading edge of technology to keep power dissipation down. That means they need to stay on par with the general processor market, which means 40nm today and a path to 28nm and smaller geometries as soon as practical.

There is no point having an “arms race” regarding number of cores if equipment designs have to dramatically reduce clock rate to keep power dissipation sensible – especially for telecom gear. For NEBS-capable ATCA that means don’t get fooled by MHz and n x cores and aggregate GHz announcements: the reality is if you want dual CPUs, a reasonable amount of memory, IO and other functions, then the maximum per-CPU power is < 90-100W!

Twitter icon