Advanced Computing in the Age of AI | Friday, March 29, 2024

Calxeda Launches Midway ARM Server Chips, Extends Roadmap 

ARM server chip supplier Calxeda is just about to ship its second generation of EnergyCore processors for hyperscale systems and most of its competitors are still working on their first products. Calxeda is also tweaking its roadmap to add a new chip to its lineup, which will bridge between the current 32-bit ARM chips and its future 64-bit processors.

There is going to be a lot of talk about server-class ARM processors this week, particularly with ARM Holdings hosting its TechCon conference in Santa Clara.

A month ago, EnterpriseTech told you about the "Midway" chip that Calxeda had in the works and as well as its roadmap to get beefier 64-bit cores and extend its Fleet Services fabric to allow for more than 100,000 nodes to be linked together.

The details were a little thin on the Midway chip, but we now know that it will be commercialized as the ECX-2000, and that Calxeda is sending out samples to server makers right now. The plan is to have the ECX-2000 generally available by the end of the year, and that is why company is ready to talk about some feeds and speeds. Karl Freund, vice president of marketing at Calxeda, walked EnterpriseTech through the details.

calxeda-midway-block-diagram

The Midway chip is fabricated in the same 40 nanometer process as the existing "High Bank" ECX-1000 chip that Calxeda first put into the field in November 2011 in the experimental "Redstone" hyperscale servers from Hewlett-Packard. That 32-bit chip, based on the ARM Cortex-A9 core, was subsequently adopted in systems from Penguin Computing, Boston, and a number of other hyperscale datacenter operators who did proofs of concept with the chips. The ECX-1000 has four cores and was somewhat limited in its performance and was definitely limited in its main memory, which topped out at 4 GB across the four-core processor. But the ECX-2000 addresses these issues.

The ECX-2000 is based on ARM Holding's Cortex-A15 core and has the 40-bit physical memory extensions, which allows for up to 16 GB of memory to be physically attached to each socket. With the 40-bit physical addressing added with the Cortex-A15, the memory controller can, in theory, address up to 1 TB of main memory; this is called Large Physical Address Extension (LPAE) in the ARM lingo, and it maps the 32-bit physical addressing on the core to a 40-bit virtual address space. Each core on the ECX-2000 has 32 KB of L1 instruction cache and 32 KB of L1 data cache, and ARM licensees are allowed to scale the L2 cache as they see fit. The ECX-2000 has 4 MB of L2 cache shared across the four cores on the die. These are exactly the same L1 and L2 cache sizes as used in the prior ECX-1000 chips.

The Cortex-A15 design was created to scale to 2.5 GHz, but as you crank up the clocks on any chip, the amount of energy consumed and heat radiated grows progressively larger as clock speeds go up. At a certain point, it just doesn't make sense to push clock speeds. Moreover, every drop in clock speed gives a proportionately larger increase in thermal efficiency, and this is why, says Freund, Calxeda is making its implementation of the Cortex-A15 top out at 1.8 GHz. The company will offer lower-speed parts running at 1.1 GHz and 1.4 GHz for customers that need an even better thermal profile or a cheaper part where low cost is more important than raw performance or thermals.

What Calxeda and its server and storage array customers are focused on is the fact that the Midway chip running at 1.8 GHz has twice the integer, floating point, and Java performance of a 1.1 GHz High Bank chip. That is possible, in part, because the new chip has four times the main memory and three times the memory bandwidth as the old chip in addition to a 64 percent boost in clock speed. Calxeda is not yet done benchmarking systems using the chips to get a measure of their thermal efficiency, but is saying that there is as much as a 33 percent boost in performance per watt comparing old to new ECX chips.

The new ECX-2000 chip has a dual-core Cortex-A7 chip on the die that is used as a controller for the system BIOS as well as a baseboard management controller and a power management controller for the servers that use them. These Fleet Engines, as Calxeda calls them, eliminate yet another set of components, and therefore their cost, in the system. These engines also control the topology of the Fleet Services fabric, which can be set up in 2D torus, mesh, butterfly tree, and fat tree network configurations.

The Fleet Services fabric has 80 Gb/sec of aggregate bandwidth and offers multiple 10 Gb/sec Ethernet links coming off the die to interconnect server nodes on a single card, multiple cards in an enclosure, multiple enclosures in a rack, and multiple racks in a data center. The Ethernet links are also used to allow users to get to applications running on the machines.

Freund says that the ECX-2000 chip is aimed at distributed, stateless server workloads, such as web server front ends, caching servers, and content distribution. It is also suitable for analytics workloads like Hadoop and distributed NoSQL data stores like Cassandra, all of which tend to run on Linux. Both Red Hat and Canonical are cooking up commercial-grade Linuxes for the Calxeda chips, and SUSE Linux is probably not going to be far behind. The new chips are also expected to see action in scale-out storage systems such as OpenStack Swift object storage or the more elaborate Gluster and Ceph clustered file systems. The OpenStack cloud controller embedded in the just-announced Ubuntu Server 13.10 is also certified to run on the Midway chip.

Hewlett-Packard has confirmed that it is creating a quad-node server cartridge for its "Moonshot" hyperscale servers, which should ship to customers sometime in the first or second quarter of 2014. (It all depends on how long HP takes to certify the system board.) Penguin Computing, Foxconn, Aaeon, and Boston are expected to get beta systems out the door this year using the Midway chip and will have them in production in the first half of next year. Yes, that's pretty vague, but that is the server business, and vagueness is to be expected in such a young market as the ARM server market is.

Looking ahead, Calxeda is adding a new processor to its roadmap, code-named "Sarita." Here's what the latest system-on-chip roadmap looks like now:

calxeda-soc-roadmap

The future "Lago" chip is the first 64-bit chip that will come out of Calxeda, and it is based on the Cortex-A57 design from ARM Holdings –one of several ARMv8 designs, in fact. (The existing Calxeda chips are based on the ARMv7 architecture.)

Both Sarita and Lago will be implemented in TSMC's 28 nanometer processes, and that shrink from the current 40 nanometer to 28 nanometer processes is going to allow for a lot more cores and other features to be added to the die and also likely a decent jump in clock speed, too. Freund is not saying at the moment which way it will go.

But what Freund will confirm is that Sarita will be pin-compatible with the existing Midway chip, meaning that server makers who adopt Midway will have a processor bump they can offer in a relatively easy fashion. It will also be based on the Cortex-A57 cores from ARM Holdings, and will sport four cores on a die that deliver about a 50 percent performance increase compared to the Midway chips.

The Lago chips, we now know, will scale to eight cores on a die and deliver about twice the performance of the Midway chips. Both Lago and Sarita are on the same schedule, in fact, and they are expected to tape out this quarter. Calxeda expects to start sampling them to customers in the second quarter of 2014, with production quantities being available at the end of 2014.

Not Just Compute, But Networking, Too

As important as the processing is to a system, the Fleet Services fabric interconnect is perhaps the key differentiator in its design. The current iteration of that interconnect, which is a distributed Layer 2 switch fabric that is spread across each chip in a cluster, can scale across 4,096 nodes without requiring top-of-rack and aggregation switches.

calxeda-fabric-roadmap

Both of the Lago and Sarita chips will be using the Fleet Services 2.0 interconnect that is now being launched with Midway. This iteration of the interconnect has all kinds of tweaks and nips and tucks but no scalability enhancements beyond the 4,096 nodes in the original fabric.

Freund says that the Fleet Services 3.0 fabric, which allows the distributed switch architecture to scale above 100,000 nodes in a flat network, will probably now come with the "Ratamosa" chips in 2015. It was originally – and loosely – scheduled for Lago next year. The circuits that do the fabric interconnect is not substantially different, says Freund, but the scalability is enabled through software. It could be that customers are not going to need such scalability as rapidly as Calxeda originally thought.

The "Navarro" kicker to the Ratamosa chip is presumably based on the ARMv9 architecture, and Calxeda is not saying anything about when we might see that and what properties it might have. All that it has said thus far is that it is aimed at the "enterprise server era."

EnterpriseAI