Advanced Computing in the Age of AI | Friday, March 29, 2024

Green500 Founder on Getting to Exascale: ‘Something’s Gotta Change’ 

<p>After the latest Green500 rankings were announced on Friday, <em>Green Computing Report</em> caught up with list founder Professor Wu Feng. While there's been some impressive movement at the top of the list, the overall energy-efficiency numbers do not bode well for exascale projections. </p>

After the latest Green500 rankings were announced on Friday, Green Computing Report caught up with founder Professor Wu Feng to ask him about the list's implications. While there's been some impressive movement at the top of the list, the overall energy-efficiency numbers do not bode well for exascale projections.

Green Computing Report: What stands out about this latest list?

Wu Feng: If you take a look at the efficiency of the devices themselves, you'll see that that there are three sets of accelerators on the list. The NVIDIA K20 has an efficiency of 4.98 gigaflops per watt; the Intel Xeon Phi is 4.49 gigaflops per watt; and the AMD Radeon is only 3.94 gigaflops per watt. So you would expect if they are housed in similar systems, the efficiency of the machines would line up the same way. That pattern is reflected in this list; it was not reflected on the last list.

If you look at the November 2012 list, the Xeon Phi system that was the greenest machine was heavily tuned, and kudos to that team for doing that. They tuned it to the point where the entire system overall was more efficient than the commodity NVIDIA GPU system.

GCR: What do you think of the progress that's been made?

Feng: The biggest thing that has come out over the last year and a half is that at the top of the list, the energy efficiency is increasing much faster than the mean and the median of the Green500. Looking at the top of the list, we think "oh yeah we're doing really well, we're going to get to an exaflop in a 100MW power envelope," but that's only for the top end machines, the rest of the list is really not improving that much.

Extrapolating to Exaflop (published November 2012)

If you look at the top 50, it's pretty much either BlueGeneQ systems, which are custom homogenous systems, or there are heterogenous systems with GPUs or coprocessors. There are just a smattering of homogenous commodity based CPUs in the top 50.

GCR: You've bemoaned the fact that power consumption is still on the rise...

Feng: The reason efficiencies are improving is that we are doing more with the power we are consuming. As an analogy, we're improving the fuel efficiency of our supercomputers much like we're improving the fuel efficiency of cars. Overall, as a society we're continuing to increase our consumption of fuels as the population increases and there are more drivers on the road, so our overall energy consumption is increasing but the efficiency with which we're using it is better.

GCR: Are we moving fast enough considering the context of climate change?

Feng: In the About section of our site, we hold that it's part of our mission to ensure that supercomputers are only simulating climate change and not creating it.

Relative to the goals that the DARPA study is shooting for – a 20MW exascale system, which is very optimistic – we're still on the path of having an exaflop supercomputer that will be on the order of 300+ MW and that's really not sustainable. Money-wise, we're looking at a million US dollars per MW per year, meaning a 330MW supercomputer would cost about 330 million dollars to operate annually.

We can work on continuing to make the systems more energy-efficient by bringing the power consumption down, but ultimately something's gotta change. I'm extrapolating from the top part of the list and that's the part of the list that's increasing the fastest in terms of its energy-efficiency. As we extrapolate those numbers out, the estimate is 300-400 MW for an exaflop system right now.

GCR: Could systems like Eurotech and Beacon maintain their FLOPS-per-watt at scale?

Feng: Probably not – as you grow in size, efficiencies will start to drop. Just like with a car - as the car gets bigger, there is additional overhead, more parts to keep moving and you need more power to start it and slow it down. So the numbers I'm giving you right now are a projection of the best case scenario, but the hope is that technologically, there may be something out there that can help us get the power consumption down.

GCR: What do you think of the energy efficiency profile of the Tianhe-2 system?

Feng: It's pretty good. It's along the lines of many of the other Xeon Phi based systems. On the last list, the only Xeon Phi based system that was around 2.5 gigaflops-per-watt was the Beacon machine, and that was because it was heavily tuned. If you look more carefully and look at where the large swath of Phi systems were on the last list, and you ignore the number one machine, most of the machines were at the 1.9 gigaflops-per-watt area and that's where the Tianhe-2 supercomputer is, so it's about what is expected.

GCR: Is it fair to compare low-core-count machines with high-core-count machines, for example comparing Eurotech with Titan?

Feng: It's much the same as we view the energy-efficiency of cars. So there's a minimum bar in terms of what you would consider a car that's drivable, and just like the Green500, applicants need to meet some minimum performance benchmark to be considered a supercomputer. Typically as cars get bigger or as computers get bigger, you would expect that the energy efficiency would go down. So usually when you get more and more cores, you have to have more networking, and there's more data movement, and with the network infrastructure and all that overhead to keep the bigger machine running it's going to be less efficient.

GCR: Final question. I noticed that Eurora had different metrics on the TOP500 versus the Green500, the Rmax was slightly higher and the energy was quite a bit higher.

Feng: When teams submit to the Green500 they submit a Linpack run that is at least as fast as the entry point into the TOP500, but they may wish to run in a more energy-efficient mode, and so when they do that, they'll get a lower performance number, but they'll often times get a significantly lower power number, thus improving their energy efficiency.

GCR: Thank you for clarifying that.

Feng: Think about it this way again, think about it with respect to cars. If you're going 72 miles per hour and you have a lot of wind you're driving into, and you cut your speed down to 70 miles per hour, your fuel efficiency is going to go up. If you do the same thing with a computer, you take some edge off the the performance, and your energy efficiency is going to improve.

EnterpriseAI