The whole concept of “green” computing has been focused in recent times on the metric introduced by The Green Grid, PUE, Power Usage Effectiveness. PUE is simply the ratio of total facility power divided by the IT load itself. So a PUE of 2 represents a facility where for every 1kW consumed by the IT load the facility consumes 2kW – although PUE has to be averaged over a full year and is measured in terms of energy, annual total kWh.
The industry has moved from an average PUE of well over 2.5 in the mid 90s to a typical 1.4 in 2011 – but with extreme examples, from Google, Yahoo and Facebook, jostling for supremacy between 1.07 and 1.13. Over recent times there have been two major contributors to the improvement in efficiency – the mechanical cooling systems and the IT hardware itself.
Traditionally, data-centers, which innovated in the mid-50s as mainframe machine rooms, required very tight limits on temperature and humidity so as to avoid either static electricity building up on magnetic tape-heads from air that was too dry or swollen punch-cards from air that was too damp. For reasons that can only be rooted in the overly-cautious engineering paranoia that has characterised the industry, even until today, the relaxation of thermal conditions has only very recently occurred – driven by ASHRAE, the American Society of Heating Refrigeration and Air-conditioning Engineers, and their Thermal Guidelines for IT equipment. The general, albeit slow, drift away from precision air-conditioning and data-center interiors that resemble chilled cold-stores to an environment where the air-flow is well managed and considerably warmer has enabled the lowering of PUE’s without undue risks being taken with the IT hardware. That is not to say that the IT hardware will run as reliably in the widened thermal limits (especially in the combination of high humidity and poor air-quality resulting in accelerated corrosion) but the modern trend for 3-4 year refresh rates negate most of the potential downsides. It is worth noting that some organizations’ claims for IT power efficiency improvements are based entirely on the rate of hardware refresh – something that often raises questions about embedded carbon and controlled recycling.
So here we are at the point of arguing about single digit improvements in PUE – e.g. from 1.11 to 1.10 – with marketing departments working overtime trying to “out-PUE” their competitors and find some sort of “green” differentiators, be they renewable energy contracts or solar-PV on the roof despite it only being capable of powering the lights.
But the real target for energy efficiency lies within the “1” in the PUE – within the IT load itself – the real elephant in the room. As an efficiency metric PUE has proven itself to be very successful and universally accepted but it does have some drawbacks, the main one being that it assumes that all of the IT power is being used to good effect. In reality we can question this assumption based on a few simple facts, without even going anywhere near any social worth or low-carbon enabler deliberations: Firstly the average server utilization in the world (excluding virus checking!) is down around the 10% mark and the ever growing processor speed and compute-capacity is doing nothing to help. Secondly, server virtualization, the darling of the IT industry at the moment, only serves to increase the poor 10% server load up to 30-40% and, thirdly (but probably the most importantly) is the fact that most servers don’t consume power in a linear fashion versus load. Hence a typical server consumes 35-40% of its maximum power when idling (doing nothing). If you have a particular server in mind you can look on the Spec_Power website where you will find all the idle-power’s listed by model – and you should be shocked when you spot some very recent tin listed with 69% idle-power! However, it has to be said that the average is gradually reducing and most popular models listed in the last 18 months have 35-40% idle-power. This has improved but when you put the average server utilization together with the power demand at low compute load you reach the crushing view that we should not be looking too hard at PUE! If, and it’s a big if (although I am sure that the server OEMs are working hard on it) the power draw was better related to server load then the power demand of our existing data-centers would be a fraction of the current situation. Do the math: Take a current facility running 5,000 servers whose peak power demand is 300W each, a total IT load of 1500kW. The design PUE could be 1.2 at full load so the facility would be drawing 1800kW from the utility at full tilt. At 20% chip utilisation we might have a server power draw of 50% of its maximum and 750kW load. At the partial load a typical PUE of 1.4 would be drawing 1050kW from the grid.
Compare this to the same facility with the servers drawing 20% power for 20% load, the grid draw would be 420kW – a 60% reduction! Compare this again if the servers were able to be loaded at 90% load with 90% draw. We then only need 1,110 servers that would draw 360kW from the grid, whilst needing a building and infrastructure of 20% of the original size and cost. So is IT the efficiency elephant in the room or is the room itself the elephant?
Ian Bitterlin is CTO for Ark Continuity – a developer of high integrity, low carbon, data-centre’s based in Corsham, Wiltshire. With a strong real-estate portfolio, well over 100MVA of power and planning consent for >100,000m² of critical space in multiple UK locations, Ark are at the forefront of the low-energy data-center market.