Data Center Liquid Cooling Boils Over

Julius Neudorfer

HAS DATA CENTER COOLING REACHED THE BOILING POINT?

As the summer heat manifests its hold on the northern hemisphere and the temperatures rise, many data centers will test the limits of their cooling systems and, in some cases, also hope that the IT equipment can in fact operate at the upper boundaries of the 2011 ASHRAE Expanded Thermal Guidelines.

Of course, when we think of the ASHARE charts we instinctively assume that we are discussing air-cooled IT equipment. The latest edition of the thermal guidelines, which based on the consensus of the TC 9.9 committee, made it clear that modern IT equipment is much more tolerant of broader environmental conditions. While the “recommended” range remained unchanged at 64.4°F to 80.6°F (18°C to 27°C), it is the A2 “allowable” range of 50°F to 95°F, which represents the majority of IT hardware (except tape drives), which provides greater opportunity to save cooling system energy. This can be done by most existing cooling systems by simply raising the supply temperatures slowly or for new installations by incorporating free cooling in the design.

While I won’t belabor the pros and cons of different types of free cooling, one of the stated goals of the new ASHRAE Thermal Guidelines is the long-term goal to minimize or eliminate mechanical cooling. Nonetheless, not everyone will feel comfortable with this, and so while most IT hardware manufacturers and many in the data center industry in general now realize that there are many opportunities for energy savings, the majority will still use air-cooled IT equipment within the “recommended” ASHARE ranges for quite some time to come.

So by now you’re probably wondering what does this have to do with the provocative “Boiling Over” title? Well beside the expanded A1-A4 guidelines, in 2011 ASHARE also published the liquid cooling guidelines known as W1-W5, which clearly defined and encourage the use of liquid cooling.

DATA CENTER HYDROPHOBIA?

Liquid cooling is not new in the data center. Some of the original mainframes and supercomputers utilized direct liquid cooling. However, in this “modern” age, the majority of IT hardware is air-cooled, and for most people, liquid cooling has become an antiquated concept. For some, there is a fear of water: data center hydrophobia.

Of course, the term liquid cooling has been used for marketing of many cooling systems used to provide so called “supplemental” cooling or “close coupled” cooling, such as in-row, contained cabinet, rear-door, or overhead style cooling units. But while these systems are primarily aimed at dealing with high density heat loads, all of these types of cooling units still are meant to support conventional air-cooled IT hardware. However, while these types of cooling units support higher heat loads for air-cooled servers, typically 15 to 30 kW per rack, nonetheless there are some applications, such as high performance computing (HPC) that demands much higher power densities.

In addition, cooling system efficiency for air-cooled IT equipment has vastly improved, especially over the last few years. As a result, PUEs for new designs have dropped substantially to 1.1 or less for some internet giants’ sites, which utilize direct airside free cooling, and even 1.3 for some not using outside air. Nonetheless, even as we approach the “perfect” PUE of 1.0, the fact remains that in most cases virtually all the energy used by the IT equipment still becomes waste heat. While there are some exceptions, such as using the warm air to heat administrative office spaces within the building, it is very difficult to make large scale use of the “low grade” heat from the IT exhaust air (typically 90°F to 110°F). Even using rivers, lakes, or an ocean to absorb the heat does not change the fact that the energy is wasted and heat is added to our environment.

Moreover, in our quest for energy efficiency, we tend to forget that even the lowest PUE overlooks the fan energy within the IT equipment itself, which can range from 2% to 5% of the actual power delivered of the IT components in most cases. However, it can be an even higher 5% to 10% for 1U servers with eight to 10 small high speed fans operating at high intake temperatures. Of course the IT fan energy is in addition to the fan energy for the facility equipment (free cooling or conventional).

So back to liquid cooling, let’s look at a brief summary of the five classes in the ASHRAE Liquid Cooling Guidelines and the developments in the IT equipment and cooling systems that use liquid cooling. Classes W1-W3, which cover fluid temperatures of 35°F to 90°F (17°C to 32°C) are generally intended for incorporation into more traditional data center cooling systems. These are meant to be integrated with chilled water or condenser water systems or at the higher ranges of W3 primarily using cooling towers. This can involve the use of fluid-to-fluid heat exchangers and IT equipment that can accept liquid cooling or as part of air-to-fluid heat exchangers within an IT cabinet. It is the W4 and W5 classes that really expand the ranges and are the basis for so called “warm water” and “hot water” cooling systems.

HOT WATER COOLING?

While “hot water cooling” may seem like an oxymoron or at least a strange statement, in fact the Intel Xeon CPU is designed to operate at up to 170°F (at the heat transfer surface – maximum case temperature). This allows for the CPU chip to be “cooled” by “warm water” (ASHRAE Class W4: 95°F to 113°F) or even “hot water” (W5 above 113°F). Moreover, the highest class W5 is specifically focused on energy recovery and reuse of the waste heat energy and is a much sought after long-term goal of the data center industry in general and for HPC-supercomputing in particular. In fact, currently IBM, HP, and Cray have liquid-cooled mainframes and supercomputers.

In addition, immersion cooling of IT hardware has been on the periphery for the past several years, originally introduced by Green Revolution Cooling, which involved submerging the IT equipment into a non-conductive dielectric fluid (such as mineral oil), which effectively absorbs the heat from all components. One implementation (which in 2010 I described as a deep-fryer for servers), involved using standard 1U servers that had been modified by removing the fans and modifying or removing the hard drives (which may void the OEM’s warrantee). Technically speaking from a heat transfer perspective it is very effective, however as you can imagine, this involves a variety of issues from a practical viewpoint.

The equipment cooling cabinet resembles a large bathtub containing several hundred gallons of mineral oil. The modified IT hardware is lowered into the open top fluid bath and then the various network and power cables (typically 120 to 240 VAC) are connected by the technical staff. The warmed fluid can then be pumped to various types of heat rejection systems (such as an evaporative cooling tower), which in most cases do not need any mechanical cooling, making it very energy efficient. Development has continued, in some cases, other manufacturers have built high-performance supercomputing hardware specifically designed for immersion cooling; one such system is the TSUBAME-KFC located at the Tokyo Institute of Technology in Japan (note: In Japanese, Tsubame can refer to a type of bird, and the “KFC” suffix was purely coincidental).

HITTING THE BOILING POINT

So while 113°F (or even 170°F at the chip) is not at the boiling point of water, other fluids “boil” (phase change) at lower temperatures. As an alternative to the immersion system based on mineral oil, another more recent design was developed in conjunction with Intel using 3M™ Novec™ Engineered Fluid (a dielectric fluid with a low boiling point of approximately 120°F) and a tub with an enclosed top. This was developed as part of the proof-of-concept, SGI® ICE™ X, the fifth generation of a distributed memory supercomputer and using the Intel® Xeon® processor E5-2600. The submerged IT equipment “boils” the Novec fluid so that it becomes a vapor that rises to the top of the tub. It then comes into contact with cooling coils which absorb the heat and causes the Novec to condense back to a fluid.

One of the primary advantages claimed is the improved heat transfer based on the phase change and also that the condensing coils can operate at higher temperatures. This allows the final external heat rejection system to avoid the usage of water for evaporative cooling. It also provides an expanded range of heat reuse options due to the higher water temperatures.

However, one of the major drawbacks of the immersion cooling is the need for personnel handling IT equipment to come into contact with the fluids, as well as exposure to vapors in open style phase change systems. While further development may continue to mitigate these issues, it is seen generally as a limiting factor for broad acceptance for mainstream applications.

Other vendors such as Isotope have developed enclosed systems, which use sealed server modules that contain dialectic fluids and have connectors that attach to water lines which circulate water though isolated passageways in the server module.

Meanwhile, others such as HP, developed “dry disconnect servers,” which transfer heat to warm water cooled thermal bus bars in the chassis of their new Apollo line of supercomputers, thus avoiding fluid contact with the server modules, and perhaps alleviating the hydrophobic concerns of potential customers.

THE BOTTOM LINE

So is there a more mainstream future for liquid cooling? Its primary driver is support for much higher heat densities while using much less energy to remove the heat. Of course the term mainstream computing is evolving as cloud computing and other internet-based services such as search and social media change the definition and boundaries. The hyper-scale computing giants such as Google, Facebook, Amazon, and Microsoft only have to deliver computing functions as a service, making users less concerned with the form factor of IT hardware, much less how it is cooled, and more about performance, reliability, and price (not necessarily in that order). As we move ahead, we need to recognize that we are entering the age of “industrial computing” and like most industrial processes, form follows function. The machinery that will deliver utility scale computing is designed to optimize performance and efficiency, not human convenience or elegance.

As the summer heat peaks and we use more energy for our chillers and CRACs as we endeavor to keep our conventional servers “cool,” try to remember that the real future efficiency goal is not only to reduce or eliminate our dependence on mechanical cooling. We need consider the long-term importance of how to recover and reuse the energy used by the IT equipment rather than just continuing to dump the “waste” heat into our environment.

Setting aside how we cool our servers – by blowing air on them, “boiling” them or “frying” them in oil – the continuing developments and use of liquid cooling should be watched carefully. Perhaps in the not so distant future I will also need to consider updating my column to “Hot Tub Insight.”

[NOTE: This blog initially appeared in Mission Critical Magazine]

newsroom

You are here