Hewlett Packard Enterprises has announced an AI Ops R&D collaboration with the US Department of Energy’s National Renewable Energy Laboratory (NREL) to develop artificial intelligence (AI) and machine learning (ML) technologies to automate and improve operational efficiency, including resiliency and energy usage, in data centers for the exascale era. The effort is part of NREL’s ongoing mission as a world leader in advancing energy efficiency and renewable energy technologies to create and implement new approaches that reduce energy consumption and lower operating costs.
The project is part of a 3-year collaboration that introduces monitoring and predictive analytics to power and cooling systems in NREL’s Energy Systems Integration Facility (ESIF) HPC Data Center.
HPE and NREL are using more than five years’ worth of historical data, which total more than 16 terabytes of data, collected from sensors in NREL’s supercomputers, Peregrine and Eagle, and its facility, to train models for anomaly detection to predict and prevent issues before they occur.
The collaboration will also address future water and energy consumption in data centers, that in the US alone will reach approximately 73 billion kWh and 174 billion gallons of water by 2020. HPE and NREL will focus on monitoring energy usage to optimize energy efficiency and sustainability as measured by key metrics such as Power Usage Effectiveness (PUE), Water Usage Effectiveness (WUE), and Carbon Usage Effectiveness (CUE).
Early results based on models trained with historical data have successfully predicted or identified events that previously occurred in NREL’s data center, demonstrating the promise of using predictive analytics in future data centers.
The AI Ops project sprung from HPE’s R&D efforts involved with PathForward, a program backed by the US Department of Energy to accelerate the nation’s technology roadmap for exascale computing, which represents the next step in supercomputing. HPE realized a critical need to develop AI and automation capabilities to manage and optimize data center environments for the exascale era. Applying AI-driven operations to an exascale supercomputer – which will run at a speed that will represent a thousandfold increase over today’s systems – will enable energy-efficient operations and increase resiliency and reliability through smart and automated capabilities.
The project will use open source software and libraries such as TensorFlow, NumPy and Sci-kit to develop machine learning algorithms. The project will focus on the following key areas:
- Monitoring: Collect, process andanalyze vast volumes of IT and facility telemetry from disparate sources before applying algorithms to data in real-time
- Analytics: Big data analytics and machine learning will be used to analyze data from various tools and devices spanning the data center facility
- Control:Algorithms will be applied to enable machines to solve issues autonomously as well as intelligently automate repetitive tasks and perform predictive maintenance on both the IT and the datacenter facility
- Datacenter operations:AI Ops will evolve to become a validation tool for continuous integration (CI) and continuous deployment (CD) for core IT functions that span the modern datacenter facility
HPE plans to demonstrate additional capabilities in the future with the enhancement of the HPE High Performance Cluster Management (HPCM) system to provide complete provisioning, management, and monitoring for clusters scaling to 100,000 nodes at a faster rate. Other testing plans include exploring integration of HPE InfoSight, a cloud-based AI-driven management tool that monitors, collects and analyzes data on IT infrastructure. HPE InfoSight is used to predict and prevent probable events to maintain the overall health of server performance.
We are currently accepting submissions for the 2020 Environment + Energy Leader Awards. Learn more here.