Virtualization Technology News and Information
Article
RSS
The Smart Data Center

Written by Mark Campbell, Chief Innovation Officer, Trace3

It is a truism that artificial intelligence (AI) is pervading all facets of technology from spotting early-stage cancer, to understanding human speech, to swapping your face with your cat's in real-time hi-res video. A stampede of consumer applications has fueled and funded the mainstream demand, social acceptance and growing ubiquity of AI and now thinking systems are exploding into the enterprise IT landscape. 

Enterprise IT has seen AI become mainstream in many use cases such as cyber security, IT operations, monitoring, data analytics, business process automation and infrastructure provisioning as a response to the widening gap between the slow growing skilled labor pool and the meteoric growth in IT workloads. Today, smart products are already augmenting operations and analytics by sifting through a dizzying amount of operational telemetry data, spotting anomalies, correlating events and determining root cause. We are also seeing smarts being injected into virtual and physical infrastructure provisioning, process automation and new products hitting the streets every week taking AI into new frontiers.

As AI implementations mature, they invariably transform from a passive reporter that explains what happened, makes recommendations or identifies anomalies into a more active player that predicts failures, steps in autonomously to adjust processes and deploys or destroys capacity auto-magically. We have seen this trend take root in the cloud-native world and there are strong AI beachheads in the hybrid and private cloud worlds.

However, for data centers we are only now seeing the first raindrops of two gathering monsoons: AI for Data Centers and Data Centers for AI -- read that twice. 

AI for Data Centers - Many data centers are coupling AI with their data center information management (DCIM) systems to provide smart data center operations. In 2014, Google used DeepMind to observe and recommend control tweaks to fans, ventilation and cooling equipment across their data centers, resulting in a utility cost reduction of 40%1. This year Google went full Monty and turned over full control of all cooling system operations to a self-taught algorithm that does not just recommend changes but autonomously adjusts the controls directly, observes the result, learns and gets smarter2. It's too early for quantified results yet but early indications look promising. 

But we're just getting started. On the way are smart products that will virtually-relocate heat generating compute loads across row and rack locations for optimal temperature control. Other DCIM vendors are looking at using AI algorithms to vary data center target temperatures based on evolving hardware tolerances, power consumption/cost trends and transient workloads. Beyond cooling, the potential for data center cost savings from AI-driven power distribution and management, which consumes 1.8% of all power in the US3 alone, is equally compelling. Scale this globally across all data centers and the impact is dinero grande.

Looking a bit further down the road, there are emerging smart DCIM systems that incorporate data center IoT sensor data such as heat, airflow, vibration, ultrasound, power consumption, water and smoke detection into AI-based platform that not only detect anomalous data center behavior but also determine the source and cause of the issue4. Soon these smart DCIM systems will not only say when,

where and why something failed but will also be able to predictively alert operators before things go awry5 and, in some cases, autonomously interdict. 

Data Centers for AI - As artificial intelligence changes almost every application housed in the data center, it is also reshaping the software development lifecycle (SDLC) itself. Traditional applications evolve via programmatic changes to their underlying code base which are then verified via rigorous testing and deployed to production in a controlled, managed and repeatable (and unidirectional) fashion. AI-based applications, however, do not rely on code changes or one-way deployment. Rather, many evolve smarter and smarter models in a development environment and deploy these to production, while others train themselves in production where they learn from real-world data and propagate these learnings back into the development environment. This bi-directional nuance has a fundamental impact on data center networking topologies.

AI algorithms, whether embedded in more traditional third-party applications or developed in-house, do best when trained on a boatload of data that is a real and relevant as possible. This means that in many cases live production data is best for training, while in other applications, external data is used to train in a non-production environment and the resulting smart model is deployed into production. In both scenarios, AI applications are not just lobbed from non-production over into production, but are instead volleyed between the two, requiring the network segmentation between environments to become more of a permeable fabric than a defensible moat.  

AI training takes a heap of computation and a mountain of data - the more of each, the better. To satisfy this gargantuan appetite for compute power, AI training is increasingly being conducted on nonCPU-centric servers built on massive arrays of GPUs, FPGAs, custom ASICs or purpose-built deep learning processing units that deliver orders of magnitude performance gain. These systems are, unfortunately, gluttonous power hogs with today's systems gobbling up 30-50 kW/rack and next-gen systems estimated to reach upwards of a staggering 100 kW/rack. According to Jason Carolan, Chief Cloud Officer at Flexential, operator of over 40 data centers, "This is simply not supportable at scale by most of today's data centers without substantial re-engineering for cooling containment solutions such as liquid cooling".6 

Beyond power, these super-crunchers only operate as fast as the training data provided to them. This is fueling the demand for large, cheap and lightning fast near-line storage, triggering a new storage arms race of faster controllers, protocols (e.g., NVMe & NVMe-oF) and media (e.g., 3D XPoint & 3D NAND). 

AI-based applications, in many cases, require a non-production training environment with more compute and storage horsepower than production. This reverses the time-honored tradition of building non-production environments from retired production hand-me-downs. Instead, we will see these shiny new compute and storage platforms deployed into development and training environments along with state-of-the-art network, SAN and associated monitoring and management tools. These evolutions will require radical transformations for both server and storage infrastructure topologies throughout the data center. Even with cloud ubiquity, owned and co-location data centers are not going away anytime soon. However, they will certainly need to grow smarter at running themselves and adapt to support smarter architectures and automated infrastructures. Those data centers that are able to make this evolution will provide the foundation to support the next generation of smart solutions.

##

About the Author

Mark Campbell 

Mark Campbell is the Chief Innovation Officer at Trace3 where he combines an insider's advantage from leading venture firms with his 25 years of real-world IT experience to help enterprises discover, vet, and adopt emerging technologies. His ‘from the trenches' perspective gives Mark the material for his frequent articles and speaking engagements.

Sources:

1 - The Verge, 2016, "Google Using Machine Learning to Boost Data Center Efficiency".

2 - MIT Technology Review, 2018, "Google just gave control over data center cooling to an AI".

3 - Berkeley Lab, Energy Technologies Area, 2016, "United States Data Center Energy Usage Report". 

4 - LitBit, 2018, "Introducing Dac, the First AI-Powered Data Center Operator"

5 - Nlyte Software, 2018, "Predict the Future with our Premier Machine Learning Engine"

6 - Jason Carolan, Chief Cloud Officer, Flexential, 2018, Interview

Published Tuesday, February 19, 2019 7:30 AM by David Marshall
Filed under: , ,
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<February 2019>
SuMoTuWeThFrSa
272829303112
3456789
10111213141516
17181920212223
242526272812
3456789