Virtualization Technology News and Information
Addressing the AI Engineering Gap

Written by Joe Hellerstein

Realizing the dream of AI is more process than magic. That's the good news. But the process itself is a journey.

Given the talk of AI in the general media, you could be forgiven for thinking that it will ‘just happen'. It sometimes sounds as if machine learning is on an unstoppable roll, poised to revolutionize industry and commerce through sheer momentum of technology. But of course, it's not that simple for organizations that want to adopt AI methods. True, the technologies are maturing at an unprecedented rate. There are countless innovations already being developed to help diagnose deadly diseases, make million-dollar trading decisions, learn skills like driving, and automate many other tasks that have the power to transform whole industries.

But of course we know that AI is not magic. What's less widely appreciated is that, for the most part, modern AI is not even science or math, in the sense of being based on fundamental truths. Modern AI is an engineering discipline-a painstaking process for creating useful things. AI engineering is in some ways similar to software engineering, but with unique properties due to its deep dependencies on large amounts of data and relatively opaque software. Doing AI engineering well requires processes and tools that are under development.

Mind the gap

We're living through watershed years for core AI methods. Development is happening at an unprecedented rate. But the tools and best practices for AI engineering are not maturing nearly as quickly. The effect will be that this engineering gap will slow the penetration of AI across industries, while offering opportunities for innovative software solutions that focus on big data and AI engineering processes. 

Headline breakthroughs in AI have come fast and furious in recent years, fuelled by the rapid maturing of techniques using deep learning, the success of GPUs at accelerating these compute-hungry tasks, and the availability of open-source libraries like TensorFlow, Caffe, Theano and PyTorch. This has accelerated innovation and experimentation, and led to impressive new products and services from large tech vendors like Google, Facebook, Apple, Microsoft, Uber and Tesla.

However, I predict that these emerging AI technologies will be very slow to penetrate other industries. A handful of massive consumer tech companies already have the infrastructure in place to make use of the mountains of data they have access to, but the fact is that most other organizations don't - and won't for a while yet. 

There are two core hurdles to widespread adoption of AI: engineering big data management, and engineering AI pipelines. The two effectively go together, and the rate of their uptake will determine the pace of change. Big data engineering competency, for example, is only now just beginning to take hold across industries - though this is very likely to accelerate in 2018 with the growing adoption of cloud-managed services. AI engineering competency is the next hurdle - and it's likely to be many years yet before it becomes widespread across industries beyond the tech giants.

Engineering big data

Let's not put too fine a point on it - success with modern AI depends on success with big data. And as a result it should be no surprise that data-rich consumer services like Google, Facebook and Uber are leading the AI charge. Recent advances in deep learning only work well when given massive training sets to play with. But accumulating this training data and having the right to use it can be difficult or infeasible outside of marquee application areas in consumer tech.

It's not just about acquiring data either - getting it's one thing, but the challenge of managing that data effectively is quite another. Successful AI organizations tend to have engineering expertise with core big data technologies like massively scalable storage, data preparation, analytic engines and visualization tools. However, increasing numbers of traditional organizations have recently succeeded in moving up the big data learning curve, and the emerging acceptance of cloud-managed solutions should accelerate the spread of that competency across more industries in the near future.

Engineering AI

Many people don't appreciate that AI today is an experimental science. The latest successes in AI are not the result of mathematical proofs, they're the result of exhaustive experimentation and tuning of ad hoc machine learning models. That ongoing experimental process requires discipline and tools, or it breaks down quickly and can lead to the kind of false predictive behaviours, hidden feedback loops and data dependencies outlined in Google's paper on Machine Learning: The High Interest Credit Card of Technical Debt

Engineering an AI application involves pipeline development on samples (usually the job of data scientists), training models on massive data sets (often the job of data engineers), and serving models for live predictions in production (usually managed by DevOps engineers). Monitoring of the serving tier then feeds back to the earlier phases in the pipeline. Each of these phases involves continuous iteration, and there needs to be regular iteration across phases and teams as well. Tools and processes need to be in place to manage daily evolution and debugging of code, pipeline specs, training and testing data, live data feeds, live experiments, and metrics of served models. It's a long and involved process, and one that requires a lot of intensive data wrangling to deliver satisfactory results.  

Creating a brighter future for AI

To date, systems for streamlining these processes have not been a focus in AI research or in open source. But this issue is an active area of chatter in the tech community, and opportunity knocks for new projects and products in this space. They'll need the skills, know-how and vision to create radical data productivity, but they'll be essential to helping the development of AI and machine learning be as good as they can be. 


About the Author

Joe Hellerstein is Trifacta's Chief Strategy Officer, Co-founder and Jim Gray Chair of Computer Science at UC Berkeley. His career in research and industry has focused on data-centric systems and the way they drive computing. In 2010, Fortune Magazine included him in their list of 50 smartest people in technology, and MIT Technology Review magazine included his Bloom language for cloud computing on their TR10 list of the 10 technologies "most likely to change our world".
Published Friday, February 23, 2018 7:33 AM by David Marshall
Filed under: ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<February 2018>