By Josh Mesout Chief Innovation Officer,
Civo
Generative
AI may be garnering all of the attention right now, but there are many branches
of Artificial Intelligence (AI) that start-ups and enterprises have been
looking to capitalize on for a long time. Machine Learning (ML) is no
exception. Indeed, 74% of executives anticipate that AI will deliver more
efficient business processes, and 55% said it will enable the creation of new
products and services.
Yet in
reality, 85% of ML projects fail to deliver and only 53% of
projects make it from prototype to production. Engineers and developers know
the high potential that ML has for their organizations, but there are huge
challenges in realizing it.
The
hurdles
Part of
the issue with ML projects is that it takes a vast amount of time and resources
to build the supporting infrastructure for only a minimal amount of ML insight.
These components range from complex areas such as feature extraction to more
labor-intensive tasks such as setting up process management tools.
According to D Scully at Google Research, eight hours of ML engineering is dependent on 96
hours of infrastructure engineering, 32 hours of data engineering, and 24 hours
of feature engineering. To break down this whole process by percentages, 60% of
hours are spent on infrastructure engineering, 20% on data engineering, 15% on
feature engineering, and only 5% on ML engineering.
Organizations
are having to spend vast amounts of time reconfiguring their adjacent
infrastructure to achieve their ML goals. For those of a smaller size with more
limited resources, they do not have this time to spare for such a small amount
of reward.
Open
sourcing
As a
result of these significant internal demands to run ML, more and more engineers
are becoming dependent on open source to help resolve these issues. According
to Anaconda's State of Data Science 2022 report, 65% of companies lack the investment in tooling to
enable high-quality ML production, with 87% of organizations already leveraging
open source software.
Organizations
look towards open source machine learning products for a variety of reasons.
For
start-ups or organizations of a smaller size, it delivers the most cost and
resource effective method for running ML algorithms. Spending two months
learning how to use AWS SageMaker before accessing ML insights isn't a feasible
use of time for many businesses, especially when the non-proprietary
infrastructure is available instantly.
On top of
being more economical, it often offers the most in demand tooling alongside
superior product quality. Not being stuck behind proprietary dependencies means
the tooling can be easily adjusted for specific cases, reducing the complexity
involved in withdrawing ML's valuable insights.
Open
source also allows organizations to leverage the latest ML expertise available.
Many of the most popular projects, such as Kubeflow, are frequently contributed
to by some of the best and the brightest minds in the industry, so
organizations can capitalize on external knowledge that may otherwise be out of
their reach and focus their domain expertise on the problem.
All of
these benefits go some way to resolving the common issues found with ML. Yet
there is always more that can be done, 32% of ML engineers want to see further simplification
in the data science community, which can further smooth the learning curve for
drawing insights from ML.
What else
can be done?
To drive
the accessibility of ML to the next level, a constructive ecosystem needs to be
built and maintained. The users and resources are already there but need to be
channeled in the correct manner. Investing in open source cloud ecosystems can
remove barriers to the adoption of ML and make it more accessible.
The first
port of call is developing more interoperable tooling. It is clear what tools
are favored by developers, and the correct infrastructure and maintenance can
help to support them sustainably. By increasing the accessibility of the
tooling that developers are familiar with using, time can be reduced -
developers don't have to go through a learning period when setting up
algorithms for new use cases and instances.
Barriers
to ML can also be reduced through a variety of different solutions. GPU Edge
boxes will allow ML to be run as effectively in on-prem, hybrid, and edge-based
use cases, ideal for secure workloads that need to be kept in-house.
GPU
instances provide streamlined methods for running ML, with fast launch times
and bandwidth pooling. More importantly however for organizations, GPU
instances provide a transparent pricing model. As such there will be fewer
unknown costs that can take smaller companies by surprise and leave a huge dent
in their budget.
Fractional
GPU instances provide similar benefits to GPU instances but may be more
appropriate to those of a lower scale, either small businesses or hobbyists. By
incorporating those into the ecosystem who traditionally may not have had
access to ML, understanding, and accessibility can be increased for all.
ML
shouldn't be a closed shop, where its potential is only realized by those of
scale. Through prioritization of developer's needs and open source tooling, ML
can be made accessible to all.
##
To learn more about the transformative nature of cloud native applications and open source software, join us at KubeCon + CloudNativeCon Europe 2023, hosted by the Cloud Native Computing Foundation, which takes place from April 18-21.
ABOUT THE AUTHOR
Josh Mesout Chief Innovation Officer,
Civo
Mesout spent seven years at AstraZeneca,
where he led the teams building the company's Enterprise Machine learning and
AI platforms. In addition to this, Mesout has led the technical implementation
of Deep Learning based clinical diagnostics using cloud native technologies,
built rapid prototypes in its Innovation Lab and won an award for his
contribution towards implementing ML into the AstraZeneca COVID-19 vaccine.
Mesout has a long history with cloud and ML, assisting in the creation of
learning materials, qualifications and exams for AWS' ML platform, SageMaker.
Mesout now works to accelerate Civo's mission of building a better cloud-native
world. He currently leads Civo's ML program, building ML infrastructure that
will lessen the workload on developers, significantly reducing the time from ML
idea to insight.