Written by Yashar Behzadi, CEO, Neuromation
It is projected that there will be 45 billion
connected cameras in the world by 2022. Coupled with breakthroughs in AI
computer vision, there is a tremendous opportunity to build a broad set of
high-value applications surrounding this new technology. Among other use cases,
AI computer vision will power deeper perception for autonomous vehicles, enable
more capable industrial robots, drive more accurate medical imaging
diagnostics, and provide higher levels of security in private and public
settings.
However, modern computer vision AI algorithms,
such as convolutional neural networks, require vast amounts of labeled data to
train the models. This data labeling is typically performed by humans and is
inherently limited. For example, a typical frame captured by an autonomous
vehicles needs to be meticulously labeled to identify objects (cars, buildings,
etc), people, and environmental factors. The costs for this work can range from
$0.50 to $1 per picture frame and take a single individual 30+ minutes to label
all associated pixels. In addition to being expensive, time consuming and error
prone, many key attributes such as the distance to an object, 3D bounding
boxes, and object velocity cannot be labeled at all by humans. Looking ahead,
as the systems we build with AI become more sophisticated, more complex labels
like driver or pedestrian intent and identification of partially obstructed
objects will be increasingly important to build safe and high performing
systems. It is clear that the current paradigm of human-in-the-loop labeling is
limiting the advancement of computer vision AI.
To facilitate the creation of more capable AI
computer vision systems, our company is focused on pioneering the use of
Synthetic Data, or digitally created data that mimics real data. In the context
of computer vision, this data takes the form of relevant video, 3D environments
or images used for training deep learning models. We believe that Synthetic
Data will become an essential component of the future technology stack for AI computer
vision. By bringing together techniques from the movie and gaming industries
such as simulation, computer generated imagery (CGI) with the emerging
technology of generative neural networks, with such techniques as generative
adversarial networks (GANs) and variational autoencoders (VAE's), it is now
possible to create vast, perfectly-labeled, realistic datasets to extend and
enrich traditional datasets. The incremental cost of each procedurally
generated image is nearly zero, providing previously unheard of scalability;
and since the data is digitally created, all of its attributes are known to
pixel-perfect precision. Key labels such as depth, 3D position and partially
obstructed objects are provided by design.
We are currently working with leading
technology companies to enable them to build better models with Synthetic Data.
Below are a few examples of use cases for which we see an almost ubiquitous
need among companies actively implementing AI development and transformation
programs:
- Rapid prototyping:
As companies build new AI products and solutions, it is often important to
do trade-off studies related to the overall system design and model
performance. For example, we see many retailers considering the use of
camera systems for inventory management, customer analytics and
customer/product interactions or handset manufacturers contemplating new
camera configurations and imaging modalities to improve facial
verification systems. By using Synthetic Data, AI developers can easily
understand the relative value of the number, type and location of cameras
without having to go through a prolonged process of building
representative hardware, acquiring data under various configurations,
labeling the images and building various models. Another key area for
rapid prototyping is in robotics where it is impractical to build hardware
variants and undergo long training sessions to mimic real-world scenarios.
The use of Synthetic Data together with reinforcement learning techniques
can cut months off development cycles and enable more capable robots.
- Reduction of
bias: Often companies lack sufficiently diverse data to build unbiased
algorithms. This is especially problematic when it comes to face
verification applications in which systems underperform or adversely
target certain demographics. To help with this issue, we are working with
a major technology company to develop a representative set of synthetic
identities. In addition, to creating a balanced dataset representative of
all demographics, Synthetic Data approaches can create a wide range of
images for a particular identity capturing variability associated with
view point, facial hair, make-up, accessories (glasses, hats, etc) and
environment (indoor, outdoor, etc). The use of Synthetic Data also eliminates
the privacy concerns inherent with using real-world captured face data. In
addition to solving key technical issues, Synthetic Data may in this way
also address key ethical issues with how models are built.
- Greater
model robustness: Synthetic Data can be highly parameterized, allowing
precise control over aspects like object position, image background,
camera position and viewpoint and light source position and intensity. The
combinatorics of procedural generation enables near infinite variability
of images, leading to more robust and generalizable models.
- Understanding
complex environments: AI computer vision systems are beginning to excel at
characterizing more complex real-world scenarios, like the identification
of potential security situations, characterizing customer and product
interactions, or understanding the complex dynamics of crosswalks in a
crowded urban environment. Synthetic Data using agent-based simulations is
extremely promising as interactions and intent can be more precisely
labeled and understood.
However, although Synthetic Data is an
extremely promising enabling technology for AI, key technical challenges remain
for its widespread adoption. The core issue of synthetic data lies in the
difficulty of effectively matching the generated data to real data. Without
proper matching, the introduction of synthetic data can lead to poor model
performance and introduce bias. To solve this issue, we have developed core
intellectual property (IP) related to domain adaptation/randomization and adaptive
generation. The latter is a particularly interesting challenge and requires a
‘closed-loop' view of data generation and model performance. To that end, we
have developed a fully integrated model development platform that leverages
insight from real data assets and model performance to inform the generation of
new data. We find that this intimate link between data and model is essential
to ensure driving the key end-points, which are ultimately related to model
performance.
We are excited about the future of AI computer
vision and the key role Synthetic Data and closed-loop model development will
play to unblock the development of next generation models.
##
About the Author
Yashar Behzadi, CEO, Neuromation
Yashar is an experienced entrepreneur who has
built transformative businesses in the AI, medical technology, and IoT space.
He comes to Neuromation after spending the last 12 years in Silicon Valley
building and scaling data-centric technology companies. His work at Proteus
Digital Health was recognized by Wired as one of the top 10 technological
breakthroughs of 2008 and as a Technology Pioneer by the World Economic Forum.
He has been recognized in Wired, Entrepreneur, WSJ, CNET, and numerous other
leading tech journals for his contributions to the industry. With 30 patents
and patents-pending and a PhD in Bioengineering from UCSD, he is a proven
technologist.