By Eugene Asahara, Evangelist - AI & Data Mesh,
Kyvos Insights
Computer Vision is a common component of a suite of A.I.
tools which can recognize objects within unstructured digital assets such as
photos and videos. It can also recognize properties of the objects such as
color, its placement in the photo, and even compose a brief description of the
photo.
The recognized objects include more than what we'd normally think
of as "things" - such as cars, fruits, or furniture. Objects include the place
depicted in the photo (indoors, outdoors, or even a specific place), faces,
famous logos, and any written text (background graffiti, text from signs or
t-shirts, text on newspapers).
Regarding text, Computer Vision can even recognize many
different languages. It can even read handwriting such as from whiteboards and
notes written on paper expense receipts - albeit more prone to error. From
there, a language service can extract key phrases that will be treated like any
other object, and even provide a contextual description.
Now that it doesn't cost at least $20 (in 1990s dollars) to
purchase and process a 24-shot roll of film, we've become photo and video
crazy. That goes beyond our personal lives of vacation photos and Tik-Toks into
our enterprise lives at work. The volume of photos and videos varies from
business to business. But the rise in remote work, growing social media
presence, and automation means that our libraries of photos and videos are
growing rapidly across the board. And since photos and videos are much larger
in size than other data, it constitutes much more than its fair share of the
proverbial "exponential data explosion".
There are Web cams all over the place recording everything
for whatever reason. Recorded remote meetings such as internal brown bags,
sales demos, external tutorials, and video support calls. There are all those
photos we text our co-workers, for example, screen shots depicting error
messages we're getting, whiteboards from a meeting, or pictures of something
we're futilely trying to explain through words.
In this article, I describe the integration of Computer
Vision capabilities into current Business Intelligence (BI) and data warehouse
(DW) implementations. Although I'd ideally like to be platform-agnostic, I
mostly reference components of Azure Cognitive Services in this article.
Association Rules
The recognition of objects trapped in the myriad photos and
videos taken by all of us offers an opportunity to expand our awareness of the
co-occurrence of things across all domains of our businesses. That is, the
notion of "what fires together wires together" is fundamental to intelligence
whether in just the course of living our lives as human beings or on the job. The
idea is that associating things that happen together provides valuable
insights towards optimizing our situation.
Data scientists have been mining insights using some sort of
association rule for many years. Most commonly, it takes some form of "market
basket analysis" using the Apriori association algorithm. Market Basket
Analysis is a data science technique that identifies items frequently purchased
together literally in shopping baskets/carts.
Based on the discovered frequent itemsets (products
purchased together), a retailer can optimize product placement on the shelves.
For example, someone at sometime would notice that people would buy graham
crackers, marshmallows, and chocolate bars together. Maybe we should place the
three products next to each other.
Figure 1 - Market Basket
Analysis is one of the prime data science techniques.
Additionally, manufacturers could create products of convenience
based on discovered associations. Continuing with the S'mores example, maybe
some manufacturer should make a S'mores product. In this case, what fires
together should be wired together.
It's hard to imagine any business that doesn't revolve
around some form of "basket". All businesses sell many products and/or services
to many customers. Other examples of "basket analysis" include the analysis of
key words using search engines, sets of shows watched by subscribers, sets of diagnosis
codes for patients, or events that happen together. Interestingly, events that happen
together often constitute a macro event. For example, a soccer game between
vicious rivals and a spike in beer sales could be the right mix of ingredients
for over-crowding at the local emergency room.
Although rather basic, analysis of frequent item sets is the
basis for a simple form of inference. For example, output from "recommenders"
such as "if you like Star Wars and Star Trek, you'll like Battlestar Galactica
are merely statistical correlations. Hopefully, we would discover less obvious
correlations that can give us a strategic edge. After all, at one time, the
correlation between smoking and lung cancer or obesity and diabetes was a
surprise to most.
However, collections of frequent item sets can take
us a step closer to deeper and more profound inferences. For example, in our
daily lives, we may have overheard a new friend mentioning he likes ceviche and
ramen. In this example, let's keep in mind that ceviche and ramen do not appear
in any groups we've yet recorded - even though they could both be considered
soups. With a little recursive logical magic, we'd find that since ceviche is
in a group of raw fish dishes and ramen is grouped in Japanese cuisine, it's
probably not such a stretch to recommend having sushi for lunch.
Such deeper inferences are at the heart of robust
intelligence that humans are still the best at. The act of inference is
inherently linked to a world of incomplete and imperfect information. It requires
us to make logical leaps and assumptions based on what we currently know
about things we have yet to connect.
Beyond Market Basket Analysis
Market Basket Analysis has been feasible and in use at BI
departments for many years since the item-level data from a shopping cart,
whether at a physical or virtual store, is held in a structured and readily
accessible relational database. Further, OCR (Optical Character Reading)
technology has been around for decades and it has been employed as
self-checkout for years. However, it still takes human effort to "supervise"
the input of the item data, whether it's a human cashier or a human customer
scanning the purchases.
But technologies such as Computer Vision (which includes OCR)
were never nearly as easy to use, inexpensive, and robust as today. Today we
can extract sets of items from massive and rapidly expanding volumes of
unstructured data with relative ease.
As we consider the information locked away in the
unstructured data, we'll quickly see how "set-based" our environment really is.
Implementation
Following is an overview of how this all works. I've
uploaded a related sample of python code and Jupyter notebook that can be downloaded
from github. Requirements and instructions are provided there.
The sample is centered around Azure Computer Vision, a
service based on a number of pre-trained A.I. models for object detection,
image classification, and OCR that is useful and applicable to a broad range of
people. For example, food and retail items, famous places, famous brands, etc. The
development of these broadly useful pre-trained models is certainly a daunting
task. It involves the processing of what must be very many petabytes of data.
You'll end up with a really big Azure bill!!
I should mention that for domain-specific objects, there is
Azure Custom Vision, which enables you to train models beyond the
everyday stuff in Azure Computer Vision. The smaller scope of a custom
application means that it's more economically feasible to develop Custom Vision
models.
Figure 2 depicts a high-level pipeline of the souped-up
Computer Vision market basket analysis. The left steps (1 and 2) depicts the
input - a camera or video recorder taking photos and videos and storing them in
some Cloud storage. The Python program drives the engagement of Azure Computer
Vision components (3) that read the photos, parse out recognized objects, and
store them in a DW (4). Finally, analysts and data scientists consume the
information using visualization tools such as Tableau or PowerBI (5).
Figure 2 - The pipeline from photos or videos to the
analysis of items using BI tools.
For the sake of simplicity, the example consists of just ten
photos of collections of common fruits. In real life, there would be hundreds
to even millions of photos collectively exhibiting thousands of different kinds
of objects. The use case for this simplified scenario would be to monitor what
seems like popular, trending compositions of fruit in the gift baskets we sell.
Figure 3 - The ten photos from which we will derive
frequent itemsets (fruits can occur together).
The result of the pipeline processing the photos ten photos
depicted above results in the information shown below in Figure 4 below. On the
left are itemsets identified. "Support" is the percentage of baskets the
itemset appeared in. On the right are a few of the sets represented
graphically. Take the example of the apple, grapefruit, and mango towards the
top-middle of Figure 4. The "2" means that two (out of the ten) photos exhibit this
group of three fruits.
Figure 4 - A sample of the frequent itemsets discovered
by the Apriori algorithm.
The most obvious observations might be that apples are the
most popular single fruit while apples and grapes are the most popular pair. A
more interesting observation would be that apple and mango appears together
more than the proverbial apples and oranges.
Figure 5 below shows how this data might look in database dimension
tables.
Figure 5 - Dimension tables filled with items recognized
by Azure Computer Vision.
Figure 6 depicts a generic schema for the basket
analysis. I say "generic" because it is meant to accommodate a heterogenous mix
of items. Meaning, "items" refer to any sort of item Computer Vision can recognize
such as fruits, cars, pastries, furniture, or baseball equipment. As opposed to
objects of specific entity types with dedicated tables as you would normally
find in a DW. For example, products, customers, or equipment.
Figure 6 - The Basket Analysis Schema.
Thinking back to the s'mores example, there is more to
making a s'mores experience than three food items purchased together; graham
crackers, marshmallows, and chocolate bars. Making it happen involves a
heterogeneous mix of things such as a permit to camp, approval to start a
campfire, fire starter fluid, matches, firewood, long sticks, time off to go
camping ...
To the point just mentioned, this schema is intended to supplement
an existing DW as a loosely-coupled component. The two gray tables on the right
of Figure 6 are examples of "gateways" into your main DW schema. They are
bridge tables that link items to the typical product dimension and baskets to a
typical customer dimension. There could be more than one bridge table for each.
For example, linking "people items" to an employee, contact, or customer table.
Both the item_dimension and basket_dimension tables exhibit
a self-referencing key indicating parent-child relationships. The items and
baskets are hierarchies. For example, in the case of an item hierarchy, mangos
and guavas are each "elemental" (or leaf) items. However, they are both sweet
tropical fruits, which in turn are fruits, which in turn are foods. At times it
might to useful to know that a photo contains a mango specifically and at other
times we might want to know that it includes any tropical fruits.
An example of a basket hierarchy are the single frames within
a scene of a video. Perhaps not surprisingly, videos are handled as a series of
photos. Azure Video Analyzer breaks down a video into scenes (a coherent set of
shots) and shots (a frame). The chronological order of the scenes and shots is
maintained. Therefore, we can tell when things appear, leave, and when things
appear together. So as a picture is worth a thousand words, a video can be
worth thousands of photos.
Conclusion
The incorporation
of Azure Computer Vision is the foundation for significant enhancement of
current BI solutions. It enables the downloading sets of objects from photos
and videos, grouping them into baskets, and storing them into a DW schema for
further analysis. By leveraging the power of machine learning and artificial
intelligence, this process allows for an automated approach to an expansion of
basket analysis. Using the Apriori association algorithm, businesses can gain
valuable insights into customer behavior and preferences, ultimately driving
more informed decision-making and improving overall business performance. With
the continued advancements in technology, the possibilities for leveraging Computer
Vision and data analysis are endless, and we can expect to see more innovative
solutions emerge in the years to come.
##