Virtualization Technology News and Information
Basket Analysis with Computer Vision
By Eugene Asahara, Evangelist - AI & Data Mesh, Kyvos Insights 

Computer Vision is a common component of a suite of A.I. tools which can recognize objects within unstructured digital assets such as photos and videos. It can also recognize properties of the objects such as color, its placement in the photo, and even compose a brief description of the photo.

The recognized objects include more than what we'd normally think of as "things" - such as cars, fruits, or furniture. Objects include the place depicted in the photo (indoors, outdoors, or even a specific place), faces, famous logos, and any written text (background graffiti, text from signs or t-shirts, text on newspapers).

Regarding text, Computer Vision can even recognize many different languages. It can even read handwriting such as from whiteboards and notes written on paper expense receipts - albeit more prone to error. From there, a language service can extract key phrases that will be treated like any other object, and even provide a contextual description.

Now that it doesn't cost at least $20 (in 1990s dollars) to purchase and process a 24-shot roll of film, we've become photo and video crazy. That goes beyond our personal lives of vacation photos and Tik-Toks into our enterprise lives at work. The volume of photos and videos varies from business to business. But the rise in remote work, growing social media presence, and automation means that our libraries of photos and videos are growing rapidly across the board. And since photos and videos are much larger in size than other data, it constitutes much more than its fair share of the proverbial "exponential data explosion".

There are Web cams all over the place recording everything for whatever reason. Recorded remote meetings such as internal brown bags, sales demos, external tutorials, and video support calls. There are all those photos we text our co-workers, for example, screen shots depicting error messages we're getting, whiteboards from a meeting, or pictures of something we're futilely trying to explain through words.

In this article, I describe the integration of Computer Vision capabilities into current Business Intelligence (BI) and data warehouse (DW) implementations. Although I'd ideally like to be platform-agnostic, I mostly reference components of Azure Cognitive Services in this article.

Association Rules

The recognition of objects trapped in the myriad photos and videos taken by all of us offers an opportunity to expand our awareness of the co-occurrence of things across all domains of our businesses. That is, the notion of "what fires together wires together" is fundamental to intelligence whether in just the course of living our lives as human beings or on the job. The idea is that associating things that happen together provides valuable insights towards optimizing our situation.

Data scientists have been mining insights using some sort of association rule for many years. Most commonly, it takes some form of "market basket analysis" using the Apriori association algorithm. Market Basket Analysis is a data science technique that identifies items frequently purchased together literally in shopping baskets/carts.

Based on the discovered frequent itemsets (products purchased together), a retailer can optimize product placement on the shelves. For example, someone at sometime would notice that people would buy graham crackers, marshmallows, and chocolate bars together. Maybe we should place the three products next to each other.


Figure 1 - Market Basket Analysis is one of the prime data science techniques.

Additionally, manufacturers could create products of convenience based on discovered associations. Continuing with the S'mores example, maybe some manufacturer should make a S'mores product. In this case, what fires together should be wired together.

It's hard to imagine any business that doesn't revolve around some form of "basket". All businesses sell many products and/or services to many customers. Other examples of "basket analysis" include the analysis of key words using search engines, sets of shows watched by subscribers, sets of diagnosis codes for patients, or events that happen together. Interestingly, events that happen together often constitute a macro event. For example, a soccer game between vicious rivals and a spike in beer sales could be the right mix of ingredients for over-crowding at the local emergency room.

Although rather basic, analysis of frequent item sets is the basis for a simple form of inference. For example, output from "recommenders" such as "if you like Star Wars and Star Trek, you'll like Battlestar Galactica are merely statistical correlations. Hopefully, we would discover less obvious correlations that can give us a strategic edge. After all, at one time, the correlation between smoking and lung cancer or obesity and diabetes was a surprise to most.

However, collections of frequent item sets can take us a step closer to deeper and more profound inferences. For example, in our daily lives, we may have overheard a new friend mentioning he likes ceviche and ramen. In this example, let's keep in mind that ceviche and ramen do not appear in any groups we've yet recorded - even though they could both be considered soups. With a little recursive logical magic, we'd find that since ceviche is in a group of raw fish dishes and ramen is grouped in Japanese cuisine, it's probably not such a stretch to recommend having sushi for lunch.

Such deeper inferences are at the heart of robust intelligence that humans are still the best at. The act of inference is inherently linked to a world of incomplete and imperfect information. It requires us to make logical leaps and assumptions based on what we currently know about things we have yet to connect.

Beyond Market Basket Analysis

Market Basket Analysis has been feasible and in use at BI departments for many years since the item-level data from a shopping cart, whether at a physical or virtual store, is held in a structured and readily accessible relational database. Further, OCR (Optical Character Reading) technology has been around for decades and it has been employed as self-checkout for years. However, it still takes human effort to "supervise" the input of the item data, whether it's a human cashier or a human customer scanning the purchases.

But technologies such as Computer Vision (which includes OCR) were never nearly as easy to use, inexpensive, and robust as today. Today we can extract sets of items from massive and rapidly expanding volumes of unstructured data with relative ease.

As we consider the information locked away in the unstructured data, we'll quickly see how "set-based" our environment really is.


Following is an overview of how this all works. I've uploaded a related sample of python code and Jupyter notebook that can be downloaded from github. Requirements and instructions are provided there.

The sample is centered around Azure Computer Vision, a service based on a number of pre-trained A.I. models for object detection, image classification, and OCR that is useful and applicable to a broad range of people. For example, food and retail items, famous places, famous brands, etc. The development of these broadly useful pre-trained models is certainly a daunting task. It involves the processing of what must be very many petabytes of data. You'll end up with a really big Azure bill!!

I should mention that for domain-specific objects, there is Azure Custom Vision, which enables you to train models beyond the everyday stuff in Azure Computer Vision. The smaller scope of a custom application means that it's more economically feasible to develop Custom Vision models.

Figure 2 depicts a high-level pipeline of the souped-up Computer Vision market basket analysis. The left steps (1 and 2) depicts the input - a camera or video recorder taking photos and videos and storing them in some Cloud storage. The Python program drives the engagement of Azure Computer Vision components (3) that read the photos, parse out recognized objects, and store them in a DW (4). Finally, analysts and data scientists consume the information using visualization tools such as Tableau or PowerBI (5).


Figure 2 - The pipeline from photos or videos to the analysis of items using BI tools.

For the sake of simplicity, the example consists of just ten photos of collections of common fruits. In real life, there would be hundreds to even millions of photos collectively exhibiting thousands of different kinds of objects. The use case for this simplified scenario would be to monitor what seems like popular, trending compositions of fruit in the gift baskets we sell.


Figure 3 - The ten photos from which we will derive frequent itemsets (fruits can occur together).


The result of the pipeline processing the photos ten photos depicted above results in the information shown below in Figure 4 below. On the left are itemsets identified. "Support" is the percentage of baskets the itemset appeared in. On the right are a few of the sets represented graphically. Take the example of the apple, grapefruit, and mango towards the top-middle of Figure 4. The "2" means that two (out of the ten) photos exhibit this group of three fruits.


Figure 4 - A sample of the frequent itemsets discovered by the Apriori algorithm.

The most obvious observations might be that apples are the most popular single fruit while apples and grapes are the most popular pair. A more interesting observation would be that apple and mango appears together more than the proverbial apples and oranges.

Figure 5 below shows how this data might look in database dimension tables.


Figure 5 - Dimension tables filled with items recognized by Azure Computer Vision.

Figure 6 depicts a generic schema for the basket analysis. I say "generic" because it is meant to accommodate a heterogenous mix of items. Meaning, "items" refer to any sort of item Computer Vision can recognize such as fruits, cars, pastries, furniture, or baseball equipment. As opposed to objects of specific entity types with dedicated tables as you would normally find in a DW. For example, products, customers, or equipment.  


Figure 6 - The Basket Analysis Schema.

Thinking back to the s'mores example, there is more to making a s'mores experience than three food items purchased together; graham crackers, marshmallows, and chocolate bars. Making it happen involves a heterogeneous mix of things such as a permit to camp, approval to start a campfire, fire starter fluid, matches, firewood, long sticks, time off to go camping ...

To the point just mentioned, this schema is intended to supplement an existing DW as a loosely-coupled component. The two gray tables on the right of Figure 6 are examples of "gateways" into your main DW schema. They are bridge tables that link items to the typical product dimension and baskets to a typical customer dimension. There could be more than one bridge table for each. For example, linking "people items" to an employee, contact, or customer table.

Both the item_dimension and basket_dimension tables exhibit a self-referencing key indicating parent-child relationships. The items and baskets are hierarchies. For example, in the case of an item hierarchy, mangos and guavas are each "elemental" (or leaf) items. However, they are both sweet tropical fruits, which in turn are fruits, which in turn are foods. At times it might to useful to know that a photo contains a mango specifically and at other times we might want to know that it includes any tropical fruits.

An example of a basket hierarchy are the single frames within a scene of a video. Perhaps not surprisingly, videos are handled as a series of photos. Azure Video Analyzer breaks down a video into scenes (a coherent set of shots) and shots (a frame). The chronological order of the scenes and shots is maintained. Therefore, we can tell when things appear, leave, and when things appear together. So as a picture is worth a thousand words, a video can be worth thousands of photos.


The incorporation of Azure Computer Vision is the foundation for significant enhancement of current BI solutions. It enables the downloading sets of objects from photos and videos, grouping them into baskets, and storing them into a DW schema for further analysis. By leveraging the power of machine learning and artificial intelligence, this process allows for an automated approach to an expansion of basket analysis. Using the Apriori association algorithm, businesses can gain valuable insights into customer behavior and preferences, ultimately driving more informed decision-making and improving overall business performance. With the continued advancements in technology, the possibilities for leveraging Computer Vision and data analysis are endless, and we can expect to see more innovative solutions emerge in the years to come.


Published Friday, March 17, 2023 7:29 AM by David Marshall
Filed under:
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<March 2023>