{"id":770,"date":"2022-08-28T11:57:35","date_gmt":"2022-08-28T11:57:35","guid":{"rendered":"https:\/\/zettaflow.tech\/blog\/?p=770"},"modified":"2025-07-13T19:49:47","modified_gmt":"2025-07-13T19:49:47","slug":"how-to-build-a-market-basket-analysis-dashboard-with-sql","status":"publish","type":"post","link":"https:\/\/tapegraph.io\/blog\/how-to-build-a-market-basket-analysis-dashboard-with-sql\/","title":{"rendered":"How to Build a Market Basket Analysis Dashboard with SQL"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large is-style-default\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-Market-Basket-Analysis-Dashboard-1024x576.png\" alt=\"Zettaflow blog. Figure 1. Market Basket Analysis Dashboard\" class=\"wp-image-771\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-Market-Basket-Analysis-Dashboard-1024x576.png 1024w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-Market-Basket-Analysis-Dashboard-300x169.png 300w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-Market-Basket-Analysis-Dashboard-768x432.png 768w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-Market-Basket-Analysis-Dashboard-1536x864.png 1536w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-Market-Basket-Analysis-Dashboard.png 1920w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 1. Market Basket Analysis Dashboard. (<a href=\"https:\/\/tapegraph.io\/share\/dashboard\/afdbdac6ca9f\/bugNg5fn9eLDrV6ClRSO38AXaJKjzgtEP0gQaakXlPBKCSLV\" target=\"_blank\" rel=\"noreferrer noopener\">Follow the link for the actual dashboard<\/a>).<\/em><\/figcaption><\/figure>\n\n\n\n<!--more-->\n\n\n\n<p>This article is about market basket analysis, how to perform it using SQL, and how to present the result in the form of a real-time dashboard.&nbsp;<\/p>\n\n\n\n<p>For the purpose of the article, we use a retail dataset of a global superstore: <a href=\"https:\/\/www.kaggle.com\/datasets\/rohitsahoo\/sales-forecasting\" target=\"_blank\" rel=\"noopener\">Superstore Sales<\/a>. This is. We will work with data for 2014.&nbsp;<\/p>\n\n\n\n<p>If you find this article useful, please give it a boost by sharing it on your social media.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Index:<\/h2>\n\n\n\n<p><strong><a href=\"#application_of_mba\">Application of Market Basket Analysis<\/a><\/strong><\/p>\n\n\n\n<p><strong><a href=\"#mba_dashboard\">Market Basket Analysis Dashboard<\/a><\/strong><\/p>\n\n\n\n<p><a href=\"#Customer_Characteristics\">Customer Characteristics<\/a><\/p>\n\n\n\n<p><a href=\"#Order_Characteristics\">Order Characteristics<\/a><\/p>\n\n\n\n<p><a href=\"#Item_Popularity\">Item Popularity<\/a><\/p>\n\n\n\n<p><a href=\"#store_entity\">Store<\/a><\/p>\n\n\n\n<p><a href=\"#time_entity\">Time<\/a><\/p>\n\n\n\n<p><strong><a href=\"#association_analysis\">Association Analysis<\/a><\/strong><\/p>\n\n\n\n<p><a href=\"#support_measure\">Support<\/a><\/p>\n\n\n\n<p><a href=\"#confidence_measure\">Confidence<\/a><\/p>\n\n\n\n<p><a href=\"#lift_measure\">Lift<\/a><\/p>\n\n\n\n<p><a href=\"#Good_and_bad_findings\">Good and bad findings<\/a><\/p>\n\n\n\n<p><strong><a href=\"#Conclusion_MBA\">Conclusion<\/a><\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"application_of_mba\">Application of Market Basket Analysis<\/h2>\n\n\n\n<p>Market basket analysis plays a crucial role in the analysis of retail businesses. The contents of the shopping carts of the customers may provide valuable information and help design promotions, ensure better stock management, identify the drivers of the purchase, build cross-sale models, and boost sales.&nbsp;<\/p>\n\n\n\n<p>The most commonly used technique for market basket analysis is association analysis, which generates association rules.&nbsp;<\/p>\n\n\n\n<p>Although market basket analysis and association rules originate from the analysis of retail transactions they may be applied to other business domains. For instance:&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>Electronic medical record (EMR) systems may suggest to the physicians the possible need for a back MRI for a patient who has had this procedure in the past; or remind the physician when prescribing antibiotics also to prescribe probiotics.<\/em><\/li>\n\n\n\n<li><em>Which other parts of a website are explored by the visitors that came from a particular landing page?<\/em><\/li>\n\n\n\n<li><em>If a passenger entered the Mall of the Emirates metro station and exited the same station two hours later paying the minimum fare, was there system abuse?<\/em><\/li>\n<\/ul>\n\n\n\n<p>Market basket analysis should not be limited to the application of association analysis.&nbsp;<\/p>\n\n\n\n<p>It should also not be limited to the <strong>items<\/strong> in the shopping cart.&nbsp;<\/p>\n\n\n\n<p>Since e-commerce with its user registration, as well as loyalty cards implemented by retail stores, allow customer identification and tracking of his or her purchases over time, the <strong>customer <\/strong>characteristics must be definitely considered.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>Why does a particular customer always buy oil filters but never buy engine oil?<\/em><\/li>\n<\/ul>\n\n\n\n<p>And you should not miss the <strong>store <\/strong>entity.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>Why is this beautiful reproduction of the Sagrada Familia a bestseller in Barcelona&#8217;s Eixample, but not a good seller in El Poblenou?<\/em><\/li>\n<\/ul>\n\n\n\n<p>However, the most important entity to watch is <strong>the order <\/strong>or <strong>the transaction<\/strong>. It includes the following characteristics in our dataset: order_id, customer_id, product_id, order_date, ship_mode, shipping_cost, order_priority, ship_date, sales, discount, and profit.&nbsp;<\/p>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"998\" height=\"682\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog-Four-entities-of-Market-Basket-Analysis.png\" alt=\"Zettaflow blog. Figure 2. Four entities of Market Basket Analysis\" class=\"wp-image-775\" style=\"width:437px;height:298px\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog-Four-entities-of-Market-Basket-Analysis.png 998w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog-Four-entities-of-Market-Basket-Analysis-300x205.png 300w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog-Four-entities-of-Market-Basket-Analysis-768x525.png 768w\" sizes=\"auto, (max-width: 998px) 100vw, 998px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 2. Four entities of Market Basket Analysis connected to each other via the order<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\" id=\"mba_dashboard\">Market Basket Analysis Dashboard<\/h2>\n\n\n\n<p>Market basket analysis dashboard may be used to make decisions about product promotions, visual merchandising, store segmentation, customer targeting, and more.&nbsp;<\/p>\n\n\n\n<p>To build the market basket analysis dashboard we use the <a href=\"https:\/\/tapegraph.io\/\" target=\"_blank\" rel=\"noreferrer noopener\">Tepegraph<\/a> platform. We connect it to the dataset and use the platform to fetch the data with SQL queries and visualize the result.<\/p>\n\n\n\n<p>We start with some basic measures that do not require association rules.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"Customer_Characteristics\">Customer Characteristics:<\/h3>\n\n\n\n<p><strong>Proportion <\/strong>of <strong>customers <\/strong>and <strong>average purchase size <\/strong>of the customers who purchase the <strong>most popular products <\/strong>(100 best-selling items)<strong>.&nbsp;<\/strong><\/p>\n\n\n\n<p>To find out the <strong>proportion <\/strong>we use the following query:&nbsp;<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH \norders AS \n(\n    SELECT \n        * \n    FROM \n        superstore_orders\n    WHERE \n        EXTRACT(YEAR FROM order_date)= 2014\n),\nbest_products AS \n(\n    SELECT\n        product_id,\n        SUM(quantity) AS quantity\n    FROM \n        orders\n    GROUP BY \n        product_id\n    ORDER BY \n        quantity DESC\n    LIMIT 100\n),\ncustomers AS \n(\n    SELECT  \n        customer_id\n    FROM \n        orders\n    GROUP BY \n        customer_id\n),\nselected_customers AS \n(\n    SELECT  \n        customer_id\n    FROM \n        orders\n    WHERE \n        product_id IN (SELECT product_id FROM best_products)\n    GROUP BY \n        customer_id\n),\nresult AS \n(\n    SELECT\n        COUNT(customer_id) AS yes,\n        ROUND((SELECT COUNT(customer_id) FROM customers),0) AS total,\n        ROUND(((SELECT COUNT(customer_id) FROM customers) - COUNT(customer_id)),0) AS no\n        FROM selected_customers\n),\npie AS \n(\n    SELECT\n        yes AS values,\n        'Yes' as labels\n    FROM result\n    UNION\n    SELECT  \n        no AS values,\n        'No' as labels\n    FROM result\n)\nSELECT * FROM pie<\/pre><\/div>\n\n\n\n<p>For the <strong>average purchase size<\/strong>:<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH \norders AS \n(\n    SELECT * \n    FROM \n        superstore_orders\n    WHERE \n        EXTRACT(YEAR FROM order_date)= 2014\n),\nbest_products AS (\n    SELECT\n        product_id,\n        SUM(quantity) AS quantity\n    FROM \n        orders\n    GROUP BY \n        product_id\n    ORDER BY \n        quantity DESC\n    LIMIT 100\n),\ncustomers AS \n(\n    SELECT  \n        customer_id,\n        SUM(sales) AS sales\n    FROM \n        orders\n    GROUP BY \n        customer_id\n),\nselected_customers AS \n(\n    SELECT  \n        customer_id,\n        SUM(sales) AS sales\n    FROM \n        orders\n    WHERE \n        product_id IN (SELECT product_id FROM best_products)\n    GROUP BY \n        customer_id\n),\nresult AS \n(\n    SELECT\n        ROUND(AVG(sales),2) AS avg_order_size \n    FROM \n        selected_customers\n)\nSELECT * FROM result<\/pre><\/div>\n\n\n\n<p><strong>Proportion <\/strong>of customers and <strong>average purchase size <\/strong>of customers who purchase the <strong>least popular products <\/strong>(100 worst selling items).<\/p>\n\n\n\n<p>To find out the <strong>proportion <\/strong>we use this query:&nbsp;<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH \norders AS \n(\n    SELECT * \n    FROM \n        superstore_orders\n    WHERE \n        EXTRACT(YEAR FROM order_date)= 2014\n),\nworst_products AS \n(\n    SELECT\n        product_id,\n        SUM(quantity) AS quantity\n    FROM \n        orders\n    GROUP BY \n        product_id\n    ORDER BY \n        quantity\n    LIMIT 100\n),\ncustomers AS \n(\n    SELECT  \n        customer_id\n    FROM \n        orders\n    GROUP BY \n    customer_id\n),\nselected_customers AS \n(\n    SELECT  \n        customer_id\n    FROM \n        orders\n    WHERE \n        product_id IN (SELECT product_id FROM worst_products)\n    GROUP BY \n        customer_id\n),\nresult AS \n(\n    SELECT\n        COUNT(customer_id) AS yes,\n        ROUND((SELECT COUNT(customer_id) FROM customers),0) AS total,\n        ROUND(((SELECT COUNT(customer_id) FROM customers) - COUNT(customer_id)),0) AS no\n        FROM selected_customers\n),\npie AS \n(\n    SELECT\n        yes AS values,\n        'Yes' as labels\n    FROM \n        result\n    UNION\n    SELECT  \n        no AS values,\n        'No' as labels\n    FROM \n        result\n)\nSELECT * FROM pie<\/pre><\/div>\n\n\n\n<p>For the <strong>average purchase size<\/strong>:<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH \norders AS \n(\n    SELECT * \n        FROM superstore_orders\n    WHERE \n        EXTRACT(YEAR FROM order_date)= 2014\n),\nworst_products AS \n(\n    SELECT\n        product_id,\n        SUM(quantity) AS quantity\n    FROM \n        orders\n    GROUP BY \n        product_id\n    ORDER BY \n        quantity\n    LIMIT 100\n),\ncustomers AS \n(\n    SELECT  \n        customer_id,\n        SUM(sales) AS sales\n    FROM \n        orders\n    GROUP BY \n        customer_id\n),\nselected_customers AS \n(\n    SELECT  \n        customer_id,\n        SUM(sales) AS sales\n    FROM \n        orders\n    WHERE \n        product_id IN (SELECT product_id FROM worst_products)\n    GROUP BY \n        customer_id\n),\nresult AS \n(\n    SELECT\n        ROUND(AVG(sales),2) AS avg_order_size \n    FROM \n        selected_customers\n)\nSELECT * FROM result<\/pre><\/div>\n\n\n\n<p>To visualize the proportions I used the Pie Chart, and for the average order size &#8211; just a Scaled-up Number. As a result, I got such a visual block:&nbsp;<\/p>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"445\" height=\"571\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog-Customers-that-buy-100-best-and-100-worst-selling-items.png\" alt=\"Zettaflow blog. Figure 3. Customers that buy 100 best and 100 worst selling items\" class=\"wp-image-786\" style=\"width:337px;height:432px\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog-Customers-that-buy-100-best-and-100-worst-selling-items.png 445w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog-Customers-that-buy-100-best-and-100-worst-selling-items-234x300.png 234w\" sizes=\"auto, (max-width: 445px) 100vw, 445px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 3. Customers that buy 100 best-selling and 100 worst-selling products.<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<ul class=\"wp-block-list\">\n<li><em>Would replacing some of those worst-selling items with better-selling ones threaten the customer relationship?<\/em><\/li>\n<\/ul>\n\n\n\n<p>The <strong>breadth<\/strong> of the customer<strong> <\/strong>relationship (the number of unique items ever purchased) <strong>by<\/strong> the <strong>depth<\/strong> of the relationship (the number of orders) for customers who purchased more than one item:&nbsp;<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH t AS \n(\n    SELECT\n        customer_id,\n        COUNT(DISTINCT product_id) AS unique_items,\n        COUNT(DISTINCT order_id) AS unique_orders\n    FROM \n        superstore_orders\n    WHERE \n        EXTRACT(YEAR FROM order_date) = 2014\n    GROUP BY \n        customer_id\n    ORDER BY \n        customer_id\n)\nSELECT\n    customer_id,\n    unique_items,\n    unique_orders\nFROM t\nWHERE \n    unique_items > 1\n<\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full is-style-default\"><img loading=\"lazy\" decoding=\"async\" width=\"968\" height=\"334\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-4.-Customer-Relationship..png\" alt=\"Zettaflow blog. Figure 4. Customer Relationship.\" class=\"wp-image-790\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-4.-Customer-Relationship..png 968w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-4.-Customer-Relationship.-300x104.png 300w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-4.-Customer-Relationship.-768x265.png 768w\" sizes=\"auto, (max-width: 968px) 100vw, 968px\" \/><figcaption class=\"wp-element-caption\">Figure 4. Customer Relationship.<\/figcaption><\/figure>\n\n\n\n<p>The result of the query is visualized using a scatter. The concentration of the data points represents the number of <strong>customers <\/strong>who have that particular breadth and depth in their relationship. The largest concentration is on the lower left, and the smallest is on the higher up to the right.&nbsp;<\/p>\n\n\n\n<p>Quite a large number of customers in our case are buying little and rear, although better customers tend to purchase a greater diversity of goods and more often.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>Who are those customers on the higher right?&nbsp;<\/em><\/li>\n\n\n\n<li><em>What products do they buy?<\/em><\/li>\n\n\n\n<li><em>How can we get more customers of this kind?<\/em><\/li>\n<\/ul>\n\n\n\n<p><strong>Orders per<\/strong> <strong>customer <\/strong>distribution 1Y:<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">SELECT \n    customer_name, \n    COUNT(order_id) AS TotalOrders\nFROM \n    superstore_orders\nWHERE \n    EXTRACT(YEAR FROM order_date)= 2014\nGROUP BY \n    customer_name<\/pre><\/div>\n\n\n\n<p>The <strong>average <\/strong>number of <strong>orders per<\/strong> <strong>customer <\/strong>per month for the last month:<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">SELECT \n    COUNT(DISTINCT order_id) * 1.0 \/ \n    NULLIF(COUNT(DISTINCT customer_id), 0) AS orders_per_customer \nFROM \n    superstore_orders\nWHERE \n    EXTRACT(YEAR FROM order_date)= 2014\n    AND EXTRACT(MONTH FROM order_date) = 12\n<\/pre><\/div>\n\n\n\n<p><strong>Change <\/strong>in the average number of <strong>orders per<\/strong> <strong>customer <\/strong>per month over the last year:&nbsp;<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH t AS \n(\n    SELECT \n        (\n            date_part('year', order_date) * 12 + \n            date_part('month', order_date) \n        ) \n        AS year_month,\n        order_id,\n        customer_id\n    FROM\n        superstore_orders\n)\nSELECT \n    year_month, COUNT(DISTINCT order_id) * 1.0 \/ \n    NULLIF(COUNT(DISTINCT customer_id), 0) AS orders_per_customer\nFROM\n    t\nGROUP BY\n    year_month\nORDER BY\n    year_month DESC\nLIMIT 12<\/pre><\/div>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"451\" height=\"421\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-5.-Orders-per-customer..png\" alt=\"Zettaflow blog. Figure 5. Orders per customer.\" class=\"wp-image-796\" style=\"width:341px;height:318px\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-5.-Orders-per-customer..png 451w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-5.-Orders-per-customer.-300x280.png 300w\" sizes=\"auto, (max-width: 451px) 100vw, 451px\" \/><figcaption class=\"wp-element-caption\">Figure 5. Orders per customer.<\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\" id=\"Order_Characteristics\">Order Characteristics:<\/h2>\n\n\n\n<p>Unique <strong>items per order <\/strong>distribution 1Y:<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">SELECT \n    order_id, \n    COUNT(DISTINCT product_id) AS uniqueitems \nFROM \n    superstore_orders\nWHERE \n    EXTRACT(YEAR FROM order_date)= 2014\nGROUP BY \n    order_id<\/pre><\/div>\n\n\n\n<p>The <strong>average <\/strong>number of unique <strong>items per<\/strong> <strong>order <\/strong>per month for the last month:<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH t AS \n(\n    SELECT\n        order_id,\n        COUNT(DISTINCT product_id) as product_count,\n        date_part('year', order_date) * 12 + \n        date_part('month', order_date) as year_month\n    FROM\n        superstore_orders\n    GROUP BY\n        order_id, year_month    \n)\nSELECT\n    year_month,\n    COUNT(order_id) as order_count,\n    SUM(product_count) AS product_count,\n    SUM(product_count) \/ COUNT(order_id) as products_per_order\nFROM\n    t    \nWHERE\n    year_month = (SELECT MAX(year_month) FROM t)\nGROUP BY\n    year_month<\/pre><\/div>\n\n\n\n<p><strong>Change <\/strong>in the average number of unique <strong>items per order<\/strong> over the last year:<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH t AS (\n    SELECT\n        order_id,\n        COUNT(DISTINCT product_id) as product_count,\n        date_part('year', order_date) * 12 + \n        date_part('month', order_date) as year_month\n    FROM\n        superstore_orders\n    GROUP BY\n        order_id, year_month    \n)\nSELECT\n    year_month,\n    COUNT(order_id) as order_count,\n    SUM(product_count) AS product_count,\n    SUM(product_count) \/ COUNT(order_id) as products_per_order\nFROM\n    t    \nGROUP BY\n    year_month    \norder by\n    year_month DESC\nLIMIT 12<\/pre><\/div>\n\n\n\n<p><strong>Average order size <\/strong>for the last month:<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">SELECT \n    AVG(sales)\nFROM \n    superstore_orders\nWHERE \n    EXTRACT(YEAR FROM order_date)= 2014\n    AND EXTRACT(MONTH FROM order_date) = 12<\/pre><\/div>\n\n\n\n<p><strong>Changes <\/strong>in average <strong>order size <\/strong>over the last year:<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH t AS \n(\n    SELECT \n        (\n            date_part('year', order_date) * 12 + \n            date_part('month', order_date) \n        ) \n        AS year_month,\n        sales\n    FROM\n        superstore_orders\n)\nSELECT \n    year_month, AVG(sales) as avg_sales\nFROM\n    t\nGROUP BY\n    year_month\nORDER BY\n    year_month DESC\nLIMIT 12<\/pre><\/div>\n\n\n\n<p><strong>Average order size by market <\/strong>1Y (you may opt for any other important characteristics):<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">SELECT \n    market,\n    AVG(sales)\nFROM \n    superstore_orders\nWHERE \n    EXTRACT(YEAR FROM order_date) = 2014 \nGROUP BY \n    market<\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full is-style-default\"><img loading=\"lazy\" decoding=\"async\" width=\"964\" height=\"424\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-6.-Order-Characteristics..png\" alt=\"Zettaflow blog. Figure 6. Order Characteristics.\" class=\"wp-image-804\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-6.-Order-Characteristics..png 964w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-6.-Order-Characteristics.-300x132.png 300w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-6.-Order-Characteristics.-768x338.png 768w\" sizes=\"auto, (max-width: 964px) 100vw, 964px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 6. Order Characteristics.<\/em><\/figcaption><\/figure>\n\n\n\n<p>The above visualizations give some insight into the business. We may see a great number of customers buy only one product per order.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>How to improve cross-selling in the sales process?&nbsp;<\/em><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"Item_Popularity\">Item Popularity:&nbsp;<\/h2>\n\n\n\n<p>It can be helpful to find out the <strong>most common item <\/strong>found in a one-item order. (As we don\u2019t have such orders in the dataset, we go for two-item orders).<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH orders AS \n(\n    SELECT * \n    FROM \n        superstore_orders\n    WHERE \n        EXTRACT(YEAR FROM order_date)= 2014\n),\nselected_orders AS \n(\n    SELECT \n        order_id, \n        SUM(quantity) AS total_quantity \n    FROM \n        orders \n    GROUP BY \n        order_id\n    HAVING \n        SUM(quantity) = 2\n),\nselected_products AS \n(\n    SELECT \n        product_name, \n        SUM(quantity) AS quantity\n    FROM \n        orders\n    WHERE order_id IN (SELECT order_id FROM selected_orders)\n    GROUP BY \n        product_name \n)\nSELECT * FROM selected_products ORDER BY quantity DESC\nLIMIT 3<\/pre><\/div>\n\n\n\n<p>What is the <strong>most common item <\/strong>found in a multi-item order?&nbsp;<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH orders AS \n(\n    SELECT * \n    FROM \n        superstore_orders\n    WHERE \n        EXTRACT(YEAR FROM order_date)= 2014\n),\nselected_orders AS \n(\n    SELECT \n        order_id, \n        SUM(quantity) AS total_quantity \n    FROM \n        orders \n    GROUP BY \n        order_id\n    HAVING \n        SUM(quantity) > 2\n),\nselected_products AS \n(\n    SELECT \n        product_name, \n        SUM(quantity) AS quantity\n    FROM \n        orders\n    WHERE \n        order_id IN (SELECT order_id FROM selected_orders)\n    GROUP BY \n        product_name \n)\nSELECT * FROM selected_products ORDER BY quantity DESC\nLIMIT 3\n<\/pre><\/div>\n\n\n\n<p>And the <strong>most common item <\/strong>found among customers who repeated purchasers:<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH orders AS \n(\n    SELECT * \n    FROM \n        superstore_orders\n    WHERE \n        EXTRACT(YEAR FROM order_date)= 2014\n),\nselected_customers AS \n(\n    SELECT \n        customer_id, \n        COUNT(order_id) AS total \n    FROM \n        orders \n    GROUP BY \n        customer_id\n    HAVING \n        COUNT(order_id) > 10\n),\nselected_products AS \n(\n    SELECT \n        product_name, \n        SUM(quantity) AS quantity\n    FROM \n        orders\n    WHERE \n        customer_id IN (SELECT customer_id FROM selected_customers)\n    GROUP BY \n        product_name \n)\nSELECT * FROM selected_products ORDER BY quantity DESC\nLIMIT 3<\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full is-style-default\"><img loading=\"lazy\" decoding=\"async\" width=\"954\" height=\"221\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-7.-Popular-Items..png\" alt=\"Zettaflow blog. Figure 7. Popular Items.\" class=\"wp-image-811\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-7.-Popular-Items..png 954w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-7.-Popular-Items.-300x69.png 300w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-7.-Popular-Items.-768x178.png 768w\" sizes=\"auto, (max-width: 954px) 100vw, 954px\" \/><figcaption class=\"wp-element-caption\">Figure 7. Popular Items.<\/figcaption><\/figure>\n\n\n\n<p>These visualizations may promote ideas for growing customer relationships.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"store_entity\">Store:<\/h2>\n\n\n\n<p>For the offline stores, it might be good to check how the popularity changes across the different stores. Different stores may have different sales models due to local management decisions, local advertising, demographics, regional trends, and even the region&#8217;s climate.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"time_entity\">Time<\/h2>\n\n\n\n<p>Time is one of the most important attributes of market basket data. It may point to the exact marketing conditions at the time of the sale (promotion, special campaign, PR event, etc.)&nbsp;<\/p>\n\n\n\n<p>Numerous questions about customer behavior can be answered by looking at sales over time.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>What is the best-selling item in this quarter? What about the previous quarter?&nbsp;<\/em><\/li>\n\n\n\n<li><em>How did a particular event affect the sales?&nbsp;<\/em><\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"association_analysis\">Association Analysis<\/h1>\n\n\n\n<p>When building association rules it is important to choose the proper item set.&nbsp;Figure 8 demonstrates partial product hierarchies in the supermarket. You may see how the <strong>level of detail <\/strong>increases from top to bottom.<\/p>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"1024\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-8.-Product-hierarchies.-768x1024.png\" alt=\"Zettaflow blog. Figure 8. Product hierarchies.\" class=\"wp-image-816\" style=\"width:419px;height:559px\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-8.-Product-hierarchies.-768x1023.png 768w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-8.-Product-hierarchies.-225x300.png 225w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-8.-Product-hierarchies..png 812w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\" \/><figcaption class=\"wp-element-caption\">Figure 8. Product hierarchies.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>For the stakeholders of a chain of ice cream parlors, it might be interesting not only the combination of the ice cream flavors, but also toppings as well as types: such as cup, stick, cone, etc., and even whether the ice cream eaten in the ice cream parlor or taken away.&nbsp;<\/p>\n\n\n\n<p>But when analyzing a large supermarket, probably the initial level of detail for the Ice Cream will be Frozen Desserts. Because as the number of items used in the analysis increases, the number of combinations to consider will grow very quickly.&nbsp;<\/p>\n\n\n\n<p>After the initial analysis, we can repeat rule generation to hone in on more specific items. And Because the analysis focuses on more specific items, we use only the subset of the transactions that contain those items.<\/p>\n\n\n\n<p>Market basket analysis works better when the items appear in approximately the same number of transactions. This prevents rules from being dominated by the most common items. That is why we may roll up rare items to higher levels in the hierarchy, so they become more frequent. Such generalized items also help find rules with sufficient support. There are more transactions supporting a given rule at higher levels of the hierarchy.<\/p>\n\n\n\n<p>But because some items are generalized does not mean that all items must be moved to the same level up.<strong><em> <\/em><\/strong>For instance, in the example shown in Figure 8, we have chosen to generalize all the types of ice creams as Frozen Desserts for the analysis done for a large supermarket, however, expensive items in the same supermarket, such as home appliances, may remain at a low level.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>What is more important: brand, size, or flavor of tea?<\/em><\/li>\n\n\n\n<li><em>Will we consider different volumes of shampoo as one product?<\/em><\/li>\n<\/ul>\n\n\n\n<p>The dataset that we are using to build our dashboard contains three levels of product hierarchies:<\/p>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"364\" height=\"442\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-9.-Superstores-product-hierarchies..png\" alt=\"Zettaflow blog. Figure 9. Superstore\u2019s product hierarchies.\" class=\"wp-image-817\" style=\"width:197px;height:239px\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-9.-Superstores-product-hierarchies..png 364w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-9.-Superstores-product-hierarchies.-247x300.png 247w\" sizes=\"auto, (max-width: 364px) 100vw, 364px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 9. Superstore\u2019s product hierarchies.<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<p>For the purposes of the dashboard that is designed to work in real-time, we choose sub_category,&nbsp;&nbsp;<\/p>\n\n\n\n<p>which hopefully, will be able to find some useful rules, or at least, promote the ideas that you can confirm with more deep analysis.&nbsp;<\/p>\n\n\n\n<p>So, we select the following transactions:<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">SELECT\n    order_id AS order,\n    sub_category AS items,\n    quantity\nFROM \n    superstore_orders\nWHERE \n    EXTRACT(YEAR FROM order_date)= 2014\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"476\" height=\"722\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-10.-Transactions-for-Association-Analysis..png\" alt=\"Zettaflow blog. Figure 10. Transactions for Association Analysis.\" class=\"wp-image-820\" style=\"width:353px;height:535px\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-10.-Transactions-for-Association-Analysis..png 476w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-10.-Transactions-for-Association-Analysis.-198x300.png 198w\" sizes=\"auto, (max-width: 476px) 100vw, 476px\" \/><figcaption class=\"wp-element-caption\">Figure 10. Transactions for Association Analysis.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>There were almost 30,000 transactions in 2014.&nbsp;<\/p>\n\n\n\n<p>We may group these transactions to see which items are purchased together.&nbsp;<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">SELECT\n    order_id AS orders,\n    string_agg(DISTINCT sub_category, ', ') AS items,\n    SUM(quantity)\nFROM \n    superstore_orders\nWHERE \n    EXTRACT(YEAR FROM order_date)= 2014\nGROUP BY \n    order_id\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"741\" height=\"724\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-11.-Online-orders..png\" alt=\"Zettaflow blog. Figure 11. Online orders.\" class=\"wp-image-823\" style=\"width:572px;height:558px\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-11.-Online-orders..png 741w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-11.-Online-orders.-300x293.png 300w\" sizes=\"auto, (max-width: 741px) 100vw, 741px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 11. Online orders.<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<p>Even in this fragment of the data, you may see that items <em>\u201cArt\u201d and \u201cBinders\u201d <\/em>are often sold together.<\/p>\n\n\n\n<p>We may count how many times each item was bought.<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH \norders AS\n(\n    SELECT\n        order_id,\n        sub_category AS item\n    FROM \n        superstore_orders\n    WHERE \n        EXTRACT(YEAR FROM order_date) = 2014\n),\ntotal_orders AS \n(\n    SELECT \n        COUNT(DISTINCT order_id) AS value\n    FROM\n        orders\n),\nitem_orders AS \n(\n    SELECT\n        item,\n        COUNT(*) AS order_count,\n        (SELECT value FROM total_orders) AS total_order_count\n    FROM \n        orders\n    GROUP BY \n        item\n)\nSELECT * FROM item_orders \nORDER BY order_count DESC \nLIMIT 100<\/pre><\/div>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"545\" height=\"725\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-12.-Item-sales..png\" alt=\"Zettaflow blog. Figure 12. Item sales.\" class=\"wp-image-827\" style=\"width:425px;height:565px\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-12.-Item-sales..png 545w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-12.-Item-sales.-226x300.png 226w\" sizes=\"auto, (max-width: 545px) 100vw, 545px\" \/><figcaption class=\"wp-element-caption\">Figure 12. Item sales.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>Let&#8217;s associate items that are bought together and then count the number of such transactions.<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH \norders AS\n(\n    SELECT\n        order_id,\n        sub_category AS item\n    FROM \n        superstore_orders\n    WHERE \n        EXTRACT(YEAR FROM order_date) = 2014\n),\ntotal_orders AS \n(\n    SELECT \n        COUNT(DISTINCT order_id) AS value\n    FROM\n        orders\n),\nassociated_orders AS \n(\n    SELECT\n        o1.item AS item1,\n        o2.item AS item2,\n        COUNT(*) AS order_count,\n        (SELECT value FROM total_orders) AS total_order_count\n    FROM \n        orders AS o1\n    INNER JOIN\n        orders AS o2 ON \n            o1.order_id = o2.order_id AND \n            o1.item &lt; o2.item\n    GROUP BY \n        item1, item2\n)\nSELECT * FROM associated_orders\nORDER BY order_count DESC\nLIMIT 150<\/pre><\/div>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"702\" height=\"728\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-13.-Association-Rules..png\" alt=\"Zettaflow blog. Figure 13. Association Rules.\" class=\"wp-image-830\" style=\"width:561px;height:581px\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-13.-Association-Rules..png 702w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-13.-Association-Rules.-289x300.png 289w\" sizes=\"auto, (max-width: 702px) 100vw, 702px\" \/><figcaption class=\"wp-element-caption\">Figure 13. Association Rules.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>As we can see in Figure 13, there are 136 combinations that might be considered. Such a combination of items is called an item set. We do not necessarily consider the quantity of the items in a particular transaction. Item sets are interesting themselves.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>Why are those two items purchased together?<\/em><\/li>\n<\/ul>\n\n\n\n<p>Since we analyze online sales where customers purchase relatively few items at one time, two-item sets might be sufficient. We will not waste our efforts searching for the rules with three, four, or more items. However, in some cases, such as with supermarkets, where the average transaction is much larger, more complex rules will be useful.<\/p>\n\n\n\n<p><em>\u201cArt\u201d and \u201cBinders\u201d<\/em> were bought together 1,688 times, and are more likely to be purchased together than any other two items. The combination of <em>\u201cArt\u201d and \u201cFasteners\u201d<\/em> appears only 680 times. Does it mean that the rule \u201cIf Art, then Binders\u201d is better than \u201cIf Art, then Fasteners\u201d? We may learn from Figure 12 that the total sales of the Fasteners are only 1588, while the total sales of the \u201cBinders\u201d are 4048.<\/p>\n\n\n\n<p>So, how good is a particular rule?<\/p>\n\n\n\n<p>To find this out we use the following methods: <strong>support<\/strong>, <strong>confidence<\/strong>, and <strong>lift<\/strong>.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"support_measure\">Support<\/h2>\n\n\n\n<p>The most common way to measure support is to calculate the proportion of transactions that contain all the items in the rule.<\/p>\n\n\n\n<p>For the rule \u201cIf Art, then Binders\u201d the Support = Number of transactions when \u201cArt\u201d and \u201cBinders\u201d are bought together \/ Total number of transactions.&nbsp;<\/p>\n\n\n\n<p class=\"has-text-align-center has-medium-font-size\">1688 \/ 7794= 0.2166 or 21.65%<\/p>\n\n\n\n<p>Let&#8217;s check Support for all the combinations:<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH \norders AS\n(\n    SELECT\n        order_id,\n        sub_category AS item\n    FROM \n        superstore_orders\n    WHERE \n        EXTRACT(YEAR FROM order_date) = 2014\n),\ntotal_orders AS \n(\n    SELECT \n        COUNT(DISTINCT order_id) AS value\n    FROM\n        orders\n),\nassociated_orders AS \n(\n    SELECT\n        o1.item AS item1,\n        o2.item AS item2,\n        COUNT(*) AS order_count,\n        (SELECT value FROM total_orders) AS total_order_count\n    FROM \n        orders AS o1\n    INNER JOIN\n        orders AS o2 ON \n            o1.order_id = o2.order_id AND \n            o1.item &lt; o2.item\n    GROUP BY \n        item1, item2\n),\nassociated_support AS \n(\n    SELECT\n        item1,\n        item2,\n        order_count,\n        total_order_count,\n        ROUND((order_count::numeric \/ total_order_count), 5) AS support\n    FROM \n        associated_orders\n)\nSELECT * FROM associated_support \nORDER BY support DESC \nLIMIT 100<\/pre><\/div>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"798\" height=\"727\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-14.-Support..png\" alt=\"Zettaflow blog. Figure 14. Support.\" class=\"wp-image-835\" style=\"width:622px;height:566px\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-14.-Support..png 798w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-14.-Support.-300x273.png 300w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-14.-Support.-768x700.png 768w\" sizes=\"auto, (max-width: 798px) 100vw, 798px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 14. Support.<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\" id=\"confidence_measure\">Confidence<\/h2>\n\n\n\n<p>Confidence helps to understand how good a rule is at predicting what is on the right-hand side.&nbsp;<\/p>\n\n\n\n<p>It is the ratio of the number of transactions with all the items in the rule to the number of transactions with just the items on the left-hand side. Alternatively, we may count the ratio of all the items in the rule support to the support of the item on the left-hand side.&nbsp;<\/p>\n\n\n\n<p>For the rule \u201cIf Art, then Binders\u201d the Confidence = Support (Number of transactions when \u201cArt\u201d and \u201cBinders\u201d are bought together) \/ Support (Number of transactions when \u201cArt\u201d is bought).&nbsp;<\/p>\n\n\n\n<p class=\"has-text-align-center has-medium-font-size\">0.21658 \/ 0.41827 = 0.51782 or 51.78%<\/p>\n\n\n\n<p>It means that, when \u201cArt\u201d appears in an order, there is a 51.78% chance that \u201cBinders\u201d also appears in it.<\/p>\n\n\n\n<p>Let&#8217;s check Confidence for all the combinations:<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH \norders AS\n(\n    SELECT\n        order_id,\n        sub_category AS item\n    FROM \n        superstore_orders\n    WHERE \n        EXTRACT(YEAR FROM order_date) = 2014\n),\ntotal_orders AS \n(\n    SELECT \n        COUNT(DISTINCT order_id) AS value\n    FROM\n        orders\n),\nitem_orders AS \n(\n    SELECT\n        item,\n        COUNT(*) AS order_count,\n        (SELECT value FROM total_orders) AS total_order_count\n    FROM \n        orders\n    GROUP BY \n        item\n),\nitem_support AS \n(\n    SELECT\n        item,\n        order_count,\n        total_order_count,\n        ROUND((order_count::numeric \/ total_order_count), 5) AS support\n    FROM \n        item_orders\n),\nassociated_orders AS \n(\n    SELECT\n        o1.item AS item1,\n        o2.item AS item2,\n        COUNT(*) AS order_count,\n        (SELECT value FROM total_orders) AS total_order_count\n    FROM \n        orders AS o1\n    INNER JOIN\n        orders AS o2 ON \n            o1.order_id = o2.order_id AND \n            o1.item &lt; o2.item\n    GROUP BY \n        item1, item2\n),\nassociated_support AS \n(\n    SELECT\n        item1,\n        item2,\n        order_count,\n        total_order_count,\n        ROUND((order_count::numeric \/ total_order_count), 5) AS support\n    FROM \n        associated_orders\n),\nassociated_confidence AS \n(\n    SELECT\n        tas.item1,\n        tas.item2,\n        tas.support AS associated_support,\n        tis1.support AS item1_support,\n        ROUND(tas.support \/ tis1.support, 5) as confidence\n    FROM\n        associated_support AS tas\n    INNER JOIN\n        item_support AS tis1\n        ON tis1.item = tas.item1\n)\nSELECT * FROM associated_confidence\nORDER BY associated_support DESC \nLIMIT 100<\/pre><\/div>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"877\" height=\"727\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-15.-Confidence..png\" alt=\"Zettaflow blog. Figure 15. Confidence.\" class=\"wp-image-840\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-15.-Confidence..png 877w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-15.-Confidence.-300x249.png 300w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-15.-Confidence.-768x637.png 768w\" sizes=\"auto, (max-width: 877px) 100vw, 877px\" \/><figcaption class=\"wp-element-caption\">Figure 15. Confidence.<\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\" id=\"lift_measure\">Lift<\/h2>\n\n\n\n<p>Lift measures the power of the rule by comparing the full rule to randomly guessing the right-hand side.&nbsp;<\/p>\n\n\n\n<p>If the item on the right-hand side already is very common, the rule is not telling us anything.&nbsp;<\/p>\n\n\n\n<p>For the rule \u201cIf Art, then Binders\u201d the Lift = Support (Number of transactions when \u201cArt\u201d and \u201cBinders\u201d are bought together) \/ (Support (Number of transactions when \u201cArt\u201d is bought) * Support (Number of transactions when \u201cBinders\u201d are bought)).&nbsp;<\/p>\n\n\n\n<p class=\"has-text-align-center has-medium-font-size\">0.21658 \/ (0.41827 * 0.51937) = 0.99698<\/p>\n\n\n\n<p>Lift greater than 1 means that the product on the left-hand side increases the chance of buying the product on the right-hand side.&nbsp;<\/p>\n\n\n\n<p>When lift is less than 1, the rule is doing worse than informed guessing. So is true for the rule <em>\u201cIf Art, then Binders\u201d.<\/em><\/p>\n\n\n\n<p>Let&#8217;s find out Lift for all the rules:<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH \norders AS\n(\n    SELECT\n        order_id,\n        sub_category AS item\n    FROM \n        superstore_orders\n    WHERE \n        EXTRACT(YEAR FROM order_date) = 2014\n),\ntotal_orders AS \n(\n    SELECT \n        COUNT(DISTINCT order_id) AS value\n    FROM\n        orders\n),\nitem_orders AS \n(\n    SELECT\n        item,\n        COUNT(*) AS order_count,\n        (SELECT value FROM total_orders) AS total_order_count\n    FROM \n        orders\n    GROUP BY \n        item\n),\nitem_support AS \n(\n    SELECT\n        item,\n        order_count,\n        total_order_count,\n        ROUND((order_count::numeric \/ total_order_count), 5) AS support\n    FROM \n        item_orders\n),\nassociated_orders AS \n(\n    SELECT\n        o1.item AS item1,\n        o2.item AS item2,\n        COUNT(*) AS order_count,\n        (SELECT value FROM total_orders) AS total_order_count\n    FROM \n        orders AS o1\n    INNER JOIN\n        orders AS o2 ON \n            o1.order_id = o2.order_id AND \n            o1.item &lt; o2.item\n    GROUP BY \n        item1, item2\n),\nassociated_support AS \n(\n    SELECT\n        item1,\n        item2,\n        order_count,\n        total_order_count,\n        ROUND((order_count::numeric \/ total_order_count), 5) AS support\n    FROM \n        associated_orders\n),\nassociated_lift AS \n(\n    SELECT\n        tas.item1,\n        tas.item2,\n        tas.support AS associated_support,\n        tis1.support AS item1_support,\n        tis2.support AS item2_support,\n        ROUND(tas.support \/ (tis1.support * tis2.support), 5) as lift\n    FROM\n        associated_support AS tas\n    INNER JOIN\n        item_support AS tis1\n        ON tis1.item = tas.item1\n    INNER JOIN\n        item_support AS tis2\n        ON tis2.item = tas.item2\n)\nSELECT * FROM associated_lift\nORDER BY associated_support DESC \nLIMIT 100<\/pre><\/div>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1017\" height=\"728\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-16.-Lift..png\" alt=\"Zettaflow blog. Figure 16. Lift.\" class=\"wp-image-846\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-16.-Lift..png 1017w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-16.-Lift.-300x215.png 300w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-16.-Lift.-768x550.png 768w\" sizes=\"auto, (max-width: 1017px) 100vw, 1017px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 16. Lift.<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<p>For the dashboard I opted to display only the rules with support greater than 5%, confidence greater than 20% and lift greater than 1.<\/p>\n\n\n\n<div style=\"height: 250px; position:relative; margin-bottom: 50px;\" class=\"wp-block-simple-code-block-ace\"><div style=\"position:absolute;top:-20px;right:0px;cursor:pointer\" class=\"copy-simple-code-block\"><span class=\"dashicon dashicons dashicons-admin-page\"><\/span><\/div><pre class=\"wp-block-simple-code-block-ace\" style=\"position:absolute;top:0;right:0;bottom:0;left:0\" data-mode=\"sql\" data-theme=\"sqlserver\" data-fontsize=\"14\" data-lines=\"Infinity\" data-showlines=\"false\" data-copy=\"true\">WITH \norders AS\n(\n    SELECT\n        order_id,\n        sub_category AS item\n    FROM \n        superstore_orders\n    WHERE \n        EXTRACT(YEAR FROM order_date) = 2014\n),\ntotal_orders AS \n(\n    SELECT \n        COUNT(DISTINCT order_id) AS value\n    FROM\n        orders\n),\nitem_orders AS \n(\n    SELECT\n        item,\n        COUNT(*) AS order_count,\n        (SELECT value FROM total_orders) AS total_order_count\n    FROM \n        orders\n    GROUP BY \n        item\n),\nitem_support AS \n(\n    SELECT\n        item,\n        order_count,\n        total_order_count,\n        ROUND((order_count::numeric \/ total_order_count), 5) AS support\n    FROM \n        item_orders\n),\nassociated_orders AS \n(\n    SELECT\n        o1.item AS item1,\n        o2.item AS item2,\n        COUNT(*) AS order_count,\n        (SELECT value FROM total_orders) AS total_order_count\n    FROM \n        orders AS o1\n    INNER JOIN\n        orders AS o2 ON \n            o1.order_id = o2.order_id AND \n            o1.item &lt; o2.item\n    GROUP BY \n        item1, item2\n),\nassociated_support AS \n(\n    SELECT\n        item1,\n        item2,\n        order_count,\n        total_order_count,\n        ROUND((order_count::numeric \/ total_order_count), 5) AS support\n    FROM \n        associated_orders\n),\nassociated_confidence AS \n(\n    SELECT\n        tas.item1,\n        tas.item2,\n        tas.support AS associated_support,\n        tis1.support AS item1_support,\n        ROUND(tas.support \/ tis1.support, 5) as confidence\n    FROM\n        associated_support AS tas\n    INNER JOIN\n        item_support AS tis1\n        ON tis1.item = tas.item1\n),\nassociated_lift AS \n(\n    SELECT\n        tas.item1,\n        tas.item2,\n        tas.support AS associated_support,\n        tis1.support AS item1_support,\n        tis2.support AS item2_support,\n        ROUND(tas.support \/ (tis1.support * tis2.support), 5) as lift\n    FROM\n        associated_support AS tas\n    INNER JOIN\n        item_support AS tis1\n        ON tis1.item = tas.item1\n    INNER JOIN\n        item_support AS tis2\n        ON tis2.item = tas.item2\n),\nsummary AS \n(\n    SELECT\n        tao.item1,\n        tao.item2,\n        tao.order_count,\n        tas.support AS support,\n        tas.support * 100 AS support_pct,\n        tac.confidence AS confidence,\n        tal.lift AS lift\n    FROM\n        associated_orders AS tao\n    INNER JOIN\n        associated_support AS tas\n        ON tas.item1 = tao.item1 AND tas.item2 = tao.item2\n    INNER JOIN\n        associated_confidence AS tac \n        ON tac.item1 = tao.item1 AND tac.item2 = tao.item2\n    INNER JOIN\n        associated_lift AS tal\n        ON tal.item1 = tao.item1 AND tal.item2 = tao.item2\n)\nSELECT \n    concat(item1, ' + ', item2) AS items, *  \nFROM \n    summary\nWHERE \n    support > 0.05 AND \n    confidence > 0.2 AND \n    lift > 1\nORDER BY \n    order_count DESC \nLIMIT 100<\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-large is-style-default\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"315\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-17.-Selected-association-rules.-1024x315.png\" alt=\"Zettaflow blog. Figure 17. Selected association rules.\" class=\"wp-image-850\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-17.-Selected-association-rules.-1024x315.png 1024w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-17.-Selected-association-rules.-300x92.png 300w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-17.-Selected-association-rules.-768x236.png 768w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-17.-Selected-association-rules.-1536x472.png 1536w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-17.-Selected-association-rules.-2048x629.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 17. Selected association rules.<\/figcaption><\/figure>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"445\" height=\"1024\" src=\"https:\/\/overfetch.com\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-18.-Association-rules-visualization.-445x1024.png\" alt=\"Zettaflow blog. Figure 18. Association rules visualization.\" class=\"wp-image-852\" style=\"width:261px;height:601px\" srcset=\"https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-18.-Association-rules-visualization.-445x1024.png 445w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-18.-Association-rules-visualization.-130x300.png 130w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-18.-Association-rules-visualization.-768x1767.png 768w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-18.-Association-rules-visualization.-668x1536.png 668w, https:\/\/tapegraph.io\/blog\/wp-content\/uploads\/2022\/08\/Zettaflow-blog.-Figure-18.-Association-rules-visualization..png 784w\" sizes=\"auto, (max-width: 445px) 100vw, 445px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 18. Association rules visualization.<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\" id=\"Good_and_bad_findings\">Good and bad findings<\/h2>\n\n\n\n<p>Association analysis generates association rules automatically. It is a type of undirected data mining that finds patterns in the data where the target is not specified beforehand. Therefore it is our task to decide whether the patterns make sense.<\/p>\n\n\n\n<p><strong>Good findings <\/strong>are actionable. They help make decisions, such as creating promotional discounts on certain product bundles to improve cross-selling.<\/p>\n\n\n\n<p>Beware of trivial and inexplicable rules.<\/p>\n\n\n\n<p><strong>Trivial rules<\/strong> reproduce common knowledge about the business.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>The customer who has bought an extended warranty is likely to buy an expensive household appliance.<\/em><\/li>\n<\/ul>\n\n\n\n<p><strong>Inexplicable rules<\/strong> may not have explanations and recommendations for action. They just don\u2019t make sense.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><em>The bookstore&#8217;s most popular product among American Express credit card shoppers is Barack Obama&#8217;s Promised Land.<\/em><\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"Conclusion_MBA\">Conclusion<\/h1>\n\n\n\n<p>In this article, we have briefly discussed market basket analysis and association rules.&nbsp;<\/p>\n\n\n\n<p>We went through the process of analyzing data at three levels: an item, an order, and a customer who, if identified, connects all the orders together.&nbsp;<\/p>\n\n\n\n<p>Also, we have described how to select the right level of detail in product hierarchies, and how many items to include in the association rule.&nbsp;<\/p>\n\n\n\n<p>We ran an association analysis with SQL and described how to evaluate the rules using such measures as support, confidence, and lift.&nbsp;<\/p>\n\n\n\n<p>And finally, we built a visual dashboard that represents our findings.&nbsp;<\/p>\n\n\n\n<p>Hope this was helpful!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This article is about market basket analysis, how to perform it using SQL, and how to present the result in the form of a real-time dashboard.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[29,14,28],"tags":[10,40,48,42,39,47,7,3],"class_list":["post-770","post","type-post","status-publish","format-standard","hentry","category-dashboard-examples","category-visualization","category-how-to-guides","tag-analysis","tag-dashboards","tag-data-analysis","tag-how-to","tag-intermediate","tag-retail","tag-sql","tag-visualization"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/tapegraph.io\/blog\/wp-json\/wp\/v2\/posts\/770","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tapegraph.io\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tapegraph.io\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tapegraph.io\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/tapegraph.io\/blog\/wp-json\/wp\/v2\/comments?post=770"}],"version-history":[{"count":84,"href":"https:\/\/tapegraph.io\/blog\/wp-json\/wp\/v2\/posts\/770\/revisions"}],"predecessor-version":[{"id":998,"href":"https:\/\/tapegraph.io\/blog\/wp-json\/wp\/v2\/posts\/770\/revisions\/998"}],"wp:attachment":[{"href":"https:\/\/tapegraph.io\/blog\/wp-json\/wp\/v2\/media?parent=770"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tapegraph.io\/blog\/wp-json\/wp\/v2\/categories?post=770"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tapegraph.io\/blog\/wp-json\/wp\/v2\/tags?post=770"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}