In HatShop, you’ll implement the dynamic recommendations system in the visitor’s shopping cart and in the product details page. After adding the new bits to your shop, the product details page will contain the product recommendations list at the bottom of the page, as shown in Figure 10-1.
335
C H A P T E R 1 0
Figure 10-1.The product details page with the dynamic recommendations system implemented The shopping cart page gets a similar addition, as shown in Figure 10-2.
Figure 10-2.The shopping cart details page with the dynamic recommendations system implemented
Implementing the Data Tier
Before writing any code, you first need to understand the logic you’ll implement for making product recommendations. We’ll focus here on the logic of recommending products that were ordered together with another specific product. Afterward, the recommendations for the shop- ping cart page will function in a similar way but will take more products into consideration.
So, you need to find out what other products were bought by customers who also bought the product for which you’re calculating the recommendations (in other words, determine the
“customer who bought this product also bought…” information). Let’s develop the SQL logic to achieve the list of product recommendations step by step.
■ Tip Because SQL is very powerful, you can actually implement the same functionality in several ways.
Here, we’ll cover just one of the options, but when implementing the actual database functions, you’ll be shown other options as well.
To determine what other products were ordered together with a specific product, you need to join two instances of the order_detailtable on their order_idfields. Feel free to review the “Joining Data Tables” section in Chapter 4 for a quick refresher about table joins.
Joining multiple instances of a single table is just like joining different data tables, which contain the same data.
You join two instances of order_detail—called od1and od2—on their order_idfields, while filtering the product_idvalue in od1for the IDof the product you’re looking for. This way, you’ll get in the od2side of the relationship all the products that were ordered in the orders that contain the product you’re looking for.
The SQL code that retrieves all the products that were ordered together with the product identified by a product_idof 4is
SELECT od2.product_id FROM order_detail od1 JOIN order_detail od2
ON od1.order_id = od2.order_id WHERE od1.product_id = 4;
This code returns a long list of products, which includes the product with the product_id of 4, such as this one:
product_id --- 4 5 10 43 4 5 10 23 25 28 4 10 12 14 43
Starting from this list of results, you need to get the products that are most frequently bought along with this product. The first problem with this list of products is that it includes the product with the product_idof 4. To eliminate it from the list (because, of course, you can’t put it in the recommendations list), you simply add one more rule to the WHEREclause:
SELECT od2.product_id FROM order_detail od1 JOIN order_detail od2
ON od1.order_id = od2.order_id
WHERE od1.product_id = 4 AND od2.product_id != 4;
Not surprisingly, you get a list of products that is similar to the previous one, except it doesn’t contain the product with a product_idof 4any more:
product_id --- 5 10 43 5 10 23 25 28 10 12 14 43
Now the list of returned products is much shorter, but it contains multiple entries for the products that were ordered more than once in the orders that contain the product identifier 4.
To get the most relevant recommendations, you need to see which products appear more fre- quently in this list. You do this by grouping the results of the previous query by product_idand sorting in descending order by how many times each product appears in the list (this number is given by the rankcalculated column in the following code snippet):
SELECT od2.product_id, COUNT(od2.product_id) AS rank FROM order_detail od1
JOIN order_detail od2
ON od1.order_id = od2.order_id
WHERE od1.product_id = 4 AND od2.product_id != 4 GROUP BY od2.product_id
ORDER BY rank DESC;
This query now returns a list such as the following:
product_id rank --- ---- 10 3 5 2 43 2 23 1 25 1 28 1 12 1 14 1
If you don’t need the rank to be returned, you can rewrite this query by using the COUNT aggregate function directly in the ORDER BYclause. You can also use the LIMITkeyword to specify how many records you’re interested in. If you want the top five products of the list, this query does the trick:
SELECT od2.product_id FROM order_detail od1 JOIN order_detail od2
ON od1.order_id = od2.order_id
WHERE od1.product_id = 4 AND od2.product_id != 4 GROUP BY od2.product_id
ORDER BY COUNT(od2.product_id) DESC LIMIT 5;
The results of this query are
product_id --- 10 43 5 23 28
Because this list of numbers doesn’t make much sense to the human eye, you’ll also want to know the name and the description of the recommended products. The following query does exactly this by querying the producttable for the IDs returned by the previous query (the description isn’t requested because of space reasons):
SELECT product_id, name FROM product
WHERE product_id IN (
SELECT od2.product_id FROM order_detail od1
JOIN order_detail od2 ON od1.order_id = od2.order_id WHERE od1.product_id = 4 AND od2.product_id != 4 GROUP BY od2.product_id
ORDER BY COUNT(od2.product_id) DESC LIMIT 5
);
Based on the data from the previous fictional results, this query returns something like this:
product_id name
--- --- 10 Vinyl Policeman Cop Hat 43 Hussar Military Hat
5 Red Santa Cowboy Hat 23 Black Basque Beret 28 Moleskin Driver
Alternatively, you might want to calculate the product recommendations only using data from the orders that happened in the last n days. For this, you need an additional join with the orderstable, which contains the date_createdfield. The following query calculates product recommendations based on orders placed in the past 30 days:
SELECT product_id, name FROM product
WHERE product_id IN (
SELECT od2.product_id FROM order_detail od1 JOIN order_detail od2
ON od1.order_id = od2.order_id JOIN orders o
ON od1.order_id = o.order_id WHERE od1.product_id = 7
AND od2.product_id != 7 AND (NOW() - o.created_on) < 30 GROUP BY od2.product_id
ORDER BY COUNT(od2.product_id) DESC LIMIT 5
);
We won’t use this trick in HatShop, but it’s worth keeping in mind as a possibility.