Association Analysis


 The typical question behind Association Analysis or often also called Basket Analysis is:Which products are bought together? This question is important as based on the result different measures can be taken: You could place those products together, increase the price of one of the products and lower the price of the other one, advertice only one of them or create combo offers.


To find out about depending products, create rules R like

R: If product A is bought, then also product B is bought

Here parameters A is called Antecedents and B is called Concequent. To determine the importance of such a rule, three statisical key figures are defined:

$SUPPORT(R) := \frac{\text{number of baskets the support the rule}}{\text{number of overall baskets}}$

$CONFIDENCE(A, B) := \frac{\text{number of baskets that support the rule}}{\text{nof of baskets that contain B}}$

In lots of examples, both of the these key figures can be high, but the result is not a very useful rule (e.g. in case product A is bought by 95% of the customer). Therefore the lift, or also called improvement is introduced:
$$LIFT(A,B) := \frac{CONFIDENCE(A, B)}{SUPPORT(A)}$$
Meanwhile the support and the lift are symmetric respect A and B, the confidence is not.

Now the lift decides, if our rule is valid:


If the lift is < 1, the rule does not describe an association. For a lift of 1 the antecedents and concequents are independent of each other, and a lift > 1 describe to which degree the products depend on each other.


A typical example for an association algorithm is the so called APRIORI algorithm which creates rules for all possible subsets having a minimal support. The big advantage of it is that it produces clear, easily understandable results, which can be directly used, however the performance grows exponentially with the set of products, also very rare data is not included into the analysis.
Previous
Next Post »
0 Comment