**Case Study – Apriori algorithm**

Apriori algorithm is an Association Rule Mining (ARM) algorithm for boolean association rules. The algorithm is based on the fact that it uses prior knowledge of the *frequent itemset* *property *which states that all nonempty subsets of a frequent itemset must also be frequent. This algorithm uses two functions namely *candidate generation* and *pruning* at every iteration.

In general, the association rule is an expression of the form *X⇒Y,* where *X*, *Y* ⊆ *I*. Here, *X* is called the *antecedent* and *Y* is called the *consequent*. Association rule shows how many times *Y* has occurred if *X* has already occurred depending on the *minimum* *support* *(s) *and *minimum **confidence* *(c) *values.

**ARM Measures**

Support: The *support* of the rule *X** ⇒Y* in the transaction database

*D*is the support of the itemset

*X*

*in*

*∪*Y*D*:

*support(X⇒Y) = count(X ∪ Y) / N *–––> (1)

where ‘*N’* is the total number of transactions in the database and *count(X ∪ Y)* is the number of transactions that contain *X ∪ Y*.

Confidence: The *confidence *of the rule *X** ⇒Y* in the transaction database

*D*is the ratio of the number of transactions in

*D*that contain

*X*

*∪*

*Y*to the number of transactions that contain

*X*in

*D*:

*confidence(X⇒Y) = count(X ∪ Y) / count(X) = support(X ∪ Y) / support(X) *–––> (2)

It is basically denotes a conditional probability *P(Y|X)*.

Lift: The *lift *of the rule *X** ⇒Y* is referred to as the interestingness measure, takes this into account by incorporating the prior probability of the rule consequent as follows:

*lift(X⇒Y) = support(X ∪ Y) / support(X) ∗ support(Y) *–––> (3)

The measure ‘lift‘ is newly added in this context. Its significance in ARM is given below:

- lift(X⇒Y) = 1 means that there is no correlation between X and Y,
- lift(X⇒Y) > 1 means that there is a positive correlation between X and Y, and
- lift(X⇒Y) < 1 means that there is a negative correlation between X and Y.

Greater lift value indicates stronger association. We will use this measure in our experiment.

**Dataset Description**

The following dataset (transaction.csv) contains transactional records of a departmental store on a particular day. The dataset is having 30 records and contains six items such as Juice, Chips, Bread, Butter, Milk, and Banana. The snapshot of the dataset is given below using MS Excel software.

transaction.csv

Juice | Chips | Bread | Butter | Milk | Banana |

Juice | Bread | Butter | Milk | ||

Bread | Butter | Milk | |||

Chips | Banana | ||||

Juice | Chips | Bread | Butter | Milk | Banana |

Juice | Chips | Milk | |||

Juice | Chips | Bread | Butter | Banana | |

Juice | Chips | Milk | |||

Juice | Bread | Banana | |||

Juice | Bread | Butter | Milk | ||

Chips | Bread | Butter | Banana | ||

Juice | Butter | Milk | Banana | ||

Juice | Chips | Bread | Butter | Milk | |

Juice | Bread | Butter | Milk | Banana | |

Juice | Bread | Butter | Milk | Banana | |

Juice | Chips | Bread | Butter | Milk | Banana |

Chips | Bread | Butter | Milk | Banana | |

Chips | Butter | Milk | Banana | ||

Juice | Chips | Bread | Butter | Milk | Banana |

Juice | Bread | Butter | Milk | Banana | |

Juice | Chips | Bread | Milk | Banana | |

Juice | Chips | ||||

Bread | Butter | Banana | |||

Bread | Butter | Milk | Banana | ||

Juice | Chips | ||||

Bread | Butter | Banana | |||

Chips | Bread | Butter | Milk | Banana | |

Juice | Bread | Butter | Banana | ||

Chips | Bread | Butter | Milk | Banana | |

Chips | Bread | Butter | Banana |

**Python Environment Setup **

Before we start coding, we need to install the ‘apyori’ module first.

pip install apyori

It is mandatory because ‘apriori‘ is a member of the ‘apyori’ module.

**Implementation of Apriori algorithm**

We provide here the implementation of Apriori algorithm using Python coding. The objective is to discover the association rules based on support, confidence and lift respectively greater than equal to *min_support*, *min_confidence* and *min_lift*. See the code below.

arm.py

# Step 1: Import the libraries

import pandas as pd

from apyori import apriori

# Step 2: Load the dataset

df = pd.read_csv('transaction.csv', header=None)

# Step 3: Display statistics of records

print("Display statistics: ")

print("===================")

print(df.describe())

# Step 4: Display shape of the dataset

print("\nShape:",df.shape)

# Step 5: Convert dataframe into a nested list

database = []

for i in range(0,30):

database.append([str(df.values[i,j]) for j in range(0,6)])

# Step 6: Develop the Apriori model

arm_rules = apriori(database, min_support=0.5, min_confidence=0.7, min_lift=1.2)

arm_results = list((arm_rules))

# Step 7: Display the number of rule(s)

print("\nNo. of rule(s):",len(arm_results))

# Step 8: Display the rule(s)

print("\nResults: ")

print("========")

print(arm_results)

**Output:**

Display statistics:

===================

0 1 2 3 4 5

count 19 18 23 23 20 22

unique 1 1 1 1 1 1

top Juice Chips Bread Butter Milk Banana

freq 19 18 23 23 20 22

Shape: (30, 6)

No. of rule(s): 1

Results:

========

[RelationRecord(items=frozenset({'Butter', 'Bread', 'Milk'}), support=0.5,

ordered_statistics=[OrderedStatistic(items_base=frozenset({'Bread', 'Milk'}),

items_add=frozenset({'Butter'}), confidence=0.9375, lift=1.2228260869565217)])]

**Explanation**

The program generates only one rule based on user-specified input measures such as: *min_support *= 0.5, *min_confidence* = 0.7, and *min_lift *= 1.2*.*

The support count value for the rule is 0.5. This number is calculated by dividing the number of transactions containing ‘Butter’, ‘Bread’, and ‘Milk’ by the total number of transactions.

The confidence level for the rule is 0.9375, which shows that out of all the transactions that contain both ‘Bread’ and ‘Milk’, 93.75 % contain ‘Butter’ too.

The lift of 1.22 tells us that ‘Butter’ is 1.22 times more likely to be bought by the customers who buy both ‘Bread’ and ‘Milk’ compared to the default likelihood sale of ‘Butter.’