#### Case Study – Apriori algorithm

Apriori algorithm is an Association Rule Mining (ARM) algorithm for boolean association rules. The algorithm is based on the fact that it uses prior knowledge of the frequent itemset property which states that all nonempty subsets of a frequent itemset must also be frequent. This algorithm uses two functions namely candidate generation and pruning at every iteration.

In general, the association rule is an expression of the form X⇒Y, where X, YI. Here, X is called the antecedent and Y is called the consequent. Association rule shows how many times Y has occurred if X has already occurred depending on the minimum support (s) and minimum confidence (c) values.

#### ARM Measures

Support: The support of the rule XY in the transaction database D is the support of the itemset X Y in D:

support(X⇒Y) = count(X ∪ Y) / N  –––> (1)

where ‘N’ is the total number of transactions in the database and count(X ∪ Y) is the number of transactions that contain X ∪ Y.

Confidence: The confidence of the rule XY in the transaction database D is the ratio of the number of transactions in D that contain X Y to the number of transactions that contain X in D

confidence(X⇒Y) = count(X ∪ Y) / count(X) = support(X ∪ Y) / support(X)   –––> (2)

It is basically denotes a conditional probability P(Y|X).

Lift: The lift of the rule XY is referred to as the interestingness measure, takes this into account by incorporating the prior probability of the rule consequent as follows:

lift(X⇒Y) = support(X ∪ Y) / support(X) ∗ support(Y)   –––> (3)

The measure ‘lift‘ is newly added in this context. Its significance in ARM is given below:

• lift(X⇒Y) = 1 means that there is no correlation between X and Y,
• lift(X⇒Y) > 1 means that there is a positive correlation between X and Y, and
• lift(X⇒Y) < 1 means that there is a negative correlation between X and Y.

Greater lift value indicates stronger association. We will use this measure in our experiment.

#### Dataset Description

The following dataset (transaction.csv) contains transactional records of a departmental store on a particular day. The dataset is having 30 records and contains six items such as Juice, Chips, Bread, Butter, Milk, and Banana. The snapshot of the dataset is given below using MS Excel software.

transaction.csv

#### Python Environment Setup

Before we start coding, we need to install the ‘apyori’ module first.

`pip install apyori`

It is mandatory because ‘apriori‘ is a member of the ‘apyori’ module.

#### Implementation of Apriori algorithm

We provide here the implementation of Apriori algorithm using Python coding. The objective is to discover the association rules based on support, confidence and lift respectively greater than equal to min_supportmin_confidence and min_lift. See the code below.

arm.py

`# Step 1: Import the librariesimport pandas as pdfrom apyori import apriori# Step 2: Load the datasetdf = pd.read_csv('transaction.csv', header=None)# Step 3: Display statistics of recordsprint("Display statistics: ")print("===================")print(df.describe())# Step 4: Display shape of the datasetprint("\nShape:",df.shape)# Step 5: Convert dataframe into a nested listdatabase = []for i in range(0,30):    database.append([str(df.values[i,j]) for j in range(0,6)])# Step 6: Develop the Apriori model arm_rules = apriori(database, min_support=0.5, min_confidence=0.7, min_lift=1.2)arm_results = list((arm_rules))# Step 7: Display the number of rule(s)print("\nNo. of rule(s):",len(arm_results))# Step 8: Display the rule(s)print("\nResults: ")print("========")print(arm_results)`

Output:

`Display statistics: ===================          0      1      2       3     4       5count    19     18     23      23    20      22unique    1      1      1       1     1       1top   Juice  Chips  Bread  Butter  Milk  Bananafreq 19 18 23 23 20 22Shape: (30, 6)No. of rule(s): 1Results: ========[RelationRecord(items=frozenset({'Butter', 'Bread', 'Milk'}), support=0.5, ordered_statistics=[OrderedStatistic(items_base=frozenset({'Bread', 'Milk'}), items_add=frozenset({'Butter'}), confidence=0.9375, lift=1.2228260869565217)])]`
##### Explanation

The program generates only one rule based on user-specified input measures such as: min_support = 0.5, min_confidence = 0.7, and min_lift = 1.2.

The support count value for the rule is 0.5. This number is calculated by dividing the number of transactions containing ‘Butter’, ‘Bread’, and ‘Milk’ by the total number of transactions.

The confidence level for the rule is 0.9375, which shows that out of all the transactions that contain both ‘Bread’ and ‘Milk’, 93.75 % contain ‘Butter’ too.

The lift of 1.22 tells us that ‘Butter’ is 1.22 times more likely to be bought by the customers who buy both ‘Bread’ and ‘Milk’ compared to the default likelihood sale of ‘Butter.’