Support Vector Machine (SVM)

Support Vector Machine (SVM) is a promising model for classification of both linear and nonlinear data. SVM uses nonlinear mapping to transform the linear dataset into a higher dimension. In this dimension it searches for the linear optimal separating hyperplane. A hyperplane is a decision boundary to separate two classes. Support vectors are the essential training tuples from the set of training dataset. With a sufficiently high dimension and appropriate nonlinear mapping two classes can be separated with the help of support vectors and margins defined by the support vectors. Training of SVM is slow, but is very accurate due to their ability to model nonlinear decision boundaries. This is why SVM has been selected to classify the given dataset.

Linear SVM

To explain Linear SVM it is necessary to consider a dataset D be given as [(X₁, y₁), (X₂, y₂), …… , (X_|D|, y_|D|)], where X_i is the set of training tuple with the corresponding class label y_i. Each of the class labels y_i can take either +1 or -1 corresponding to the training tuple. Then the study needs to search the separating hyperplane. There are infinite numbers of hyperplanes that can exist in between the two class labels. In SVM it is needed to search the maximum marginal hyperplane (MMH) as illustrated below in Figure 1.

Figure 1: A SVM showing maximum marginal hyperplane between two classes

Here, Figure 1 considers all the possible hyperplanes and their possible margins. A margin is the shortest distance between the hyperplane and one of the sides, where the side is parallel to the hyperplane. For accurate classification it is likely to consider the maximum possible distance in between the margins. A separating hyperplane can be given as:

W.X – b = 0 ———> (1)

where W is the weight vector and b is the bias. Thus the tuples in the dataset can be sub-grouped into two classes with margins H₁ and H₂ as:

H₁: W.X – b ≥ 1 for y_i=+1 and H₂: W.X – b ≤ 1 for y_i=–1 ———> (2)

Thus the tuples that falls above H₁ corresponds to the first class and the tuples that are below H₂ are in the second class. If there exist any tuples that fall on either H₁ or H₂ , then they are the support vectors. It is important to note that the hyperplane is positioned in the middle of the two margins. Thus the maximum margin can be given as 2 / ||W||, where ||W|| is the Euclidean norm of W, that is √(W.W).

Nonlinear SVM

There are several nonlinear versions of SVM available. All of these are kernel based. The nonlinear version of a SVM can be represented by using a kernel function K as:

K(X_i, X_j) = φ(X_i) .φ(X_j) ———> (3)

1. Polynomial kernel:

For SVM, a Polynomial kernel of degree ‘d’ is defined as –

K(X_i, X_j) = (X_i.X_j+ 1)^d———> (4)

2. Gaussian radial basis function kernel:

For SVM, a Gaussian radial basis function kernel is defined as –

K(X_i, X_j) = e^{-||(X_i– X_j)||² / 2σ²} ———> (5)

3. Sigmoid kernel:

For SVM, a Sigmoid kernel is defined as –

K(X_i, X_j) = tanh(κX_i.X_j– δ)———> (6)