Classification Performance Measures

To analyze the performance of classification models several evaluation metrics are used. These performance measures are described below.  


Root-Mean-Squared Error (RMSE)

RMSE is a long-familiar performance measure of the dissimilarity between the values classified by a classifier and the values actually found from the system being modeled. The RMSE of a classifier’s estimation with regard to the calculated variable eclassified is the square root of the Mean Squared Error (MSE):

     RMSE = [∑k (eactual,k – eclassified,k)2]  / n  ———> (1)  [for k=1, 2, … , n]

where eactual are the actual values and eclassified are the classified values for ∀k. Here, ‘n’ denotes the number of data records present in the database.


Kappa Statistic

Cohen’s Kappa statistic, represented by κ, is a well-known performance metric in statistics. It is the measure of reliability among different raters or judges. The following equation estimates the value of κ as: 

      κ = (prob(O) – prob(C)) / (1 – prob(C))   ———> (2)

Here prob(O) is the probability of witnessed settlements amongst the raters, and prob(C) is the probability of settlements estimated by coincidence.

The value of κ lies between 0 and 1. If κ = 1, the judges have approved each other’s decision. If κ = 0, then the judges do not agree with each other.

Cohen suggested the Kappa result should be interpreted as follows:

  • κ ≤ 0 as indicating no agreement
  • κ within 0.01–0.20 as none to slight
  • κ within 0.21–0.40 as fair
  • κ within 0.41– 0.60 as moderate
  • κ within 0.61–0.80 as substantial
  • κ within 0.81–1.00 as almost perfect agreement


Confusion Matrix

In the machine learning field, the confusion matrix is a specific tabular representation illustrating a classification algorithm’s performance. It is a table layout that permits more thorough analysis than accuracy. Each column of the matrix denotes the patterns in a predicted class while each row indicates the patterns in the actual class. Table 1 below displays the confusion matrix for a two-class classifier with the following data entries: 

  • true positive (tp) indicates the number of ‘positive’ patterns classified as ‘positive.’
  • false positive (fp) means the number of ‘negative’ patterns classified as ‘positive.’
  • false negative (fn) denotes the number of ‘positive’ patterns classified as ‘negative.’
  • true negative (tn) implies the number of ‘negative’ patterns classified as ‘negative.’

Table 1: A confusion matrix for a two-class classifier


Predicted Class




Actual Class








A two-class confusion matrix defines several standard terms. They are described below one-by-one.


The accuracy is the sum of the correctly classified examples divided by the total number of examples present. The following equation calculates this as:

accuracy = (tp + tn) / (tp + tn + fp + fn)   ———> (3)

tp-rate / recall:

The tp-rate or recall is the ratio of positive occurrences discovered correctly, as estimated using the equation:

tp-rate = recall = tp / (tp + fn)   ———> (4)


The fp-rate is the ratio of negative examples incorrectly classified as positive, as determined using the equation:

fp-rate = fp / (fp + tn)   ———> (5)


The precision is the ratio of the predicted positive examples found to be correct, as calculated using the following equation:

precision = tp / (tp + fp)   ———> (6)


In some situations, high precision may be more relevant while sometimes high recall may be more significant. However, in most representations, one should try to improve both values. The combined form of these values is called the f-measure, and usually expressed as the harmonic mean of both these values:

 f-measure = (2∗ precision∗ recall) / (precision + recall)   ———> (7)