Demystifying Kernel Trick: A big picture of kernelized support vector machines

Disclaimer : This is a very high level intuitive overview of this esoteric topic. This article does not deal with all the technical subtleties .

Lets start with a one dimensional binary classification problem. Here are a set of red and green points that lie along the X axis.

If we give this problem to a Linear Support Vector Machine it will have no problem classifying the two classes of red and green balls by drawing a decision boundary.

Imagine a scenario when the linear support vector machine would have a hard time classifying these two classes .

Can you figure out how to draw a line a such that balls of two classes can be separated ?

Actually in this scenario the classes are now longer linearly separable ! This is where kernel trick comes into play .

The idea is to transform the input data to from a 1-dimensional space to a 2-dimensional space.

We feed the input data points into a function f(x) =x² . The figure above shows a mapping among original data points and the points after applying the function . After applying the function it is evident that the data points are linearly separable.

The function f(x)=x² is called the ‘kernel’ .

In practice some complicated kernels are used . Gaussian kernel ,Radial Basis Function ,polynomial kernel etc are to name a few .

If we look back at the original data points then we find that the data points are separated by a parabolic decision boundary which is non-linear .

Kernelized support vector machines can go beyond linear decision boundaries and that makes it much complex as well as more applicable to real world scenarios.

Undergraduate Computer Science student, interested in Scientific Computing and Interpretable Machine Learning.