Demystifying Kernel Trick: A big picture of kernelized support vector machines
Disclaimer : This is a very high level intuitive overview of this esoteric topic. This article does not deal with all the technical subtleties .
Lets start with a one dimensional binary classification problem. Here are a set of red and green points that lie along the X axis.
If we give this problem to a Linear Support Vector Machine it will have no problem classifying the two classes of red and green balls by drawing a decision boundary.
Imagine a scenario when the linear support vector machine would have a hard time classifying these two classes .
Can you figure out how to draw a line a such that balls of two classes can be separated ?
Actually in this scenario the classes are now longer linearly separable ! This is where kernel trick comes into play .
The idea is to transform the input data to from a 1-dimensional space to a 2-dimensional space.
We feed the input data points into a function f(x) =x² . The figure above shows a mapping among original data points and the points after applying the function . After applying the function it is evident that the data points are linearly separable.
The function f(x)=x² is called the ‘kernel’ .
In practice some complicated kernels are used . Gaussian kernel ,Radial Basis Function ,polynomial kernel etc are to name a few .
If we look back at the original data points then we find that the data points are separated by a parabolic decision boundary which is non-linear .
Kernelized support vector machines can go beyond linear decision boundaries and that makes it much complex as well as more applicable to real world scenarios.