# Demystifying Kernel Trick: A big picture of kernelized support vector machines

Disclaimer : This is a very high level intuitive overview of this esoteric topic. This article does not deal with all the technical subtleties .

Lets start with a one dimensional binary classification problem. Here are a set of red and green points that lie along the X axis.

If we give this problem to a Linear Support Vector Machine it will have no problem classifying the two classes of red and green balls by drawing a decision boundary.

Imagine a scenario when the linear support vector machine would have a hard time classifying these two classes .

Can you figure out how to draw a line a such that balls of two classes can be separated ?

Actually in this scenario the classes are now longer linearly separable ! This is where kernel trick comes into play .

The idea is to transform the input data to from a 1-dimensional space to a 2-dimensional space.

We feed the input data points into a function f(x) =x² . The figure above shows a mapping among original data points and the points after applying the function . After applying the function it is evident that the data points are linearly separable.

The function f(x)=x² is called the ‘kernel’ .