We discuss shortcomings of linear models for data that is far from linearly separable. We then show how to use non-linear feature transforms to create decision boundaries corresponding to balls, polynomials etc. Finally, we discuss pitfalls when using non-linear transforms based on learning theoretic arguments.