The same idea can be applied in Fourier instead of Polynomial space: A data point with given value x (e.g. from a time series) is transformed into Fourier feature space by evaluating at point x all sine and cosine functions that constitute the Fourier basis. The result is stored in a large 'feature' vector. The value y at x is a linear combination of the evaluated sine and cosine functions. i.e. the dot product of a coefficient vector with the feature vector. The linear regression models can now find the coefficient vector that best predicts the data points y for given x.

Seeing the Fourier transform from this perspective has the advantage that a plethora of linear regression models can be used to fit the data and to find the coefficients of the Fourier Basis (the spectrum). The following image demonstrates how a simple sine wave with outliers can be accurately fit using the robust linear estimators that are implemented in scikit-learn. The short code can be found here.

Another useful application of custom non-linear features in Scikit-Learn are B-Splines. B-Splines are built from non-linear piecewise basis functions. To use them in Scikit-Learn, we need to build a Custom Feature Transformer class that transforms the single feature x to the feature vector of B-Spline basis functions evaluated at x, as in the case of the Fourier transform. This Feature Transformer can be pipelined with regression models to build the robust spline regression. The results of this are shown in the following image, and the code is located here.

In a similar manner, 2D Bsplines can be used to regress images with outliers. For the next example, we want to use a 2d spline basis to a fit a Gaussian function that is sampled at random intervals. 10% of the data points are set to a value of 10 as outliers. We then run the same estimators as above to find the spline coefficients that best fit the image. Again, the robust estimators manage to fit the original input function fairly well, whereas the Least-Squares fit is strongly perturbed by the outliers. The code for this example is available here.

## No comments:

## Post a Comment