Saturday, May 26, 2018

Execute code in jupyter kernels through http requests and websockets

My last post demonstrated how to interact directly with a Jupyter kernel via the jupyter_client python module. Communication was done via the ZeroMQ messaging protocoll through a number of specific ports.

As soon as we want to communicate with the kernel via a web application, as for example the jupyter notebook app does, there is a simpler way that exposes the communication API as http endpoints and websocket connections. Any client website or webserver can then directly talk to a jupyter kernel. The jupyter application that provides this API is a tornado server called kernelgateway that can -for our purposes- be thought of as a jupyter notebook without the notebook client itself.

The kernelgateway can be installed with conda install -c conda-forge jupyter_kernel_gateway or pip install jupyter_kernel_gateway.

The default API of the kernelgateway can be found here (swagger.yaml). It is possible to add your own custom API to the preexisting one (this is for another blogpost).

Starting Kernels via Bash and Curl

The API can be explored with this bash snippet that (a) starts a kernelgateway, (b) starts a new kernel via a POST message, (c) get's a list of running kernels via a GET request:

When executed, this produces:

[...]/kernel_gateway_test$ ./ 

[KernelGatewayApp] Jupyter Kernel Gateway at

{"version": "5.4.0"}[I 180526 11:02:10 web:2064] 200 GET /api ( 1.40ms

==== START KERNEL ====
[KernelGatewayApp] Kernel started: 4d427307-10cb-44e8-a2ae-54dbd47fe05d
{"id": "4d427307-10cb-44e8-a2ae-54dbd47fe05d", "name": "python3", "last_activity": "2018-05-26T09:02:10.639301Z", "execution_state": "starting", "connections": 0}[I 180526 11:02:10 web:2064] 201 POST /api/kernels ( 456.17ms

[{"id": "4d427307-10cb-44e8-a2ae-54dbd47fe05d", "name": "python3", "last_activity": "2018-05-26T09:02:10.639301Z", "execution_state": "starting", "connections": 0}][I 180526 11:02:10 web:2064] 200 GET /api/kernels ( 0.59ms

==== ALL DONE ====
[KernelGatewayApp] Received signal to terminate.
[KernelGatewayApp] Kernel shutdown: 4d427307-10cb-44e8-a2ae-54dbd47fe05d

Starting Kernels and Executing Code via Websockets and Python aiohttp

The same communication API can be used from Python with the aiohttp module and asynchronous http requests. Additionally we can connect the aiohttp client to the gateway via a websocket (ws). The websocket connection opens a channel directly to the kernel through which we can send messages such as execute requests, and receive messages from the kernel such as execute results. This is demonstrated in this script (output below):

When executing the kernel gateway in terminal1 and the above script in terminal2, this produces:

terminal1$  /gateway_aiohttp/jupyter kernelgateway --JupyterWebsocketPersonality.list_kernels=True

[KernelGatewayApp] Jupyter Kernel Gateway at
[I 180526 .... web:2064] 200 GET /api/kernels ( 0.78ms
[KernelGatewayApp] Kernel started: 16582749-b11e-4868-930f-7bb76ae10c96
[I 180526 .... web:2064] 201 POST /api/kernels ( 427.41ms
[KernelGatewayApp] WARNING | No session ID specified
[KernelGatewayApp] Adapting to protocol v5.1 for kernel 16582749-b11e-4868-930f-7bb76ae10c96
[I 180526 .... web:2064] 101 GET /api/kernels/16582749-b11e-4868-930f-7bb76ae10c96/channels ( 621.36ms
[KernelGatewayApp] Starting buffering for 16582749-b11e-4868-930f-7bb76ae10c96:1c08d7b3-97f4656534aede0cd4ddb32a
^C[KernelGatewayApp] Interrupted...
[KernelGatewayApp] Kernel shutdown: 16582749-b11e-4868-930f-7bb76ae10c96

terminal2$ /gateway_aiohttp/python

==== start or get kernel ====
get kernel list from:  http://localhost:8888/api/kernels
no kernel exists
starting new kernel 16582749-b11e-4868-930f-7bb76ae10c96

==== start communication ====

---- sending ----

{'header': {'username': '', 'version': '5.0', 'session': '', 'msg_id': '99bd09c585284c3f8f5e7ce1310d698b', 'msg_type': 'execute_request'}, 'parent_header': {}, 'channel': 'shell', 'content': {'code': '2+3', 'silent': False, 'store_history': False, 'user_expressions': {}, 'allow_stdin': False}, 'metadata': {}, 'buffers': {}}

---- receiving ----

< msg type: status >
{'execution_state': 'busy'}
< msg type: execute_input >
{'code': '2+3', 'execution_count': 1}
< msg type: execute_result >
{'data': {'text/plain': '5'}, 'metadata': {}, 'execution_count': 1}
< msg type: execute_reply >
{'status': 'ok', 'execution_count': 0, 'user_expressions': {}, 'payload': []}
< msg type: status >
{'execution_state': 'idle'}

We are sending a single execute message to the kernel and it responds with a cascade of messages that can be summarized as:
  1. status -> busy
  2. execute_input -> code: '2+3'
  3. execute_result -> data: '5'
  4. execute_reply -> status: ok
  5. status -> idle
this corresponds to the standard jupyter messaging protocol (see details).

Friday, May 18, 2018

Using Jupyter Kernels from Python

This notebook demonstrates how a jupyter kernel can be started and controlled from Python (i.e. another jupyter kernel in this case, with an attached notebook).

Saturday, November 25, 2017

Minimal Continuous Wavelet Transform (Python Function)

The continuous wavelet transform (CWT) is one of the most handy tools to examine time-frequency content. Many libraries exist that implement the CWT using different wavelets and methods, but often, I encounter the situation having to include the CWT in my code without a library dependency. I wrote a minimal version of the CWT that can be copy pasted into any python code and that is flexible and well normalized to be used in most standard settings. In turned out that it can be compressed to less than 11 lines of active code:

Wednesday, October 11, 2017

Matplotlib Coordinate System Transformations

This image gives a short overview of coordinate transformations and associated commands in matplotlib

Friday, July 21, 2017

ND B-spline Basis Functions with Scipy

The following script shows how to extract ND-Bspline basis functions from scipy.

1D Example:

2D Example:

Monday, May 22, 2017

Robust B-spline Regression and Fourier Transform with Scikit-Learn

Real world data is often cluttered with outliers. Such contaminated data points can strongly distort data, and derived values such as the Fourier spectrum. Outliers can be handled by different algorithms. The python library Scikit-Learn, for example, contains several robust models that identify outliers and reduce their influence on linear data fits. Although linear models are often insufficient to capture real world data on their own, they can be combined with non-linear models, as shown in this example where the data is non-linearly transformed into polynomial space and then linearly and robustly fit.

The same idea can be applied in Fourier instead of Polynomial space: A data point with given value x (e.g. from a time series) is transformed into Fourier feature space by evaluating at point x all sine and cosine functions that constitute the Fourier basis. The result is stored in a large 'feature' vector. The value y at x is a linear combination of the evaluated sine and cosine functions. i.e. the dot product of a coefficient vector with the feature vector. The linear regression models can now find the coefficient vector that best predicts the data points y for given x.

Seeing the Fourier transform from this perspective has the advantage that a plethora of linear regression models can be used to fit the data and to find the coefficients of the Fourier Basis (the spectrum). The following image demonstrates how a simple sine wave with outliers can be accurately fit using the robust linear estimators that are implemented in scikit-learn. The short code can be found here.

Another useful application of custom non-linear features in Scikit-Learn are B-Splines. B-Splines are built from non-linear piecewise basis functions. To use them in Scikit-Learn, we need to build a Custom Feature Transformer class that transforms the single feature x to the feature vector of B-Spline basis functions evaluated at x, as in the case of the Fourier transform. This Feature Transformer can be pipelined with regression models to build the robust spline regression. The results of this are shown in the following image, and the code is located here.

In a similar manner, 2D Bsplines can be used to regress images with outliers. For the next example, we want to use a 2d spline basis to a fit a Gaussian function that is sampled at random intervals. 10% of the data points are set to a value of 10 as outliers. We then run the same estimators as above to find the spline coefficients that best fit the image. Again, the robust estimators manage to fit the original input function fairly well, whereas the Least-Squares fit is strongly perturbed by the outliers. The code for this example is available here.

Tuesday, February 21, 2017

Animation of Secondary Microseismic Noise Sources

The interaction of ocean waves that run into each other is the major source of seismic noise in the frequency band between 0.1 and 0.3 Hz all over the Earth. This animation shows the interaction pressure fluctuations and their dominant frequency, displayed as color lightness and hue, respectively, from 2003 - 2015. The data shown in this video is based on the wave model WAVEWATCH III. Seismic recordings permit to monitor such microseisms directly.

For further information about secondary microseisms you can check out this talk.