Datapoint smoothing in Pyxplot

This is the first entry in a series of posts about Pyxplot, a free graph-plotting and vector-graphics software that is heavily inspired by Gnuplot. Pyxplot is amazing but it has one problem: It is almost unknown and there is hardly any information about it on the internet. It comes with a really good documentation so it is nevertheless very usable, but it still bothers me because I think that Pyxplot is actually better than Gnuplot in many ways: Nice integration with LaTeX, data structures (strings, lists, dictionaries, vectors, matrices), C-like control flow (ifs, loops, subroutines, ...), numerical integration & differentiation, and so much more. So let's raise some awareness for Pyxplot by blogging about it!

One thing that Pyxplot lacks is an equivalent of Gnuplot's smooth keyword, which makes it very easy to plot a smooth line that roughly follows a noisy set of datapoints. Pyxplot has an interpolate keyword that one could use to generate a differentiable function that goes exactly through each datapoint, but this is useless if one is interested in smoothing out the noise. I was at first a bit disappointed that Pyxplot didn't have this, but then I realized that it is actually not needed and that one can build something much more flexible than Gnuplot's smooth keyword using Pyxplot's basic mathematical tools!

Here is the result for a random old datafile I found on my computer:


So how did I do this? The basic idea is to use Pyxplots interpolate keyword to turn the discrete datapoints into a continuous function f(x) and then use Pyxplot's numerical integration to convolute f(x) with a suitable kernel K(x).

 f_\mathrm{smooth}(x) = B(x) \int_{x_\mathrm{min}}^{x_\mathrm{max}} \mathrm d x' K(x - x') f(x')

Here x_\mathrm{min} and x_\mathrm{max} span the plotted range on the x-axis. Dont worry about the prefactor B(x), we'll get to it later. A suitable kernel K(x) would be a function that has a peak at x = 0 and fulfills \int_{-\infty}^{\infty} \mathrm d x K(x) = 1. The obvious choice here is a Gaussian, which not only has the right shape and integral, but also has a single parameter \sigma we can use to control the strength of the smoothing.

 K(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp \left( -\frac{1}{2} \frac{x^2}{\sigma^2} \right)

The entire procedure is actually very similar to what any image editing program does if one applies a Gaussian blur to an image, except that we only have one value instead of three (red, green, blue) and only one dimension instead of two. The idea is the same though: Instead of determining the color of a pixel in the blurred image by mixing the colors of the surrounding pixels, we get the value of the smooth function f_\mathrm{smooth}(x) by averaging the values of f(x) over the surroundings of the point x.

(mouseover to blur)

But wait! What happens if the pixel is on the edge of the image and doesn't have neighboring pixels in every direction? That is exactly the reason why we need the prefactor B(x) in the above equation. At x_\mathrm{min} we integrate over only half of the area under the Gaussian peak, which would make the smoothed function too small by a factor of 2. Luckily it is easy to fix this: As we don't have a left side to include in the average we just increase the weight of the right side until we get a total weight of 1 again. More general, the correct prefactor B(x) is the inverse of the kernel's integral over the plotted interval.

 B(x) = \left[ \int_{x_\mathrm{min}}^{x_\mathrm{max}} \mathrm d x' K(x - x') \right]^{-1}

That's it! Putting these equations into Pyxplot, I ended up with the following script to produce the plot above:

set terminal png dpi 600 transparent
set size 8 ratio 0.5
set output 'smoothing.png'
set key top left

x_min = 400
x_max = 500
set xrange[x_min:x_max]

# convert the datapoints into a continuous function
interpolate linear data_linint() 'data.txt'

# smoothing kernel
K(x, sigma) = 1/(sigma*sqrt(2*pi)) * exp(-1/2*(x/sigma)**2)

# edge correction
B(x, sigma) = 1 / int_dxp(K(x-xp, sigma), x_min, x_max)

# calculate the convolution with the smoothing kernel
data_smooth(x, sigma) = B(x, sigma) * int_dxp(K(x-xp, sigma)*data_linint(xp), x_min, x_max)

plot \
    'data.txt' with pointslines lw 0.66 ps 0.5 title 'raw data', \
    data_smooth(x, 2) with lines lw 2 title 'smooth ($\sigma = 2$)', \
    data_smooth(x, 8) with lines lw 2 title 'smooth ($\sigma = 8$)'

I think this is a great way to smooth things in Pyxplot. It works on discrete datapoints as well as continuous functions, and is very flexible due to the fact that we can use whatever kernel we want. Instead of a Gaussian we might as well have used a Lorentzian or a rectangular function. One could also use an exponential kernel to obtain something like an exponential smoothing as it is usually done for incoming time series data. The possibilities are endless!

Stay tuned for more Pyxplot tricks!

You may also like...

6 Responses

  1. Sven says:

    Cool! Do you have an idea how to get this nicely typesetted fonts in pylab like the ones from your gnuplot plot? Sure pyplot is able to render latex in labels, but typically the axis are in some sans-serif font. I frequently think nothing reaches the plotting quality from gnuplot :-/

  2. robert says:

    @ Sven: Sorry, I don't know about pylab/matplotlib. I have heard that the combination numpy+scipy+matplotlib+ipython is awesome, but I've never used it myself. (It's on my todo list though ...)
    By the way: The plot above was made using Pyxplot, not Gnuplot.

  3. Sven says:

    Oh wow, I didn't realize Pyxplot is not Pyplot! I read "Py" and immediately thought of Python. So Pyxplot has a complete scripting language (in contrast to gnuplot), but it is not Python? WTF? So where are the use cases for an improved gnuplot which is capable of implementing more logic? Some guys in physics use ROOT for such tasks. Or matlab, which is actually equal to pylab (that is, pylab=numpy+scipy+matplotlib+ipython).

    Fun fact: - there is even PyX and Oh my.

    Back to the content of your blog post, I like the idea with the smoothing kernel, but I am not sure if it is suitable for a time series analysis where the data amplitude is modified so hardly. Funny enought I had the same problem some weeks ago and decided for spline interpolation which looks more like the data but is "unscientific" (not to say "unphysical") since splines can have convergence problems and suggest the presence of data which is not there. A mathematically motivated smoothing is more the way to go.

  4. robert says:

    The name of Pyxplot is indeed confusing. It doesn't have anything to do with Python. It's not even written in Python! ...
    I don't really understand your doubt about convoluting with a smoothing kernel. A simple spline interpolation won't do the trick here: It's an interpolation after all, so you might get something smooth in between the datapoints, but overall it will still be as noisy as the datapoints themselves.

  5. Wolfgang Engelmann says:

    thanks for the pyxplot smoothing example. Is the data file available somewhere?
    Do you have more examples re Pyxplot? I am particularly interested in an export
    with latex-ified text.

  6. robert says:

    I'm not sure what datafile I used when writing this post, sorry. I used Pyxplot quite excessively for all my plots. LaTeX is generally not problem with Pyxplot. All the text and labels go through LaTeX anyway,so you can just use LaTeX commands anywhere you like and it just works ...

Leave a Reply

Your email address will not be published. Required fields are marked *