Datapoint smoothing in Pyxplot
This is the first entry in a series of posts about Pyxplot, a free graph-plotting and vector-graphics software that is heavily inspired by Gnuplot. Pyxplot is amazing but it has one problem: It is almost unknown and there is hardly any information about it on the internet. It comes with a really good documentation so it is nevertheless very usable, but it still bothers me because I think that Pyxplot is actually better than Gnuplot in many ways: Nice integration with LaTeX, data structures (strings, lists, dictionaries, vectors, matrices), C-like control flow (ifs, loops, subroutines, ...), numerical integration & differentiation, and so much more. So let's raise some awareness for Pyxplot by blogging about it!
One thing that Pyxplot lacks is an equivalent of Gnuplot's
smooth keyword, which makes it very easy to plot a smooth line that roughly follows a noisy set of datapoints. Pyxplot has an
interpolate keyword that one could use to generate a differentiable function that goes exactly through each datapoint, but this is useless if one is interested in smoothing out the noise. I was at first a bit disappointed that Pyxplot didn't have this, but then I realized that it is actually not needed and that one can build something much more flexible than Gnuplot's
smooth keyword using Pyxplot's basic mathematical tools!
Here is the result for a random old datafile I found on my computer:
So how did I do this? The basic idea is to use Pyxplots
interpolate keyword to turn the discrete datapoints into a continuous function and then use Pyxplot's numerical integration to convolute with a suitable kernel .
Here and span the plotted range on the -axis. Dont worry about the prefactor , we'll get to it later. A suitable kernel would be a function that has a peak at and fulfills . The obvious choice here is a Gaussian, which not only has the right shape and integral, but also has a single parameter we can use to control the strength of the smoothing.
The entire procedure is actually very similar to what any image editing program does if one applies a Gaussian blur to an image, except that we only have one value instead of three (red, green, blue) and only one dimension instead of two. The idea is the same though: Instead of determining the color of a pixel in the blurred image by mixing the colors of the surrounding pixels, we get the value of the smooth function by averaging the values of over the surroundings of the point .
(mouseover to blur)
But wait! What happens if the pixel is on the edge of the image and doesn't have neighboring pixels in every direction? That is exactly the reason why we need the prefactor in the above equation. At we integrate over only half of the area under the Gaussian peak, which would make the smoothed function too small by a factor of . Luckily it is easy to fix this: As we don't have a left side to include in the average we just increase the weight of the right side until we get a total weight of again. More general, the correct prefactor is the inverse of the kernel's integral over the plotted interval.
That's it! Putting these equations into Pyxplot, I ended up with the following script to produce the plot above:
set terminal png dpi 600 transparent set size 8 ratio 0.5 set output 'smoothing.png' set key top left x_min = 400 x_max = 500 set xrange[x_min:x_max] # convert the datapoints into a continuous function interpolate linear data_linint() 'data.txt' # smoothing kernel K(x, sigma) = 1/(sigma*sqrt(2*pi)) * exp(-1/2*(x/sigma)**2) # edge correction B(x, sigma) = 1 / int_dxp(K(x-xp, sigma), x_min, x_max) # calculate the convolution with the smoothing kernel data_smooth(x, sigma) = B(x, sigma) * int_dxp(K(x-xp, sigma)*data_linint(xp), x_min, x_max) plot \ 'data.txt' with pointslines lw 0.66 ps 0.5 title 'raw data', \ data_smooth(x, 2) with lines lw 2 title 'smooth ($\sigma = 2$)', \ data_smooth(x, 8) with lines lw 2 title 'smooth ($\sigma = 8$)'
I think this is a great way to smooth things in Pyxplot. It works on discrete datapoints as well as continuous functions, and is very flexible due to the fact that we can use whatever kernel we want. Instead of a Gaussian we might as well have used a Lorentzian or a rectangular function. One could also use an exponential kernel to obtain something like an exponential smoothing as it is usually done for incoming time series data. The possibilities are endless!
Stay tuned for more Pyxplot tricks!