Plotting data with Asymptote

As a followup to the previous post, the graphics software Asymptote will be used to generate a data plot along with the model curve. Given the extra effort that is needed to create an honest data plot using Asymptote, it seems likely that it was not written with such in mind. On the other hand, it is quite useful for creating non-data figures that would benefit from mathematical parameterization. Some examples of this latter use is forthcoming.

To compare to the last gnuplot output, the same Kirby2 data set will be used for this plot. The following script will generate a plot of the data along with the model curve. Some of the code are commented, to assist with understanding the various steps in accomplishing this.

// kirby2.asy
//-------------------------------------------------------------
// description:
//-------------------------------------------------------------
//   asymptote script to plot kirby2 data set
//   see Kirby2.txt for original data with description
//
//
//-------------------------------------------------------------
// asymptote script written by:
//-------------------------------------------------------------
//   Richard Duncan, Ph.D.
//   Georgia Perimeter College
//   2016 Feb 14
//
//
//-------------------------------------------------------------
// postprocessing notes:
//-------------------------------------------------------------
//   output is pdf
//   to convert pdf to jpg use this command:
//     convert -density 300 kirby2.pdf kirby2.jpg     // supplied by imagemagick
//
//
import graph;
settings.outformat="pdf"; 
size(360,252,IgnoreAspect);

//---------------------------------------------
// plot window characteristics:
//---------------------------------------------
real xmin = 0.0, xmax = 380;
real ymin = 0.0, ymax = 99;
fixedscaling(pic=currentpicture,
             min=(xmin,ymin),
             max=(xmax,ymax),
             p=nullpen, warn=false);

// point style for plotted data points:
marker datcircle = marker(g=scale(1.0mm)*unitcircle, p=blue);


//---------------------------------------------
// read in kirby2 data set:
//---------------------------------------------
file kdat = input("kirby2.dat");
real[][] a = kdat.dimension(0,2);
a = transpose(a);

// NB:  order in kirby2 is Y,X
real[] y=a[0], x=a[1];    // column arrays of x,y

// each datum gets plotted as an individual point:
pair[] fx;                // this will store the (x,y) data pairs
int j = 0;                // zero-based index for data pairs
while(j < y.length) {
  fx[j] = (x[j],y[j]);
  ++j;
}

// Asymptote plots an interpolating line by default
// use an invisible pen so that isn't drawn
// all points are drawn using datcircle, defined above:
draw(graph(fx), legend="data", p=invisible, marker=datcircle);


//---------------------------------------------
// model function:
//---------------------------------------------
// regression parameters fit-resolved elsewhere:
real b1 = 1.67437;
real b2 = -0.139266;
real b3 = 0.00259603;
real b4 = -0.00172429;
real b5 = 2.16644e-05;

// rational function with five regression parameters:
real y_model(real x) {
  real y = (b1 + b2*x + b3*x**2)/(1 + b4*x + b5*x**2);
  return y;
}

// plot the model using a red solid curve:
draw(graph(y_model, xmin, xmax), legend="model", p=red+1.5);


//---------------------------------------------
// supplement plot with legend and coordinate axes:
//---------------------------------------------
add(legend(xmargin=1mm, perline=1, p=invisible), 80N+20E);

xaxis("$x$", axis=Bottom, ticks=LeftTicks(Step=50), arrow=EndArrow);
yaxis("$y$", axis=Left, ticks=LeftTicks(Step=10), arrow=EndArrow);

// add a bit of whitespace around plot for aesthetics:
shipout(bbox(0.25cm));

The plot showing the data and the model curve generated with this Asymptote script is shown next:

Kirby2 data plot

Currently it seems that the only curve fitting functionality built into Asymptote is for line regressions, i.e., only a linear least squares function is available. It should be possible to produce code to perform a nonlinear regression, but that requires some lower level coding and is beyond the scope of this post.

Plotting with spreadsheets is stupid

Too often when I ask which software will be used to produce a data plot is the response ‘Excel’. Let’s get something straight: Excel or any other spreadsheet for that matter, is NOT plotting software. While it is true that such software does have some plotting capabilities, the plots produced therein are very poor quality compared to the output of real plotting software. Furthermore, anyone who would suggest that a spreadsheet program is an appropriate choice for plotting data in anything more substantial than a preliminary exploratory phase of an analysis is either misguided, lazy, or ignorant. If one just needs a quick superficial visual of a set of data, then go ahead and slap it into your spreadsheet and plot away; however, do not use such a sloppy plot as a component in any serious reporting. Spreadsheets are appropriate for some tasks: production plotting is not one of them.

The reality is that there are quite a few software programs that are capable of producing professional quality plots either data or general mathematical functions. These programs are often free and open source software (FOSS), meaning that, unlike Microsoft Excel, no financial burden is imposed for using it. GPC pays an expensive site license in order to legally retain MS Excel (and other MS Office products) on its machines. Such software licenses are probably funded, at least in part, by the ‘technology fee’ paid by students; perhaps too, being a public institution, the taxpayers supply funds for such things as well.

In this post, I will demonstrate by example the use of one particular alternative to crappy spreadsheet plotting: gnuplot. As time permits, I may further develop this to show another one or two examples using other software as well.

gnuplot

As with many such technical matters, there is definitely a learning curve to developing one’s plotting skills using gnuplot. This is, after all, not dumbed-down point-and-click software: it is most useful when invoked from a prewritten script. Nevertheless, gnuplot is very nicely documented and it is usually straightforward to resolve how a desired effect can be achieved. To provide an example of using gnuplot, I grabbed some data from the NIST site then fitted a model curve to it and plotted both the data and model curve.

In particular, I’m using the Kirby2 data set.

Displayed here is the script to plot the data:

# kirby2.plt
#
#-------------------------------------------------------------
# description:
#-------------------------------------------------------------
#   gnuplot script to plot kirby2 data set
#   see Kirby2.txt for original data with description
#
#
#-------------------------------------------------------------
# gnuplot script written by:
#-------------------------------------------------------------
#   Richard Duncan, Ph.D.
#   Georgia Perimeter College
#   2016 Feb 10
#
#
#-------------------------------------------------------------
# postprocessing notes:
#-------------------------------------------------------------
#   output is encapsulated postscript (eps)
#   to convert eps to jpg or pdf using one of these commands:
#     convert -density 300 kirby2-plot.eps kirby2-plot.jpg   # supplied by imagemagick
#     ps2pdf kirby2-plot.eps                                 # supplied by texlive
#
#
reset
set term postscript eps enhanced color solid "Times-Bold" 25
set output "kirby2-plot.eps"
set size 1.5,1.5
set noborder

# the newer ps color palet is insane...revert to classic
set colorsequence classic

# define various line/point styles and colors:
set style line 1 lt 1 lw 5 pt 6 ps 3
set style line 3 lt 3 lw 1 pt 6 ps 2
set style line 7 lt 7 lw 4 pt 7 ps 1

#------------------------------------------------------------------
# modeling the data:
#------------------------------------------------------------------
# model curve is rational function with five parameters:
y(x) = (b1 + b2*x + b3*x**2)/(1 + b4*x + b5*x**2)

# assign parameter values by fitting the model curve to the data:
fit y(x) 'kirby2.dat' using 2:1 via b1, b2, b3, b4, b5


#------------------------------------------------------------------
# main plot:
#------------------------------------------------------------------
set origin 0,0     # position of bottom left corner
set key at 100,95   # place a key at specified graph position

# range of plot axes:
xmin = 0; xmax = 380
set xrange [xmin:xmax]

ymin = 0; ymax = 99
set yrange [ymin:ymax]


# x-axis details:
set arrow 1 from xmin,0 to xmax,0 ls 7
set xtics axis nomirror 50, 50, 380
set xlabel "input"
set label 1 "{/Times-Italic x}" at xmax,-0.03*(ymax-ymin)

# y-axis details:
set arrow 2 from 0,ymin to 0,ymax ls 7
set ytics axis nomirror 10, 10, 90
set ylabel "response"
set label 2 "{/Times-Italic y}" at -10, ymax center

# plot the data with blue circles (ls 3) and the model curve with a red line (ls 1)
# NB:  the data columns are ordered as Y X  (thus 'using 2:1' in plot command)
plot 'kirby2.dat' using 2:1 title 'data' w points ls 3, \
     y(x) title "model" w l ls 1

The plot showing the data and the model curve generated with this gnuplot script is shown next:

Kirby2 data plot

The model is a rational function with five regression coefficients. Under the hood, gnuplot uses an iterative damped least squares (Levenberg-Marquardt) method to perform the curve fits. For comparison with the NIST certified estimates, the gnuplot output is shown here:

#-----------------------------------------------
# gnuplot curve fit output:
#-----------------------------------------------
After 119 iterations the fit converged.
final sum of squares of residuals : 3.90507
rel. change during last iteration : -3.74562e-06

degrees of freedom    (FIT_NDF)                        : 146
rms of residuals      (FIT_STDFIT) = sqrt(WSSR/ndf)    : 0.163545
variance of residuals (reduced chisquare) = WSSR/ndf   : 0.0267471

Final set of parameters            Asymptotic Standard Error
=======================            ==========================
b1              = 1.67437          +/- 0.08799      (5.255%)
b2              = -0.139266        +/- 0.004118     (2.957%)
b3              = 0.00259603       +/- 4.185e-05    (1.612%)
b4              = -0.00172429      +/- 5.888e-05    (3.415%)
b5              = 2.16644e-05      +/- 2.015e-07    (0.9303%)

To see many other examples of using gnuplot, check out their demostrations page.

other plotting software

There are many FOSS alternatives to dumbed-down spreadsheet plotting. The software programs I use most for my own plots is gnuplot and asymptote. The latter of these is considerably more technical in how it is scripted, however, the increased complexity that goes with that does facilitate more robust control over the plots created. Asymptote also (probably) requires one to have LaTeX installed, e.g., via a TeX Live distribution.

As time permits, I’ll generate the same graph using asymptote for comparison. I’ll be interested to hear about any FOSS plotting software that you have found useful.