Here I discuss tools I developed to fit and plot XY data. We often want to plot (X,Y) data from a simple file containing columns and rows. We might want to compute basic statiistics for given column, or fit some curve to the (X,Y) data pairs. Often we just want to plot up Y vs. X and see what is there. We need a way to do this in a simple interactive way, but we may also want the ability to make plots in a batch approach. All of these things are covered here. I have tried to include descriptions of some of the low- and high-level routines. Every routine discussed here should have a clear usage message as well as a more extensive help page iy the "--help" flag is invoked. I include an appendix at the end of this doc that gives some fairly terse practical examples. Finally, it should always be true that you cab replicate the steps descibed here using the sample files packed into this tarblall file. Once you have grabbed theis link you caqn get the sample files by:
% cp ~sco/Downloads/samples.tar . # My downloads typically go here % tar xvf samples.tar ./samples/ ./samples/UT20160617-hetq-tz.file_1 ./samples/UT20160617-hetq-tz.file_2 ./samples/hetAZindo_dec01.dat ./samples/XY0.rst_plot ./samples/UT20160617-hetq-tz.file % ls samples hetAZindo_dec01.dat UT20160617-hetq-tz.file_1 XY0.rst_plot UT20160617-hetq-tz.file UT20160617-hetq-tz.file_2You can use these files to learn how my codes operate and also to confirm that they are still working. The plots made with the two examples are shown below.
Here is a list of the software tools discussed here. They should all have usage messages and useful online help messages.
pxy_SM_plot.py == high-level python plot tool colget.py == extract a column from a table file calstats.py == compute statistics for an extracted column xy_from_table == interactive tool to get X,Y from a table file xyplotter == script to run pxy_SM_plot.py make_fit_data == generate sample test data gen_curve.sh == general curve fitting script data_strip == strp the "# data" delimited header from a table file getNline == pull the Nth line from a file number_lines == write a new version of an ASCII file with the lines numbered list_cdfp_params == Use the header of a cdfp file to make a list of table parameters (i.e. column names) cdfp2table == read a cdfp file and create a corresponding table fileWhether a code is a bash script, fortan (otw) or python code, you should always get a fairly complete HELP message using show_help. Here is an example:
% show_help calstats.py
A tabla file (see many examples below) is a file of rows and colums, with a header that explains what is in each column. In many cases, the table file is accopmanied by a "parameter names" file which gives a short name for each column in the table (as opposed to a long comment line). The data column can be composed of numeric or string data.
A cdfp file (CoorDinate Floating Point) is a specialized file I use for coordinate related files. The first line is always a single string of heade information. The columns of data are all numeric, and the first two columns are always the RA in hours and the DEC in units of degrees (both in floating point format).
My most general (X,Y) plotting code is named pxy_SM_plot.py. A second is routine is trs_plot.py. The "trs_" package is designed for the less general problem of visuallizing coordinate transformations, but it uses some nice examples of point labeling and maintaining proper aspect ratio, etc... The links above will give you sample calls to these codes, but here are two quick examples using the sample files discussed above:
% pxy_SM_plot.py style.hetq-tz 0 360 86 91 SHOW % cat style.hetq-tz 20160617 ACAM Data HETQ (HET Parallactic Angle) tz UT20160617-hetq-tz.file UT20160617-hetq-tz.file_1 UT20160617-hetq-tz.file_2 % trs_plot.py Style.file -30 30 -30 30 SHOW % cat Style.file X,Y Gaussian Data X (0.0,5.0) Y (0.0,12.0) XY0.rst_plotI show these mainly to demonstrate how similar the calls are. The first argument is the name of a file that will specify the style of the plot (axis labels etc...) and the data files to be plotted. The next four values are axis limits, and the last argument is used to indicate whether you just want a hard copy file or you want to use the matplotlib show() module to view and adjust the plot after it is initially generated. In the first example above I am plotting three sets of X,Y points (in the file UT20160617-hetq-tz.file, etc...) and in the second case I am plotting a single set (in the file XY0.rst_plot). In addition to plotting points, we might fit a curve to some of the (X,Y) sets and plot the fitted curve. The rest of this document is about how, given a table file, we can easily set up those Style and XY data files for doing these things.
![]() |
This is the plot made with the pxy_SM_plot.py example shown above.
We have plotted data from 3 different files using three basic formats:
data points, a connected line (curve), and points with error bars. A
legend in the upper right labels each set. The command line we used was:
% pxy_SM_plot.py style.hetq-tz 0 360 86 91 SHOWThe first argument is the file that specifies the axis label names and the names of the data files containing the X,Y values. The next argeuments specify the X and Y axis limits to be used. The last argument (SHOW) indicates that we will use the matplotlib show() module to display the plot interactively. The nice feature here is that we can easily change the scale and range of each axis and we have the option of producing a hardcopy plot. |
![]() |
This is the plot made with the trs_plot.py example shown above.
We have plotted a set of (random) X,Y points, each of which is labeled
with an identification string. The command line used was:
% trs_plot.py Style.file -30 30 -30 30 SHOWthe X, and Y axis limist to be used. Here I have resized the axes and re-positioned the data placement in oder to bring into full view some of points that were initially under the legend. |
I define a table file to be an ASCII file with a header section. Some would refer to this as a flat file. I always offset the header of my files with the "# data" string. Here is an example of the first protion of a table file.
% head -15 hetAZindo_dec01.dat Col01 = STRUCTAZ, structure azimuth from header Col02 = AZfromDEC, azimuth based on declination Col03 = HETQfromAZ, parallactic angle from structure azimuth Col04 = HETQfromDEC, parallactic angle from structure azimuth based on DEC Col05 = DECDEG, declination in degree uni ts Col06 = STRUCTAZ - AZfromDEC Col07 = HETQfromAZ - HETQfromDEC Col09 = side of sky relative to meridian STRUCTAZ, AZfromDEC, HETQfromAZ, HETQfromDEC, DECDEG, AZdif, HETQdif, direction # data 180.00 177.75 180.000 178.060 -4.306543 002.25 001.94 E 20161006T001036.7_acm_sci 180.00 177.75 180.000 178.060 -4.306543 002.25 001.94 E 20161006T001021.3_acm_sci 180.00 177.75 180.000 178.060 -4.306543 002.25 001.94 E 20161006T001032.8_acm_sci 180.00 177.75 180.000 178.060 -4.306543 002.25 001.94 E 20161006T001029.0_acm_sci 180.00 177.75 180.000 178.060 -4.306543 002.25 001.94 E 20161006T001025.2_acm_sciThe entire file is about 3600 lines long,so we jsut show the top of the file above. You can see the full file in the set of sample data files discussed in the first section above. The job before is to use a tool that lets us pull columns and build the data files we need for our plot tool.
The first thing we have to do is use a tool to pull the columns we want to plot from the table file. The python tool colget.py is good for this. Suppose I want to plot column 2 of our table (named AZfromDEC) and plot it on the Y axis as a fuction of column 5 (named DECDEG) on the X axis. With colget.py I could pull two files, one for each variable, and I can name the files using the variable name:
% colget.py hetAZindo_dec01.dat 2 AZ1 N % colget.py hetAZindo_dec01.dat 5 DECDEG N % ls AZ1 DECDEG head.lines hetAZindo_dec01.dat S/ % head -5 AZ1 177.75 177.75 177.75 177.75 177.75 In this way I can manually build the file (dat.1) with my X,Y data: % head dat.1 point g o 90 AZ(from DEC) -4.306543 177.75 -4.306543 177.75 -4.306543 177.75 -4.306543 177.75 Then I manually make the style file: stayle.1 $ cat style.1 Sample Azimuth Plot Declination (degrees) Azimuth (degrees) dat.1 Finally I can make my plot! % pxy_SM_plot.py style.1 -10 89 0 360 SHOWThe plot we generate this way is shown below.
![]() |
Here is the plot we have generated with the two columns pulled from out table file. This plot of approximately 3500 poinst was generated in a couple seconds (not including the time we used building the files style.1 and dat.1). My inital Azimuth range was 0 to 360, but becasue I used the show() module I was quickly able to interactively reset the axes and make a nicer version of the plot. Why my azimuths run from 0 to 180 instead of 0 to 360 is another question all together! As a result of this little plot, I could quickly see that I had lost my Azimuth values between 180 and 360. I went to my routine "estimate_azhet_hetq" and quickly located the trouble: another misssed dollar sign in front of a variable name. But, the problem is now fixed! |
The example in the previous section explains how things are done, but it is not very practical. What we want is a way to quickly extract columns from a file and build the plot file we need for pxy_SM_plot.py.Here is an easy-to-use tool for this:
% xy_from_table hetAZindo_dec01.dat 5 1 * You are queried for the symbol style, marker symbol, color, etc... * the product is a file named XY.plot that can be fed to pxy_SM_plot.pyWith this utility I could pretty quickly assemble plot files from a large, complicated table file (or files). With this I re-did the plot in the previous section after I had fixed the azimuth problem. Also, I was able to grab the AZSTRUCT values from headers and plot those with a different symbol.
![]() |
Here is the plot from the previous section where I have now fixed
my azimuth problem. I have also added the AZSTUCT values obtained
from ACAM images (these are plotted as small red dots). The plot
files were easily generated with the script xy_from_table. The
commands I used were:
% xy_from_table hetAZindo_dec01.dat 5 1 % mv XY.plot XY.plot_1 % xy_from_table hetAZindo_dec01.dat 5 2 % mv XY.plot XY.plot_2 # I manually create the style.1 file % cat style.1 Sample Azimuth Plot Declination (degrees) Azimuth (degrees) XY.plot_1 XY.plot_2 % pxy_SM_plot.py style.1 -10 89 0 360 SHOWAgain, I note that I changed the plot limits on each axis using the show() module. The values at high azimuth were running in the point legend in the upper-right of the plot. With show() it was trivial to correct this on the fly. |
Finally, I'll mention that "xy_from_table" has a little trick built in. You can read the plot info automatically if you build a local file named "xy_from_table.input" that has the necessary parameters written to it. This is not terribly useful for a single manual run, but as we'll see in the next section, it can be very useful for building a higher-level tool. Here is an example of using this mode:
% cat xy_from_table.input point r o 50 "My Comment" [sco@mcs T1]$ xy_from_table hetAZindo_dec01.dat 5 1 [sco@mcs T1]$ head -5 XY.plot point r o 50 "My Comment" -4.306543 180.00 -4.306543 180.00 -4.306543 180.00As you see, there are no interactive queries. You just get your plot file (XY.plot) right after the call. Now we could build something that would make repeated calls to xy_from_table, build our style file, and finally run our pxy_SM_plot.py code for us. Not terribly trivial, but a heck of lot easier than the manual procedures we've been using thus far.
I have developed a fairly simple script for doing all of the tasks discussed above.
It pulls columns from a Table file and prepares the XY data files.
![]() |
I have plotted four rather large data sets from a single ASCII
file (the file hetAZindo_dec01.dat) using the "xyplotter" code.
[sco@mcs T1]$ cat List.axes Multiple Sets Plotted with xyplotter -20 130 Declination (degrees) 0 360 Azimuth , HETQ [sco@mcs T1]$ cat List.data_files hetAZindo_dec01.dat 5 1 point r . 10 AZ header hetAZindo_dec01.dat 5 2 point g . 10 AZ DEC hetAZindo_dec01.dat 5 3 pointopen b o 30 HETQ stru hetAZindo_dec01.dat 5 4 pointopen m h 10 HETQ DEC [sco@mcs T1]$ xyplotter List.data_files List.axes NOTE: xyplotter only builds the STYLE file and the XY plot files for each set. To see the plot you run pxy_SM_plot.py is the usual way: [sco@mcs T1]$ pxy_SM_plot.py STYLE -12 100 0 360 SHOWThis plot took just a few seconds to display. |
In construction. Here are my notes on fitting and plotting. A list of recognized functions is given below:
To get a list of functions fitted by the code gen_curve.sh, just type: % gen_curve.sh L % curve_fit L # You get a lot of info, but at the end you get the desired list: Recognized function names: line, poly1, poly2, poly3, poly4, poly5, ploy6, poly7, poly8, poly9, poly10, natexp, gauss3p A simple example: % make_fit_data poly5 c.5 22 1 12 0.5 # if the coefficient file (c.5) is not present, then user is queried % xyplotter List.1 Axes.1 % pxy_SM_plot.py STYLE 1 12 -808.58392 4.66443 SHOW To fit a curve: % curve_fit poly3 X.dat Y.dat 10In reality, we usually start with a simple table file that has a column for the X data and column for the Y data. A simple set of steps for extracting these columns, preparing a plot, and fitting a curve (that is also plotted) is given below. Note that the first (prep) script will query the user for things like which columns to grab and what plot axes labels are to be used.
# If our file in anemd Table1.dat: % xyplotter_prep Table1.dat 1 # To view the plot: % xyplotter List.1 Axes.1 # Fit and view the data for List.1 and Axes.1: % curve_runner 1
Sometimes I want to plot X,Y values to the same scale on each axis. Also, I want to control the physical size of the output plot. Here are some notes on how to do this.
Here I collect some hopefully useful explanations and examples.
point pointopen errorbar line
% head -15 hetAZindo_dec01.dat Col01 = STRUCTAZ, structure azimuth from header Col02 = AZfromDEC, azimuth based on declination Col03 = HETQfromAZ, parallactic angle from structure azimuth Col04 = HETQfromDEC, parallactic angle from structure azimuth based on DEC Col05 = DECDEG, declination in degree uni ts Col06 = STRUCTAZ - AZfromDEC Col07 = HETQfromAZ - HETQfromDEC Col09 = side of sky relative to meridian STRUCTAZ, AZfromDEC, HETQfromAZ, HETQfromDEC, DECDEG, AZdif, HETQdif, direction # data 180.00 177.75 180.000 178.060 -4.306543 002.25 001.94 E 20161006T001036.7_acm_sci 180.00 177.75 180.000 178.060 -4.306543 002.25 001.94 E 20161006T001021.3_acm_sci 180.00 177.75 180.000 178.060 -4.306543 002.25 001.94 E 20161006T001032.8_acm_sci 180.00 177.75 180.000 178.060 -4.306543 002.25 001.94 E 20161006T001029.0_acm_sci 180.00 177.75 180.000 178.060 -4.306543 002.25 001.94 E 20161006T001025.2_acm_sciI usually try to describe the contents of each colum in a table file in the way above, but this is nit a rewirement. The only hard requirement for most of my software tools that that there be a "# data" line that indicates where the table data begins. I should note that the format of the above tabkle is nice and neat: the columns are all aligned and easy to follow with the ey when you read it. The software does not care about this. All it wants is blank space betwwen column entries.
% cat style.hetq-tz 20160617 ACAM Data HETQ (HET Parallactic Angle) tz UT20160617-hetq-tz.file UT20160617-hetq-tz.file_1 UT20160617-hetq-tz.file_2The first three lines are the plot title, the X-axis label, and the Y-axis label. The next three lines are the names of the data point files to be plotted.
% head UT20160617-hetq-tz.file_1 line r : 30 Line Model 9.864 88.510 243.402 88.205 254.989 88.307 266.616 88.254 279.311 88.628 293.520 88.437 315.167 88.507 340.091 88.742 % head UT20160617-hetq-tz.file_2 errorbar b o 100 With Errors 50.0 5.0 88.0 1.3 180.0 10.0 88.4 0.8 120.0 7.0 87.4 1.0
Often we'll have information in the column of a table file that we wish to summarize with some simple statistics.
% calstats.py Az1 223.24486 90.64843 0.00000 351.27000 236.495000 3536 1.524634 % calstats.py -v Az1 223.24486 90.64843 0.00000 351.27000 236.495000 3536 1.524634 (mean,std,min,max,median,Npnts,m.e.) Simple stats for numbers in: Az1 Mean = 223.24486 Median = 236.49500 Standard deviation = 90.64843 Minimum = 0.00000 Maximum = 351.27000 Number of values = 3536 Mean error of then mean = 1.52463Of course, the user has to know enough to pull the proper column, and that column must be be comprised of numerical data.
I wrote a script named mpl that provides a
brief summary of point type a color properties.
I wondered about how to change the symbol types. I googled
"symbol attributes in matplotlib scatter plot" and found
lots of things, the
second of which was very useful!. For the sake of completeness in my offline
notes, I show a small part of a graphic from that webdoc below:
![]() |
Examples of marker types. |
These will work: blue . 5 Blue point of size five. red o 10 Red circle of size ten. g d 12 Green thin-diamond of size twelve These will fail (in python 2.7): blue point 5 red circle 10 g thin_diamond 12 in-diamond of size twelve
Commands which take color arguments can use several formats to specify the colors. For the basic built-in colors, you should use a single letter:
b: blue g: green r: red c: cyan m: magenta y: yellow k: black w: white Gray shades can be given as a string encoding a float in the 0-1 range, e.g.: color = '0.75' For a greater range of colors, you have two options. You can specify the color using an html hex string, as in: color = '#eeefff'
Line types were a little harder to find out about. Python has a jillion options, but nobody ever lists or expalins more than a few that work. Here are four that work with the type "line" in my codes:
These will work: - -- : - .
![]() |
Examples of line style in matplotlib that seem to work in pxy_SM_plot.py (as of Mar2017). |