Pull a column of data from an ASCII file. In the online help for this python code, I tell the user the input file must be in a "table-style" format. All this really means is that I have a "# data" marker indicating where the header stops and the columns of data begin. I allow the user to write a "# data" header to the output file if desired (since a lot of my own codes use this convention). Also, I assume the columns are separated by "white space". I show sample files in the examples below. This routine was initial developed in:
/home/sco/sco/codes/python/tables/colget.pyWhy not use a bash script or a fortran code for this seemingly simple job? I wanted a general tool. The readline() and split() tools in python make reading column files very easy. As I show below, I can use colget.py to grab numeric and string data values. I can then use other processing scripts on the single-column data files as the need arises.
% colget.py --help usage: colget.py [-h] [-v] arg1 arg2 arg3 positional arguments: arg1 Input table-style file arg2 Column number of data to be extracted arg3 Name of output file arg4 Y/N for data header line in OUTPUT file optional arguments: -h, --help show this help message and exit -v, --verbose Verbose responsesBelow I show examples of how to use colget.py. The files and run scripts for these examples are also located in my $tdata/T_runs/colget.py/ directories. Notice that the header lines in any file processed by colget.py are preserved in a file anmed "head.lines". All but the "# data" line are written to this file. I did this anticipating that I might want to cut various columns from multiple files. When I paste these columns to form other files I might want to save the header lines from all of the progenitor files.
Here we'll use the case of a very simple file. All of the columns are floating point numbers separated by white-space.
% cat Table_file.1 # THis is my data table RA DEC V sig_V B-V U-B V-R # data 6.708496 27.066771 13.5835 0.0000 0.1824 0.1862 0.0000 6.712891 27.061857 16.2000 0.0000 0.5742 0.4227 0.0000 6.708603 27.057254 15.8663 0.1724 0.3551 0.1404 0.0000 6.707845 27.056499 14.8863 0.0000 0.2723 0.1956 0.0000 6.708209 27.056278 13.8898 0.0000 0.2566 0.2134 0.0000 6.709876 27.054192 12.6919 0.0000 0.3597 0.2571 0.0000 % colget.py Table_file.1 1 RA N % ls head.lines RA S/ Table_file.1 The new file RA contains the column of data. % cat RA 6.708496 6.712891 6.708603 6.707845 6.708209 6.709876 % cat head.lines # THis is my data table RA DEC V sig_V B-V U-B V-R % colget.py Table_file.1 1 RA Y % cat RA # data 6.708496 6.712891 6.708603 6.707845 6.708209 6.709876 As a further example, I might pull several columns from my file and paste them into a new file using a simple script: % cat RUN_IT # colget.py Table_file.1 1 RA N colget.py Table_file.1 2 DEC N colget.py Table_file.1 5 BmV N echo "# data" > New.File paste RA DEC BmV >> New.File \rm -f RA DEC BmV % RUN_IT % ls New.File RUN_IT* Table_file.1 % cat New.File # data 6.708496 27.066771 0.1824 6.712891 27.061857 0.5742 6.708603 27.057254 0.3551 6.707845 27.056499 0.2723 6.708209 27.056278 0.2566 6.709876 27.054192 0.3597
Here we'll use a file with both numeric and string data.
% cat sep02_2015_A.dat AZ = HET structure azimuth (may be approximate since this must be recorded manually Q = HET parallactic angle ps = derived arcsec/pix from WCS N = direction CCW (in degrees) from +Y axis to North E = direction CCW (in degrees) from +Y axis to East sdRA = stan.dev. of RA residuals in WCS solution (arcsec) sdDEC = stan.dev. of RA residuals in WCS solution (arcsec) Nstar = number of stars used in WCS solution AZ DEC Q Target ps m.e. N E sdRA sdDEC Nstars Image_file_rootname fit_method # data 68 +37.23880 91.81 gsc1_2715.0724 0.2705 0.0003 184.9120 94.6388 0.254 0.302 6 20150812T041257.0_acm wcs_doall 68 +37.23930 91.81 gsc1_2715.0724 0.2707 0.0003 184.3591 94.8650 0.035 0.137 6 20150812T042308.2_acm wcs_doall 67 +37.49154 91.45 BSC5_8549 0.2706 0.0002 184.4207 94.3219 0.330 0.338 7 20150812T045350.0_acm wcs_doall 67 +37.39880 91.58 BSC5_8549 0.2710 0.0003 184.7464 94.6543 0.268 0.444 8 20150812T051838.2_acm wcs_doall 342.401 +62.59766 325.33 NGC7160 0.2736 0.0007 57.5834 327.9899 0.436 1.011 7 20150812T083408.1_acm wcs_doall 290 +35.93367 266.37 gsc1-2619.0796 0.2663 0.0002 358. 268. 0.25 0.25 0 20150805T063659.1_acm ast.net 83.42 +28.21667 104.06 ngc6940 0.2665 0.0002 196. 106. 0.25 0.25 0 20150808T033609.2_acm ast.net 290 +35.92152 266.35 gsc1-2651.1659 0.2665 0.0002 359. 269. 0.25 9.25 0 20150805T075207.8_acm ast.net 42.774 +51.24519 69.00 ngc1528 0.2716 0.0001 161. 70. 0.25 0.25 0 20150826T100618.7_acm ast.net % cat RUN # colget.py sep02_2015_A.dat 5 PS N colget.py sep02_2015_A.dat 4 Target_name N % RUN % cat PS 0.2705 0.2707 0.2706 0.2710 0.2736 0.2663 0.2665 0.2665 0.2716 % cat Target_name gsc1_2715.0724 gsc1_2715.0724 BSC5_8549 BSC5_8549 NGC7160 gsc1-2619.0796 ngc6940 gsc1-2651.1659 ngc1528
What is with the "# data" biz? I lot of my codes use the "# data" to locate the data table in a file. For instance, I may want to pull columns from different files and combine them with the oned_imarith.sh routine. This code requires files that we have that header line.
In Oc2018 I thought my conditional test in colget.py was messing up. The conditional was fine, but I had my "break" statement indented incorrectly. Here is a test that validates the debugged code: are floating point numbers separated by white-space.
% cat A.table # R50 from profile_gcurve = 3.9454 (pixels) # parlab rad Radius in pixel units k Fraction of total Signal # data 2.2460 0.2000 2.2933 0.2102 2.3414 0.2204 % colget.py A.table 1 X N % cat X 2.2460 2.2933 2.3414