colget.py
See complementary routine lineget.py.

Pull a column of data from an ASCII file. In the online help for this python code, I tell the user the input file must be in a "table-style" format. All this really means is that I have a "# data" marker indicating where the header stops and the columns of data begin. I allow the user to write a "# data" header to the output file if desired (since a lot of my own codes use this convention). Also, I assume the columns are separated by "white space". I show sample files in the examples below. This routine was initial developed in:

 

/home/sco/sco/codes/python/tables/colget.py 
 
Why not use a bash script or a fortran code for this seemingly simple job? I wanted a general tool. The readline() and split() tools in python make reading column files very easy. As I show below, I can use colget.py to grab numeric and string data values. I can then use other processing scripts on the single-column data files as the need arises.

 
 

% colget.py --help 
usage: colget.py [-h] [-v] arg1 arg2 arg3

positional arguments:
  arg1           Input table-style file
  arg2           Column number of data to be extracted
  arg3           Name of output file
  arg4           Y/N for data header line in OUTPUT file 

optional arguments:
  -h, --help     show this help message and exit
  -v, --verbose  Verbose responses

 
Below I show examples of how to use colget.py. The files and run scripts for these examples are also located in my $tdata/T_runs/colget.py/ directories. Notice that the header lines in any file processed by colget.py are preserved in a file anmed "head.lines". All but the "# data" line are written to this file. I did this anticipating that I might want to cut various columns from multiple files. When I paste these columns to form other files I might want to save the header lines from all of the progenitor files.


Example 1: A simple file

Here we'll use the case of a very simple file. All of the columns are floating point numbers separated by white-space.

 

% cat Table_file.1 
# THis is my data table 
 RA         DEC         V       sig_V   B-V      U-B      V-R
# data 
  6.708496  27.066771  13.5835 0.0000   0.1824   0.1862   0.0000
  6.712891  27.061857  16.2000 0.0000   0.5742   0.4227   0.0000
  6.708603  27.057254  15.8663 0.1724   0.3551   0.1404   0.0000
  6.707845  27.056499  14.8863 0.0000   0.2723   0.1956   0.0000
  6.708209  27.056278  13.8898 0.0000   0.2566   0.2134   0.0000
  6.709876  27.054192  12.6919 0.0000   0.3597   0.2571   0.0000

% colget.py Table_file.1 1 RA N 
% ls   
head.lines  RA  S/  Table_file.1
The new file RA contains the column of data. 

% cat RA 
6.708496
6.712891
6.708603
6.707845
6.708209
6.709876

% cat head.lines 
# THis is my data table 
 RA         DEC         V       sig_V   B-V      U-B      V-R


% colget.py Table_file.1 1 RA Y 
% cat RA 
# data
6.708496
6.712891
6.708603
6.707845
6.708209
6.709876

As a further example, I might pull several columns from 
my file and paste them into a new file using a simple script:
% cat RUN_IT 
#
colget.py Table_file.1 1 RA N
colget.py Table_file.1 2 DEC N
colget.py Table_file.1 5 BmV N
echo "# data" > New.File 
paste RA DEC BmV >> New.File
\rm -f RA DEC BmV

% RUN_IT 

% ls 
New.File  RUN_IT*  Table_file.1

% cat New.File 
# data
6.708496	27.066771	0.1824
6.712891	27.061857	0.5742
6.708603	27.057254	0.3551
6.707845	27.056499	0.2723
6.708209	27.056278	0.2566
6.709876	27.054192	0.3597


Example 2: A complicated file

Here we'll use a file with both numeric and string data.

 

% cat sep02_2015_A.dat 

AZ    = HET structure azimuth (may be approximate since this must be recorded manually 
Q     = HET parallactic angle 
ps    = derived arcsec/pix from WCS 
N     = direction CCW (in degrees) from +Y axis to North  
E     = direction CCW (in degrees) from +Y axis to  East
sdRA  = stan.dev. of RA residuals in WCS solution (arcsec) 
sdDEC = stan.dev. of RA residuals in WCS solution (arcsec) 
Nstar = number of stars used in WCS solution 
  AZ       DEC       Q         Target         ps     m.e.       N          E       sdRA  sdDEC Nstars  Image_file_rootname     fit_method 
# data 
  68    +37.23880   91.81  gsc1_2715.0724   0.2705  0.0003   184.9120    94.6388  0.254  0.302   6     20150812T041257.0_acm   wcs_doall
  68    +37.23930   91.81  gsc1_2715.0724   0.2707  0.0003   184.3591    94.8650  0.035  0.137   6     20150812T042308.2_acm   wcs_doall
  67    +37.49154   91.45  BSC5_8549        0.2706  0.0002   184.4207    94.3219  0.330  0.338   7     20150812T045350.0_acm   wcs_doall
  67    +37.39880   91.58  BSC5_8549        0.2710  0.0003   184.7464    94.6543  0.268  0.444   8     20150812T051838.2_acm   wcs_doall
342.401 +62.59766  325.33  NGC7160          0.2736  0.0007    57.5834   327.9899  0.436  1.011   7     20150812T083408.1_acm   wcs_doall
 290     +35.93367 266.37  gsc1-2619.0796   0.2663  0.0002   358.       268.      0.25   0.25    0     20150805T063659.1_acm   ast.net
 83.42   +28.21667 104.06  ngc6940          0.2665  0.0002   196.       106.      0.25   0.25    0     20150808T033609.2_acm   ast.net
 290     +35.92152 266.35  gsc1-2651.1659   0.2665  0.0002   359.       269.      0.25   9.25    0     20150805T075207.8_acm   ast.net
 42.774  +51.24519  69.00  ngc1528          0.2716  0.0001   161.        70.      0.25   0.25    0     20150826T100618.7_acm   ast.net 

% cat RUN 
#
colget.py sep02_2015_A.dat 5 PS N
colget.py sep02_2015_A.dat 4 Target_name N 

% RUN

% cat PS 
0.2705
0.2707
0.2706
0.2710
0.2736
0.2663
0.2665
0.2665
0.2716

% cat Target_name 
gsc1_2715.0724
gsc1_2715.0724
BSC5_8549
BSC5_8549
NGC7160
gsc1-2619.0796
ngc6940
gsc1-2651.1659
ngc1528
 

What is with the "# data" biz? I lot of my codes use the "# data" to locate the data table in a file. For instance, I may want to pull columns from different files and combine them with the oned_imarith.sh routine. This code requires files that we have that header line.


Example 3: Another simple case

In Oc2018 I thought my conditional test in colget.py was messing up. The conditional was fine, but I had my "break" statement indented incorrectly. Here is a test that validates the debugged code: are floating point numbers separated by white-space.

 

% cat A.table
# R50 from profile_gcurve =       3.9454  (pixels)
# parlab 
 rad     Radius in pixel units 
 k       Fraction of total Signal 
# data 
      2.2460        0.2000
      2.2933        0.2102
      2.3414        0.2204

% colget.py A.table 1 X N

% cat X
  2.2460
  2.2933
  2.3414




Back to scocodes page