Wiki Navigation

Process Data from Level0 to Level1 (raw2proc)

Process Level 0 (raw form) to Level 1 (monthly netCDF), which generally involves parsing ascii or binary data, combining measured parameters to get specific derived parameters of interest, and perform some basic data quality checks. While standing up the framework to add new platforms and packages to the processing stream has been accomplished to arrive at Version 1.0 (raw2proc-1.0), much remains to be done and holes to fill.

Pyhton Module Dependencies

numpy
pycdf
dateutil
udunits

Python Modules

raw2proc -- knows how-to process and what to process based on configuration files for platform and packages
ncutil -- create, update to netCDF files, and get data from them
procutil -- lots of time handling functions

Processes raw ascii- or binary-data from different NCCOOS sensors (ctd, adcp, waves-adcp, met) based on manual or automated operation. raw2proc manages what configuration files to use for the specified platform and month to process. Then based on specified package, it determines (via rigid directory structure) what raw files to look for to process. Finally, raw2proc loads parsing and output methods pre-determined in config files for specific package-type to handle processing for specific data formats. For example, adcp data has multiple data formats that we might want to process (log data, spec data, and binary data for both currents and waves). Each format has different information that dictates what can be output. raw2proc also organizes the data into monthly netCDF files in the Level1 dataset.

Platform Configuration Files

build config_tester() for config-file compliance to run under raw2proc
start with example config

Config File Rules

one config file per platform and configuration change
config_end_date : None denotes that the current config file is active and the current config file for the platform
change config_end_date : None to config_end_date : 'yyyy-mm-dd HH:MM:SS' in the old config file
create a new config for the same platform any time a change is made at the platform
config file naming convention platformID_config_YYYYMMDD.py, for example bogue_config_20060918.py

STILL TO DO

organize/brainstorm where to store configuration files (for now they are in the trunk)
ditto for processor modules

raw2proc

add to processors parse_last_time_stamp() from ascii data (so monitor can use this time)
add command-line interface thru getopt
in find_raw() override looking in "yyyy_mm" rawdata subdirectories for multi-month deployments for adcp (e.g. _RDI_000.000, _RDI_001.000, _RDI_003.000 in one subdirectory for one deployment.)

add proc2latest hook if auto-mode for SECOORA commons ingest (should config-files drive it?)
add manual spin to reprocess all or portion of the level1

time continuity check
simple range tests
establish qaqc tests and flags
establish qaqc variable for each variable

processors

build processors (this list will grow)

ncutil

investigate another way to get data into netcdf array (currently use numpy.tolist()) You'd think we should be able to hand netcdf file an array but I got a TypeError? ('f4' expected). The native type for array is 'f8' and I could not figure out how to cast it down to 'f4' or initialize an 'f4' array in memory to match the files array-type. kapeesh.

procutil

use dateutil timezone functions to handle setting timezone in time field of data
implement interface to udunits conversion (ticket)

testing processors

test auto and manual processing with ADCP data from bogue (Wavesmon output)
test manual processing with data from internal cards on ADCP at LSRB (internal binary and Wavesmon output)

unit tests

Sara needs to learn more about test code and knows she should have been doing it all along
lots of testing needed

Download in other formats:

Plain Text