NCCOOS Trac Projects: Top | Web | Platforms | Processing | Viz | Sprints | Sandbox | (Wind)

Process Data from Level0 to Level1 (raw2proc)

Process Level 0 (raw form) to Level 1 (monthly netCDF), which generally involves parsing ascii or binary data, combining measured parameters to get specific derived parameters of interest, and perform some basic data quality checks. While standing up the framework to add new platforms and packages to the processing stream has been accomplished to arrive at Version 1.0 (raw2proc-1.0), much remains to be done and holes to fill.

Pyhton Module Dependencies

  • numpy
  • pycdf
  • dateutil
  • udunits

Python Modules

  • raw2proc -- knows how-to process and what to process based on configuration files for platform and packages
  • ncutil -- create, update to netCDF files, and get data from them
  • procutil -- lots of time handling functions

Processes raw ascii- or binary-data from different NCCOOS sensors (ctd, adcp, waves-adcp, met) based on manual or automated operation. raw2proc manages what configuration files to use for the specified platform and month to process. Then based on specified package, it determines (via rigid directory structure) what raw files to look for to process. Finally, raw2proc loads parsing and output methods pre-determined in config files for specific package-type to handle processing for specific data formats. For example, adcp data has multiple data formats that we might want to process (log data, spec data, and binary data for both currents and waves). Each format has different information that dictates what can be output. raw2proc also organizes the data into monthly netCDF files in the Level1 dataset.

Platform Configuration Files

  • build config_tester() for config-file compliance to run under raw2proc
  • start with example config

Config File Rules

  • one config file per platform and configuration change
  • config_end_date : None denotes that the current config file is active and the current config file for the platform
  • change config_end_date : None to config_end_date : 'yyyy-mm-dd HH:MM:SS' in the old config file
  • create a new config for the same platform any time a change is made at the platform
  • config file naming convention platformID_config_YYYYMMDD.py, for example bogue_config_20060918.py

STILL TO DO

  • organize/brainstorm where to store configuration files (for now they are in the trunk)
  • ditto for processor modules

raw2proc

  • add to processors parse_last_time_stamp() from ascii data (so monitor can use this time)
  • add command-line interface thru getopt
  • in find_raw() override looking in "yyyy_mm" rawdata subdirectories for multi-month deployments for adcp (e.g. _RDI_000.000, _RDI_001.000, _RDI_003.000 in one subdirectory for one deployment.)
  • add proc2latest hook if auto-mode for SECOORA commons ingest (should config-files drive it?)
  • add manual spin to reprocess all or portion of the level1

  • time continuity check
  • simple range tests
  • establish qaqc tests and flags
  • establish qaqc variable for each variable

processors

  • build processors (this list will grow)

ncutil

  • investigate another way to get data into netcdf array (currently use numpy.tolist()) You'd think we should be able to hand netcdf file an array but I got a TypeError? ('f4' expected). The native type for array is 'f8' and I could not figure out how to cast it down to 'f4' or initialize an 'f4' array in memory to match the files array-type. kapeesh.

procutil

  • use dateutil timezone functions to handle setting timezone in time field of data
  • implement interface to udunits conversion (ticket)

testing processors

  • test auto and manual processing with ADCP data from bogue (Wavesmon output)
  • test manual processing with data from internal cards on ADCP at LSRB (internal binary and Wavesmon output)

unit tests

  • Sara needs to learn more about test code and knows she should have been doing it all along
  • lots of testing needed