date: Tue, 29 Jan 2008 10:44:31 +0000
from: Ian Harris <i.harrisatXYZxyz.ac.uk>
subject: Re: more thoughts on netCDF CRU TS 3.0
to: Tim Osborn <t.osbornatXYZxyz.ac.uk>
On 29 Jan 2008, at 9:49, Tim Osborn wrote:
> a couple more issues arose during my use of these netCDF files...
> (1) would it make the files much larger to use real*4 rather than
> int*4 for the data type of the main variable? If so this would be
> preferable, because most people will want to do calculations with
> the data are reading it that require real values. Reading the data
> as integer and then subsequently moving them into a real variable
> requires double the memory, and already we're talking > GB just to
> read one variable in full!
> (2) real*4 would also allow you to store the data without needing
> the scale factor to make them integers. Again, applying the
> scaling after reading requires another GB of memory, even if only
> temporarily when storing back into the same variable, if using
> whole-matrix calculations, i.e., alldata=alldata*scalefactor.
> Obviously one could avoid this by running through each element in a
> loop, but this is much slower.
> I appreciate that you wanted to replicate the values from the ASCII
> files as closely as possible, for the moment, but in the end I
> think it better to make the netCDF files as convenient as possible.
Point(s) taken. I think I'm happy to abandon the emulation of the
traditional format. INT and FLOAT take up the same space (they just
have different permissible ranges). When I next start work on the
production programs I'll filter through the changes.
> (3) when the file is read by a package that uses the UDUNITS
> protocol for units of physical data, the time variable is somewhat
> weird. e.g. February 2006 in the file appears in ncview as 31-
> Jan-06 rather than 1-Feb-06. At first I just glanced at the month
> (since the data are monthly) and actually thought Feb 2006 was
> missing from the file because it went from 31-Jan-06 to 1-Mar-06
> for the next month. I think this is because in UDUNITS a month is
> defined, for the default 'standard' (=='gregorian') calendat as
> 365.2425/12 days and therefore some unusual rounding occurs
> differently depending on whether it is or isn't a leap year. For
> this reason, time units of "days since ------" is preferred to
> either "months since -----" or "years since -----". A few details
> are given here:
> I wonder whether the simplest workaround would be to change the
> time attribute 'calendar' to '365_day'? Alternative, requiring
> more calculation, would be to use "days since ------" together with
> the "standard" gregorian calendar to define the values; you'd need
> to convert months to days taking into account leap/noleap years and
> the exact individual month lengths.
No one issue with the NetCDF format has caused me greater pain than
the time variable, whether we're talking about this work, or QUEST. I
really thought I'd avoided the day counting by saying 'months
since..', especially as that's a valid format. I hadn't considered
that people might use UDUNITS (I don't unless forced because it only
caters for a subset of the available calendars), so yes I'll have to
cater for its quirks too. Wail!
Thanks for spotting it.. I'll think on.
Ian "Harry" Harris
Climatic Research Unit
School of Environmental Sciences
University of East Anglia
Norwich NR4 7TJ