UPDATE: This is scientific?

Here, the expected 1990-2003 period is MISSING – so the correlations aren’t so hot! Yet
the WMO codes and station names /locations are identical (or close). What the hell is
supposed to happen here? Oh yeah – there is no ‘supposed’, I can make it up. So I have :-)

and

<DO YOU SEE? THERE’S THAT OH-SO FAMILIAR BLOCK OF MISSING CODES IN THE LATE 80S,
THEN THE DATA PICKS UP AGAIN. BUT LOOK AT THE CORRELATIONS ON THE RIGHT, ALL
GOOD AFTER THE BREAK, DECIDEDLY DODGY BEFORE IT. THESE ARE TWO DIFFERENT
STATIONS, AREN’T THEY? AAAARRRGGGHHHHHHH!!!!!>

and

Worked out an algorithm from scratch. It seems to give better answers than the others, so we’ll go
with that.

and

So the largest database, precip, contained 14397 stations with usable WMO codes (and 1540 without).
The TMin, (and TMax and DTR, which were tested then excluded as they matched TMin 100%) database only agreed
perfectly with precip for 1865 stations, nearby 3389, believable 57, worrying 77. TMean fared worse, with NO
exact matches (WMO misformatting again) and over 100 worrying ones.

and

I am seriously close to giving up, again. The history of this is so complex that I can’t get far enough
into it before by head hurts and I have to stop. Each parameter has a tortuous history of manual and
semi-automated interventions that I simply cannot just go back to early versions and run the update prog.
I could be throwing away all kinds of corrections – to lat/lons, to WMOs (yes!), and more.

Bad data, poorly collected, some of it missing and a programmer applying all types of kludges trying to make sense of it.  Dollars to doughnuts that anything produced by the CRU is about as reliable as its data.

———————————————

More information has surfaced as more people peruse the files from the Univeristy of East Anglica’s Climate Reseach Unit.

One particularly interesting file is named HARRY_Read_Me.txt.

This from the Toronto Sun:

I’ve been poring over one of many leaked computer files from the “climategate” scandal.

It’s worse than those e-mails revealing leading climate scientists did a “trick” to “hide the decline” in global temperatures and privately called it a “travesty” they couldn’t explain recent cooling.

This document has the innocuous header “HARRY_READ_Me.txt.”

I’m indebted to Kate McMillan, the remarkable Canadian blogger who runs smalldeadanimals.com, for calling it to my attention.

You can easily find it online. I used www.anenglishmanscastle.com/HARRY_READ_Me.txt.

The file — 274 pages long — describes the efforts of a climatologist/programmer at the Climatic Research Unit (CRU) of the University of East Anglia to update a huge statistical database (11,000 files) of important climate data between 2006 and 2009.

The computer coding, along with the programmer’s apparently unsuccessful efforts to complete the project, involve data that are the foundation of the study of climate change — recordings from hundreds of weather stations around the world of temperature and precipitation measurements from 1901 to 2006, sun/cloud computer simulations, and the like.

PRESUMABLY PRECISE

These presumably precise data are the backbone of climate science.

Reading “HARRY_READ_ME.txt” it’s clear the CRU’s files were a mess. The programmer laments huge gaps in data, bug-filled programs and worries about all the guesswork he’s doing. His comments suggest the problems go back years.

The CRU at East Anglia University is considered by many as the world’s leading climate research agency. Here’s how CBSNews.com’s Declan McCullagh describes its enormous impact on policymakers:

“In global warming circles, the CRU wields outsize influence: It claims the world’s largest temperature data set, and its work and mathematical models were incorporated into the United Nations Intergovernmental Panel on Climate Change’s 2007 report. The report … is what the Environmental Protection Agency acknowledged it ‘relies on most heavily’ when concluding carbon dioxide emissions endanger public health and should be regulated.”

As you read the programmer’s comments below, remember, this is only a fraction of what he says.

- “But what are all those monthly files? DON’T KNOW, UNDOCUMENTED. Wherever I look, there are data files, no info about what they are other than their names. And that’s useless …” (Page 17)

- “It’s botch after botch after botch.” (18)

- “The biggest immediate problem was the loss of an hour’s edits to the program, when the network died … no explanation from anyone, I hope it’s not a return to last year’s troubles … This surely is the worst project I’ve ever attempted. Eeeek.” (31)

- “Oh, GOD, if I could start this project again and actually argue the case for junking the inherited program suite.” (37)

- “… this should all have been rewritten from scratch a year ago!” (45)

- “Am I the first person to attempt to get the CRU databases in working order?!!” (47)

- “As far as I can see, this renders the (weather) station counts totally meaningless.” (57)

- “COBAR AIRPORT AWS (data from an Australian weather station) cannot start in 1962, it didn’t open until 1993!” (71)

- “What the hell is supposed to happen here? Oh yeah — there is no ‘supposed,’ I can make it up. So I have : – )” (98)

- “You can’t imagine what this has cost me — to actually allow the operator to assign false WMO (World Meteorological Organization) codes!! But what else is there in such situations? Especially when dealing with a ‘Master’ database of dubious provenance …” (98)

- “So with a somewhat cynical shrug, I added the nuclear option — to match every WMO possible, and turn the rest into new stations … In other words what CRU usually do. It will allow bad databases to pass unnoticed, and good databases to become bad …” (98-9)

- “OH F— THIS. It’s Sunday evening, I’ve worked all weekend, and just when I thought it was done, I’m hitting yet another problem that’s based on the hopeless state of our databases.” (241).

- “This whole project is SUCH A MESS …” (266)

And based on stuff like this, politicians are going to blow up our economy and lower our standard of living to “fix” the climate?

Are they insane?

I went ahead and downloaded the “HARRY” file.  Much of it is just technical jargon, of one man trying run the stats, but more importantly, bit by bit, it becomes clear that the data is compromised–poorly collected, poorly assembled– and it becomes crystal clear that ultimately the goal becomes an endeavor to make the data fit theory.

This isn’t a study of data.  Global warming is a science of data manipulation.

To that end, I’m going post below the entire text and throughout the day, I’ll make bold the tell-tale signs of bad data that prove nothing and that shouldn’t be used to chart our economic future.

—HARRY TXT—

1. Two main filesystems relevant to the work:

/cru/dpe1a/f014
/cru/tyn1/f014

Both systems copied in their entirety to /cru/cruts/

Nearly 11,000 files! And about a dozen assorted ‘read me’ files addressing
individual issues, the most useful being:

fromdpe1a/data/stnmon/doc/oldmethod/f90_READ_ME.txt
fromdpe1a/code/linux/cruts/_READ_ME.txt
fromdpe1a/code/idl/pro/README_GRIDDING.txt

(yes, they all have different name formats, and yes, one does begin ‘_’!)

2. After considerable searching, identified the latest database files for
tmean:

fromdpe1a/data/cruts/database/+norm/tmp.0311051552.dtb
fromdpe1a/data/cruts/database/+norm/tmp.0311051552.dts

(yes.. that is a directory beginning with ‘+’!)

3. Successfully ran anomdtb.f90 to produce anomaly files (as per item 7
in the ‘_READ_ME.txt’ file). Had to make some changes to allow for the
move back to alphas (different field length from the ‘wc -l’ command).

4. Successfully ran the IDL regridding routine quick_interp_tdm.pro
(why IDL?! Why not F90?!) to produce ‘.glo’ files.

5. Currently trying to convert .glo files to .grim files so that we can
compare with previous output. However the progam suite headed by
globulk.f90 is not playing nicely – problems with it expecting a defunct
file system (all path widths were 80ch, have been globally changed to 160ch)
and also no guidance on which reference files to choose. It also doesn’t
seem to like files being in any directory other than the current one!!

6. Temporarily abandoned 5., getting closer but there’s always another
problem to be evaded. Instead, will try using rawtogrim.f90 to convert
straight to GRIM. This will include non-land cells but for comparison
purposes that shouldn’t be a big problem… [edit] noo, that’s not gonna
work either, it asks for a ‘template grim filepath’, no idea what it wants
(as usual) and a serach for files with ‘grim’ or ‘template’ in them does
not bear useful fruit. As per usual. Giving up on this approach altogether.

7. Removed 4-line header from a couple of .glo files and loaded them into
Matlab. Reshaped to 360r x 720c and plotted; looks OK for global temp
(anomalies) data. Deduce that .glo files, after the header, contain data
taken row-by-row starting with the Northernmost, and presented as ’8E12.4′.
The grid is from -180 to +180 rather than 0 to 360.
This should allow us to deduce the meaning of the co-ordinate pairs used to
describe each cell in a .grim file (we know the first number is the lon or
column, the second the lat or row – but which way up are the latitudes? And
where do the longitudes break?
There is another problem: the values are anomalies, wheras the ‘public’
.grim files are actual values. So Tim’s explanations (in _READ_ME.txt) are
incorrect..

8. Had a hunt and found an identically-named temperature database file which
did include normals lines at the start of every station. How handy – naming
two different files with exactly the same name and relying on their location
to differentiate! Aaarrgghh!! Re-ran anomdtb:

crua6[/cru/cruts/rerun1/data/cruts/rerun1work] ./anomdtb

> ***** AnomDTB: converts .dtb to anom .txt for gridding *****

> Enter the suffix of the variable required:
.tmp
> Select the .cts or .dtb file to load:
tmp.0311051552.dtb
> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal:           23
> Specify the no. of stdevs at which to reject data:
3
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
8
> Select the generic .txt file to save (yy.mm=auto):
rr2.txt
> Select the first,last years AD to save:
1901,2002
> Operating…
> Failed to find file.
> Enter the file, with suffix: .dts
tmp.0311051552.dts
Values loaded: 1255171542;  No. Stations:      12155
> NORMALS            MEAN percent      STDEV percent
>         .dtb    5910325    86.6
>         .cts     575661     8.4    6485986    95.0
> PROCESS        DECISION percent %of-chk
> no lat/lon        12043     0.2     0.2
> no normal        335741     4.9     4.9
> out-of-range      31951     0.5     0.5
> duplicated       341323     5.0     5.3
> accepted        6107721    89.4
> Dumping years 1901-2002 to .txt files…

crua6[/cru/cruts/rerun1/data/cruts/rerun1work]

9. Ran the IDL function:
IDL> quick_interp_tdm2,1901,2002,’rr2glofiles/rr2grid.’,1200,gs=0.5,dumpglo=’dumpglo’,pts_prefix=’rr2txtfiles/rr2.’
% Compiled module: QUICK_INTERP_TDM2.
% Compiled module: GLIMIT.
Defaults set
1901
% Compiled module: MAP_SET.
% Compiled module: CROSSP.
% Compiled module: STRIP.
% Compiled module: SAVEGLO.
% Compiled module: SELECTMODEL.
1902
(etc)
2002
IDL>

This produces anomoly files even when given a normals-added
database.. doesn’t create the CLIMATOLOGY. However we do have
it, both in the ‘normals’ directory of the user data
directory, and in the dpe1a ‘cru_cl_1.0′ folder! The relevant
file is ‘clim.6190.lan.tmp’. Obviously this is for land
only.

10. Trying to compare .glo and .grim
Wrote several programs to assist with this process. Tried
creating anomalies from the .grim files, using the
published climatology. Then tried to compare with the glo
files I’d produced (this is all for 1961-1970). Couldn’t
get a sensible grid layout for the glo files! Eventually
resorted to visualisation – looks like the .glo files are
‘regular’ grid format after all (longitudes change
fastest). Don’t understand why the comparison program had
so much trouble getting matched cells!

11. Decided to concentrate on Norwich. Tim M uses Norwich
as the example on the website, so we know it’s at (363,286).
Wrote a prog to extract the relevant 1961-1970 series from
the published output, the generated .glo files, and the
published climatology. Prog is norwichtest.for. Prog also
creates anomalies from the published data, and raw data
from the generated .glo data. Then Matlab prog plotnorwich.m
plots the data to allow comparisons.
First result: works perfectly, except that the .glo data is
all zeros. This means I still don’t understand the structure
of the .glo files. Argh!

12. Trying something *else*. Will write a prog to convert
the 1961-1970 .glo files to a single file with 120 columns
and a row for each non-zero cell. It will be slow. It is a
nuisance because the site power os off this weekend (and
it’s Friday afternoon) so I will get it running at home.
Program is glo2vec.for, and yup it is slow. Started a second
copy on uealogin1 and it’s showing signs of overtaking the
crua6 version that started on Friday (it’s Tuesday now). I’m
about halfway through and the best correlation so far (as
tested by norwichcorr.for) is 0.39 at (170,135) (lon,lat).

13. Success! I would crack open a bottle of bubbly but it’s
only 11.25am. The program norwichcorr.for found a correlation
for the norwich series at (363, 286) of 1.00! So we have
found the published Norwich series in the grids I produced. A
palpable sense of relief pervades the office :-) It’s also the
grid reference given by Tim for Norwich. So how did I miss it
earlier??

14. Wrote a program (‘glo2grim.for’) to do what I cannot get
Tim’s ‘raw2grim.f90′, ie, convert .glo files to GRIM format.
It’s slow but sure. In parallel, a quick prog called grimcmp.for
which compares two GRIM-format files. It produces brief stats.
At time of writing, just over 4000 cells have been converted,
and the output of grimcmp is:

uealogin1[/cru/cruts/rerun1/data/cruts/rerun1work] ./grimcmp

Welcome to the GRIM Comparer

Please enter the first grim file (must be complete!):  cru_ts_2_10.1961-1970.tmp
Please enter the second grim file (may be incomplete): glo2grim1.out

File glo2grim1.out terminated prematurely after     4037 records.

SUMMARY FROM GRIMCMP

Files compared:
1. cru_ts_2_10.1961-1970.tmp
2. glo2grim1.out

Total Cells Compared              4037
Total 100% Matches                   0
Cells with Corr. == 1.00             0  ( 0.0%)
Cells with 0.90<=Corr<=0.99       3858  (95.6%)
Cells with 0.80<=Corr<=0.89        119  ( 2.9%)
Cells with 0.70<=Corr<=0.79         25  ( 0.6%)

..which is good news! Not brilliant because the data should be
identical.. but good because the correlations are so high! This
could be a result of my mis-setting of the parameters on Tim’s
programs (although I have followed his recommendations wherever
possible), or it could be a result of Tim using the Beowulf 1
cluster for the f90 work. Beowulf 1 is now integrated in to the
latest Beowulf cluster so it may not be practical to test that
theory.

15. All change! My ‘glo2grim1′ program was presciently named as
it’s now up to v3! My attempt to speed up early iterations by
only reading as much of each glo file as was needed was really
stupidly coded and hence the poor results. Actually they’re
worryingly good as the data was effectively random :-0
We are now on-beam and initial results are very very promising:

uealogin1[/cru/cruts/rerun1/data/cruts/rerun1work] ./grimcmp3x

File glo2grim3.out terminated prematurely after      143 records.

SUMMARY FROM GRIMCMP

Files compared:
1. cru_ts_2_10.1961-1970.tmp
2. glo2grim3.out

Total Cells Compared               143
Total 100% Matches                  12
Cells with Corr. == 1.00            12  ( 8.4%)
Cells with 0.96<=Corr<=0.99        130  (90.9%)
Cells with 0.90<=Corr<=0.95          1  ( 0.7%)
Cells with 0.80<=Corr<=0.89          0  ( 0.0%)
Cells with 0.70<=Corr<=0.79          0  ( 0.0%)

..so all correlations are >= 0.9 and all but one are >=0.96!
with 12 complete (100% identical) matches I think we can safely
say we are producing the data Tim produced. The variations can
be accounted for as rounding errors due to different hardware
and compilers, I reckon..

16. So, it seemed like a good time to start a Precip run. With
a bit of luck this would go as smoothly as the Temperature run,
ho, ho, ho. The first problem was that anomdtb kept crashing:

crua6[/cru/cruts/rerun1/data/cruts/rerun2work] ./anomdtb

> ***** AnomDTB: converts .dtb to anom .txt for gridding *****

> Enter the suffix of the variable required:
.pre
> Will calculate percentage anomalies.
> Select the .cts or .dtb file to load:
pre.0312031600.dtb
> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal:           23
> Specify the no. of stdevs at which to reject data:
3
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
8
> Select the generic .txt file to save (yy.mm=auto):
rr2pre.txt
> Select the first,last years AD to save:
1901,2002
> Operating…
Values loaded: 1258818288;  No. Stations:      12732
> NORMALS            MEAN percent      STDEV percent
>         .dtb    2635549    29.6
forrtl: error (75): floating point exception
IOT trap (core dumped)
crua6[/cru/cruts/rerun1/data/cruts/rerun2work]

..not good! Tried recompiling for uealogin1.. AARGGHHH!!! Tim’s
code is not ‘good’ enough for bloody Sun!! Pages of warnings and
27 errors! (full results in ‘anomdtb.uealogin1.compile.results’).

17. Inserted debug statements into anomdtb.f90, discovered that
a sum-of-squared variable is becoming very, very negative! Key
output from the debug statements:

OpEn=   16.00, OpTotSq=    4142182.00, OpTot= 7126.00
DataA val =       93, OpTotSq=       8649.00
DataA val =      172, OpTotSq=      38233.00
DataA val =      950, OpTotSq=     940733.00
DataA val =      797, OpTotSq=    1575942.00
DataA val =      293, OpTotSq=    1661791.00
DataA val =       83, OpTotSq=    1668680.00
DataA val =      860, OpTotSq=    2408280.00
DataA val =      222, OpTotSq=    2457564.00
DataA val =      452, OpTotSq=    2661868.00
DataA val =      561, OpTotSq=    2976589.00
DataA val =    49920, OpTotSq=-1799984256.00
DataA val =      547, OpTotSq=-1799684992.00
DataA val =      672, OpTotSq=-1799233408.00
DataA val =      710, OpTotSq=-1798729344.00
DataA val =      211, OpTotSq=-1798684800.00
DataA val =      403, OpTotSq=-1798522368.00
OpEn=   16.00, OpTotSq=-1798522368.00, OpTot=56946.00
forrtl: error (75): floating point exception
IOT trap (core dumped)

..so the data value is unbfeasibly large, but why does the
sum-of-squares parameter OpTotSq go negative?!!

Probable answer: the high value is pushing beyond the single-
precision default for Fortran reals?

Value located in pre.0312031600.dtb:

-400002  3513   3672  309 HAMA                 SYRIA         1985 2002   -999     -999
6190  842  479 3485  339  170  135  106    0    9  243  387  737
1985  887  582   93   16   17    0    0    0    0  352  221  627
1986  899  252  172  527  173   30    0    0    0   84  496  570
1987  578  349  950  191    4    0    0    0    0  343  462  929
1988 1044  769  797  399   11  903  218    0    0  163  517 1181
1989  269   62  293    3   13    0    0    0    0  101  292  342
1990  328  276   83  135  224    0    0    0    0   87  343  230
1991 1297  292  860  320   70    0    0    0    0  206  298  835
1992  712 1130  222   39  339  301    0    0    0    0  909  351
1993  726  609  452   82  672    3    0    0    0   34  183  351
1994  625  661  561   41  155    0    0    0   22  345  953 1072
1995  488-9999-9999  182-9999    0-9999    0    0    0  754-9999
1996-9999  40949920-9999   82    0-9999    0   36  414  112  312
1997-9999  339  547-9999  561-9999    0    0   54  155  265  962
1998 1148  289  672  496-9999    0    0-9999    9   21-9999 1206
1999  343  379  710  111    0    0    0-9999-9999-9999  132  285
2000 1518  399  211  354   27    0-9999    0   27  269  316 1057
2001  370-9999-9999  273  452    0-9999-9999-9999  290  356-9999
2002  871  329  403  111  233-9999    0    0-9999-9999  377 1287

(value is for March 1996)

Action: value replaced with -9999 and file renamed:

pre.0312031600H.dtb   (to indicate I’ve fixed it)

.dts file also renamed for consistency.

anomdtb then runs fine!! Producing the usual txt files.

18. Ran the IDL gridding routine for the precip files:

quick_interp_tdm2,1901,2002,’rr2preglofiles/rr2pregrid.’,450,gs=0.5,dumpglo=’dumpglo’,pts_prefix=’rr2pretxtfiles/rr2pre.’

..and this is where it gets CRAZY. Instead of running normally,
this time I get:

IDL> quick_interp_tdm2,1901,1910,’rr2glofiles2/rr2grid.’,1200,gs=0.5,dumpglo=’dumpglo’,pts_prefix=’rr2txtfiles/rr2.’

limit=glimit(/all) ; sets limit to global field
^
% Syntax error.
At: /cru/cruts/fromdpe1a/code/idl/pro/quick_interp_tdm2.pro, Line 38

lim=glimit(/all)
^
% Syntax error.
At: /cru/cruts/fromdpe1a/code/idl/pro/quick_interp_tdm2.pro, Line 122

r=area_grid(pts2(n,1),pts2(n,0),pts2(n,2),gs*2.0,bounds,dist,angular=angular)
^
% Syntax error.
At: /cru/cruts/fromdpe1a/code/idl/pro/quick_interp_tdm2.pro, Line 183
% Compiled module: QUICK_INTERP_TDM2.
% Attempt to call undefined procedure/function: ‘QUICK_INTERP_TDM2′.
% Execution halted at:  $MAIN$
IDL>

.. WHAT?! Now it’s not precompiling its functions for some reason!
What’s more – I cannot find the ‘glimit’ function anywhere!!

Eventually (the following day) I found glimit and area_grid, they are
in Mark New’s folder: /cru/u2/f080/Idl. Since this is in $IDL_PATH I
have no idea why they’re not compiling! I manually compiled them with
.compile, and the errors vanished! Though not for long:

IDL> .compile /cru/u2/f080/Idl/glimit.pro
% Compiled module: GLIMIT.
IDL> .compile /cru/u2/f080/Idl/area_grid.pro
% Compiled module: AREA_GRID.
IDL> quick_interp_tdm2,1901,1910,’rr2glofiles2/rr2grid.’,1200,gs=0.5,dumpglo=’dumpglo’,pts_prefix=’rr2txtfiles/rr2.’
% Compiled module: QUICK_INTERP_TDM2.
Defaults set
1901
% Compiled module: MAP_SET.
% Compiled module: CROSSP.
% Variable is undefined: STRIP.
% Execution halted at:  QUICK_INTERP_TDM2  215 /cru/cruts/fromdpe1a/code/idl/pro/quick_interp_tdm2.pro
%                       $MAIN$
IDL>

Was this a similar problem? Unfortunately not:

IDL> .compile /cru/u2/f080/Idl/strip.pro
% Compiled module: STRIP.
IDL> quick_interp_tdm2,1901,1910,’rr2glofiles2/rr2grid.’,1200,gs=0.5,dumpglo=’dumpglo’,pts_prefix=’rr2txtfiles/rr2.’
Defaults set
1901
% Variable is undefined: STRIP.
% Execution halted at:  QUICK_INTERP_TDM2  215 /cru/cruts/fromdpe1a/code/idl/pro/quick_interp_tdm2.pro
%                       QUICK_INTERP_TDM2  215 /cru/cruts/fromdpe1a/code/idl/pro/quick_interp_tdm2.pro
%                       $MAIN$
IDL>

..so it looks like a path problem. I wondered if the NFS errors that have
been plagueing crua6 work for some time now might have prevented IDL from
adding the correct directories to the path? After all the help file does
mention that IDL discards any path entries that are inaccessible.. so if
the timeout is a few seconds that would explain it. So I restarted IDL,
and PRESTO! It worked. I then tried the precip veriosn – and it worked
too!

IDL> quick_interp_tdm2,1901,2002,’rr2preglofiles/rr2pregrid.’,450,gs=0.5,dumpglo=’dumpglo’,pts_prefix=’rr2pretxtfiles/rr2pre.’
% Compiled module: QUICK_INTERP_TDM2.
% Compiled module: GLIMIT.
Defaults set
1901
% Compiled module: MAP_SET.
% Compiled module: CROSSP.
% Compiled module: STRIP.
% Compiled module: SAVEGLO.
% Compiled module: SELECTMODEL.
1902
(etc)
2001
2002
IDL>

I then ran glo2grim4.for to convert from percentage anomalies to real
(10ths of a mm) values. Initial results are not as good as temperature,
but mainly above 0.96 so obviously on the right track.

However..

19. Here is a little puzzle. If the latest precipitation database file
contained a fatal data error (see 17. above), then surely it has been
altered since Tim last used it to produce the precipitation grids? But
if that’s the case, why is it dated so early? Here are the dates:

/cru/dpe1a/f014/data/cruts/database/+norm/pre.0312031600.dtb
- directory date is 23 Dec 2003

/cru/tyn1/f014/ftpfudge/data/cru_ts_2.10/data_dec/cru_ts_2_10.1961-1970.pre.Z
- directory date is 22 Jan 2004 (original date not preserved in zipped file)
- internal (header) date is also ’22.01.2004 at 17:57′

So what’s going on? I don’t see how the ‘final’ precip file can have been
produced from the ‘final’ precipitation database, even though the dates
imply that. The obvious conclusion is that the precip file must have been
produced before 23 Dec 2003, and then redated (to match others?) in Jan 04.

20. Secondary Variables – Eeeeeek!! Yes the time has come to attack what even
Tim seems to have been unhappy about (reading between the lines). To assist
me I have 12 lines in the gridding ReadMe file.. so par for the course.
Almost immediately I hit that familiar feeling of ambiguity: the text
suggests using the following three IDL programs:
frs_gts_tdm.pro
rd0_gts_tdm.pro
vap_gts_anom.pro
So.. when I look in the code/idl/pro/ folder, what do I find? Well:

3447 Jan 22  2004 fromdpe1a/code/idl/pro/frs_gts_anom.pro
2774 Jun 12  2002 fromdpe1a/code/idl/pro/frs_gts_tdm.pro

2917 Jan  8  2004 fromdpe1a/code/idl/pro/rd0_gts_anom.pro
2355 Jun 12  2002 fromdpe1a/code/idl/pro/rd0_gts_tdm.pro

5880 Jan  8  2004 fromdpe1a/code/idl/pro/vap_gts_anom.pro

In other words, the *anom.pro scripts are much more recent than the *tdm
scripts. There is no way of knowing which Tim used to produce the current
public files. The scripts differ internally but – you guessed it! – the
descriptions at the start are identical. WHAT IS GOING ON? Given that the
‘README_GRIDDING.txt’ file is dated ‘Mar 30  2004′ we will have to assume
that the originally-stated scripts must be used.

To begin with, we need binary output from quick_interp_tdm2, so it’s run
again for tmp and pre, and (for the first time) for dtr. This time, the
command line looks like this for tmp:
IDL> quick_interp_tdm2,1901,2002,’idlbinout/idlbin’,1200,gs=2.5,dumpbin=’dumpbin’,pts_prefix=’tmp_txt_4idl/tmp.’
This gives screen output for each year, typically:
1991
grid 1991 non-zero    0.9605    2.0878    2.1849 cells=    27048
And produces output files (in, in this case, ‘idlbinout/’), like this:
-rw——-   1 f098     cru       248832 Sep 21 12:20 idlbin_tmp/idlbin_tmp1991

At this point, did some logical renaming. So..
.txt files (pre-IDL) are typically ‘tmp.1901.01.txt’ in ‘tmp_txt_4idl/’
binary files (post-IDL) are typically ‘idlbin_tmp1991′ in ‘idlbin_tmp/’.
These changes rolled back to the quoted command lines, to avoid confusion.

Next, precip command line:
IDL> quick_interp_tdm2,1901,2002,’idlbin_pre/idlbin_pre’,450,gs=2.5,dumpbin=’dumpbin’,pts_prefix=’pre_txt_4idl/pre.’
(note new filenaming schema)
This gives example screen output:
1991
grid 1991 non-zero   -4.8533   36.2155   51.0738 cells=    51060
And produces output files like:
-rw——-   1 f098     cru       248832 Sep 21 12:50 idlbin_pre/idlbin_pre1991

Finally for the primaries, the first stab at dtr. Ran anomdtb with the
database file dtr.0312221128.dtb, and the standard/recommended responses.
Screen output:
> NORMALS            MEAN percent      STDEV percent
>         .dtb          0     0.0
>         .cts    3375441    84.1    3375441    84.1
> PROCESS        DECISION percent %of-chk
> no lat/lon         3088     0.1     0.1
> no normal        638538    15.9    15.9
> out-of-range      70225     1.7     2.1
> duplicated       135457     3.4     4.1
> accepted        3167636    78.9
> Dumping years 1901-2002 to .txt files…

Then for the gridding:
IDL> quick_interp_tdm2,1901,2002,’idlbin_dtr/idlbin_dtr’,750,gs=2.5,dumpbin=’dumpbin’,pts_prefix=’dtr_txt_4idl/dtr.’
Giving screen output:
1991
grid 1991 non-zero   -0.3378    1.6587    1.7496 cells=     3546
And files such as:
-rw——-   1 f098     cru       248832 Sep 21 13:39 idlbin_dtr/idlbin_dtr1991

And.. at this point, I read the ReadMe file properly. I should be gridding at
2.5 degrees not 0.5 degrees! For some reason, secondary variables are not
derived from the 0.5 degree grids. Re-did all three generations (the sample
command lines and outputs above have been altered to reflect this, to avoid
confusion).

So, to the generation of the synthetic grids.

Tried running frs_gts_tdm but it complained it couldn’t find the normals file:

IDL> frs_gts_tdm,dtr_prefix=’idlbin_dtr/idlbin_dtr’,tmp_prefix=’idlbin_tmp/idlbin_tmp’,1901,2002,outprefix=’syngrid_frs/syngrid_frs’
% Compiled module: FRS_GTS_TDM.
% Attempt to call undefined procedure/function: ‘FRS_GTS_TDM’.
% Execution halted at:  $MAIN$
IDL> frs_gts,dtr_prefix=’idlbin_dtr/idlbin_dtr’,tmp_prefix=’idlbin_tmp/idlbin_tmp’,1901,2002,outprefix=’syngrid_frs/syngrid_frs’
% Compiled module: RDBIN.
% Compiled module: STRIP.
ls: /home/cru/f098/m1/gts/frs/glo/glo.frs.norm not found
ls: /home/cru/f098/m1/gts/frs/glo/glo.frs.norm.Z not found
ls: /home/cru/f098/m1/gts/frs/glo/glo.frs.norm.gz not found
% READF: End of file encountered. Unit: 99, File: foo
% Execution halted at:  RDBIN              25 /cru/u2/f080/Idl/rdbin.pro
%                       FRS_GTS            18 /cru/cruts/fromdpe1a/code/idl/pro/frs_gts_tdm.pro
%                       $MAIN$
IDL>

However when I eventually found what I hope is the normals file:

/cru/cruts/fromdpe1a/data/grid/twohalf/glo25.frs.6190

..and altered the IDL prog to read it.. same error! Turns out it’s preferring
to pick up Mark N’s version so tried explicitly compiling,
(‘.compile xxxxxx.pro’) that worked, in that the error changed:

IDL> frs_gts,dtr_prefix=’idlbin_dtr/idlbin_dtr’,tmp_prefix=’idlbin_tmp/idlbin_tmp’,1901,2002,outprefix=’syngrid_frs/syngrid_frs’
% Compiled module: RDBIN.
% Compiled module: STRIP.
yes
% Variable is undefined: NF.
% Execution halted at:  RDBIN              68 /cru/u2/f080/Idl/rdbin.pro
%                       FRS_GTS            21 /cru/cruts/fromdpe1a/code/idl/pro/frs_gts_tdm.pro
%                       $MAIN$
IDL>

So what is this mysterious variable ‘nf’ that isn’t being set? Well strangely,
it’s in Mark N’s ‘rdbin.pro’. I say strangely because this is a generic prog
that’s used all over the place! Nonetheless it does have what certainly looks
like a bug:

38   if keyword_set(gridsize) eq 0 then begin
39    info=fstat(lun)
40    if keyword_set(seas) then info.size=info.size*2.0
41    if keyword_set(ann) then info.size=info.size*12.0
42    nlat=sqrt(info.size/48.0)
43    gridsize=180.0/nlat
44    if keyword_set(quiet) eq 0 then print,’filesize=’,info.size
45    if keyword_set(quiet) eq 0 then print,’gridsize=’,gridsize
46   endif
47   if keyword_set(had) then had=1 else had=0
48   if keyword_set(echam) then echam=1 else echam=0
49   if keyword_set(gfdl) then gfdl=1 else gfdl=0
50   if keyword_set(ccm) then ccm=1 else ccm=0
51   if keyword_set(csiro) then csiro=1 else csiro=0
52  ;create array to read data into
53   if keyword_set(seas) then nf=6 else nf=12
54   if keyword_set(ann) then nf=1
55   defxyz,lon,lat,gridsize,grid=grid,nf=nf,had=had,echam=echam,gfdl=gfdl,ccm=ccm,csiro=csiro
56   if keyword_set(quiet) eq 0 then help,grid
57   grid=fix(grid)
58  ;read data
59   readu,lun,grid
60   close,lun
61   spawn,string(‘rm -f ‘,fff)
62  endif else begin
63   openr,lun,fname
64  ; check file size and work out grid spacing if gridsize isn’t set
65   if keyword_set(gridsize) eq 0 then begin
66    info=fstat(lun)
67    if keyword_set(quiet) eq 0 then print,’yes’
68    nlat=sqrt((info.size/nf)/4.0)
69    gridsize=180.0/nlat
70    if keyword_set(quiet) eq 0 then print,’filesize=’,info.size
71    if keyword_set(quiet) eq 0 then print,’gridsize=’,gridsize
72   endif
73   if keyword_set(seas) then nf=6.0 else nf=12.0
74   if keyword_set(ann) then nf=1

In other words, ‘nf’ is set in the first conditional set of statements, but in
the alternative (starting on #62) it is only set AFTER it’s used
(set #73,#74; used #68). So I shifted #73 and #74 to between #64 and #65, and..
with precompiling to pick up the local version of rdbin, too.. it worked!
Er, perhaps.

Lots of screen output, and lots of files. A set of synthetic grids in ‘syngrid_frs/’ as requested, typically:

-rw——-   1 f098     cru        20816 Sep 17 22:10 syngrid_frs/syngrid_frs1991.Z

..but also a set of some binariy files in the working directory! They look like this:

-rw——-   1 f098     cru        51542 Sep 17 22:10 glo.frs.1991.Z

Having read the program it looks as though the latter files are absolutes,
whereas the former are anomalies. With this in mind, they are renamed:

glo.frs.1991 -> glo.frs.abs.1991

..and put into folder syngrid_frs_abs/

Then – a real setback. Looked for a database file for frost.. nothing. Is
this a real secondary parameter? Answer: yes. Further digging revealed that
quick_interp_tdm2.pro has a ‘nostn’ command line option. It’s undocumented,
as usual, but it does seem to avoid the use of the ‘pts_prefix’ option.. so
I set it, and it at least *ran* for the full term (though very slow compared
to primary variables)!

IDL> quick_interp_tdm2,1901,2002,’glo_frs_grids/frs.grid.’,750,gs=0.5,dumpglo=’dumpglo’,nostn=1,synth_prefix=’syngrid_frs/syngrid_frs’

It does produce output grids. Without converting to absolutes with the normals file,
it’s hard to know if they’re realistic.

Then, I moved on to rd0 (wet-day frequency). This time, when I searched for the
normals files required (‘glo.pre.norm’ and ‘glo.rd0.norm’), I could not (as before)
find exact matches. The difference this time is that the program checks that the
normals file supplied is a 0.5-degree grid, so glo25.pre.6190 failed. This implies
to me that my approach to frs (above) was wrong as well. Where is the documenatation
to explain all this?!

Finally – a breakthrough. A search of Mark New’s old directory hierarchy revealed
what look like the required files:

crua6[/cru/mark1/f080] find . -name ‘glo.*.norm*’
./gts/cld/glo/glo.cld.norm.Z
./gts/dtr/glo_old/glo.dtr.norm.Z
./gts/frs/glo.frs.norm.Z
./gts/frs/glo/glo.frs.norm.Z
find: cannot open < ./gts/frs/glo_txt >
./gts/pre/glo_quick_abs/glo.pre.norm.Z
./gts/pre/glo_quick_log/glo.pre.norm.Z
./gts/pre/glo_spl/glo.pre.norm.Z
find: cannot open < ./gts/pre_perc/station_list >
./gts/rad/glo/glo.rad.norm.Z
./gts/rd0/glo/glo.rd0.norm.Z
./gts/rd0/glo_old/glo.rd0.norm.Z
./gts/sunp/glo/glo.sunp.norm
./gts/sunp/means/glo.sunp.norm.Z
./gts/tmp/glo/glo.tmp.norm.Z
./gts/tmp/glo_old/glo.tmp.norm.Z
find: cannot open < ./gts/tmp/station_list >
./gts/vap/glo/glo.vap.norm.Z
./gts/wnd/glo/glo.wnd.norm.Z

A listing of /cru/mark1/f080/gts gives:

drwxr-x—   2 f080     cru         1024 Sep 12  2005 cdrom
drwxr-x—  10 f080     cru        57344 Nov  1  2001 cld
drwxr-xr-x  19 f080     cru        24576 Feb 27  2001 dtr
drwxr-x—   2 f080     cru         8192 Feb 25  1998 elev
drwxr-x—   2 f080     cru         8192 Jun  8  1998 euroclivar
-rw-r—–   1 f080     cru            0 Aug  3  1999 foo
drwxr-x—   6 f080     cru         8192 Aug  6  2002 frs
-rw-r-x—   1 f080     cru          438 May 12  1998 gts.errors
-rw-r—–   1 f080     cru           10 Jul 21  1999 in
drwxr-x—   5 f080     cru         8192 Jan  6  1999 jiang
drwxr-x—   2 f080     cru         8192 Apr  7  1998 landsea
-rw-r—–   1 f080     cru          240 May 12  1998 normal.errors
drwxr-x—   5 f080     cru         8192 Aug  6  2002 plots
drwxr-xr-x  12 f080     cru       106496 May 22  2000 pre
drwxr-x—   9 f080     cru       114688 Aug  6  2002 pre_perc
drwxr-x—   4 f080     cru         1024 Jan  6  1999 rad
drwxr-x–x   6 f080     cru         8192 Nov  1  2001 rd0
-rwxr-xr–   1 f080     cru         1779 Dec  5  1997 readme.txt
drwxr-x—   8 f080     cru         1024 Apr  5  2000 reg_series
drwxr-x—   3 f080     cru         1024 Oct 18  1999 reh
drwxr-x—   2 f080     cru         8192 Jan 19  2000 scengen
drwxr-x—   5 f080     cru        24576 Nov  5  1998 sunp
drwxr-x—   2 f080     cru         1024 Aug  6  2002 test
drwxr-x—   4 f080     cru         1024 Aug  3  1999 tmn
drwxr-xr-x  20 f080     cru       122880 Mar 19  2002 tmp
drwxr-x—   4 f080     cru         1024 Aug  3  1999 tmx
drwxr-x—   6 f080     cru         1024 Jul  8  1998 ukcip
drwxr-x—   5 f080     cru         8192 Nov  5  2001 vap
drwxr-x—   4 f080     cru         1024 Jul  2  1998 wnd

And a listing of, for example, the ‘frs’ directory:

drwxr-x—   2 f080     cru        16384 Jul 18  2002 glo
-rw-r-x—   1 f080     cru       433393 Aug 12  1998 glo.frs.1961.Z
-rw-r-x—   1 f080     cru       321185 Aug 12  1998 glo.frs.ano.1961.Z
-rw-r-x—   1 f080     cru       740431 Aug 12  1998 glo.frs.norm.Z
drwxr-xr-x   2 f080     cru        16384 Jul 27  1999 glo25
drwx——   2 f080     cru         8192 Jul 18  2002 glo_txt
drwxr-xr-x   2 f080     cru         8192 Aug 28  1998 means

So, the following were copied to the working area:

cp /cru/mark1/f080/gts/frs/glo.frs.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/
cp /cru/mark1/f080/gts/cld/glo/glo.cld.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/
cp /cru/mark1/f080/gts/dtr/glo_old/glo.dtr.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/

precip looked like it might be a problem (3 matching files, see above),
but on investigation they were found to be identical! Wonderful.

cp /cru/mark1/f080/gts/pre/glo_quick_log/glo.pre.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/
cp /cru/mark1/f080/gts/rad/glo/glo.rad.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/
cp /cru/mark1/f080/gts/rd0/glo/glo.rd0.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/

There were two ‘sunp’ norm files, but one was 0 bytes in length.

cp /cru/mark1/f080/gts/sunp/means/glo.sunp.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/
cp /cru/mark1/f080/gts/tmp/glo/glo.tmp.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/
cp /cru/mark1/f080/gts/vap/glo/glo.vap.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/
cp /cru/mark1/f080/gts/wnd/glo/glo.wnd.norm.Z /cru/cruts/rerun1/data/cruts/rerun_synth/

The synthetics generation was then re-run for frs (records above have
been modified to reflect this).

Next, rd0. Synthetics generated OK..

IDL> rd0_gts,1901,2002,1961,1990,outprefix=”syngrid_rd0/syngrid_rd0″,pre_prefix=”idlbin_pre/idlbin_pre”

..until the end:

2001
yes
filesize=      248832
gridsize=      2.50000
2002
yes
filesize=      248832
gridsize=      2.50000
% Program caused arithmetic error: Floating divide by 0
% Program caused arithmetic error: Floating illegal operand
IDL>

However, all synthetic grids appear to have been written OK, including 2002.

Grid generation proceeded without error:

IDL> quick_interp_tdm2,1901,2002,’glo_rd0_grids/rd0.grid.’,450,gs=0.5,dumpglo=’dumpglo’,nostn=1,synth_prefix=’syngrid_rd0/syngrid_rd0′

Onto vapour pressure, and the crunch. For here, the recommended program for
synthetic grid production is ‘vap_gts_anom.pro’. In fact, there is no sign
of a ‘vap_gts_tdm.pro’. And, in the program notes, it reads:

; required inputs are:
; ** vapour pressure and temperature normals on 2.5deg grid
;    (these come ready-supplied for a 1961-90 normal period)
; ** temp and dtr monthly anomalies on 2.5deg grid, including normal period

So, we face a situation where some synthetics are built with 0.5-degree
normals, and others are built with 2.5-degree normals. I can find no
documentation of this. There are ‘*_anom.pro’ versions of the frs and rd0
programs, both of which use 2.5-degree normals, however they are dated
Jan 2004, and Tim’s Read_Me (which refers to the ‘*_tdm.pro’ 0.5-degree
versions) is dated end March 2004, so we have to assume these are his
best suggestions.

The 2.5 normals are found here:

> ls -l /cru/cruts/fromdpe1a/data/grid/twohalf/
total 1248
-rwxr-xr-x   1 f098     cru       248832 Jan  9  2004 glo25.frs.6190
-rwxr-xr-x   1 f098     cru       248832 Jan  8  2004 glo25.pre.6190
-rwxr-xr-x   1 f098     cru       248832 Jan  8  2004 glo25.rd0.6190
-rwxr-xr-x   1 f098     cru       248832 Jan  7  2004 glo25.tmp.6190
-rwxr-xr-x   1 f098     cru       248832 Jan  6  2004 glo25.vap.6190
-rwxr-xr-x   1 f098     cru           86 Feb 25  2004 readme.txt

readme.txt:
2.5deg climatology files
Tim Mitchell, 25.2.04

These are in Mark New’s binary format
(end)

Set up the required inputs, and ran it:

IDL> vap_gts_anom,dtr_prefix=’idlbin_dtr/idlbin_dtr’,tmp_prefix=’idlbin_tmp/idlbin_tmp’,1901,2002,outprefix=’syngrid_vap/syngrid_vap’,dumpbin=1

Producing screen output like this:
1991 vap (x,s2,<<,>>):  0.000493031  0.000742087   -0.0595093      1.86497

And output files like this:
-rw——-   1 f098     cru       248832 Sep 22 10:56 syngrid_vap/syngrid_vap1991

On, without further ado, to the gridding. For this secondary, there *are* database
files, so the ‘nostn’ option is not used, and anomdtb.f is wheeled out again
to construct .txt files for the run:

crua6[/cru/cruts/rerun1/data/cruts/rerun_vap] ./anomdtb

> ***** AnomDTB: converts .dtb to anom .txt for gridding *****

> Enter the suffix of the variable required:
.vap
> Select the .cts or .dtb file to load:
vap.0311181410.dtb
> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal:           23
> Specify the no. of stdevs at which to reject data:
3
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
8
> Select the generic .txt file to save (yy.mm=auto):
vap.txt
> Select the first,last years AD to save:
1901,2002
> Operating…
Values loaded: 1239868112;  No. Stations:       7691
> NORMALS            MEAN percent      STDEV percent
>         .dtb     887754    46.9
>         .cts      34175     1.8     921929    48.7
> PROCESS        DECISION percent %of-chk
> no lat/lon          105     0.0     0.0
> no normal        969384    51.3    51.3
> out-of-range       2661     0.1     0.3
> duplicated        25557     1.4     2.8
> accepted         893711    47.3
> Dumping years 1901-2002 to .txt files…

crua6[/cru/cruts/rerun1/data/cruts/rerun_vap]

Moved straight onto the gridding, which, of course, failed:

IDL> quick_interp_tdm2,1901,2002,’glo_vap_grids/vap.grid.’,1000,gs=0.5,dumpglo=’dumpglo’,synth_prefix=’syngrid_vap/syngrid_vap’,pts_prefix=’../rerun_vap/vap_txt_4idl/vap.’
Defaults set
1901
1902
% Array dimensions must be greater than 0.
% Execution halted at:  QUICK_INTERP_TDM2   88 /cru/cruts/fromdpe1a/code/idl/pro/quick_interp_tdm2.pro
%                       QUICK_INTERP_TDM2   88 /cru/cruts/fromdpe1a/code/idl/pro/quick_interp_tdm2.pro
%                       $MAIN$
IDL>

This turns out to be because of the sparcity of VAP station measurements in the
early years. The program cannot handle anom files of 0 length, even though it
checks the length! Bizarre. The culprit is ‘vap.1902.03.txt’, the only month to
have no station reading at all (45 months have only 1 however). I decided to mod
the program to use the ‘nostn’ option if the length is 0. Hope that’s right – the
synthetics are read in first and the station data is added to that grid so this
should be OK.. and it looks OK:

IDL> quick_interp_tdm2,1901,2002,’vap.grid.’,1000,gs=0.5,dumpglo=’dumpglo’,synth_prefix=’syngrid_vap/syngrid_vap’,pts_prefix=’../rerun_vap/vap_txt_4idl/vap.’
% Compiled module: GLIMIT.
Defaults set
1901
1902
no stations found in: ../rerun_vap/vap_txt_4idl/vap.1902.03.txt
1903

(..etc..)

Pause for reflection: the list of CRU_TS_2.1 parameters is as follows:
pre  primary, done
tmp  primary, done
tmx  derived, not done
tmn  derived, not done
dtr  primary, done
vap  secondary, done
cld/spc  secondary, not done
wet  secondary, done
frs  secondary, done

Now the interesting thing is that the ‘Read Me’ file for gridding only
mentions frs, rd0 (which I’m assuming == wet) and vap. How, then, do I
produce cld/spc and the two derived vars??

Well, there’s a /cru/cruts/fromdpe1a/code/idl/pro/cal_cld_gts_tdm.pro,
also:
/cru/cruts/fromdpe1a/code/idl/pro/cloudcorrspc.pro
/cru/cruts/fromdpe1a/code/idl/pro/cloudcorrspcann.pro
/cru/cruts/fromdpe1a/code/idl/pro/cloudcorrspcann9196.pro

Loading just the first program opens up another huge can o’ worms. The
program description reads:

pro cal_cld_gts_tdm,dtr_prefix,outprefix,year1,year2,info=info
; calculates cld anomalies using relationship with dtr anomalies
; reads coefficients from predefined files (*1000)
; reads DTR data from binary output files from quick_interp_tdm2.pro (binfac=1000)
; creates cld anomaly grids at dtr grid resolution
; output can then be used as dummy input to splining program that also
;  includes real cloud anomaly data

So, to me this identifies it as the program we cannot use any more because
the coefficients were lost. As it says in the gridding read_me:

Bear in mind that there is no working synthetic method for cloud, because Mark New
lost the coefficients file and never found it again (despite searching on tape
archives at UEA) and never recreated it. This hasn’t mattered too much, because
the synthetic cloud grids had not been discarded for 1901-95, and after 1995
sunshine data is used instead of cloud data anyway.

But, (Lord how many times have I used ‘however’ or ‘but’ in this file?!!), when
you look in the program you find that the coefficient files are called:

rdbin,a,’/cru/tyn1/f709762/cru_ts_2.0/_constants/_7190/a.25.7190′,gridsize=2.5
rdbin,b,’/cru/tyn1/f709762/cru_ts_2.0/_constants/_7190/b.25.7190′,gridsize=2.5

And, if you do a search over the filesystems, you get:

crua6[/cru/cruts] ls fromdpe1a/data/grid/cru_ts_2.0/_makecld/_constants/_7190/spc2cld/_ann/
a.25.01.7190.glo.Z  a.25.05.7190.glo.Z  a.25.09.7190.glo.Z  a.25.7190.eps.Z     b.25.04.7190.glo.Z  b.25.08.7190.glo.Z  b.25.12.7190.glo.Z
a.25.02.7190.glo.Z  a.25.06.7190.glo.Z  a.25.10.7190.glo.Z  b.25.01.7190.glo.Z  b.25.05.7190.glo.Z  b.25.09.7190.glo.Z  b.25.7190.eps.Z
a.25.03.7190.glo.Z  a.25.07.7190.glo.Z  a.25.11.7190.glo.Z  b.25.02.7190.glo.Z  b.25.06.7190.glo.Z  b.25.10.7190.glo.Z
a.25.04.7190.glo.Z  a.25.08.7190.glo.Z  a.25.12.7190.glo.Z  b.25.03.7190.glo.Z  b.25.07.7190.glo.Z  b.25.11.7190.glo.Z
crua6[/cru/cruts] ls fromdpe1a/data/grid/cru_ts_2.0/_makecld/_constants/_7190/spc2cld/_mon/
a.25.01.7190.glo.Z  a.25.05.7190.glo.Z  a.25.09.7190.glo.Z  a.25.7190.eps.Z     b.25.04.7190.glo.Z  b.25.08.7190.glo.Z  b.25.12.7190.glo.Z
a.25.02.7190.glo.Z  a.25.06.7190.glo.Z  a.25.10.7190.glo.Z  b.25.01.7190.glo.Z  b.25.05.7190.glo.Z  b.25.09.7190.glo.Z  b.25.7190.eps.Z
a.25.03.7190.glo.Z  a.25.07.7190.glo.Z  a.25.11.7190.glo.Z  b.25.02.7190.glo.Z  b.25.06.7190.glo.Z  b.25.10.7190.glo.Z
a.25.04.7190.glo.Z  a.25.08.7190.glo.Z  a.25.12.7190.glo.Z  b.25.03.7190.glo.Z  b.25.07.7190.glo.Z  b.25.11.7190.glo.Z

So.. we don’t have the coefficients files (just .eps plots of something). But
what are all those monthly files? DON’T KNOW, UNDOCUMENTED. Wherever I look,
there are data files, no info about what they are other than their names. And
that’s useless.. take the above example, the filenames in the _mon and _ann
directories are identical, but the contents are not. And the only difference
is that one directory is apparently ‘monthly’ and the other ‘annual’ – yet
both contain monthly files.

Lots of further investigation.. probably the most useful program found is
cal_cld_gts_tdm.pro, the description of which reads as follows:

pro cal_cld_gts_tdm,dtr_prefix,outprefix,year1,year2,info=info
; calculates cld anomalies using relationship with dtr anomalies
; reads coefficients from predefined files (*1000)
; reads DTR data from binary output files from quick_interp_tdm2.pro (binfac=1000)
; creates cld anomaly grids at dtr grid resolution
; output can then be used as dummy input to splining program that also
;  includes real cloud anomaly data

It also tellingly contains:
; unnecessary because 61-90 normals have already been created
; print, “@@@@@ looking for 2.5 deg DTR 1961-90 @@@@@”
; mean_gts,’~/m1/gts/dtr/glo25/glo25.dtr.’,nor1,nor2
; mean_gts_tdm,’/cru/mark1/f080/gts/dtr/glo25/glo25.dtr.’,nor1,nor2
;print, “@@@@@ looking for 2.5 deg DTR normal @@@@@”
;; rdbin,dtrnor,’~/m1/gts/dtr/glo25/glo25.dtr.’+string(nor1-1900,nor2-1900,form=’(2i2.2)’)
;dtrnorstr=’/cru/mark1/f080/gts/dtr/glo25/glo25.dtr.’+string(nor1-1900,nor2-1900,form=’(2i2.2)’)
;rdbin,dtrnor,dtrnorstr

The above has seemingly been replaced with:
rdbin,a,’/cru/tyn1/f709762/cru_ts_2.0/_constants/_7190/a.25.7190′,gridsize=2.5
rdbin,b,’/cru/tyn1/f709762/cru_ts_2.0/_constants/_7190/b.25.7190′,gridsize=2.5

These are the files that have been lost according to the gridding read_me
(see above).

The conclusion of a lot of investigation is that the synthetic cloud grids
for 1901-1995 have now been discarded. This means that the cloud data prior
to 1996 are static.

Edit: have just located a ‘cld’ directory in Mark New’s disk, containing
over 2000 files. Most however are binary and undocumented..

Eventually find fortran (f77) programs to convert sun to cloud:

sh2cld_tdm.for     converts sun hours monthly time series to cloud percent
sp2cld_m.for       converts sun percent monthly time series to cloud oktas

There are also programs to convert sun parameters:

sh2sp_m.for        sun hours to sun percent
sh2sp_normal.for   sun hours monthly .nrm to sunshine percent
sh2sp_tdm.for      sun hours monthly time series to sunshine percent

AGREED APPROACH for cloud (5 Oct 06).

For 1901 to 1995 – stay with published data. No clear way to replicate
process as undocumented.

For 1996 to 2002:
1. convert sun database to pseudo-cloud using the f77 programs;
2. anomalise wrt 96-00 with anomdtb.f;
3. grid using quick_interp_tdm.pro (which will use 6190 norms);
4. calculate (mean9600 – mean6190) for monthly grids, using the
published cru_ts_2.0 cloud data;
5. add to gridded data from step 3.

This should approximate the correction needed.

On we go.. firstly, examined the spc database.. seems to be in % x10.
Looked at published data.. cloud is in % x10, too.
First problem: there is no program to convert sun percentage to
cloud percentage. I can do sun percentage to cloud oktas or sun hours
to cloud percentage! So what the hell did Tim do?!! As I keep asking.

Examined the program that converts sun % to cloud oktas. It is
complicated! Have inserted a line to multiple the result by 12.5 (the
result is in oktas*10 and ranges from 0 to 80, so the new result will
range from 0 to 1000).

Next problem – which database to use? The one with the normals included
is not appropriate (the conversion progs do not look for that line so
obviously are not intended to be used on +norm databases). The non
normals databases are either Jan 03 (in the ‘_ateam’ directory) or
Dec 03 (in the regular database directory). The newer database is
smaller! So more weeding than planting in 2003. Unfortunately both
databases contain the 6190 normals line, just unpopulated. So I will
go with the ‘spc.0312221624.dtb’ database, and modify the already-
modified conversion program to process the 6190 line.

Then – comparing the two candidate spc databases:

spc.0312221624.dtb
spc.94-00.0312221624.dtb

I find that they are broadly similar, except the normals lines (which
both start with ’6190′) are very different. I was expecting that maybe
the latter contained 94-00 normals, what I wasn’t expecting was that
thet are in % x10 not %! Unbelievable – even here the conventions have
not been followed. It’s botch after botch after botch. Modified the
conversion program to process either kind of normals line.

Decided to go with the ‘spc.94-00.0312221624.dtb’ database, as it
hopefully has some of the 94-00 normals in. I just wish I knew more.

Conversion was hampered by the discovery that some stations have a mix
of % and % x10 values! So more mods to Hsp2cldp_m.for. Then conversion,
producing cldfromspc.94000312221624.dtb. Copied the .dts file across
as is, not sure what it does unfortunately (or can’t remember!).

After conversion, ran anomdtb:

crua6[/cru/cruts/rerun1/data/cruts/rerun_cld] ./anomdtb

> ***** AnomDTB: converts .dtb to anom .txt for gridding *****

> Enter the suffix of the variable required:
.cld
> Select the .cts or .dtb file to load:
cldfromspc.94000312221624.dtb

> Specify the start,end of the normals period:
1994,2000
> Specify the missing percentage permitted:
25
> Data required for a normal:            6
> Specify the no. of stdevs at which to reject data:
3
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
8
> Select the generic .txt file to save (yy.mm=auto):
cldfromspc.txt
> Select the first,last years AD to save:
1994,2002
> Operating…

>         .cts      96309    19.6     280712    57.2
> PROCESS        DECISION percent %of-chk
> no lat/lon            0     0.0     0.0
> no normal        209619    42.8    42.8
> out-of-range     177298    36.2    63.2
> duplicated          154     0.0     0.1
> accepted         103260    21.1
> Dumping years 1994-2002 to .txt files…

crua6[/cru/cruts/rerun1/data/cruts/rerun_cld]

Then ran quick_interp_tdm2:

IDL> .compile /cru/cruts/fromdpe1a/code/idl/pro/quick_interp_tdm2.pro
% Compiled module: QUICK_INTERP_TDM2.
IDL> .compile /cru/cruts/fromdpe1a/code/idl/pro/rdbin.pro
% Compiled module: RDBIN.
IDL> quick_interp_tdm2,1994,2002,’glo_from_idl/cld.’,600,gs=0.5,pts_prefix=’txt_4_idl/cldfromspc.’,dumpglo=’dumpglo’
Defaults set
1994
% Compiled module: MAP_SET.
% Compiled module: CROSSP.
% Compiled module: STRIP.
% Compiled module: SAVEGLO.
% Compiled module: SELECTMODEL.
1995
1996
1997
1998
1999
2000
2001
2002
IDL>

Tadaa: .glo files produced for 1994 to 2002.

Then retracked to produce regular 0.5-degree grids for dtr (having only
produced 2.5-degree binaries for synthetics earlier):

IDL> quick_interp_tdm2,1901,2002,’glo_dtr_grids/dtr.’,750,gs=0.5,pts_prefix=’dtr_txt_4idl/dtr.’,dumpglo=’dumpglo’

That went off without any apparent hitches, so I wrote a fortran prog,
‘maxminmaker.for’, to produce tmn and tmx grids from tmp and dtr. It ran.

However – yup, more problems – when I checked the inputs and outputs I found
that in numerous instances there was a value for mean temperature in the grid,
with no corresponding dtr value. This led to tmn = tmx = tmp for thos cells.
NOT GOOD.

Actually, what was NOT GOOD was my grasp of context. Oh curse this poor
memory! For the IDL gridding program produces ANOMALIES not ACTUALS.

Wrote a program, ‘glo2abs.for’ does a file-for-file conversion of .glo
files (as produced by quick_interp_tdm2.pro) to absolute-value files (also
gridded and with headers). After some experiments realised that the .glo
anomalies are in degrees, but the normals are in 10ths of a degree :-)

Produced absolutes for TMP. Then wrote a program, ‘cmpcruts.for’, to
compare the absolute grids with the published cru_ts_2.10 data. The
comparison simply measures absolute differences between old and new, and
categorises as either (1) identical, (2) within 0.5 degs, (3) within 1 deg,
(4) over 1 deg apart. Results for temperature (TMP):

Identical   <0.5deg  0.5-1deg     >1deg
30096176  48594200   2755281   1076423

And for temperature range (DTR):

45361058  31267870   3893754   1999398

These are very promising. The vast majority in both cases are within 0.5
degrees of the published data. However, there are still plenty of values
more than a degree out.

The total number of comparisons is 67420*102*12 = 82,522,080

It seems prudent to add percentage calculations..

TMP:
Final Diff Totals:   30096176  48594200   2755281   1076423
Percentages:            36.47     58.89      3.34      1.30

TMP has a comforting 95%+ within half a degree, though one still wonders
why it isn’t 100% spot on..

DTR:
Final Diff Totals:   45361058  31267870   3893754   1999398
Percentages:            54.97     37.89      4.72      2.42

DTR fares perhaps even better, over half are spot-on, though about
7.5% are outside a half.

However, it’s not such good news for precip (PRE):
Final Diff Totals:   11492331  21163924   9264554  40601271
Percentages:            13.93     25.65     11.23     49.20

21. A little experimentation goes a short way..

I tried using the ‘stn’ option of anomdtb.for. Not completely sure what
it’s supposed to do, but no matter as it didn’t work:

crua6[/cru/cruts/rerun1/data/cruts/rerun_pre] ./anomdtb

> ***** AnomDTB: converts .dtb to anom .txt for gridding *****

> Enter the suffix of the variable required:
.pre
> Will calculate percentage anomalies.
> Select the .cts or .dtb file to load:
pre.0312031600H.dtb

> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal:           23
> Specify the no. of stdevs at which to reject data:
5
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
4
> Check for duplicate stns after anomalising? (0=no,>0=km range)
8
> Select the .stn file to save:
pre.fromanomdtb.stn
> Enter the correlation decay distance:
450
> Submit a grim that contains the appropriate grid.
> Enter the grim filepath:
cru_ts_2_10.1961-1970.pre

> Grid dimensions and domain size:      720     360   67420
> Select the first,last years AD to save:
1901,2002
> Operating…

> NORMALS            MEAN percent      STDEV percent
>         .dtb    2635548    29.6
>         .cts    4711327    52.8    7325296    82.2
> PROCESS        DECISION percent %of-chk
> no lat/lon        20761     0.2     0.2
> no normal       1585342    17.8    17.8
> out-of-range      20249     0.2     0.3
> duplicated       317035     3.6     4.3
> accepted        6972308    78.2
> Calculating station coverages…
> ##### WithinRange: Alloc: DataB #####
forrtl: severe (174): SIGSEGV, segmentation fault occurred
crua6[/cru/cruts/rerun1/data/cruts/rerun_pre]

..knowing how long it takes to debug this suite – the experiment
endeth here. The option (like all the anomdtb options) is totally
undocumented so we’ll never know what we lost.

22. Right, time to stop pussyfooting around the niceties of Tim’s labyrinthine software
suites – let’s have a go at producing CRU TS 3.0! since failing to do that will be the
definitive failure of the entire project..

Firstly, we need to identify the updated data files. I acquired the following:

iran_asean_GHCN_WWR-CD_save50_CLIMAT_MCDW_updat_merged renamed to pre.0611301502.dat
newbigfile0606.dat renamed to tmp.0611301507.dat
glseries_tmn_final_merged renamed to tmn.0611301516.dat
glseries_tmx_final_merged renamed to tmx.0611301516.dat
anders9106m.dat renamed to tmp9106.0612011708.dat

..and established a directory hierarchy under /cru/cruts/version_3_0

Next step, convert the various db formats to the CRU TS one. Made a visual
comparison which indicated that it would work. Unfortunately it will mean
losing the ‘extra’ fields that have been tacked onto the headers willy-nilly
as they are undocumented. Furthermore the two extra fields in the CRU TS
format are undocumented, as far as I can see! So I wrote headergetter.for
to produce stats on the CRU TS headers. It looks for violations of the
mandatory blank spaces, and for variations in the two extra fields. Sample
output for temperature and precip:

Header report for tmp.0311051552.dtb
Produced by headgetter.for
Total Records Read:    12155

BLANKS (expected at 8,14,21,26,47,61,66,71,78)
position   missed
8        0
14        0
21        0
26        0
47        0
61        0
66        0
71        0
78        2

EXTRA FIELD 1 (72:77)
type detected   counted
Missing Value Code       12155
Possible F.P. Value          0
Possible Exp. Value          0
Integer Value Found          0
Real Value Found             0
Unidentifiable               0

EXTRA FIELD 2 (79:86)
type detected   counted
Missing Value Code         709
Possible F.P. Value        697
Possible Exp. Value          0
Integer Value Found      10749
Real Value Found             0
Unidentifiable               0

ENDS

Header report for pre.0312031600.dtb
Produced by headgetter.for
Total Records Read:    12732

BLANKS (expected at 8,14,21,26,47,61,66,71,78)
position   missed
8        0
14        0
21        0
26        0
47        0
61        0
66        0
71        0
78      154

EXTRA FIELD 1 (72:77)
type detected   counted
Missing Value Code       12732
Possible F.P. Value          0
Possible Exp. Value          0
Integer Value Found          0
Real Value Found             0
Unidentifiable               0

EXTRA FIELD 2 (79:86)
type detected   counted
Missing Value Code        3635
Possible F.P. Value        437
Possible Exp. Value          0
Integer Value Found       8660
Real Value Found             0
Unidentifiable               0

ENDS

As can be seen, there are no unidentifiable headers – hurrah! – but quite
a few violations of the boundary between the two extra fields, particularly
in the precip database. On examination, the culprits are all African
stations. The two tmp exceptions:

641080  -330   1735  324 BANDUNDU             DEM REP CONGO 1961 1990   -99908
642200  -436   1525  445 KINSHASA/BINZA       DEM REP CONGO 1960 1990   -99920

And samples of the pre exceptions:

-656002   698   -958  150 SUAKOKO              LIBERIA       1951 1970   -999123008050
-655327   727   -723  350 KOUIBLY              IVORY COAST   1977 1990   -999109001290
-655001  1320   -235  332 GOURCY               BURKINA FASO  1956 1980   -999120001240
-618504   788  -1118 -999 KENEMA/FARM          SIERRA LEONE  1951 1972   -999139003500
-612067  1407   -307  253 KORO                 MALI          1958 1989   -999127002650

So the first extra field is apparently unused! It would be a handy place for
the 6-character data-code and valid-start-year from the temperature db.

On to a more detailed look at the cru precip format; not sure whether there
are two extra fields or one, and what the sizes are. A quick hack through
the headers is not pleasing. There appears to be only one field, but it can
have up to nine (9) digits in it, and at least three missing value codes:

6785300-1863  2700 1080HWANGE/N.P.A.       ZIMBABWE     19621996              40
8100100  680 -5820    2GEORGETOWN          GUYANA       18462006             -99
6274000 1420  2460 1160KUTUM               SUDAN        19291990             194
6109200-9999-99999 -999UNKNOWN             NIGER        19891989            -999
6542000  945    -2  197YENDI               GHANA        19071997            8010
6544200  672  -160  293KUMASI              GHANA        19062006           17009
6122306 1670  -299  267KABARA              MALI         19231989          270022
6193128   32   672 -999SAO TOME            SAO TOME     19391973         8888888
6266000 1850  3180  249KARIMA              SUDAN        19172006        18315801
6109905 1208  -367  315OUARKOYE            BURKINA FASO 19601980       120002470

*unimpressed*

This is irritating as it means precip has only 9 fields and I can’t do a
generic mapping from any cru format to cru ts.

As a glutton for punishment I then looked at the tmin/tmax db format. Looks
like two extra fields (i6,i7) with mvcs of 999999 and 8888888 respectively.
However *sigh* inspection reveals the following two possibilities:

851300 3775 -2568   17PONTA DELGADA       PORTUGAL     18652004 9999998888888
851500 3697 -2517  100SANTA MARIA A       ACORES       19542006 -77777  8888888

Isn’t that marvellous? These can’t even be read with a consistent header format!

So, the approach will be to read exactly ONE extra field. For cru tmp that
will be the i2+i4 anders/best-start codes as one. For cru pre it will be
the amazing multipurpose, multilength field. For cru tmnx it will be the
first field, which is at least stable at i6.

Conversions/corrections performed:

Temperature

Converted tmp.0611301507.dat to tmp.0612081033.dat

Found one corrupted station name:
BEFORE
911900   209   1564   20  HI*KAHULUI WSO (PUU NENE)         1954 1990 101954  -999.00
AFTER
911900   209   1564   20 KAHULUI ARPT/MAUI    HAWAII        1954 1990 101954  -999.00

Precipitation

Converted pre.0611301502.dat to pre.0612081045.dat

Found one corrupted station name:
BEFORE
4125600 2358  5828   15SEEB AP./=MUSCAT*0.9OMAN         18932006          301965
AFTER
4125600  2358   5828   15 SEEB INTL/MUSCAT     OMAN          1893 2006   -999  -999.00

(DL later reported that the name wasintended to signify that the data had been
corrected by a factor of 0.9 when data from another station was incorporated
to extend the series – this was Mike Hulme’s work)

Write db2dtb.for, which converts any of the CRU db formats to the CRU TS format.

Started work on mergedb.for, which should merge a primary database with and incoming
database of the same (CRU TS) format. Quite complicated. No operator interventions,
just a log file of failed attempts – but hooks left in for op sections in case this
turns out to be the main programmatic deliverable to BADC!

23. Interrupted work on mergedb.for in order to trial a precip gridding for 3.0. This
required another new proglet, addnormline.for, which adds a normals line below each
header. It fills in the normals values if the condisions are met (75% of values, or
23 for the 30 year period).

Initial results promising.. ran it for precip, it added normals lines OK, a total of
15942 with 6003 missing value lines. No errors, and no ops interventions because the
file didn’t have normals lines before!

‘Final’ precip file: pre.0612151458.dtb

Tried running anomdtb.f90.. failed because it couldn’t find the .dts file! No matter
that it doesn’t need it – argh!

Examined existing .dts files.. not sure what they’re for. Headers are identical to
the .dtb file, all missing values are retained, all other values are replaced with
one of several code numbers, no idea what they mean.

Wrote ‘falsedts.for’ to produce dummy .dts files with all zeros in place of real
data values. Produced pre.0612151458.dts.

Added normals line, producing: pre.0612181221.dtb
Re-produced matching pre.0612181221.dts file.

Tried running anomdtb.f90 again. This time it crashed at record #1096. Wrote a proglet
‘findstn.for’ to find the n-th station in a dtb file, pulled out 1096:

0   486  10080 1036 BUKIT LARUT          MALAYSIA      1951 1988   -999  -999.00
6190 2094 2015 2874 3800 4619 3032 5604 3718 4626 5820 5035 3049
1951 3330 2530 2790 5660 4420 4030 1700 2640 8000 5950 6250 2020

(snipped normal years)

1979  110 1920 1150 5490 3140 308067100 2500 4860 4280 4960 1600

Uh-oh! That’s 6.7m of rain in July 1979? Looks like a factor-of-10 problem. Confirmed
with DL and changed to 6710.

Next run, crashed at #4391, CHERRAPUNJI, the wettest place in the world. So here, the
high values are realistic. However I did notice that the missing value code was -10
instead of -9999! So modified db2dtb.for to fix that and re-produced the precip database
as pre.0612181214.dat. This then had to have normals recalculated for it (after fixing
#1096).

Finally got it through anomdtb.for AND quick_interp_tdm2 – without crashing! IDL was even
on the ball with the missing months at the end of 2006:

IDL> quick_interp_tdm2,1901,2006,’preglo/pregrid.’,450,gs=0.5,dumpglo=’dumpglo’,pts_prefix=’preanoms/pre.’
% Compiled module: QUICK_INTERP_TDM2.
% Compiled module: GLIMIT.
Defaults set
1901
% Compiled module: MAP_SET.
% Compiled module: CROSSP.
% Compiled module: STRIP.
% Compiled module: SAVEGLO.
% Compiled module: SELECTMODEL.
1902
1903
(etc)
2005
2006
no stations found in: preanoms/pre.2006.09.txt
no stations found in: preanoms/pre.2006.10.txt
no stations found in: preanoms/pre.2006.11.txt
no stations found in: preanoms/pre.2006.12.txt

All good. Wrote mergegrids.for to create the more-familiar decadal and full-series
files from the monthly *.glo.abs ones.

Then.. like an idiot.. I had to test the data! Duh.

Firstly, wrote mmeangrid.for and cmpmgrids.m to get a visual comparison of old and
new precip grids (old being CRU TS 2.10). This showed variations in ‘expected’ areas
where changes had been made, it the Southern tip of Greenland.

Next, Phil requested some statistical plots of percentage change in annual totals,
and long-term trends. Wrote ‘anntots.for’ to convert monthly gridded files into
yearly totals files. Then tried to write precipchecker.m to do the rest in Matlab..
it wasn’t having it, OUT OF MEMORY! Bah. So wrote ‘prestats.for’ to calculate the
final stats, for printing with an emasculated precipchecker.m. BUT.. it wouldn’t
work, and on investigating I found 200-odd stations with zero precipitation for
the entire 1901-2006 period! Modified anntots.for to dump a single grid with those
cells that remained at zero marked, then plotted.

Zero cells in North Africa and the Western coast of South America. None in the
CRU TS 2.10 precip grids :-(

Next step, produce a list of cell centres of the offending cells. wrote a quick
proglet, ‘idzerocells.for’. Then ‘getcellstations.for’, which, given a CRUTS DB
file and a list of lat/lon values, extracts all stations lying inside the cells
listed.

Uh-oh. Looked in the new pre db and found 15 stations for 257 zero cells! They are:

6061170  2810    670  381 FT FLATTERS          ALGERIA       1925 1965   -999  -999.00
6064000  2650    840  559 FORT POLIGNAC        ALGERIA       1925 2006   -999  -999.00
6262000  2080   3260  470 STATION NO. 6        SUDAN         1950 1988   -999  -999.00
8450100  -810  -7900   26 TRUJILLO             PERU          1961 2006   -999  -999.00
8453100  -920  -7850   10 CHIMBOTE             PERU          1961 2006   -999  -999.00
8462800 -1200  -7710   13 LIMA-CALLAO/INTL.AP. PERU          1961 2006   -999  -999.00
8463100 -1210  -7700  137 LIMATAMBO/C.DE MARTE PERU          1927 1980   -999  -999.00
8469100 -1380  -7630    6 PISCO                PERU          1942 2006   -999  -999.00
8540600 -1850  -7030   29 ARICA/CHACALLUTA     CHILE         1903 2006   -999  -999.00
8541700 -2020  -7020    6 IQUIQUE/CAVANCHA     CHILE         1886 1986   -999  -999.00
8541800 -2053  -7018   52 IQUIQUE DIEGO ARACEN CHILE         1989 2006   -999  -999.00
8700494  -707  -7957  150 CAYALTI              PERU          1934 1959   -999  -999.00
8700562 -1203  -7703  137 LIMA                 PERU          1929 1963   -999  -999.00
8700581 -1207  -7717   13 LA PUNTA (NA         PERU          1939 1963   -999  -999.00
9932040  2810    670  381 FT FLATTER           ALGERIA       1925 1965   -999  -999.00

Looked for the same zero cell stations in the old pre db (pre.0312031600.dtb) and only
found 10:

-854031 -2021  -7015    5 IQUIQUE/CAVANCHA     CHILE         1899 1986   -999     0.00
-843002 -1210  -7700  135 LIMATAMBO            PERU          1927 1980   -999
-603550  2810    670  381 FT FLATTER           ALGERIA       1925 1965   -999  -999.00
606400  2650    841  558 ILLIZI/ILLIRANE      ALGERIA       1925 2002   -999     -999
626200  2075   3255  468 STATION NO. 6        SUDAN         1950 1988   -999  -999.00
845010  -810  -7903   30 TRUJILLO/MARTINEZ    PERU          1961 2002   -999     -999
845310  -916  -7851   11 CHIMBOTE/TENIENTE    PERU          1961 2001   -999
846280 -1200  -7711   13 LIMA/JORGE CHAVEZ    PERU          1961 2002   -999     -999
846910 -1375  -7628    7 PISCO (CIV/MIL)      PERU          1942 2002   -999     -999
854180 -2053  -7018   52 IQUIQUE/DIEGO ARAC   CHILE         1989 2002   -999  -999.00

So why does the old db result in no ‘zero’ cells, and the new db give us over 250? I
wondered if normals might be the answer, but none of the 10 stations from the old db
have in-db normals, wheras three of the new db have:

8453100  -920  -7850   10 CHIMBOTE             PERU          1961 2006   -999  -999.00
6190   19   59   36   18    5    0    3    0    0    1   10    5
8469100 -1380  -7630    6 PISCO                PERU          1942 2006   -999  -999.00
6190    3    0    3    0    0    1    1    3    1    4    0    0
8540600 -1850  -7030   29 ARICA/CHACALLUTA     CHILE         1903 2006   -999  -999.00
6190    1    3    0    0    0    2    2    2    2    0    0    0

So these alone ought to guarantee three of the cells being nonzero – they should have
the bloody normals in! So the next check has to be the climatology, that which provides
the cell-by-cell normals..

A check of the gridded climatology revealed that all 257 ‘zero cells’ have their
climatologies set to zero, too. This was partially checked in the GRIM-format climatology
just in case!

Next, a focus: on CHIMBOTE (see header line above). This has real data (not just zeros).
It is in cell (162,203), or (-9.25,-78.75) [lat, lon in both cases]. So we extract the
full timeseries for that cell from the published 2.10 (1901-2002) GRIM file:

Grid-ref= 203, 162
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    0    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    7    0    3
2    0    0    0    2    0    0    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    0    0    0    5    0    3
2    0    0    0    2    0    2    0    0    5    0    3
2    0    0    0    2    0    0    0    0    5    0    3
0    0    0    0    2    0    0    0    0    0    0    0
0    0    0    0    2    0    0    0    0    0    0    2
0    0    0    0    2    5    6    0    0    0    0    3
2    3    0    0    0    0   17    0    0    4    0    3
2    0    0    0    3    0    2    0    0    2    0    3
0    0    0    0    0    0   14    0    0    9    0    0
0    0    0    0    0    0    0    0    0    2    0    2
0    0    0    0    0    0   12    0    0    0    0    5
0    0    0    0    0    0    0    0    0    3    0    2
0    0    0    0    0    0   10    0    0    0    0    2
0    0    0    0    3    0   11    0    0    2    0    3
0    0    0    0    2    0    0    0    0    0    0    2
0    0    0    0    0    0    0    0    0    4    0    0
3    0    0    0    0    0   15    0    0    0    0    2
0    0    0    0    0    0    0    0    0    4    3    2
5    0    0    0    0    0    0    0    0   12    0    3
0    0    2    2    4    2    0    0    2    3    0    3
0    0    0    0    3    0    0    2    0    2    2    3
0    0    0    0    0    0    0    3    0    0    0    0
0    0    0    0    0    0    0    0    0    0    0    0
0    0    0    7    0    3    0    0    0    0    0    0
0    0    2    3    0    0    0    4    0    0   12    0
0    0    9    0    0    0    0    0    0    0    0    0
0    0    0    0    0    0    0    0    0    0    0    0
0    6    2    0    0    0    6    0    0    0    0    0
0    0    0    0    0    0    0    2    2    0    0    0
0    0    0    0    0    3    0    0    0    0    0    0
0    0    0    0    2    2    0    0    0    0    0    0
0    2    0    0    0    0    0    2    0    0    0    0
0    0    0    7    0    0    0    2    0    0    0    3
2    0    7    0    0    2    0    0    2    0    0    0
0    0    0    7    0    0    2    2    2    0    0    0
8    0    2    0    0    0    0    2    0    0    0    0
0    7    0    0    0    0    0    0    0    0    0    0
0    0    2    0    0    0    0    0    0    0    0    0
0    0    0    0    0    0    0    0    0    2    0    0
0    0    0    0    2    0    0    0    0    0    0    0
2    0    0    0    0    2    0    0    0    0   10    0
3    0    0    0    0    0    9    0    0    0    0    3
0    0    0    0    0    0    0    0    0    3    0    5
4    0    0    2   10    2    0    0    0    0    0    4
0    0    0    0    0    0    0    0    2    5    0    0
0    0    0    0    0    0    9    0    0    0    0    0
0    0    0    0    0    0    0    3    0    0    0    0
0    0    0    0    0    0    0    0    0    5    0    0
3    0    0    0    0    0    0    0    0    0    0    0
2    0    0    0    0    0    0    0    3    0    0    0
0    0    0    0    8    2    0    0    0    0    0    3
0    0    2    0    2    0    0    0    0    0    2    3
0    0    0    0    2    0    2    0    0    5    0    2
0    0    0    0    2    0    2    0    0    0    0    0
0    0    0    0    0    0    0    0    0    5    0    0
0    0    0    0    2    0    0    2    0    0    0    0
2    0    2    0    0    0    0    0    0    5    0    0
0    0    0    0   11    0    2    0    0    4    0    3
2    3    2    0   13    0    0    0    0    0    0    0
2    6    0    3    0    0    0    0    2    3    0    7
2    0    0    0    2    0    0    0    0    0    0    3
0    0    0    0    0    0    2    0    0    0    0    2
0    0    0    0    0    0    0    0    0    0    0    3

..yet in the 3.00 version, it’s all zeros!

Only one thing for it.. examine the attempt at regenerating 2.10.
Unfortunately – well, interestingly then – this gave the same
zero cells as the 3.00 generation! So it’s something to do with
the process, not the database (or the climatology, assuming that
has remained constant, which I gather it has).

Update: aha! Phil pointed out that for precip the climatology
is used as a MULTIPLIER. So if the clim hasn’t changed, the
cells should always have been zero regardless of actual data.

As I should have remembered:

crua6[/cru/cruts/version_3_0/primaries/precip] ./glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.pre
Enter a name for the gridded climatology file: clim.6190.lan.pre.grid
Enter the path and stem of the .glo files: preglo/pregrid.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: pregrid/
Now, CONCENTRATE. Addition or Percentage (A/P)? P
Right, erm.. off I jolly well go!
pregrid.01.1901.glo
pregrid.02.1901.glo
(etc)

Decided to read Mitchell & Jones 2005 again. Noticed that the
limit for SD when anomalising should be 4 for precip, not 3! So
re-ran with that:

crua6[/cru/cruts/version_3_0/primaries/precip] ./anomdtb

> ***** AnomDTB: converts .dtb to anom .txt for gridding *****

> Enter the suffix of the variable required:
.pre
> Will calculate percentage anomalies.
> Select the .cts or .dtb file to load:
pre.0612181221.dtb
pre.0612181221.dtb

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/precip/pre.0612181221.dtb

> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal:           23
> Specify the no. of stdevs at which to reject data:
4
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
8
> Select the generic .txt file to save (yy.mm=auto):
pre4sd.txt
> Select the first,last years AD to save:
1901,2006
> Operating…
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/precip/pre.0612181221.dtb

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/precip/pre.0612181221.dtb

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/precip/pre.0612181221.dts

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/precip/pre.0612181221.dts

> NORMALS            MEAN percent      STDEV percent
>         .dtb    7315040    73.8
made it to here
>         .cts     299359     3.0    7613600    76.8
> PROCESS        DECISION percent %of-chk
> no lat/lon        17527     0.2     0.2
> no normal       2355659    23.8    23.8
> out-of-range      13253     0.1     0.2
> duplicated       586206     5.9     7.8
> accepted        6934807    70.0
> Dumping years 1901-2006 to .txt files…

This is not as good a percentage as for 2.10:

> NORMALS            MEAN percent      STDEV percent
>         .dtb          0     0.0
>         .cts    3375441    84.1    3375441    84.1
> PROCESS        DECISION percent %of-chk
> no lat/lon         3088     0.1     0.1
> no normal        638538    15.9    15.9
> out-of-range      70225     1.7     2.1
> duplicated       135457     3.4     4.1
> accepted        3167636    78.9
> Dumping years 1901-2002 to .txt files…

But the actual number of accepted values is more than TWICE 2.10!

Of course, the same 257 gridcells are zeros, because the multiplicative
normals are still zero.

For reference, these are the results for the 3 SD limit of 3.00:

> NORMALS            MEAN percent      STDEV percent
>         .dtb    7315040    73.8
made it to here
>         .cts     284160     2.9    7598401    76.7
> PROCESS        DECISION percent %of-chk
> no lat/lon        17527     0.2     0.2
> no normal       2370858    23.9    24.0
> out-of-range      32379     0.3     0.4
> duplicated       583193     5.9     7.8
> accepted        6903495    69.7
> Dumping years 1901-2006 to .txt files…

So we’ve only gained 0.3% of values, a real figure of 31312 values.

Conclusion: stick with a 3 Standard Deviation limit, like the
Read_Me says.

24. (cont of 22 really)

Restarted work on mergedb.for. Decided I was taking the wrong approach,
so the interruption was probably a GOOD THING.

The process now is to read in the header lines AND line numbers from
the main database, and to then process the incoming database one record
at a time. It’s more logical and haivng the line numbers will speed
things up enormously (well it has done on previous occasions).

The biggest immediate problem was the loss of an hour’s edits to the
program, when the network died.. no explanations from anyone, I hope
it’s not a return to last year’s troubles.

(some weeks later)

well, it compiles OK, and even runs enthusiastically. However there are
loads of bugs that I now have to fix. Eeeeek. Timesrunningouttimesrunningout.

(even later)

Getting there.. still ironing out glitches and poor programming.

25. Wahey! It’s halfway through April and I’m still working on it. This
surely is the worst project I’ve ever attempted. Eeeek. I think the main
problem is the rather nebulous concept of the automatic updater. If I
hadn’t had to write it to add the 1991-2006 temperature file to the ‘main’
one, it would probably have been a lot simpler. But that one operation has
proved so costly in terms of time, etc that the program has had to bend
over backwards to accommodate it. So yes, in retrospect it was not a
brilliant idea to try and kill two birds with one stone – I should have
realised that one of the birds was actually a pterodactyl with a temper
problem.

Success!
crua6[/cru/cruts/version_3_0/db/testmergedb] ./mergedb
**************************************************
*                  MERGEDB                       *
*                                                *
*  Merging of two database files                 *
*             Ops ID: f098xxxx                   *
*               Date:   12:17  25/04/07          *
*  The Session ID is: 0704251217.f098xxxx        *
*  (log file ‘mergedb.0704251217.f098xxxx.log’)  *
*                                                *
*  Please choose the mode of working.            *
*  This program can either run..                 *
*  [1] Interactively, (in which case an operator *
*  must be present throughout to make decision), *
*  or [2] in Batch mode, (in which case it may   *
*  be left unattended). If Batch mode is used, a *
*  file of outstanding issues will be saved for  *
*  later [3] resolution by an operator.          *
*                                                *
*  [1] Interactive (operator) processing         *
*  [2] Batch (no operator) processing            *
*  [3] Operator processing of saved batch        *
*  [4] Run a previously-saved action file        *
*                                                *
*  Please enter 1,2,3 or 4: 4
*  RUN ACTION FILE MODE                          *
*                                                *
*  Enter the ACTion filename, or ‘x’ for a list: x
*  The  1 most recent ACT files:
*   1. mergedb.0704201343.f098xxxx.act           *
*  Enter a number or 0 for none of the above: 1
*  Enter ‘Y’ to run this file or ‘N’ to abort: Y
*                                                *
*  Creation date/time:  13:43 20/04/07           *
*  Batch initiator was: f098                     *

*  Number of actions/requests:      2586
*  This ACT file derived from original OPS file: *
*  mergedb.0704201210.f098xxxx.ops               *
*  Main (existing) Database:      tmp.0702091122.dtb
*  Secondary (incoming) Database: tmp.0612081519.dat
*  Parameter is ‘tmp’ – confirm (Y/N): Y

*  Actions Completed!                            *
*          Thank You for using MERGEDB!          *
**************************************************

..well, ‘success’ in the sense that it ran and apparently all the data’s
in the right place, in tmp.0704251819.dtb.

26. OK, now to merge in the US stations. First, wrote ‘us2cru’ to convert
the marksusanonwmocru.dat file to the ‘standard’ format we’re using. That
worked OK. Then used ‘addnormline’ to, well – add a normals line. Only 17
out of 1035 stations ended up with missing normals, which is pretty good!

The with-normals US database file is tmp.0704251654.dat.

Now, I knew that using mergedb as it stands would not work. It expects to
be updating the existing records, and actions like ‘addnew’ require OPS
to confirm each one. So I thought it best to add an OPS clause to auto-
confirm additions where there’s no WMO match and the data density is OK,
say 50% or higher. Unfortunately, that didn’t work either, and rather than
spend even more time debugging mergedb.for, I knocked off simpleaddnew.for,
which adds two non-overlapping databases. The resultant file, with all
three partial databases, is tmp.0704271015.dtb.

27. Well, enough excuses – time to remember how to do the anomalising and
gridding things! Fisrtly, ran ‘addnormline’ just to ensure all normals are
up to date. The result was 8 new sets of normals, so well worth doing. The
database is now:

tmp.0704292158.dtb

Ran ‘anomdtb’ – got caught out by the requirement for a companion ‘.dts’
file again, ran ‘falsedts.for’ and carried on.. would still be nice to be
sure that it’s not something meaningful **sigh**.

Output:
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/primaries/temp] ./anomdtb

> ***** AnomDTB: converts .dtb to anom .txt for gridding *****

> Enter the suffix of the variable required:
.tmp
> Select the .cts or .dtb file to load:
tmp.0704292158.dtb
tmp.0704292158.dtb

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dtb

> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal:           23
> Specify the no. of stdevs at which to reject data:
3
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
8
> Select the generic .txt file to save (yy.mm=auto):
tmp.txt
> Select the first,last years AD to save:
1901,2006
> Operating…
/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dtb

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dtb

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dts

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dts

> Failed to find file.
> Enter the file, with suffix: .dts
tmp.0704292158.dts
tmp.0704292158.dts

/tmp_mnt/cru-auto/cruts/version_3_0/primaries/temp/tmp.0704292158.dts

> NORMALS            MEAN percent      STDEV percent
>         .dtb    3330007    81.3
made it to here
>         .cts      92803     2.3    3422810    83.6
> PROCESS        DECISION percent %of-chk
> no lat/lon            0     0.0     0.0
> no normal        671592    16.4    16.4
> out-of-range        744     0.0     0.0
> duplicated      4102723   100.2   119.9
> accepted        -680657   -16.6
> Dumping years 1901-2006 to .txt files…

crua6[/cru/cruts/version_3_0/primaries/temp]
<END QUOTE>

.. which is a trifle worrying! And looking at the .txt files, they look
rather odd as well – for instance, tmp.1953.03.txt starts like this:

7.09    0.87    10.0      0.10000  10010
7.83   -1.55    28.0     -4.80000  10080
6.97   -1.89    10.0      0.90000   -999
6.97   -1.89   100.0      0.50000  10260
7.45   -1.90    16.0     -3.10000  10280
6.95   -2.55   129.0      3.70000  10650
7.04   -3.11    14.0      0.00000  10980
6.60   -0.20     0.0      1.20000  11000
6.73   -1.44    13.0      1.60000   -999
6.68   -1.40    39.0      2.20000  11530

Now, do those first two columns look like lat & lon to you? Me neither,
here’s what the old version of the same file looks like:

60.00  -20.00  -999.0      0.40000-990007
62.00  -33.00  -999.0     -0.40000-990002
56.50  -51.00     0.0     -0.50000-990000
6.90  122.06     6.0     -0.60000   -999
13.13  123.73    17.0      0.20000   -999
14.52  121.00    15.0      0.60000   -999
18.37  121.63     4.0      1.10000   -999
6.90  122.00     6.0     -0.60000   -999
10.70  122.50    14.0     -0.10000   -999
13.13  123.73    19.0      0.10000   -999

In fact, the first two columns never get outside of +/- 30. Oh bugger.
What the HELL is going on?!

Decided to pursue that worrying (and impossible) ‘duplicates’ figure.

The function ‘sort’ was used to sort the database so that any duplicate
lines would be together – then ‘uniq’ was used to pull out duplicates.
There were quite a few dupes, and one or two triples too, like these:

crua6[/cru/cruts/version_3_0/primaries/temp] grep -n ’1984  \-83  \-46   22   55  126  154  222  215  159   63   32  \-62′ tmp.0704292158.dtb
195789:1984  -83  -46   22   55  126  154  222  215  159   63   32  -62
254265:1984  -83  -46   22   55  126  154  222  215  159   63   32  -62
254380:1984  -83  -46   22   55  126  154  222  215  159   63   32  -62

These are from the following stations:
720344   408   1158 1539 ELKO-FAA-AP———USA———   1870 1996 301870  -999.00
725837   408   1158 1549  NV ELKO FAA AP                    1930 1990 101930  -999.00
725910   401   1223  103 RED BLUFF            USA           1878 2006 101878  -999.00

The past two are consecutive stations.

Looking at the last two.. it seems that 725910 has 725837′s data!

1977   71  124  118  184  167  275  283  280  230  190  126   99
1978  107  114  149  144  208  248  289  282  232  220  118   72
1979   85   99  139  150  218  256  282  258  253  189  117   94
1980   99  121  119  156  192  216  275  262  241  196  128  102
1981   14   19   49   90  123  196  233  227  164   71   47   11
1982  -49  -14   32   57  114  164  206  214  148   74   11  -23
1983   -9   -1   54   59  114  167  204  223  170  104   25  -19
1984  -83  -46   22   55  126  154  222  215  159   63   32  -62

Ascan be seen, 1981 sees a complete chance in range, especially for
Autumn/Winter. In fact, from 1981 to 1990, 725910 is a copy of
725837! It then reverts to the original range for the rest of the run.
So.. did the merging program do this? Unfortunately, yes. Check dates:

crua6[/cru/cruts/version_3_0/db/testmergedb] grep -n ‘RED BLUFF’ tmp.0*.*
tmp.0612081519.dat:28595: 725910   401   1223  103 RED BLUFF            USA           1991 2006 101991  -999.00
tmp.0702091122.dtb:171674: 725910   401   1223  103 RED BLUFF            USA           1878 1980 101878  -999.00
tmp.0704251819.dtb:200331: 725910   401   1223  103 RED BLUFF            USA           1878 2006 101878  -999.00
tmp.0704271015.dtb:254272: 725910   401   1223  103 RED BLUFF            USA           1878 2006 101878  -999.00
tmp.0704292158.dtb:254272: 725910   401   1223  103 RED BLUFF            USA           1878 2006 101878  -999.00
crua6[/cru/cruts/version_3_0/db/testmergedb]

The first file is the 1991-2006 update file. The second is the original
temperature database – note that the station ends in 1980.

It has *inherited* data from the previous station, where it had -9999
before! I thought I’d fixed that?!!!

/goes off muttering to fix mergedb.for for the five hundredth time

Miraculously, despite being dog-tired at nearly midnight on a Sunday, I
did find the problem. I was clearing the data array but not close enough
to the action – when stations were being passed through (ie no data to
add to them) they were not being cleaned off the array afterwards. Meh.

Wrote a specific routine to clear halves of the data array, and back to
square one. Re-ran the ACT file to merge the x-1990 and 1991-2006 files.
Created an output file exactly the same size as the last time (phew!)
but with..

crua6[/cru/cruts/version_3_0/db/testmergedb] comm -12 tmp.0704292355.dtb tmp.0704251819.dtb |wc -l
285516
crua6[/cru/cruts/version_3_0/db/testmergedb] wc -l tmp.0704292355.dtb
285829 tmp.0704292355.dtb

.. 313 lines different. Typically:

14881,14886c14881,14886
< 1965-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1966-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1967-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1968-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1969-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1970-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999

> 1965 -221 -177 -234 -182   -5    6   24   36  -15  -91 -100 -221
> 1966 -272 -194 -248 -192  -66   10   27   45  -12  -75 -139 -228
> 1967 -201 -243 -196 -158  -26    1   40   30  -18  -89 -183 -172
> 1968 -253 -256 -253 -107  -42   10   46   33  -21  -64 -134 -195
> 1969 -177 -202 -248 -165  -33    8   42   50   -1  -89 -157 -204
> 1970 -237 -192 -217 -160  -87    6   30   25   -5  -55 -143 -222

ie, what should have been missing data is now missing data again:

200436,200445c200436,200445
< 1981-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1982-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1983-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1984-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1985-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1986-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1987-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1988-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1989-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
< 1990-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999

> 1981   14   19   49   90  123  196  233  227  164   71   47   11
> 1982  -49  -14   32   57  114  164  206  214  148   74   11  -23
> 1983   -9   -1   54   59  114  167  204  223  170  104   25  -19
> 1984  -83  -46   22   55  126  154  222  215  159   63   32  -62
> 1985  -57  -29   17   89  122  181  244  188  121   79  -11  -50
> 1986    2   31   66   72  113  187  194  214  116   78   11  -39
> 1987  -59   -5   30   97  131  177  193  192  153  101   21  -35
> 1988  -65  -15   29   80  108  184  222  198  138  116    8  -57
> 1989 -113  -54   53   94  113  164  215  186  143   78    8  -24
> 1990  -24  -30   49  100  100  166  214  194  177   77    9  -97

Hurrah!

So the interim database file is tmp.0704292355.dtb. Now to re-add
the US station dataset with simpleaddnew.for.

crua6[/cru/cruts/version_3_0/db/testmergedb] ./simpleaddnew

SIMPLYADDNEW – add stations to a database
This program assumes the two databases have
NO COMMON STATIONS and will fail (stop) if
any are found.

Please enter the main database: tmp.0704292355.dtb

Please enter the new database: tmp.0704251654.dat
Please enter a 3-character parameter code: tmp
Output database is: tmp.0704300053.dtb
crua6[/cru/cruts/version_3_0/db/testmergedb]

So now we have the combined database again, a bit quicker than
last time: tmp.0704300053.dtb. Pity we slid into May: I was hoping
to only be FIVE MONTHS late.

What’s worse – there are STILL duplicate non-missing lines, 210 of
them. The first example is this:

1835   92   73  141  187  260  279  281  288  241  195  183  106

Which belongs to this in the original database (tmp.0702091122.dtb):

722080   329    800   15 CHARLESTON, S. CAROL UNITED STATES 1823 1990 101823  -999.00
6190   84  100  142  180  224  257  274  270  245  191  145  104

..and to this in the US database (tmp.0704251654.dat):

720467   328    799    3 CHARLESTON-CITY—–USA———   1835 1996 301835  -999.00
6190   91  106  144  186  227  260  277  272  249  199  154  112

These two stations obviously have a lot in common – though not
everything, as their normals (shown) differ. In fact, on examination
the US database record is a poor copy of the main database one, it
has more missing data and so forth. By 1870 they have diverged, so
in this case it’s probably OK.. but what about the others? I just do
not have the time to follow up everything. We’ll have to take 210
year repetitions as ‘one of those things’.

..actually, I decided in the end to follow up all 210 of them. The
likelihood is that the number is far greater, since the filtering
that gave the 210 figure excluded any lines with two or more
consecutive missing values (to avoid hundreds of just-missing-value
lines). Also I spotted some instances where data lines would be
identical but for one or more missing values in one of the stations.

After checking, I found that the majority of the duplications were
between the original database and the US database, with just a couple
of ‘linked’ stations within the original database, and half a dozen
in the 1991-2006 update file. One surprise was that stations I’m sure
I rejected ended up marked as ‘addnew’ in the .act file – quite
unsettling!

Rather foolishly, perhaps, I decided to have a go at interactively
incorporating the US data rather than using ‘simplyaddnew’. However,
progress was so slow (because of the high number of ‘near matches’)
that this approach was abandoned.

Tried ‘anomdtb’ with the fixed final file (tmp.0704300053.dtb)…
no better! The crucial bits:

<BEGIN QUOTE>
> NORMALS            MEAN percent      STDEV percent
>         .dtb    3323823    81.3
made it to here
>         .cts      91963     2.2    3415786    83.5
> PROCESS        DECISION percent %of-chk
> no lat/lon            0     0.0     0.0
> no normal        675037    16.5    16.5
> out-of-range        744     0.0     0.0
> duplicated      4100117   100.2   120.1
> accepted        -685075   -16.7
> Dumping years 1901-2006 to .txt files…
> Failed to create file. Try again.
> Enter the file, with suffix: .ann
tmp.ann
> Failed to create file. Try again.
> Enter the file, with suffix: .ann
h.ann

crua6[/cru/cruts/version_3_0/primaries/temp]
<END QUOTE>

So the ‘duplicated’ figure is slightly lower.. but what’s this
error with the ‘.ann’ file?! Never seen before. Oh GOD if I
could start this project again and actually argue the case for
junking the inherited program suite!!

OK.. the .ann file was simply that it refuses to overwrite any
existing one. Meh. It’s happy to overwrite the log file of
course – nice bit of logic there.

and the duplicates? Well I inserted a debug line where the
decision is made. Here’s an example:

712600 vs.  727340:      4.7     8.4     4.7     8.4 ->      0.0km

Here the two WMO codes look OK (though others are -999 which
seems unlikely) but the two lat/lon pairs? Ooops. Here are the
actual headers:

712600   465    845  187 Sault Ste Marie A    CANADA        1945 2006 361945  -999.00
727340   465    844  220 SAULT-STE-MARIE—– USA———  1888 2006 101888  -999.00

So, uhhhh.. what in tarnation is going on? Just how off-beam
are these datasets?!!

Not sure why the lats & lons are a factor of 10 too low – may
be intentional though it wasn’t happening before.

Ran with the original database:

<BEGIN QUOTE>
> NORMALS            MEAN percent      STDEV percent
>         .dtb    2113609    81.7
made it to here
>         .cts          0     0.0    2113608    81.7
> PROCESS        DECISION percent %of-chk
> no lat/lon            0     0.0     0.0
> no normal        474422    18.3    18.3
> out-of-range      68179     2.6     3.2
> duplicated       923258    35.7    45.1
> accepted        1122172    43.4
> Dumping years 1901-1990 to .txt files…
<END QUOTE>

The lats & lons look the same.. but a lot less duplicates!

WHY? Well, it could just be those pesky US stations.. so
why not compare the two bespoke log files (as excerpted above)?

Immediately, another baffler: the log file from the run of
the ‘final’ database has lots of ‘DEBUG DETAIL’ information,
but the log file from the run of the original database does not!
So cropping those away with a judicious ‘tail’.. I ran comm:

crua6[/cru/cruts/version_3_0/primaries/temp] comm -23 log_anomdtb_H.0702091122.dat barelog_anomdtb_H.0704300053.dat |wc -l
200
crua6[/cru/cruts/version_3_0/primaries/temp] comm -13 log_anomdtb_H.0702091122.dat barelog_anomdtb_H.0704300053.dat | wc -l
2572
crua6[/cru/cruts/version_3_0/primaries/temp] comm -12 log_anomdtb_H.0702091122.dat barelog_anomdtb_H.0704300053.dat | wc -l
1809

So 200 duplication events are unique to the older database,
and 2572 are unique to the new database – with 1809 common
to both. A quick look at the 2572 ‘new’ ones showed a majority
of those with the first WMO as -999: this is the key. The
databases do not have any records with WMO=-999 as far as I know,
so something is going on..

28. With huge reluctance, I have dived into ‘anomdtb’ – and already I have
that familiar Twilight Zone sensation.

I have found that the WMO Code gets set to -999 if *both* lon and lat are
missing. However, the following points are relevant:

* LoadCTS multiplies non-missing lons by 0.1, so they range from -18 to +18
with missing value codes passing through AS LONG AS THEY ARE -9999. If they
are -999 they will be processed and become -99.9. It is not clear why lats
are not treated in the same way!

* The subroutine ‘Anomalise’ in anomdtb checks lon and lat against a simple
‘MissVal’, which is defined as -999. This will catch lats of -999 but not
lons of -9999.

* This does still not explain how we get so many -999 codes.. unless we don’t
and it’s just one or two?

And the real baffler:

* If the code is -999 because lat and lon are both missing – how the bloody
hell does it know there’s a duplication within 8km?!!!

.. ah, OK. well for a start, the last point above does not apply – not one
case of the code being set to -999 because of lat/lon missing. In fact, I
hate to admit it, bit it is *sort of* clever – the code is set to -999 to
prevent it being used again, because the distance/duplication checker will
not make a distance comparison if either code is -999. So HOW COME loads of
the duplicates have a code of -999?!!!

The plot thickens.. I changed the exclusion tests in the duplication loops
from:
if (AStn(XAStn).NE.MissVal) then
to:
if (int(AStn(XAStn)).NE.-999) then

This made NO DIFFERENCE. So having tested to ensure that the first of the
pair hasn’t already been used – we then use it! What’s more I’ve noticed
that it’s usually the one ‘incorporated’ in the previous iteration!

Consider:

67700 vs.  160660:      4.6    -0.9     4.6    -0.9 ->      5.4km
-999 vs.  160707:      4.6    -0.9     4.6    -0.9 ->      2.2km
-999 vs.  160800:      4.6    -0.9     4.5    -0.9 ->      7.3km
-999 vs.  160811:      4.6    -0.9     4.6    -0.9 ->      5.8km

Here we can see (check the first set of lat/lons) that, after being
incorporated into 160660, 67700 goes on to also be incorporated into
160707, 160800 and 160811! So the same data could end up in three
other stations. It gets worse!! Because later on, we find:

160660 vs.  160707:      4.6    -0.9     4.6    -0.9 ->      7.9km
-999 vs.  160800:      4.6    -0.9     4.5    -0.9 ->      7.0km
-999 vs.  160811:      4.6    -0.9     4.6    -0.9 ->      5.8km
160707 vs.  160800:      4.6    -0.9     4.5    -0.9 ->      7.9km
-999 vs.  160811:      4.6    -0.9     4.6    -0.9 ->      6.6km
160800 vs.  160811:      4.5    -0.9     4.6    -0.9 ->      2.2km

So three of those recipients have gone on to be incorporated into one
of them (160811). But although in this case 67700 is within 8km of
160811, there is no guarantee! Indeed, with this system, the ‘chosen’
station may hop all over the place in <8km steps, collecting data as
it goes. In a densely-packed area this could drastically reduce the
number of stations. Then there’s these:

85997 vs.  390000:    -10.0   -20.0   -10.0   -20.0 ->      0.0km
-999 vs.  685807:    -10.0   -20.0   -10.0   -20.0 ->      0.0km
-999 vs.  688607:    -10.0   -20.0   -10.0   -20.0 ->      0.0km
-999 vs.  967811:    -10.0   -20.0   -10.0   -20.0 ->      0.0km
-999 vs.  968531:    -10.0   -20.0   -10.0   -20.0 ->      0.0km

as might be guessed, they all end up incorporated into 968531 – but
no surprise seeing as their lats & lons are rubbish!!! Oh Tim what
have you done, man? [actually - what he's done is to let missing
lats & lons through. Missing lon code is -1999 not -9999 so these
figures are the roundings]

All that said, the biggest worry is still the lats & lons themselves.
They just don’t look realistic. Lats appear to have been reduced by
a factor of 10 too, even though I can’t find the code for that. And
(from the top example) is 67700 really 5.4km from 160660?

67700   460    -90  273 LUGANO               SWITZERLAND   1864 2006 101864  -999.00
160660   456    -87 -999 MILANO MALPENSA      ITALY         1961 1970 101961  -999.00

Of course not! It’s just over 50km. I do not understand why the lats
& lons have been scaled, when the stated distance threshold has not.

At least I’ve found *where* they are scaled, in LoadCTS (crutsfiles.f90):

if (StnInfo(XStn,2).NE.LatMissVal) Lat (XStn) = real(StnInfo(XStn,2)) / real(LatFactor)
if (StnInfo(XStn,3).NE.LonMissVal) Lon (XStn) = real(StnInfo(XStn,3)) / real(LonFactor)

Looking at how LoadCTS is called from anomdtb..

subroutine LoadCTS (StnInfo,StnLocal,StnName,StnCty,Code,Lat,Lon,Elv,OldCode,Data,YearAD,&
NmlData,DtbNormals,CallFile,Hulme,Legacy,HeadOnly,HeadForm,LongType,Silent,Extra,PhilJ, &
YearADMin,YearADMax,Source,SrcCode,SrcSuffix,SrcDate, &
LatMV,LonMV,ElvMV,DataMV,LatF,LonF,ElvF,NmlYr0,NmlYr1,NmlSrc,NmlInc)

call LoadCTS (StnInfoA,StnLocalA,StnNameA,StnCtyA,Code=AStn,OldCode=AStnOld, &
Lat=ALat,Lon=ALon,Elv=AElv,DtbNormals=DtbNormalsA, &
Data=DataA,YearAD=AYearAD,CallFile=LoadFileA,silent=1)     ! get .dtb file

.. we see that Legacy is not passed. This means that.. (from LoadCTS):

LatFactor=100 ; LonFactor=100 ; ElvFactor=1            ! usual/hulme hdr factors
if (present(Legacy)) then
LatFactor=10 ; LonFactor=10 ; ElvFactor=1            ! legacy hdr factors
end if
if (present(LatF)) LatFactor = LatF                ! custom hdr factors
if (present(LonF)) LonFactor = LonF
if (present(ElvF)) ElvFactor = ElvF

..LatFactor and LonFactor are set to 100.

So I added a specific pair of arguments, LatF=10,LonF=10, and got:

> NORMALS            MEAN percent      STDEV percent
>         .dtb    3323823    81.3
made it to here
>         .cts      91963     2.2    3415786    83.5
> PROCESS        DECISION percent %of-chk
> no lat/lon            0     0.0     0.0
> no normal        675037    16.5    16.5
> out-of-range        744     0.0     0.0
> duplicated        53553     1.3     1.6
> accepted        3361489    82.2
> Dumping years 1901-2006 to .txt files…

Hurrah! Looking at the log it is still ignoring the -999 Code and re-intgrating stations..
but not to any extent worth worrying about. Not when duplications are down to 1.3% :-) ))

Then got a mail from PJ to say we shouldn’t be excluding stations inside 8km anyway – yet
that’s in IJC – Mitchell & Jones 2005! So there you go. Ran again with 0km as the distance:

> NORMALS            MEAN percent      STDEV percent
>         .dtb    3323823    81.3
made it to here
>         .cts      91963     2.2    3415786    83.5
> PROCESS        DECISION percent %of-chk
> no lat/lon            0     0.0     0.0
> no normal        675037    16.5    16.5
> out-of-range        744     0.0     0.0
> accepted        3415042    83.5
> Dumping years 1901-2006 to .txt files…

Which hasn’t saved much as it turns out. In fact, I must conclude that an inquiring mind is
a very dangerous thing – I decided to see what difference it made, turning off the proximity
duplicate detection and elimination:

crua6[/cru/cruts/version_3_0/primaries/temp] wc -l */*1962.12.txt
2773 oldtxt/old.1962.12.txt
3269 tmptxt0km/tmp.1962.12.txt
3308 tmptxt8km/tmp.1962.12.txt

So.. ‘oldtxt’ is before I fixed the lat/lon scaling problem. But look at the last two – I
got MORE results when I used an elimination radius! Whaaaaaaaaat?!!!

/goes home in a huff

/gets out of huff and goes into house, checks things and thinks hard

Okay, I guess if we don’t do the roll-duplicates-together thing, then we could lose data
because the ‘rolled’ station (ie the one subsumed into its neighbour) might have useful
years but no normals, so that data would be lost?

29. I suddenly thought – what about the Australian data? But luckily that’s just tmax/tmin
so I can roll that into the next database work.

30. Being an idiot much experience I decided to go back to the ‘perfectly-good’ precip
generation for v3.0 and re-do the anomalies with the new anomdtb. At 8km, we got the
duplicates down from 5.9% to 2.1%:

<OLD ANOMDTB WITH LATLON PROBS>
> NORMALS            MEAN percent      STDEV percent
>         .dtb    7315040    73.8
made it to here
>         .cts     299359     3.0    7613600    76.8
> PROCESS        DECISION percent %of-chk
> no lat/lon        17527     0.2     0.2
> no normal       2355659    23.8    23.8
> out-of-range      13253     0.1     0.2
> duplicated       586206     5.9     7.8
> accepted        6934807    70.0
> Dumping years 1901-2006 to .txt files…

<NEW ANOMDTB WITH LATLON ‘FIXED’>
> NORMALS            MEAN percent      STDEV percent
>         .dtb    7315040    73.8
made it to here
>         .cts     299359     3.0    7613600    76.8
> PROCESS        DECISION percent %of-chk
> no lat/lon        17527     0.2     0.2
> no normal       2355659    23.8    23.8
> out-of-range      13253     0.1     0.2
> duplicated       207391     2.1     2.8
> accepted        7313622    73.8
> Dumping years 1901-2006 to .txt files…

And, of course, all in with 0km range:

> NORMALS            MEAN percent      STDEV percent
>         .dtb    7315040    73.8
made it to here
>         .cts     299359     3.0    7613600    76.8
> PROCESS        DECISION percent %of-chk
> no lat/lon        17527     0.2     0.2
> no normal       2355659    23.8    23.8
> out-of-range      13253     0.1     0.2
> accepted        7521013    75.9
> Dumping years 1901-2006 to .txt files…

Happy? well.. no. Because something is happening for precip that does not happen for
temp! But of course. Here are the first few lines from various 1962.12 text files..

tmptxt8km/tmp.1962.12.txt
70.90    8.70    10.0      2.10000  10010
78.30  -15.50    28.0     -3.30000  10080
69.70  -18.90    10.0     -1.40000   -999
69.70  -18.90   100.0     -1.50000  10260
74.50  -19.00    16.0     -1.20000  10280
69.50  -25.50   129.0     -3.10000  10650
70.40  -31.10    14.0     -0.20000  10980
66.00   -2.00     0.0      0.50000  11000
67.30  -14.40    13.0     -1.00000  11520
66.80  -14.00    39.0     -0.70000  11530

tmptxt0km/tmp.1962.12.txt
70.90    8.70    10.0      2.10000  10010
78.30  -15.50    28.0     -3.30000  10080
69.70  -18.90    10.0     -1.40000  10250
69.70  -18.90   100.0     -1.50000  10260
74.50  -19.00    16.0     -1.20000  10280
69.50  -25.50   129.0     -3.10000  10650
70.40  -31.10    14.0     -0.20000  10980
66.00   -2.00     0.0      0.50000  11000
67.30  -14.40    13.0     -1.00000  11520
66.80  -14.00    39.0     -0.70000  11530

preanoms/pre.1962.12.txt (old anomdtb output)
61.00   10.60   190.0     48.20000-511900
54.45   -6.07   116.0     -3.70000   -999
50.83   -4.55    15.0    -22.40000-389870
50.22   -5.30    76.0     39.70000   -999
50.63   -3.45     9.0    -28.10000-388730
51.43   -2.67    51.0    -36.90000   -999
51.05   -3.60   314.0    -27.80000-386030
51.72   -2.77   245.0    -37.70000-385850
51.62   -3.97    10.0    -46.10000-384130
52.35   -3.82   301.0     -4.40000-380860

pretxt8km/pre.1962.12.txt
610.00  106.00   190.0     48.20000-511900
544.50  -60.70   116.0     -3.70000-392380
508.30  -45.50    15.0    -22.40000-389870
502.20  -53.00    76.0     39.70000-389280
506.30  -34.50     9.0    -28.10000-388730
514.30  -26.70    51.0    -36.90000-386780
510.50  -36.00   314.0    -27.80000-386030
517.20  -27.70   245.0    -37.70000-385850
516.20  -39.70    10.0    -46.10000-384130
523.50  -38.20   301.0     -4.40000-380860

pretxt0km/pre.1962.12.txt
610.00  106.00   190.0     48.20000-511900
544.50  -60.70   116.0     -3.70000-392380
508.30  -45.50    15.0    -22.40000-389870
502.20  -53.00    76.0     39.70000-389280
506.30  -34.50     9.0    -28.10000-388730
514.30  -26.70    51.0    -36.90000-386780
510.50  -36.00   314.0    -27.80000-386030
517.20  -27.70   245.0    -37.70000-385850
516.20  -39.70    10.0    -46.10000-384130
523.50  -38.20   301.0     -4.40000-380860

..As a result of fixing the lats and lons for temperature, and indeed
precip it seems, we have buggered up the outputs!!! Obviously the
correction factor is expecting 100 not 10, but why isn’t this a problem
for temperature?! Went back and ran exactly the same version of anomdtb
on temperature – exactly the same as last time (2nd from top above). So
it is precip specific (or, erm, .not.temp specific?).

On the other hand, we’ve fixed the -999 WMO codes..

..and actually, those anomalies had better be percentage anomalies!

(checks a few) – yes, they are :-)

So oookay, LoadCTS reports the divisor is still 10 for lon/lat, so the
stored values for the first station (-511900, BIRI) should be 61 and 10.6,
sounds about right for Norway. The bit in anomdtb (actually the subroutine
‘Dumping’, LOL) that writes the .txt files just writes directly from the
arrays.. so they must have been modified somewhere in ‘Anomalise’ (there’s
nothing else in ‘Dumping’). Modified anomdtb to dump the first station’s
lat & lon at key stages – they were too high throughout, so LoadCTS assumed
to be the troublemaker. Modified LoadCTS in the same way, and it was
holding them at x100 from their true values, ie 61.0 -> 6100. It was about
now that I spotted something I’d not thought to examine before: precip
headers use two decimal places for their coordinates!

Temperature header:
10010   709     87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00

Precipitation header:
100100  7093   -867   10 JAN MAYEN            NORWAY        1921 2006   -999  -999.00

So.. this begs the question, how does the software suite know which it’s got?
By rights it should look at the most extreme values for each.. something tells
me that’s not the case. Decided to look at the ranges of values for different
versions of the databases, starting with temperature:

crua6[/cru/cruts] head -1 fromdpe1a/data/cruts/database/+norm/tmp.0311051552.dtb
-990017 -9999 -99999 -999 UNKNOWN              MARINE        1948 1990   -999  -999.00
crua6[/cru/cruts] head -1 fromdpe1a/data/cruts/database/+norm/_old/tmp.0310311715.dtb
-176000  3520   3330  220 NICOSIA              CYPRUS        1932 1974   -999   nocode
crua6[/cru/cruts] head -1 rerun1/data/cruts/rerun_tmp/tmp.0311051552.dtb
-990017 -9999 -99999 -999 UNKNOWN              MARINE        1948 1990   -999  -999.00
crua6[/cru/cruts] head -1 rerun1/data/cruts/rerun_tmp/tmp.0311051552n.dtb
-990017 -9999 -99999 -999 UNKNOWN              MARINE        1948 1990   -999  -999.00
crua6[/cru/cruts] head -1 rerun1/data/cruts/rerun_tmp/database/+norm/_old/tmp.0310311715.dtb
-176000  3520   3330  220 NICOSIA              CYPRUS        1932 1974   -999   nocode
crua6[/cru/cruts] head -1 rerun1/data/cruts/rerun_tmp/database/+norm/tmp.0311051552.dtb
-990017 -9999 -99999 -999 UNKNOWN              MARINE        1948 1990   -999  -999.00
crua6[/cru/cruts] head -1 rerun1/data/cruts/rerun_tmp/database/tmp.0311051552.dtb
-990017 -9999 -99999 -999 UNKNOWN              MARINE        1948 1990   -999  -999.00
crua6[/cru/cruts] head -1 version_3_0/primaries/temp/tmp.0702091122.dtb
10010   709     87   10 Jan Mayen            NORWAY        1921 1990 341921  -999.00
crua6[/cru/cruts] head -1 version_3_0/primaries/temp/tmp.0704300053.dtb
10010   709     87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00
crua6[/cru/cruts] head -1 version_3_0/db/testmergedb/tmp.0702091122.dtb
10010   709     87   10 Jan Mayen            NORWAY        1921 1990 341921  -999.00
crua6[/cru/cruts] head -1 version_3_0/db/testmergedb/tmp.0704292355.dtb
10010   709     87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00
crua6[/cru/cruts] head -1 version_3_0/db/testmergedb/badtimeline/tmp.0704251819.dtb
10010   709     87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00
crua6[/cru/cruts] head -1 version_3_0/db/testmergedb/badtimeline/tmp.0704271015.dtb
10010   709     87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00
crua6[/cru/cruts] head -1 version_3_0/db/testmergedb/badtimeline/tmp.0704292158.dtb
10010   709     87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00
crua6[/cru/cruts] head -1 version_3_0/db/testmergedb/tmp.0704300053.dtb
10010   709     87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00
crua6[/cru/cruts] head -1 version_3_0/db/tmp.0702091122.dtb
10010   709     87   10 Jan Mayen            NORWAY        1921 1990 341921  -999.00
crua6[/cru/cruts] head -1 version_3_0/db/tmp.0704300053.dtb
10010   709     87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00

Without going any further, it’s obvious that LoadCTS is going to have to auto-
sense the lat and lon ranges. Missing value codes can then be derived – if it
always returns actual (unscaled) degrees (to one or two decimal places) then
any value lower than -998 will suffice for both parameters. However, this does
make me wonder why it wasn’t done like that. Is there a likelihood of the
programs being used on a spatial subset of stations? Say, English? Then lon
would never get into double figures, though lat would.. well let’s just hope
not! *laughs hollowly*

Okay.. so I wrote extra code into LoadCTS to detect Lat & Lon ranges. It excludes any
values for which the modulus of 100 is -99, so hopefully missing value codes do not
conribute. The factors are set accordingly (to 10 or 100). I had to default to 1 which
is a pity. Once you’ve got the factors, detection of missing values can be a simple
out-of-range test.

However *sigh* this led me to examine the detection of ‘non-standard longitudes’ – a
small section of code that converts PJ-style reversed longitudes, or 0-360 ones, to
regular -180 (W) to +180 (E). This code is switched on by the presence of the
‘LongType’ flag in the LoadCTS call – the trouble is, THAT FLAG IS NEVER SET BY
ANOMDTB. There is a declaration ‘integer :: QLongType’ but that is never referred to
again. Just another thing I cannot understand, and another reason why this should all
have been rewritten from scratch a year ago!

So, I wrote ‘revlons.for’ – a proglet to reverse all longitude values in a database
file. Ran it on the temperature database (final):

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/testmergedb] ./revlons
REVLONS – Reverse All Longitudes!

This nifty little proglet will fix all of your
longitudes so that they point the right way, ie,
positive = East of Greenwich, negative = West.

..of course, if they are already fixed, this will
UNfix them. I am not that smart! So be careful!!

Please enter the database to be fixed: tmp.0704300053.dtb

Output file will be: tmp.0705101334.dtb
Confirm this filename (Y/N): Y

Log file will be:    tmp.0705101334.log

5065 stations written to tmp.0705101334.dtb

<END QUOTE>

Thus the ‘final’ temperature database is now tmp.0705101334.dtb.

Re-ran anomdtb – with working lat/lon detection and missing lat/lon value
detection – for both precip and temperature. This should ensure that all
WMO codes are present and all lats and lons are correct.

Temp:
<BEGIN QUOTE>
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****

> Enter the suffix of the variable required:
.tmp
> Select the .cts or .dtb file to load:
tmp.0705101334.dtb

> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal:           23
> Specify the no. of stdevs at which to reject data:
3
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
0
> Select the generic .txt file to save (yy.mm=auto):
tmp.txt
> Select the first,last years AD to save:
1901,2006
> Operating…

> NORMALS            MEAN percent      STDEV percent
>         .dtb    3323823    81.3
>         .cts      91963     2.2    3415786    83.5
> PROCESS        DECISION percent %of-chk
> no lat/lon         1993     0.0     0.0
> no normal        673044    16.5    16.5
> out-of-range        744     0.0     0.0
> accepted        3415042    83.5
> Dumping years 1901-2006 to .txt files…
<END QUOTE>

Precip:
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/primaries/precip] ./anomdtb

> ***** AnomDTB: converts .dtb to anom .txt for gridding *****

> Enter the suffix of the variable required:
.pre
> Will calculate percentage anomalies.
> Select the .cts or .dtb file to load:
pre.0612181221.dtb

> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal:           23
> Specify the no. of stdevs at which to reject data:
4
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
0
> Select the generic .txt file to save (yy.mm=auto):
pre.txt
> Select the first,last years AD to save:
1901,2006
> Operating…

> NORMALS            MEAN percent      STDEV percent
>         .dtb    7315040    73.8
>         .cts     299359     3.0    7613600    76.8
> PROCESS        DECISION percent %of-chk
> no lat/lon        17911     0.2     0.2
> no normal       2355275    23.8    23.8
> out-of-range      13253     0.1     0.2
> accepted        7521013    75.9
> Dumping years 1901-2006 to .txt files…
<END QUOTE>

Note that precip accepted values is up to 75.9%, I honestly don’t
think we’ll get higher.

Decided to process temperature all the way. Ran IDL:

IDL> quick_interp_tdm2,1901,2006,’tmpglo/tmpgrid.’,1200,gs=0.5,dumpglo=’dumpglo’,pts_prefix=’tmp0km0705101334txt/tmp.’

then glo2abs, then mergegrids, to produce monthly output grids. It apparently worked:

-rw——-   1 f098     cru      138964083 May 13 20:42 cru_ts_3_00.1901.2006.tmp.dat.gz
-rw——-   1 f098     cru        7852589 May 13 20:42 cru_ts_3_00.2001.2006.tmp.dat.gz
-rw——-   1 f098     cru       13108065 May 13 20:39 cru_ts_3_00.1991.2000.tmp.dat.gz
-rw——-   1 f098     cru       13106515 May 13 20:36 cru_ts_3_00.1981.1990.tmp.dat.gz
-rw——-   1 f098     cru       13106963 May 13 20:33 cru_ts_3_00.1971.1980.tmp.dat.gz
-rw——-   1 f098     cru       13123939 May 13 20:30 cru_ts_3_00.1961.1970.tmp.dat.gz
-rw——-   1 f098     cru       13120586 May 13 20:26 cru_ts_3_00.1951.1960.tmp.dat.gz
-rw——-   1 f098     cru       13120691 May 13 20:23 cru_ts_3_00.1941.1950.tmp.dat.gz
-rw——-   1 f098     cru       13130077 May 13 20:20 cru_ts_3_00.1931.1940.tmp.dat.gz
-rw——-   1 f098     cru       13104881 May 13 20:16 cru_ts_3_00.1921.1930.tmp.dat.gz
-rw——-   1 f098     cru       13094948 May 13 20:13 cru_ts_3_00.1911.1920.tmp.dat.gz
-rw——-   1 f098     cru       13085509 May 13 17:08 cru_ts_3_00.1901.1910.tmp.dat.gz

As a reminder, these output grids are based on the tmp.0705101334.dtb database, with no
merging of neighbourly stations and a limit of 3 standard deviations on anomalies.

Decided to (re-) process precip all the way, in the hope that I was in the zone or
something. Started with IDL:

IDL> quick_interp_tdm2,1901,2006,’preglo/pregrid.’,450,gs=0.5,dumpglo=’dumpglo’,pts_prefix=’pre0km0612181221txt/pre.’

Then glo2abs, then mergegrids.. all went fine, apparently.

31. And so.. to DTR! First time for generation I think.

Wrote ‘makedtr.for’ to tackle the thorny problem of the tmin and tmax databases not
being kept in step. Sounds familiar, if worrying. am I the first person to attempt
to get the CRU databases in working order?!! The program pulls no punches. I had
already found that tmx.0702091313.dtb had seven more stations than tmn.0702091313.dtb,
but that hadn’t prepared me for the grisly truth:

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/dtr] ./makedtr

MAKEDTR – Produce a DTR database

This program takes as its input a database of
of minimum temperatures and another of maximum
temperatures, and produces a database of diurnal
temperatures. If the input databases are found
to be out of synchronisation, the option is also
given to save synchronised versions.

So, may I please have the tmin database? tmn.0702091139.dtb

May I please now have the tmax database? tmx.0702091313.dtb

The output database will now be called:  dtr.0705152339.dtb

IMPORTANT: PLEASE READ! (it’s good for you)

The databases you gave are NOT synchronised!

tmn.0702091139.dtb has   42 ‘extra’ stations

tmx.0702091313.dtb has   49 ‘extra’ stations
You have the choice of quitting, or of allowing
me to create two new synchronised databases,
which will be saved and used to create the dtr db

Enter Q to Quit, S to Synchronise: S
New tmin database is: tmn.0705152339.dtb
Discarded tmin stations here: tmn.0702091139.dtb.del
New tmax database is: tmx.0705152339.dtb
Discarded tmax stations here: tmx.0702091313.dtb.del
Number of stations to process:    14267
<END QUOTE>

Yes, the difference is a lot more than seven! And the program helpfully dumps a listing
of the surplus stations to the log file. Not a pretty sight.

Unfortunately, it hadn’t worked either. It turns out that there are 3518 stations in
each database with a WMO Code of ‘      0′. So, as the makedtr program indexes on the
WMO Code.. you get the picture. *cries*

Rewrote as makedtr2, which uses the first 20 characters of the header to match:

<BEGIN QUOTE>
MAKEDTR2 – Produce a DTR database

This program takes as its input a database of
of minimum temperatures and another of maximum
temperatures, and produces a database of diurnal
temperatures. If the input databases are found
to be out of synchronisation, the option is also
given to save synchronised versions.

So, may I please have the tmin database? tmn.0702091139.dtb

May I please now have the tmax database? tmx.0702091313.dtb

The output database will now be called:  dtr.0705162028.dtb

IMPORTANT: PLEASE READ! (it’s good for you)

The databases you gave are NOT synchronised!

tmn.0702091139.dtb has  203 ‘extra’ stations

tmx.0702091313.dtb has  209 ‘extra’ stations
You have the choice of quitting, or of allowing
me to create two new synchronised databases,
which will be saved and used to create the dtr db

Enter Q to Quit, S to Synchronise: S
New tmin database is: tmn.0705162028.dtb
Discarded tmin stations here: tmn.0702091139.dtb.del
New tmax database is: tmx.0705162028.dtb
Discarded tmax stations here: tmx.0702091313.dtb.del
<END QUOTE>

The big jump in the number of ‘surplus’ stations is because we are no longer automatically
matching stations with WMO=0.

Here’s what happened to the tmin and tmax databases, and the new dtr database:

Old tmin: tmn.0702091139.dtb      Total Records Read:    14309
New tmin: tmn.0705162028.dtb      Total Records Read:    14106
Del tmin: tmn.0702091139.dtb.del  Total Records Read:      203

Old tmax: tmx.0702091313.dtb      Total Records Read:    14315
New tmax: tmx.0705162028.dtb      Total Records Read:    14106
Del tmax: tmx.0702091313.dtb.del  Total Records Read:      209

New dtr:  dtr.0705162028.dtb      Total Records Read:    14107

*sigh* – one record out! Also three header problems:

BLANKS (expected at 8,14,21,26,47,61,66,71,78)
position   missed
8        1
14        1
21        0
26        0
47        1
61        0
66        0
71        0
78        0

Why?!! Well the sad answer is.. because we’ve got a date wrong. All three ‘header’ problems
relate to this line:

6190   94   95   98  100  101  101  102  103  102   97   94   94

..and as we know, this is not a conventional header. Oh bum. But, but.. how? I know we do
muck around with the header and start/end years, but still..

Wrote filtertmm.for, which simply steps through one database (usually tmin) and
looks for a ‘perfect’ match in another database (usually tmax). ‘Perfect’ here
means a match of WMO Code, Lat, Lon, Start-Year and End-Year. If a match is
found, both stations are copied to new databases:

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/dtr] ./filtertmm

FILTERTMM – Create GOOD tmin/max databases

Please enter the tmin database: tmn.0702091139.dtb

Please enter the tmax database: tmx.0702091313.dtb

working..

Old tmin database: tmn.0702091139.dtb had   14309 stations
New tmin database: tmn.0705182204.dtb has   13016 stations
Old tmax database: tmx.0702091313.dtb had   14315 stations
New tmax database: tmx.0705182204.dtb has   13016 stations
<END QUOTE>

I am going to *assume* that worked! So now.. to incorporate the Australian
monthly data packs. Ow. Most future-proof strategy is probably to write a
converter that takes one or more of the packs and creates CRU-format databases
of them. Edit: nope, thought some more and the *best* strategy is a program
that takes *pairs* of Aus packs and updates the actual databases. Bearing in
mind that these are trusted updates and won’t be used in any other context.

From Dave L – who incorporated the initial Australian dump – for the tmin/tmax bulletins,
he used a threshold of 26 days/month or greater for inclusion.

Obtained two files from Dave – an email that explains some of the Australian
bulletin data/formatting, and a list of Austraian headers matched with their
internal codes (the latter being generated by Dave).

Actually.. although I was going to assume that filtertmm had done the synching job OK, a
brief look at the Australian stations in the databases showed me otherwise. For instance,
I pulled all the headers with ‘AUSTRALIA’ out of the two 0705182204 databases. Now because
these were produced by filtertmm, we know that the codes (if present), lats, lons and dates
will all match. Any differences will be in altitude and/or name. And so they were:

crua6[/cru/cruts/version_3_0/db/dtr] diff tmn.0705182204.dtb.oz tmx.0705182204.dtb.oz | wc -l
336

..so roughly 100 don’t match. They are mostly altitude discrepancies, though there are an
alarming number of name mismatches too. Examples of both:

74c74
<       0 -3800  14450   11 AVALON AIRPORT           AUSTRALIA 2000 2006   -999  -999.00

>       0 -3800  14450    8 AVALON AIRPORT           AUSTRALIA 2000 2006   -999  -999.00

16c16
<       0 -4230  14650  585 TARRALEAH VILLAGE        AUSTRALIA 2000 2006   -999  -999.00

>       0 -4230  14650  595 TARRALEAH CHALET         AUSTRALIA 2000 2006   -999  -999.00

Examples of the second kind (name mismatch) are most concerning as they may well be
different stations. Looked for all occurences in all tmin/tmax databases:

crua6[/cru/cruts/version_3_0/db/dtr] grep ‘TARRALEAH’ *dtb
tmn.0702091139.dtb:      0 -4230  14650  585 TARRALEAH VILLAGE        AUSTRALIA 2000 2006   -999  -999.00
tmn.0702091139.dtb:9597000 -4230  14645  595 TARRALEAH CHALET     AUSTRALIA     1991 2000   -999  -999.00
tmn.0705182204.dtb:      0 -4230  14650  585 TARRALEAH VILLAGE        AUSTRALIA 2000 2006   -999  -999.00
tmn.0705182204.dtb:9597000 -4230  14645  595 TARRALEAH CHALET     AUSTRALIA     1991 2000   -999  -999.00
tmx.0702091313.dtb:      0 -4230  14650  595 TARRALEAH CHALET         AUSTRALIA 2000 2006   -999  -999.00
tmx.0702091313.dtb:9597000 -4230  14645  595 TARRALEAH CHALET     AUSTRALIA     1991 2000   -999  -999.00
tmx.0705182204.dtb:      0 -4230  14650  595 TARRALEAH CHALET         AUSTRALIA 2000 2006   -999  -999.00
tmx.0705182204.dtb:9597000 -4230  14645  595 TARRALEAH CHALET     AUSTRALIA     1991 2000   -999  -999.00

This takes a little sorting out. Well first, recognise that we are dealing with four files: tmin
and tmax, early and late (before and after filtertmm.for). We see there are two TARRALEAH entries
in each of the four files. We see that ‘TARRALEAH VILLAGE’ only appears in the tmin file. We see,
most importantly perhaps, that they are temporally contiguous – that is, each pair could join with
minimal overlap, as one is 1991-2000 and the other 2000-2006. Also, we note that the ‘early’ one
of each pair has a slightly different longitude and altitude (the former being the thing that
distinguished the stations in filtertmm.for).

Finally, this, from the tmax.2005120120051231.txt bulletin:

95018, 051201051231, -42.30, 146.45,    18.0,        00,   31,  31,   585,    TARRALEAH VILLAGE

So we can resolve this case – a single station called TARRALEAH VILLAGE, running from 1991 to 2006.

But what about the others?! There are close to 1000 incoming stations in the bulletins, must
every one be identified in this way?!! Oh God. There’s nothing for it – I’ll have to write a prog
to find matches for the incoming Australian bulletin stations in the main databases. I’ll have to
use the databases from before the filtertmm application, so *0705182204.dtb. And it will only
need the Australian headers, so I used grep to create *0705182204.dtb.auhead files. The other
input is the list of stations taken from the monthly bulletins. Now these have a different number
of stations each month, so the prog will build an array of all possible stations based on the
files we have. Oh boy. And the program shall be called, ‘auminmaxmatch.for’.

Assembled some information:

crua6[/cru/cruts/version_3_0/db] wc -l *auhead
1518 glseries_tmn_final_merged.auhead
1518 tmn.0611301516.dat.auhead
1518 tmn.0612081255.dat.auhead
1518 tmn.0702091139.dtb.auhead
1518 tmn.0705152339.dtb.auhead
1426 tmn.0705182204.dtb.auhead

(the ‘auhead’ files were created with <grep ‘AUSTRALIA’>)

Actually, stopped work on that. Trying to match over 800 ‘bulletin’ stations against over 3,000
database stations *in two unsynchronised files* was just hurting my brain. The files have to be
properly synchronised first, with a more lenient and interactive version of filtertmm. Or…
could I use mergedb?! Pretend to merge tmin into tmax and see what pairings it managed? No
roll through obviously. Well it’s worth a play.

..unfortunately, not. Because when I tried, I got a lot of odd errors followed by a crash. The
reason, I eventually deduced, was that I didn’t build mergedb with the idea that WMO codes might
be zero (many of the australian stations have wmo=0). This means that primary matching on WMO
code is impossible. This just gets worse and worse: now it looks as though I’ll have to find WMO
Codes (or pseudo-codes) for the *3521* stations in the tmin file that don’t have one!!!

OK.. let’s break the problem down. Firstly, a lot of stations are going to need WMO codes, if
available. It shouldn’t be too hard to find any matches with the existing WMO coded stations in
the other databases (precip, temperature). Secondly, we need to exclude stations that aren’t
synchronised between the two databases (tmin/tmax). So can mergedb be modified to treat WMO codes
of 0 as ‘missing’? Had a look, and it does check that the code isn’t -999 OR 0.. but not when
preallocating flags in subroutine ‘countscnd’. Fixed that and tried running it again.. exactly
the same result (crash). I can’t see anything odd about the station it crashes on:

0 -2810  11790  407 MOUNT MAGNET AERO        AUSTRALIA 2000 2006   -999  -999.00
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2000  339  344  280  252  214  202  189  196  262  291  316  377
2001  371  311  310  300  235  212  201  217  249  262  314  333
2002-9999-9999  339  297  258  209  205  212  246  299  341  358
2003  365  367  336  296  249  195  193  200  238  287  325  368
2004  395  374  321  284  219  214  173  188  239  309  305  370
2005  389  396  358  315  251  182  189  201  233  267  332  341
2006  366  331  314  246  240-9999-9999-9999-9999-9999-9999-9999

.. it’s very similar to preceding (and following) stations, and the station before has even
less real data (the one before that has none at all and is auto-deleted). The nature of the
crash is ‘forrtl: error (65): floating invalid’ – so a type mismatch possibly. The station has
a match in the tmin database (tmn.0702091139.dtb) but the longitude is different:

tmn.0702091139.dtb:
0 -2810  11780  407 MOUNT MAGNET AERO        AUSTRALIA 2000 2006   -999  -999.00
tmx.0702091313.dtb:
0 -2810  11790  407 MOUNT MAGNET AERO        AUSTRALIA 2000 2006   -999  -999.00

It also appears in the tmin/tmax bulletins, eg:
7600, 070401070430, -28.12, 117.84,    16.0,        00,   30,  30,   407,    MOUNT MAGNET AERO

Note that the altitude matches (as distinct from the station below).

Naturally, there is a further ‘MOUNT MAGNET’ station, but it’s probably distinct:

tmn.0702091139.dtb:
9442800 -2807  11785  427 MOUNT MAGNET (MOUNT  AUSTRALIA     1956 1992   -999  -999.00
tmx.0702091313.dtb:
9442800 -2807  11785  427 MOUNT MAGNET (MOUNT  AUSTRALIA     1957 1992   -999  -999.00

I am at a bit of a loss. It will take a very long time to resolve each of these ‘rogue’
stations. Time I do not have. The only pragmatic thing to do is to dump any stations that are
too recent to have normals. They will not, after all, be contributing to the output. So I
knocked out ‘goodnorm.for’, which simply uses the presence of a valid normals line to sort.
The results were pretty scary:

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/dtr] ./goodnorm

GOODNORM: Extract stations with non-missing normals

Please enter the input database name: tmn.0702091139.dtb
The output database will be called:   tmn.0705281724.dtb

(removed stations will be placed in: tmn.0705281724.del)

FINISHED.

Stations retained:   5026
Stations removed:    9283

crua6[/cru/cruts/version_3_0/db/dtr] ./goodnorm

GOODNORM: Extract stations with non-missing normals

Please enter the input database name: tmx.0702091313.dtb
The output database will be called:   tmx.0705281724.dtb

(removed stations will be placed in: tmx.0705281724.del)

FINISHED.

Stations retained:   4997
Stations removed:    9318

<END QUOTE>

Essentially, two thirds of the stations have no normals! Of course, this still leaves us with
a lot more stations than we had for tmean (goodnorm reported 3316 saved, 1749 deleted) though
still far behind precipitation (goodnorm reported 7910 saved, 8027 deleted).

I suspect the high percentage lost reflects the influx of modern Australian data. Indeed, nearly
3,000 of the 3,500-odd stations with missing WMO codes were excluded by this operation. This means
that, for tmn.0702091139.dtb, 1240 Australian stations were lost, leaving only 278.

This is just silly. I can’t dump these stations, they are needed to potentially match with the
bulletin stations. I am now going to try the following:

1. Attempt to pair bulletin stations with existing in the tmin database. Mark pairings in the
database headers and in a new ‘Australian Mappings’ file. Program auminmatch.for.

2. Run an enhanced filtertmm to synchronise the tmin and tmax databases, but prioritising the
‘paired’ stations from step 1 (so they are not lost). Mark the same pairings in the tmax
headers too, and update the ‘Australian Mappings’ file.

3. Add the bulletins to the databases.

OK.. step 1. Modified auminmaxmatch.for to produce auminmatch.for. Hit a semi-philosophical
problem: what to do with a positive match between a bulletin station and a zero-wmo database
station? The station must have a real WMO code or it’ll be rather hard to describe the match!

Got a list of around 12,000 wmo codes and stations from Dave L; unfortunately there was a problem
with its formatting that I just couldn’t resolve.

So.. current thinking is that, if I find a pairing between a bulletin station and a zero-coded
Australain station in the CRU database, I’ll give the CRU database station the Australian local
(bulletin) code twice: once at the end of the header, and once as the WMO code *multiplied by -1*
to avoid implying that it’s legitimate. Then if a ‘proper’ code is found or allocated later, the
mapping to the bulletin code will still be there at the end of the header. Of course, an initial
check will ensure that a match can’t be found, within the CRU database, between the zero-coded
station and a properly-coded one.

Debated header formats with David. I think we’re going to go with (i8,a8) at the end of the header,
though really it’s (2x,i6,a8) as I remember the Anders code being i2 and the real start year being
i4 (both from the tmean database). This will mean post-processing existing databases of course,
but that’s not a priority.

A brief (hopefully) diversion to get station counts sorted. David needs them so might as well sort
the procedure. In the upside-down world of Mark and Tim, the numbers of stations contributing to
each cell during the gridding operation are calculated not in the IDL gridding program – oh, no! -
but in anomdtb! Yes, the program which reads station data and writes station data has a second,
almost-entirely unrelated function of assessing gridcell contributions. So, to begin with it runs
in the usual way:

crua6[/cru/cruts/version_3_0/primaries/precip] ./anomdtb

> ***** AnomDTB: converts .dtb to anom .txt for gridding *****

> Enter the suffix of the variable required:
.pre
> Will calculate percentage anomalies.
> Select the .cts or .dtb file to load:
pre.0612181221.dtb

> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal:           23
> Specify the no. of stdevs at which to reject data:
4

But then, we choose a different output, and it all shifts focus and has to ask all the IDL
questions!!

> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
4
> Check for duplicate stns after anomalising? (0=no,>0=km range)
0
> Select the .stn file to save:
pre.stn
> Enter the correlation decay distance:
450
> Submit a grim that contains the appropriate grid.
> Enter the grim filepath:
clim.6190.lan.pre

> Grid dimensions and domain size:      720     360   67420
> Select the first,last years AD to save:
1901,2006
> Operating…

> NORMALS            MEAN percent      STDEV percent
>         .dtb    7315040    73.8
>         .cts     299359     3.0    7613600    76.8
> PROCESS        DECISION percent %of-chk
> no lat/lon        17911     0.2     0.2
> no normal       2355275    23.8    23.8
> out-of-range      13253     0.1     0.2
> accepted        7521013    75.9
> Calculating station coverages…

And then.. it unhelpfully crashes:

> ##### WithinRange: Alloc: DataB #####
forrtl: severe (174): SIGSEGV, segmentation fault occurred

Ho hum. I did try this last year which is why I’m not tearing my hair out. The plan is to use the
outputs from the regular anomdtb runs – ie, the monthly files of valid stations. After all we need
to know the station counts on a per month basis. We can use the lat and lon, along with the
correlation decay distance.. shouldn’t be too awful. Just even more programming and work. So before
I commit to that, a quick look at the IDL gridding prog to see if it can dump the figures instead:
after all, this is where the actual ‘station count’ information is assembled and used!!

..well that was, erhhh.. ‘interesting’. The IDL gridding program calculates whether or not a
station contributes to a cell, using.. graphics. Yes, it plots the station sphere of influence then
checks for the colour white in the output. So there is no guarantee that the station number files,
which are produced *independently* by anomdtb, will reflect what actually happened!!

Well I’ve just spent 24 hours trying to get Great Circle Distance calculations working in Fortran,
with precisely no success. I’ve tried the simple method (as used in Tim O’s geodist.pro, and the
more complex and accurate method found elsewhere (wiki and other places). Neither give me results
that are anything near reality. FFS.

Worked out an algorithm from scratch. It seems to give better answers than the others, so we’ll go
with that.
Also decided that the approach I was taking (pick a gridline of latitude and reverse-
engineer the GCD algorithm so the unknown is the second lon) was overcomplicated, when we don’t
need to know where it hits, just that it does. Since for any cell the nearest point to the station
will be a vertex, we can test candidate cells for the distance from the appropriate vertex to the
station. Program is stncounts.for, but is causing immense problems.

The problem is, really, the huge numbers of cells potentially involved in one station, particularly
at high latitudes. Working out the possible bounding box when you’re within cdd of a pole (ie, for
tmean with a cdd of 1200, the N-S extent is over 20 cells (10 degs) in each direction. Maybe not a
serious problem for the current datasets but an example of the complexity. Also, deciding on the
potential bounding box is nontrivial, because of cell ‘width’ changes at high latitudes (at 61 degs
North, the half-degree cells are only 27km wide! With a precip cdd of 450 km this means the
bounding box is dozens of cells wide – and will be wider at the Northern edge!

Clearly a large number of cells are being marked as covered by each station. So in densely-stationed
areas there will be considerable smoothing, and in sparsely-stationed (or empty) areas, there will be
possibly untypical data. I might suggest two station counts – one of actual stations contributing from
within the cell, one for stations contributing from within the cdd. The former being a subset of the
latter, so the latter could be used as the previous release was used.

Well, got stncounts.for working, finally. And, out of malicious interest, I dumped the first station’s
coverage to a text file and counted up how many cells it ‘influenced’. The station was at 10.6E, 61.0N.
The total number of cells covered was a staggering 476! Or, if you prefer, 475 indirect and one direct.

Ran for the first month (01/1901). Compared the resulting grid with that from CRU TS 2.1. Seems to
compare fine, some higher, some lower. Example:

2.10:    139   142   146   154   156   157   165   170
3.00:    141   148   154   153   153   159   163   168

(data are on latitude #265 and longitudes #163-170)

Wrote ‘makelsmask.for’ to, well, make a land-sea mask. It’ll work with any gridded
data file that uses -999 for sea. The mask is called ‘lsmask.halfdeg.dat’. Adapted
stncounts.for to read it and use it to mask the output files.

Still a bit disturbed by the large number of cells marked as ‘influenced’ by a single station. IDL
seems to use the inbuilt ‘TRIGRID’ function to interpolate the grid, so there’s no way of getting
the station count for a particular cell that way anyway. Not that it would mean much, since there
is bound to be some kind of weighting (it’s not clear what that weighting is, though, from the IDL
website). So the figures in the station count files are really rather loose. What might be useful
as a companion dataset would be the ACTUAL station counts. Counts for cells with stations actually
INSIDE them. Of course, that might be rather sensitive information..

Managed a full run of stncounts. It took over five and a half hours, which is a bit much!

Back to the gridding. I am seriously worried that our flagship gridded data product is produced by
Delaunay triangulation – apparently linear as well. As far as I can see, this renders the station
counts totally meaningless. It also means that we cannot say exactly how the gridded data is arrived
at from a statistical perspective – since we’re using an off-the-shelf product that isn’t documented
sufficiently to say that.
Why this wasn’t coded up in Fortran I don’t know – time pressures perhaps?
Was too much effort expended on homogenisation, that there wasn’t enough time to write a gridding
procedure? Of course, it’s too late for me to fix it too. Meh.

Well, it’s been a real day of revelations, never mind the week. This morning I
discovered that proper angular weighted interpolation was coded into the IDL
routine, but that its use was discouraged because it was slow! Aaarrrgghh.

There is even an option to tri-grid at 0.1 degree resolution and then ‘rebin’
to 720×360 – also deprecated! And now, just before midnight (so it counts!),
having gone back to the tmin/tmax work, I’ve found that most if not all of the
Australian bulletin stations have been unceremoniously dumped into the files
without the briefest check for existing stations. A classic example would be
these ‘two’ stations:

0 -1570  12870   31 KIMBERLEY RES.STATIO     AUSTRALIA 2000 2006   -999  -999.00
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2000  245  243  243  232  184  143  138  155  193  231  249  249
2001  245  247  241  216  156  167  163  129  201  238  246  247
2002  244  246  230  208  167  122   92  119  202  217  248  259
2003  253  249  222  220  169  151  144  158  203  216  248  250
2004  252  247  244  209  202  135  129  140  176  230  248  257
2005  245  246  237-9999-9999-9999-9999-9999-9999-9999-9999-9999
2006-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999

0 -1565  12871   31 KIMBERLEY RES.STATIO AUSTRALIA     1971 2000   -999  -999.00
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1971  254  249  239  218  166  147  142  169  214  246  253  241
1972  246  244  226  198  175  158  126  182  200  222  244  259
1973  255  259  252  232  215  186  171  189  216  240  256  246
1974  247  243  240  217  183  144  134  171  216  247  248  246
1975  239  239  237  216  180  157  168  171  223  233  243  246
1976  235  244  227  190  148  142  142  144  177  236  252  250
1977  253  249  245  218  177  135  130  137  187  226  250  248
1978  247  244  239  199  218  174  162  186  195  233  245  253
1979  247  246  238  217  205  166  147  178  216  234  248  254
1980  249  245  240  221  186  161  141  171  192  241  249  252
1981-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1982-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1983-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1984-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1985-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1986-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1987-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1988-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1989-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1990-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1991  248  244  234  224  169  160  160  140  210  225  252  260
1992  253  251  247  239  206-9999  141  173  218  237  246  260
1993  247-9999  242  225  207  172  149  170  204  237  249  258
1994  253-9999  214  196  171  140  130  141  171  222  248  247
1995  245  249  234  205  186  155  148  151  198  217  245  244
1996  245  238  220  208  159  166  136  161  179  225  233  247
1997  245  243  217  195  186  149  138  156  195  230  242  247
1998  248  250  245  229  188  167  177  158  200  247  253  250
1999  250  245  242  216  144  150  123-9999  188  239  240  251
2000  245  243  243  232  184  143  138  154  194  231  249  249

Now, I admit the lats and lons aren’t spot on. But c’mon, what are the chances
of them being different? The two year 2000s are almost identical. What about:

0 -1550  12450   12 KURI BAY                 AUSTRALIA 2000 2006   -999  -999.00
9420800 -1548  12452   29 KURI BAY             AUSTRALIA     1965 1992   -999  -999.00

Or:

0 -1550  12810   11 WYNDHAM                  AUSTRALIA 2000 2006   -999  -999.00
0 -1550  12820    4 WYNDHAM AERO             AUSTRALIA 2000 2006   -999  -999.00
9421400 -1549  12812   11 WYNDHAM POST OFFICE  AUSTRALIA     1968 2000   -999  -999.00
9421401 -1547  12810   20 WYNDHAM (WYNDHAM POR AUSTRALIA     1898 1966   -999  -999.00

Come On!! This is one station isn’t it.

I’d be content to leave it – but I have to match the bulletins! And I can match
to the long, stable series or to the loose, flapping ones put in for the
purpose! Meh II.

So.. in the end I matched to the 2000-2006 stations, where they actually did match.
Unfortunately the huge bulk of the bulletins still had to have new entries created for
them, which is a shame, and begs the question of why the Australian update bulletins
don’t match the original ‘catch-up’ block they sent us.

For some reason, the auminmatch program is causing no end of grief. I thought I’d
managed a complete run, and it did produce a good-looking tmin database with lots of
new station stubs tacked on the end:

-1009 -6628  11054   12 KURI BAY                 AUSTRALIA 2007 2007    -999    1009
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2006-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
-1019 -6628  11054   23 KALUMBURU                AUSTRALIA 2007 2007    -999    1019
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2006-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
-1020 -6628  11054   51 TRUSCOTT                 AUSTRALIA 2007 2007    -999    1020
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2006-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999

However, it doesn’t seem to have put the bulletin codes on the (a8) header field, for
some of the matches only!

Not sure why this is yet.. but have found also that there are cases of duplicated lat/lon pairs,
so multiple matches are being made.. argh.. will have to further augment auminmatch. Not happy.

An interesting aside.. David was looking at the v3.00 precip to help National Geographic with
an enquiry. I produced a second ‘station’ file with the ‘honest’ counts (see above) and he used
that to mask out cells with a 0 count (ie that only had indirect data from ‘nearby’ stations).
There were some odd results.. with certain months havign data, and others being missing. After
considerable debate and investigation, it was understood that anomdtb calculates normals on a
monthly basis. So, where there are 7 or 8 missing values in each month (1961-1990), a station
may end up contributing only in certain months of the year, throughout its entire run! This was
noticed in the Seychelles, where only October has real data (the remaining months being relaxed
to the climatology but excluded by David using the ‘tight’ station mask). There is no easy
solution, because essentially it’s an honest result: only October has sufficient values to form
a normal, so only October gets anomalised. It’s an unfortunate concidence that it’s the only
station in the cell, but it’s not the only one. A ‘solution’ could be for anomdtb to get a bit
more involved in the gridding, to check that if a cell only has one station (for one or more
years) then it’s all-or-nothing. Maybe if only one month has a normal then it’s dumped and the
whole reverts to climatology. Maybe if 4 or more months have normals.. maybe if >0 months have
normals and the rest can be brought in with a minor relaxation of the ’75% rule’.. who knows.

Back to auminmatch.for, and a (philosophical) breakthrough. Built a loop to find ‘fuzzy’
matches and group them together. The user then processes one group at a time, pairing up
matches until the potential for further matches is zero (or the user decides it is). Uses a
FSM to work out each chain (all db matches for a bulletin, then all bulletins that match
each of those db stations, then.. etc). To understand it, either read the code (especially
the comments) or just look at this mind-boggling example from the first run of it:

-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
User Match Decision(s) Please!
Bulletin stations:    8
1.    9021 -3193  11598   15 PERTH AIRPORT
2.    9225 -3192  11587   25 PERTH METRO
3.    9106 -3205  11598   10 GOSNELLS CITY
4.    9240 -3201  11614  384 BICKLEY
5.    9172 -3210  11588   30 JANDAKOT AERO
6.    9215 -3196  11576   41 SWANBOURNE
7.    9194 -3222  11581   14 MEDINA RESEARCH CENT
8.    9256 -3224  11568    6 GARDEN ISLAND HSF
Database stations:   18
1.       0 -3190  11590   25 PERTH METRO          2000 2006
2.       0 -3190  11600   15 PERTH AIRPORT        2000 2006
3. 9461000 -3190  11600   20 PERTH AIRPORT COMPAR 1944 2006
4. 9461001 -3190  11600   18 PERTH AIRPORT        1944 2004
5. 9461501 -3198  11607  210 KALAMUNDA (KALAMUNDA 1908 1992
6.       0 -3200  11580   41 SWANBOURNE           2000 2006
7.       0 -3200  11580   20 SUBIACO TREATMENT PL 2000 2006
8.       0 -3196  11579   20 SUBIACO TREATMENT PL 1991 1999
9. 9460800 -3195  11587   19 PERTH (PERTH REGIONA 1897 1992
10. 9460801 -3195  11587   19 PERTH-(PERTH-REGIONA 1897 1992
11.       0 -3210  11590   30 JANDAKOT AERO        2000 2006
12.       0 -3210  11600   10 GOSNELLS CITY        2000 2006
13.       0 -3200  11610  384 BICKLEY              2000 2006
14.       0 -3220  11580   14 MEDINA RESEARCH CENT 2000 2006
15.       0 -3220  11580    4 KWINANA BP REFINERY  2000 2006
16.       0 -3223  11576    4 KWINANA BP REFINERY  1961 2000
17. 9560800 -3222  11581   14 MEDINA RESEARCH CENT 1991 2000
18.       0 -3220  11570    6 GARDEN ISLAND HSF    2000 2006

Enter a matching pair, (bulletin,database) or ‘n’ to end: 1,2
Bulletin stations:    7
2.    9225 -3192  11587   25 PERTH METRO
3.    9106 -3205  11598   10 GOSNELLS CITY
4.    9240 -3201  11614  384 BICKLEY
5.    9172 -3210  11588   30 JANDAKOT AERO
6.    9215 -3196  11576   41 SWANBOURNE
7.    9194 -3222  11581   14 MEDINA RESEARCH CENT
8.    9256 -3224  11568    6 GARDEN ISLAND HSF
Database stations:   17
1.       0 -3190  11590   25 PERTH METRO          2000 2006
3. 9461000 -3190  11600   20 PERTH AIRPORT COMPAR 1944 2006
4. 9461001 -3190  11600   18 PERTH AIRPORT        1944 2004
5. 9461501 -3198  11607  210 KALAMUNDA (KALAMUNDA 1908 1992
6.       0 -3200  11580   41 SWANBOURNE           2000 2006
7.       0 -3200  11580   20 SUBIACO TREATMENT PL 2000 2006
8.       0 -3196  11579   20 SUBIACO TREATMENT PL 1991 1999
9. 9460800 -3195  11587   19 PERTH (PERTH REGIONA 1897 1992
10. 9460801 -3195  11587   19 PERTH-(PERTH-REGIONA 1897 1992
11.       0 -3210  11590   30 JANDAKOT AERO        2000 2006
12.       0 -3210  11600   10 GOSNELLS CITY        2000 2006
13.       0 -3200  11610  384 BICKLEY              2000 2006
14.       0 -3220  11580   14 MEDINA RESEARCH CENT 2000 2006
15.       0 -3220  11580    4 KWINANA BP REFINERY  2000 2006
16.       0 -3223  11576    4 KWINANA BP REFINERY  1961 2000
17. 9560800 -3222  11581   14 MEDINA RESEARCH CENT 1991 2000
18.       0 -3220  11570    6 GARDEN ISLAND HSF    2000 2006

Enter a matching pair, (bulletin,database) or ‘n’ to end: 2,1
Bulletin stations:    6
3.    9106 -3205  11598   10 GOSNELLS CITY
4.    9240 -3201  11614  384 BICKLEY
5.    9172 -3210  11588   30 JANDAKOT AERO
6.    9215 -3196  11576   41 SWANBOURNE
7.    9194 -3222  11581   14 MEDINA RESEARCH CENT
8.    9256 -3224  11568    6 GARDEN ISLAND HSF
Database stations:   16
3. 9461000 -3190  11600   20 PERTH AIRPORT COMPAR 1944 2006
4. 9461001 -3190  11600   18 PERTH AIRPORT        1944 2004
5. 9461501 -3198  11607  210 KALAMUNDA (KALAMUNDA 1908 1992
6.       0 -3200  11580   41 SWANBOURNE           2000 2006
7.       0 -3200  11580   20 SUBIACO TREATMENT PL 2000 2006
8.       0 -3196  11579   20 SUBIACO TREATMENT PL 1991 1999
9. 9460800 -3195  11587   19 PERTH (PERTH REGIONA 1897 1992
10. 9460801 -3195  11587   19 PERTH-(PERTH-REGIONA 1897 1992
11.       0 -3210  11590   30 JANDAKOT AERO        2000 2006
12.       0 -3210  11600   10 GOSNELLS CITY        2000 2006
13.       0 -3200  11610  384 BICKLEY              2000 2006
14.       0 -3220  11580   14 MEDINA RESEARCH CENT 2000 2006
15.       0 -3220  11580    4 KWINANA BP REFINERY  2000 2006
16.       0 -3223  11576    4 KWINANA BP REFINERY  1961 2000
17. 9560800 -3222  11581   14 MEDINA RESEARCH CENT 1991 2000
18.       0 -3220  11570    6 GARDEN ISLAND HSF    2000 2006

Enter a matching pair, (bulletin,database) or ‘n’ to end: 3,12
Bulletin stations:    5
4.    9240 -3201  11614  384 BICKLEY
5.    9172 -3210  11588   30 JANDAKOT AERO
6.    9215 -3196  11576   41 SWANBOURNE
7.    9194 -3222  11581   14 MEDINA RESEARCH CENT
8.    9256 -3224  11568    6 GARDEN ISLAND HSF
Database stations:   15
3. 9461000 -3190  11600   20 PERTH AIRPORT COMPAR 1944 2006
4. 9461001 -3190  11600   18 PERTH AIRPORT        1944 2004
5. 9461501 -3198  11607  210 KALAMUNDA (KALAMUNDA 1908 1992
6.       0 -3200  11580   41 SWANBOURNE           2000 2006
7.       0 -3200  11580   20 SUBIACO TREATMENT PL 2000 2006
8.       0 -3196  11579   20 SUBIACO TREATMENT PL 1991 1999
9. 9460800 -3195  11587   19 PERTH (PERTH REGIONA 1897 1992
10. 9460801 -3195  11587   19 PERTH-(PERTH-REGIONA 1897 1992
11.       0 -3210  11590   30 JANDAKOT AERO        2000 2006
13.       0 -3200  11610  384 BICKLEY              2000 2006
14.       0 -3220  11580   14 MEDINA RESEARCH CENT 2000 2006
15.       0 -3220  11580    4 KWINANA BP REFINERY  2000 2006
16.       0 -3223  11576    4 KWINANA BP REFINERY  1961 2000
17. 9560800 -3222  11581   14 MEDINA RESEARCH CENT 1991 2000
18.       0 -3220  11570    6 GARDEN ISLAND HSF    2000 2006

Enter a matching pair, (bulletin,database) or ‘n’ to end: 4,13
Bulletin stations:    4
5.    9172 -3210  11588   30 JANDAKOT AERO
6.    9215 -3196  11576   41 SWANBOURNE
7.    9194 -3222  11581   14 MEDINA RESEARCH CENT
8.    9256 -3224  11568    6 GARDEN ISLAND HSF
Database stations:   14
3. 9461000 -3190  11600   20 PERTH AIRPORT COMPAR 1944 2006
4. 9461001 -3190  11600   18 PERTH AIRPORT        1944 2004
5. 9461501 -3198  11607  210 KALAMUNDA (KALAMUNDA 1908 1992
6.       0 -3200  11580   41 SWANBOURNE           2000 2006
7.       0 -3200  11580   20 SUBIACO TREATMENT PL 2000 2006
8.       0 -3196  11579   20 SUBIACO TREATMENT PL 1991 1999
9. 9460800 -3195  11587   19 PERTH (PERTH REGIONA 1897 1992
10. 9460801 -3195  11587   19 PERTH-(PERTH-REGIONA 1897 1992
11.       0 -3210  11590   30 JANDAKOT AERO        2000 2006
14.       0 -3220  11580   14 MEDINA RESEARCH CENT 2000 2006
15.       0 -3220  11580    4 KWINANA BP REFINERY  2000 2006
16.       0 -3223  11576    4 KWINANA BP REFINERY  1961 2000
17. 9560800 -3222  11581   14 MEDINA RESEARCH CENT 1991 2000
18.       0 -3220  11570    6 GARDEN ISLAND HSF    2000 2006

Enter a matching pair, (bulletin,database) or ‘n’ to end: 5,11
Bulletin stations:    3
6.    9215 -3196  11576   41 SWANBOURNE
7.    9194 -3222  11581   14 MEDINA RESEARCH CENT
8.    9256 -3224  11568    6 GARDEN ISLAND HSF
Database stations:   13
3. 9461000 -3190  11600   20 PERTH AIRPORT COMPAR 1944 2006
4. 9461001 -3190  11600   18 PERTH AIRPORT        1944 2004
5. 9461501 -3198  11607  210 KALAMUNDA (KALAMUNDA 1908 1992
6.       0 -3200  11580   41 SWANBOURNE           2000 2006
7.       0 -3200  11580   20 SUBIACO TREATMENT PL 2000 2006
8.       0 -3196  11579   20 SUBIACO TREATMENT PL 1991 1999
9. 9460800 -3195  11587   19 PERTH (PERTH REGIONA 1897 1992
10. 9460801 -3195  11587   19 PERTH-(PERTH-REGIONA 1897 1992
14.       0 -3220  11580   14 MEDINA RESEARCH CENT 2000 2006
15.       0 -3220  11580    4 KWINANA BP REFINERY  2000 2006
16.       0 -3223  11576    4 KWINANA BP REFINERY  1961 2000
17. 9560800 -3222  11581   14 MEDINA RESEARCH CENT 1991 2000
18.       0 -3220  11570    6 GARDEN ISLAND HSF    2000 2006

Enter a matching pair, (bulletin,database) or ‘n’ to end: 6,6
Bulletin stations:    2
7.    9194 -3222  11581   14 MEDINA RESEARCH CENT
8.    9256 -3224  11568    6 GARDEN ISLAND HSF
Database stations:   12
3. 9461000 -3190  11600   20 PERTH AIRPORT COMPAR 1944 2006
4. 9461001 -3190  11600   18 PERTH AIRPORT        1944 2004
5. 9461501 -3198  11607  210 KALAMUNDA (KALAMUNDA 1908 1992
7.       0 -3200  11580   20 SUBIACO TREATMENT PL 2000 2006
8.       0 -3196  11579   20 SUBIACO TREATMENT PL 1991 1999
9. 9460800 -3195  11587   19 PERTH (PERTH REGIONA 1897 1992
10. 9460801 -3195  11587   19 PERTH-(PERTH-REGIONA 1897 1992
14.       0 -3220  11580   14 MEDINA RESEARCH CENT 2000 2006
15.       0 -3220  11580    4 KWINANA BP REFINERY  2000 2006
16.       0 -3223  11576    4 KWINANA BP REFINERY  1961 2000
17. 9560800 -3222  11581   14 MEDINA RESEARCH CENT 1991 2000
18.       0 -3220  11570    6 GARDEN ISLAND HSF    2000 2006

Enter a matching pair, (bulletin,database) or ‘n’ to end: 7,14
Bulletin stations:    1
8.    9256 -3224  11568    6 GARDEN ISLAND HSF
Database stations:   11
3. 9461000 -3190  11600   20 PERTH AIRPORT COMPAR 1944 2006
4. 9461001 -3190  11600   18 PERTH AIRPORT        1944 2004
5. 9461501 -3198  11607  210 KALAMUNDA (KALAMUNDA 1908 1992
7.       0 -3200  11580   20 SUBIACO TREATMENT PL 2000 2006
8.       0 -3196  11579   20 SUBIACO TREATMENT PL 1991 1999
9. 9460800 -3195  11587   19 PERTH (PERTH REGIONA 1897 1992
10. 9460801 -3195  11587   19 PERTH-(PERTH-REGIONA 1897 1992
15.       0 -3220  11580    4 KWINANA BP REFINERY  2000 2006
16.       0 -3223  11576    4 KWINANA BP REFINERY  1961 2000
17. 9560800 -3222  11581   14 MEDINA RESEARCH CENT 1991 2000
18.       0 -3220  11570    6 GARDEN ISLAND HSF    2000 2006

Enter a matching pair, (bulletin,database) or ‘n’ to end: 8,18
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

Amazing, huh? Most are actually more like this:

-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
User Match Decision(s) Please!
Bulletin stations:    1
1.    9053 -3167  11602   40 PEARCE RAAF
Database stations:    2
1.       0 -3170  11600   40 PEARCE RAAF          2000 2006
2. 9461200 -3167  11602   49 BULLSBROOK (PEARCE A 1940 1992

Enter a matching pair, (bulletin,database) or ‘n’ to end: 1,1
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

However.. still teething troubles, with previously-paired stations reappearing
for a second chance sometimes! So more debugging.. fixed. Also added a test
before the user gets a chain, to anticipate what the user (er, I) would do. For
instance, I generally match to a 200-2006, WMO=0 database station if the names
match, as they’re the ones David L put in from the Aus update files. I, er, the
user then gets ambiguities and nearby but unconnected stations. Fine, until you
get a nasty surprise like this one:

User Match Decision(s) Please!
Bulletin stations:    2
1.   58009 -2864  15364   95 BYRON BAY (CAPE BYRO
2.   58216 -2864  15364   95 BYRON BAY (CAPE BYRO
Database stations:    3
1.       0 -2860  15360   95 BYRON BAY (CAPE BYRO 2000 2006
2.       0 -2860  15360   95 BYRON BAY (CAPE BYRO 2000 2006
3. 9459500 -2863  15363   98 CAPE BYRON           1974 1992

Looking in the files I see that Bulletin 58009 is ‘BYRON BAY (CAPE BYRON LIGHTHOUSE)’,
and 58216 is ‘BYRON BAY (CAPE BYRON AWS)’. But the database stubs that have been
entered have not been intelligently named, just truncated – so I have no way of
knowing which is which! CRU NEEDS A DATA MANAGER. In this case I had to assume that
the updates were processed in .au code order, so 1-1 and 2-2. Argh. A few doubles
found, too:

Bulletin stations:    1
1.   33106 -2037  14895   59 HAMILTON ISLAND AIRP
Database stations:    3
1.       0 -2040  14900   23 HAMILTON ISLAND AIRP 2000 2006
2.       0 -2040  14900   59 HAMILTON ISLAND AIRP 2000 2006
3. 9436800 -2035  14895   23 HAMILTON ISLAND AIRP 1991 2000

Bulletin stations:    1
1.   90186 -3829  14245   71 WARRNAMBOOL AIRPORT
Database stations:    4
1.       0 -3830  14250   71 WARRNAMBOOL AIRPORT  2000 2006
2.       0 -3830  14240   75 WARRNAMBOOL AIRPORT  2000 2006
3.       0 -3840  14248   21 WARRNAMBOOL (POST OF 1961 1980
4.       0 -3828  14243   76 WARRNAMBOOL A        1983 1999

And the results? Strictly average, I thought.. but I’d forgotten to count the extra
‘anticipated match’ routine achievements! So I grepped the match-by-match file,
matches.0706281447.dat, and got:

crua6[/cru/cruts/version_3_0/db/dtr] grep ‘AUTO\:’ matches.0706281447.dat |wc -l
232
crua6[/cru/cruts/version_3_0/db/dtr] grep ‘AUTO FROM CHAIN’ matches.0706281447.dat | wc -l
514
crua6[/cru/cruts/version_3_0/db/dtr] grep ‘MANUAL’ matches.0706281447.dat | wc -l
12

In other words, all that sweat was worth it – 746 stations matched automatically, and
a further 12 manually! Only (797-758=) 39 bulletins unmatched! Wheeee! And here they are:

-6072 -2303  11504  111 EMU CREEK STATION        AUSTRALIA 2007 2007    -999    6072
-12044 -3355  12070  220 MUNGLINUP WEST           AUSTRALIA 2007 2007    -999   12044
-12241 -2888  12132  370 LEONORA AERO             AUSTRALIA 2007 2007    -999   12241
-17031 -2965  13806   50 MARREE COMPARISON        AUSTRALIA 2007 2007    -999   17031
-21118 -3323  13800   10 PORT PIRIE AERODROME     AUSTRALIA 2007 2007    -999   21118
-22801 -3575  13659  143 CAPE BORDA COMPARISO     AUSTRALIA 2007 2007    -999   22801
-23122 -3451  13868   65 ROSEWORTHY AWS           AUSTRALIA 2007 2007    -999   23122
-24521 -3512  13926   33 MURRAY BRIDGE COMPAR     AUSTRALIA 2007 2007    -999   24521
-25509 -3533  14052   99 LAMEROO COMPARISON       AUSTRALIA 2007 2007    -999   25509
-26026 -3716  13976    3 ROBE COMPARISON          AUSTRALIA 2007 2007    -999   26026
-32004 -1826  14602    5 CARDWELL MARINE PDE      AUSTRALIA 2007 2007    -999   32004
-35019 -2282  14764  260 CLERMONT SIRIUS ST       AUSTRALIA 2007 2007    -999   35019
-48243 -2943  14797  154 LIGHTNING RIDGE VISI     AUSTRALIA 2007 2007    -999   48243
-55024 -3103  15027  307 GUNNEDAH RESOURCE CE     AUSTRALIA 2007 2007    -999   55024
-56037 -3053  15167  987 ARMIDALE (TREE GROUP     AUSTRALIA 2007 2007    -999   56037
-60013 -3218  15251    4 FORSTER – TUNCURRY R     AUSTRALIA 2007 2007    -999   60013
-63039 -3371  15031 1015 KATOOMBA (MURRI ST)      AUSTRALIA 2007 2007    -999   63039
-63226 -3348  15013  900 LITHGOW (COOERWULL)      AUSTRALIA 2007 2007    -999   63226
-68257 -3406  15077  112 CAMPBELLTOWN (MOUNT      AUSTRALIA 2007 2007    -999   68257
-70263 -3475  14970  670 GOULBURN TAFE            AUSTRALIA 2007 2007    -999   70263
-82170 -3655  14600  171 BENALLA AIRPORT          AUSTRALIA 2007 2007    -999   82170
-84150 -3787  14801    4 LAKES ENTRANCE (EAST     AUSTRALIA 2007 2007    -999   84150
-85099 -3863  14581    3 POUND CREEK              AUSTRALIA 2007 2007    -999   85099
-88023 -3723  14591  230 LAKE EILDON              AUSTRALIA 2007 2007    -999   88023
-200001 -2166  15027  209 MIDDLE PERCY ISLAND      AUSTRALIA 2007 2007    -999  200001
-200100 -2066  11558   24 VARANUS ISLAND           AUSTRALIA 2007 2007    -999  200100
-200212 -1061  12598 -999 NORTHERN ENDEAVOUR       AUSTRALIA 2007 2007    -999  200212
-200283 -1629  14997    8 WILLIS ISLAND            AUSTRALIA 2007 2007    -999  200283
-200288 -2904  16794  112 NORFOLK ISLAND AERO      AUSTRALIA 2007 2007    -999  200288
-200731 -1176  13003    7 POINT FAWCETT            AUSTRALIA 2007 2007    -999  200731
-200783 -1772  14845    3 FLINDERS REEF            AUSTRALIA 2007 2007    -999  200783
-200790 -1045  10569  261 CHRISTMAS ISLAND AER     AUSTRALIA 2007 2007    -999  200790
-200824 -1753  21040    2 PAPEETE                  AUSTRALIA 2007 2007    -999  200824
-200838 -3922  14698  116 HOGAN ISLAND             AUSTRALIA 2007 2007    -999  200838
-200851   -52  16692    7 NAURU ARCS-2             AUSTRALIA 2007 2007    -999  200851
-200852  -206  14743    4 MANUS ARCS-1             AUSTRALIA 2007 2007    -999  200852
-300000 -6858   7797   18 DAVIS                    AUSTRALIA 2007 2007    -999  300000
-300001 -6760   6287   10 MAWSON                   AUSTRALIA 2007 2007    -999  300001
-300017 -6628  11054   40 CASEY                    AUSTRALIA 2007 2007    -999  300017

Resultant database: tmn.0707021605.dtb

[edit: found another fault, had to re-run. Headers weren't being modded if the WMO code was
already there]

32. The next stage *heart falls* will be to synchronise tmax *against* tmin, sweeping
up duplicates in the process. How long’s THIS gonna take? Well actually, it might be fairly easy,
if we use a similar approach. We can base it all around the user being given a ‘cloud’ of
related stations to pick pairs from, only they will be uniquely numbered so that two from the
same database can be selected. The user can in this way ‘pair up’ stations in groups.

Of course, this comes with the downside of complexity (and therefore bugs). And both databases
will almost certainly have to be preloaded in their entirety because of the need for the user to
be able to confirm header and data precedence info when stations within a database are merged.

Oh – and I’ll have to move bloody quick. So more bugs.

Well.. it’s written, and debugging. Around 1500 lines of code, or 1000 without all the comments ;-)
It does indeed read in all the data, so has to be compiled on uealogin1 (as crua6 doesn’t have
enough memory!). Reusing code from auminmatch.for did speed things up a bit, though two new
subroutines had to be written to carry out checking for merges (within a database) and for
matches (between the databases). Also introduced a user decision at the start to allow the TMin
database to take precedence in terms of station metadata. Here’s the current state of play:

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/dtr] ./auminmaxsync

WELCOME TO THE TMIN/TMAX SYNCHRONISER

Before we get started, an important question: Should TMin header info take precedence over TMax?

This will significantly reduce user decisions later, but is a big step as TMax settings may be silently overridden!

To let TMin header values take precedence over those of TMax, enter ‘YES’: YES
Please enter the tmin database name: tmn.0707021605.dtb
Please enter the tmax database name: tmx.0702091313.dtb

Reading in both databases..
TMin database stations:    14349
TMax database stations:    14315

Processing one-to-one matches..

Initial scan found:
one-to-one matches:  7875
of which confirmed:  7691
in a station cloud:  6411 (tmin)
in a station cloud:  6392 (tmax)
unmatchable:    63 (tmin)
unmatchable:    48 (tmax)
Processing match clouds..
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
User Match Decision(s) Please!
TMin stations:    2
1. -401000  3178   3522  783 JERUSALEM            1863 2000    -999       0
2. 4018400  3178   3522  809 JERUSALEM            1977 1995    -999       0
TMax stations:    2
3. -401000  3178   3522  783 JERUSALEM            1863 2000    -999       0
4. 4018400  3178   3522  809 JERUSALEM            1977 1995    -999       0

*** Remember: Merge first, Match second! ***
Enter ANY pair to match or merge, or ‘n’ to end:
<END QUOTE>

So stats pretty much as expected/hoped. The one-to-one matches should, of course, be 100%.. but as
the databases aren’t synchronised, and as there are hundreds of ‘duplicate’ entries.. only around
50% match straight away. The situation isn’t as bleak as it looks, though – there is further
automatching at the beginning of each cloud, so the user can still be spared the obvious. If the
merging gets too onerous, though, I might have to automate that – with associated risks.

And of course – if you look closely – things are still a little offbeam :-/

Found another database bug by chance.. a <tab> instead of a space after ‘CRANWELL’:

-324320  5303    -50   62 CRANWELL                UK            1961 1995   -999  -999.00

Doesn’t show up in reads as it’s a white space character. Argh. Fixed in tmin & tmax. Now to find
out why some matched stations STILL don’t have the backref in the last header field!! ..found it,
not my problem, it’s the ones that *pre-existed* in the databases, there’s 84 in total I think. So
I can write a proglet to check that any with negative WMO codes have the positive version in that
last field. And I did – ‘fixtnxrefs.for’. Fixed:
tmn.0702091139.dtb (84 fixed)
tmn.0707021605.dtb (651 ‘fixed’ – includes all with negative WMOs regardless of end field)
tmx.0702091313.dtb (84 fixed)

So why, when we matched 758 bulletins in the first place, did this program only ‘fix’ 651, of which
84 were preexisting? Because, of course, the matches only get a negative WMO code if the original
WMO code is missing (zero). The ‘missing’ stations would be ones that already had a WMO code.

So, try again, and it’s looking good!

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/dtr] ./auminmaxsync

WELCOME TO THE TMIN/TMAX SYNCHRONISER

Before we get started, an important question: Should TMin header info take precedence over TMax?

This will significantly reduce user decisions later, but is a big step as TMax settings may be silently overridden!

To let TMin header values take precedence over those of TMax, enter ‘YES’: YES
Please enter the tmin database name: tmn.0702091139.dtb
Please enter the tmax database name: tmx.0702091313.dtb

Reading in both databases..
TMin database stations:    14309
TMax database stations:    14315

Processing one-to-one matches..

Initial scan found:
one-to-one matches:  7889
of which confirmed:  7702
in a station cloud:  6365 (tmin)
in a station cloud:  6378 (tmax)
unmatchable:    55 (tmin)
unmatchable:    48 (tmax)
Processing match clouds..
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
User Match Decision(s) Please!
TMin stations:    2
1. -401000  3178   3522  783 JERUSALEM            1863 2000    -999  401000
2. 4018400  3178   3522  809 JERUSALEM            1977 1995    -999       0
TMax stations:    2
3. -401000  3178   3522  783 JERUSALEM            1863 2000    -999  401000
4. 4018400  3178   3522  809 JERUSALEM            1977 1995    -999       0

*** Remember: Merge first, Match second! ***
Enter ANY pair to match or merge, or ‘n’ to end: 1,2
Merging two stations from the TMin database:
Stn 1:      -401000  3178   3522  783 JERUSALEM            ISRAEL        1863 2000   -999   401000
Stn 2:      -401000  3178   3522  783 JERUSALEM            ISRAEL        1863 2000   -999   401000
Please resolve the following inconsistencies:
Overlap:  Station A) -401000  3178   3522  783 JERUSALEM            ISRAEL        1863 2000   -999   401000
Station B) 4018400  3178   3522  809 JERUSALEM            ISRAEL        1977 1995   -999  -999.00

You must decide which station’s data takes precedence.
The intercorrelation for the period is:  0.99
Enter A or B, or undo pair(X):

<END QUOTE>

Well.. it’s kinda working. I found some idiotic bugs, though it is a fearsomely complicated program with
lots of indirect pointers (though I do try and resolve them at the first opportunity). One thing that’s
making debugging frustratingly difficult is something that must be a uealogin1 feature, and I haven’t seen
it before: the program doesn’t actually flush the output channels whenever you write! For example, as I
write this the program has dispensed with auto-matching:

Initial scan found:
one-to-one matches:  7875
of which confirmed:  7691
in a station cloud:  6411 (tmin)
in a station cloud:  6392 (tmax)
unmatchable:    63 (tmin)
unmatchable:    48 (tmax)

(yes, it’s a little tighter now)

Anyway, since then I’ve merged two pair (JERUSALEM) then paired the remainder. That activity has generated
match reports on channel 31 BUT THEY ARE NOT IN THE FILE YET. Here is the tail of channel 31:

crua6[/cru/cruts/version_3_0/db/dtr] tail mat.0707121500.dat
TMax: 9929470  4330   1340  342 MACERATA             ITALY         1953 1975   -999  -999.00
AUTO PAIRING FROM ONE-TO-ONE SCAN:
TMin: 9929480  4030    880  585 MACOMER              ITALY         1952 1978   -999  -999.00
TMax: 9929480  4030    880  585 MACOMER              ITALY         1952 1978   -999  -999.00
AUTO PAIRING FROM ONE-TO-ONE SCAN:
TMin: 9929500  4010   1850   86 PALASCIA AERO        ITALY         1952 1978   -999  -999.00
TMax: 9929500  4010   1850   86 PALASCIA AERO        ITALY         1952 1978   -999  -999.00
AUTO PAIRING FROM ONE-TO-ONE SCAN:
TMin: 9929520  4060   1490   30 PONTECAGNANO         ITALY         1951 1978   -999  -999.00
TMax: 9929520  4060   1490   30 PONTECAGNANO         ITALY         1951 1978   -999  -999.00

In addition, the log file is EMPTY, yet at least 416 bytes have been written to it. How the hell can I
debug if I can’t monitor what’s being written to the log files?!! Of course, once I force-quit the program,
and wait a bit.. the missing info appears. Similarly if I carry on using the program, the files get more
info. It’s as if there’s a write buffer that runs FIFO. Must look at the ‘help’.. why is it that whenever I
crack the programming, the systems themselves step in the screw it up? And computer support is away of course.

Looked at f77 -help.. nothing. well nothing obvious. Anyway, more debugging and..

Seems to be working. But it’s going to take ages. Here is an example of the problem:

<BEGIN QUOTE>
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
User Match Decision(s) Please!
TMin stations:    2
1. -315770  5638   -287   10 LEUCHARS             UK            1959 1995    -999  315770
2.  317100  5640   -287   12 LEUCHARS             UNITED KINGDO 1997 2006    -999       0
TMax stations:    2
3. -315770  5638   -287   10 LEUCHARS             UK            1959 1995    -999  315770
4.  317100  5638   -287   12 LEUCHARS RAF         UK UK         1973 2006    -999       0

*** Remember: Merge first, Match second! ***
Enter ANY pair to match or merge, or ‘n’ to end:
<END QUOTE>

Not only do both databases have unnecessary duplicates, introduced for external mapping purposes
by the look of it, but the ‘main’ stations (2 and 4) have different station name & country. In fact
one of the country names is illegal! Dealing with things like this cannot be automated as they’re
the results of non-automatic decisions.

Something new – a listing of 147 Australian ‘bulletin’ stations, most of which have mappings to
WMO codes. Decided to xref against the (mapped) TMin database, for a laugh. Then decided to take it
more seriously. Wrote a prog to IMPOSE the mappings onto tmn.0707021605.dtb, overriding existing
mappings as necessary. What a bloody mess.

Decided to be vaguely sensible and let the program, auwmoxref.for, evolve. so to begin with it just
did a scan between the mappings file (au_mapping_to_wmo.dat) and the tmin database with my mappings
in (tmn.0707021605.dtb). Results:

crua6[/cru/cruts/version_3_0/db/dtr] ./auwmoxref

<BEGIN QUOTE>
AUWMOXREF: Check Australian cross-references

Enter the file of WMO mappings: au_mapping_to_wmo.dat
115 mappings read

Enter the mapped TMin database: tmn.0707021605.dtb
14349 database headers read

RESULTS:

WMO Matches:    92
(multiples)  (  0)
> Ref matches:  60
> Ref empty:    31
> Ref WRONG:     1

Ref Matches:   114
(multiples)  (  0)
> WMO matches:  60
> WMO -1*Ref:   41
> WMO WRONG:    13
<END QUOTE>

So first the good news – no duplicates. Well there shouldn’t have been any anyway of course, but the
way things are going I’m taking nothing for granted. See, I count something turning out as expected
as ‘good news’. So anyway.. I also extracted the statistic that 26 mappings matched both Ref and WMO,
but to separate database entries. Thus the 115 mappings are allocated as follows:

60  Mapping found to be correctly implemented (over half, excellent)
41  WMO Missing, of which:
26  WMO found elsewhere (one of which has an unmapped ref attached to it)
15  WMO not in database (can add wmo codes for these)
13  WMO wrong, of which:
5  Can be merged with real WMO (effectively same station)
8  WMO not in database
1  Completely unmatched (96003 -> 949500)

For the purposes of actions to take, the 13 ‘WMO Wrong’ refs can simply be unmapped from their incorrect
mappings and be rolled into the 41 ‘WMO Missing’ refs.

So:

60  Mapping found to be correctly implemented (over half, excellent)
54  WMO Missing or wrong, of which:
31  WMO found elsewhere (one of which has an unmapped ref attached to it)
23  WMO not in database but pairing made (can add wmo codes for these)
8  WMO not in database and no pairing (can add new stations for these)
1  Completely unmatched (96003 -> 949500)

So, actions to take:

1. For the first 60, no action required.
2. For the 13 with incorrectly-assigned WMOs, disengage and roll into the rest below
3. For the 1 WMO with an unmapped ref attached, disengage and roll into the rest below
3. For the 31 with dislocated WMOs, print a list and ref when doing the tmin/tmax syncing
4. For the 23 with WMO-less stations, add the WMO codes..
5. For the 8 with no WMO found and no pairing found, create new stations.

For the disengagements, decided to work directly with an editor rather than craft another program! So
changes made to tmn.0707021605.dtb (after a suitable backup was made of course!).

The following assignments were disengaged (and replaced with -999.00). Where a WMO code follows in
brackets, the ref was reassigned there.

1. 9460300 -3200  11550   43 ROTTNEST ISLAND          AUSTRALIA 1898 2006    -999    9193 (9460200)
2. 9464600 -3090  12810  159 FORREST                  AUSTRALIA 1946 2006    -999   11052 (no)
3. 9432200 -2020  13000  340 RABBIT FLAT              AUSTRALIA 1969 2006    -999   15666 (no)
4. 9557400 -2640  15300    6 TEWANTIN RSL PARK        AUSTRALIA 1949 2006    -999   40908 (no)
5. 9451600 -2810  14860  199 ST GEORGE AIRPORT        AUSTRALIA 1938 2006    -999   43109 (9451700)
6. 9452700 -2950  14990  213 MOREE AERO               AUSTRALIA 1964 2006    -999   53115 (9552700)
7. 9454100 -2980  15110  582 INVERELL (RAGLAN ST)     AUSTRALIA 1907 2006    -999   56242 (no)
8. 9478700 -3140  15290    4 PORT MACQUARIE AIRPO     AUSTRALIA 1907 2006    -999   60139 (no)
9. 9475800 -3210  15090  216 SCONE SCS                AUSTRALIA 2000 2006    -999   61089 (9473800)
10. 9494000 -3510  15080   85 JERVIS BAY (POINT PE     AUSTRALIA 1907 2006    -999   68151 (no)
11. 9491600 -3590  14840 1482 CABRAMURRA SMHEA AWS     AUSTRALIA 1962 2006    -999   72161 (no)
12. 9482700 -3630  14160  133 NHILL                    AUSTRALIA 1897 2006    -999   78031 (9582900)
13. 9597900 -4300  14710   63 GROVE (COMPARISON)       AUSTRALIA 1961 2006    -999   94069 (no)

The ‘mismatched WMO code’ station was disengaged from it’s reference and given 48027 instead:
1. 9471100 -3150  14580  218 COBAR AIRPORT AWS        AUSTRALIA 1962 2006    -999   48237 -> 48027

I mailed BOM as we have 94711 = COBAR AWS but they have *94710* for AWS and 94711 for COBAR MO. The
reply was as follows:

<BEGIN QUOTE>
On 18 Jul 2007, at 8:51, Matthew Bastin wrote:

Hi Ian,

I hope this table helps

Name                  BoM No. WMO No.      Opened           Closed
Cobar Comparison        48244   94711   1/11/1997       15/11/2000
Cobar MO                48027   94711   1/01/1962
Cobar Airport AWS       48237   94710  11/06/1993
Cobar PO                48030            1/1/1881       31/12/1965

The blank in the Closed column means that the site is still open
When Cobar Comparison site closed it transferred its WMO number to Cobar MO
A blank in the WMO No. column means that the site never had a WMO number.

I am not sure of the overlap between the assignment of 94711 between 48244 and 48027. I will find
out and get back to you.
<END QUOTE>

Here are our current ‘COBAR’ headers:

0 -3150  14580  260 COBAR COMPARISON     AUSTRALIA     2000 2006    -999 -999.00
0 -3150  14580  260 COBAR MO             AUSTRALIA     2000 2006    -999 -999.00
0 -3148  14582  265 COBAR                AUSTRALIA     1962 2004    -999 -999.00
0 -3150  14580  251 COBAR POST OFFICE    AUSTRALIA     1902 1960    -999 -999.00
9471100 -3150  14580  218 COBAR AIRPORT AWS    AUSTRALIA     1962 2006    -999   48027

Now looking at the dates.. something bad has happened, hasn’t it. COBAR AIRPORT AWS cannot start
in 1962, it didn’t open until 1993! Looking at the data – the COBAR station 1962-2004 seems to be
an exact copy of the COBAR AIRPORT AWS station 1962-2004, except that the latter has more missing
values. Now, COBAR AIRPORT AWS has 15 months of missing value codes beginning Oct 1993.. coincidence?
No. I think that that series should start there. Furthermore, the overlap between COBAR and COBAR MO
(2000-2004) is *almost* identical:

0 -3148  14582  265 COBAR                AUSTRALIA     1962 2004    -999 -999.00
2000  177  209  183  135   80   51   45   52  105  122  166  186
2001  223  214  159  126   72   61   43   52  105  110  148  181
2002  195  185  168  148   88   58   49   63  101  128  186  192
2003  222  216  161  137   97   71   56   61   92  113  159  208
2004  207  226  175  141   74   69   46   69   90  136  160  186

0 -3150  14580  260 COBAR MO             AUSTRALIA     2000 2006    -999 -999.00
2000  178  209  184  136   80   52   45   55  105  122  166  186  (7/12)
2001  223  214  159  126   72   61   43   52  105  110  148  181  (12/12)
2002  195  185  168  148   88   58   49   63  101  128  187  192  (11/12)
2003  222  216  161  137   97   71   56   61   92  113  159  208  (12/12)
2004  207  226  175  141   74   69   46   69   90  136  160  186  (12/12)

I therefore propose to extend COBAR MO using COBAR, and to truncate COBAR AIRPORT AWS at 1993.
All BOM codes will be appended for completeness. So the new headers (with lat/lon from BOM too) are:

0 -3149  14583  260 COBAR COMPARISON     AUSTRALIA     2000 2006    -999   48244 (closed)
9471100 -3149  14583  260 COBAR MO             AUSTRALIA     1962 2006    -999   48027
0 -3150  14583  251 COBAR POST OFFICE    AUSTRALIA     1902 1960    -999   48030 (closed)
9471000 -3154  14580  218 COBAR AIRPORT AWS    AUSTRALIA     1995 2006    -999   48237

Deleted:
0 -3148  14582  265 COBAR                AUSTRALIA     1962 2004    -999 -999.00

The remaining 26 dislocated references were reassigned as for the 13 above. Legitimate mappings:

1.       3003   9420300
2.       4032   9431200
3.       5007   9430200
4.       7176   9431700
5.       9021   9461000
6.      14508   9415000
7.      14932   9413100
8.      17031   9448000
9.      22801   9480500
10.      26026   9481200
11.      27045   9417000
12.      32040   9429400
13.      40842   9457800
14.      50052   9470700
15.      55024   9474000
16.      67105   9575300
17.      68072   9475000
18.      71041   9590800
19.      86282   9486600
20.     200283   9429900
21.     200288   9499600
22.     200790   9699500
23.     200839   9499500
24.     300000   8957100
25.     300001   8956400
26.     300017   8961100

WMO codes were added to these uncoded sites as shown:

1. 9410000 -1430  12670   23 KALUMBURU                AUSTRALIA 2000 2006    -999    1019
2. 9562500 -3160  11720  217 CUNDERDIN AIRFIELD       AUSTRALIA 2000 2006    -999   10286
3. 9564000 -3270  11670  275 WANDERING                AUSTRALIA 2000 2006    -999   10917
4. 9567000 -3380  13820  109 SNOWTOWN (RAYVILLE P     AUSTRALIA 2000 2006    -999   21133
5. 9481400 -3530  13890   58 STRATHALBYN RACECOUR     AUSTRALIA 2000 2006    -999   24580
6. 9548200 -2590  13940   47 BIRDSVILLE AIRPORT       AUSTRALIA 2000 2006    -999   38026
7. 9552900 -2670  15020  305 MILES CONSTANCE STRE     AUSTRALIA 2000 2006    -999   42112
8. 9549200 -2800  14380  132 THARGOMINDAH AIRPORT     AUSTRALIA 2000 2006    -999   45025
9. 9578400 -3190  15250    8 TAREE AIRPORT AWS        AUSTRALIA 2000 2006    -999   60141
10. 9571900 -3220  14860  284 DUBBO AIRPORT AWS        AUSTRALIA 2000 2006    -999   65070
11. 9586900 -3560  14500   94 DENILIQUIN AIRPORT A     AUSTRALIA 2000 2006    -999   74258
12. 9495400 -4070  14470   94 CAPE GRIM BAPS           AUSTRALIA 2000 2006    -999   91245
13. 9596400 -4110  14680    3 LOW HEAD                 AUSTRALIA 2000 2006    -999   91293
14. 9595900 -4190  14670 1055 LIAWENEE                 AUSTRALIA 2000 2006    -999   96033

The following was corrected (ref had been mistyped as 78013):
1. 9582900 -3783  14206  200 HAMILTON RESEARCH ST AUSTRALIA     1971 1998    -999   78031

Now the results look like this:

WMO Matches:   106
> Ref matches: 106
> Ref empty:     0
> Ref WRONG:     0
Ref Matches:   106
> WMO matches: 106
> WMO -1*Ref:    0
> WMO WRONG:     0

In other words, there are (115-106=) 9 mappings unfulfilled. The ref hasn’t been matched and
WMO code isn’t in the database. However, that didn’t mean they weren’t in the database with a
missing WMO code, did it? The following were found and augmented with both WMO code and ref.

9457000 -2639  15304    6 TEWANTIN RSL PARK    AUSTRALIA     2000 2004    -999   40908
9594000 -3509  15080   85 JERVIS BAY (PT PERP AWS) AUSTRALIA 2000 2006    -999   68151

The following were added as new station stubs:
9532200 -2018  13001  340 RABBIT FLAT          AUSTRALIA     2007 2007    -999   15666
9554100 -2978  15111  582 INVERELL (RAGLAN ST) AUSTRALIA     2007 2007    -999   56242
9478600 -3143  15287    4 PORT MACQUARIE AIRPT AUSTRALIA     2007 2007    -999   60139
9591600 -3594  14838 1482 CABRAMURRA SMHEA AWS AUSTRALIA     2007 2007    -999   72161
9597100 -4298  14708   63 GROVE (COMPARISON)   AUSTRALIA     2007 2007    -999   94069

The following was complicated by the fact that two versions of the station appear to have been
concatenated. This is the station as it already exists in the TMin database:
9464600 -3085  12811  159 FORREST                  AUSTRALIA 1946 2006    -999 -999.00
However, the current ‘live’ FORREST station (11052) started in 1993, according to bom.au
records. And wouldn’t you know it, the data for this station has missing data between 12/92
and 12/99 inclusive. So I reckon it’s the old FORREST AERO station (WMO 9464600, .au ID 11004),
with the new Australian bulletin updates tacked on (hence starting in 2000). Especially as the
old station started in 1946 (http://www.bom.gov.au/climate/averages/tables/cw_011004.shtml).
The trouble is that the bom.au mappings all agree that FORREST is now WMO=9564600. So.. do I
split off the 2000-present data to a new station with the new number, or accept that whoever
joined them (Dave?) looked into it and decided it would be OK? The BOM website says they’re
800m apart. Decided to be brave and split the data back into two stations, with both codes
attached (in case we ever get replacement data for the closed station, the site says it went to
1995 after all). So there are now two FORREST stations:

9464600 -3085  12811  159 FORREST AERO             AUSTRALIA 1946 1992    -999   11004
9564600 -3085  12811  159 FORREST                  AUSTRALIA 2000 2006    -999   11052

Hope that’s right..

The following mapping was added, though the station does not currently feature in the bulletins.
9495900 -4228  14628 -999 BUTLERS GORGE        AUSTRALIA     2007 2007    -999   96003
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2007-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999

Also ran a risky search&replace to left-justify the ‘AUSTRALIA’ in its field, provided the
field wasn’t touched by an extended station name. Seems to have been 100% successful.

All 115 refs now matched in the TMin database. Confidence in the fidelity of the Australian
station in the database drastically reduced. Likelihood of invalid merging of Australian
stations high. Let’s go..

Well OK, made some final ‘improvements’ to the syncing program. Now, after it forms a cloud, it
should automatically merge stations provided the criteria are met and no others are possibles.
It also records, in a separate ‘action’ file (act.*), every relevant action performed during
the run, so that if interrupted I should be able to hack in something to enable a ‘resume’. It’s
been done a bit hastily so no guarantees that enough information’s been saved!

Debugging is still a big issue, unfortunately. It’s a complicated program to sort out and the
possibilities for indexing errors are many. In fact, for the first time ever, it’s just locked up!
That’s a first (it was due to getmos not defaulting to months 1 & 12 if the data was all missing).

Another problem solved – spent ages wondering how the start & end years for a particular station
(WARATAH) were being corrupted. Turns out they weren’t – I’d written ‘getmos’ to trim empty years,
but forgot to check the return flag! Duh.

So.. perhaps a debugged run through? I’m quickly realising that the Australian stations are in
such a state that I’m having to constantly refer to the station descriptions on the BOM website,
which are individual PDFs:

http://www.bom.gov.au/climate/cdo/metadata/pdf/metadata088110.pdf

It takes time.. time I don’t have! Though I’m pleased to see that the second FSM is helpfully
chipping in to pair things up when possible.

getting seriously fed up with the state of the Australian data. so many new stations have been
introduced, so many false references.. so many changes that aren’t documented. Every time a
cloud forms I’m presented with a bewildering selection of similar-sounding sites, some with
references, some with WMO codes, and some with both. And if I look up the station metadata with
one of the local references, chances are the WMO code will be wrong (another station will have
it) and the lat/lon will be wrong too. I’ve been at it for well over an hour, and I’ve reached
the 294th station in the tmin database. Out of over 14,000. Now even accepting that it will get
easier (as clouds can only be formed of what’s ahead of you), it is still very daunting. I go
on leave for 10 days after tomorrow, and if I leave it running it isn’t likely to be there when
I return! As to whether my ‘action dump’ will work (to save repetition).. who knows?

Yay! Two-and-a-half hours into the exercise and I’m in Argentina!

Pfft.. and back to Australia almost immediately :-(    .. and then Chile. Getting there.

Unfortunately, after around 160 minutes of uninterrupted decision making, my screen has started
to black out for half a second at a time. More video cable problems – but why now?!! The count is
up to 1007 though.

I am very sorry to report that the rest of the databases seem to be in nearly as poor a state as
Australia was. There are hundreds if not thousands of pairs of dummy stations, one with no WMO
and one with, usually overlapping and with the same station name and very similar coordinates. I
know it could be old and new stations, but why such large overlaps if that’s the case? Aarrggghhh!
There truly is no end in sight. Look at this:

-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
User Match Decision(s) Please!
TMin stations:    4
1.       0   153  12492   80 MENADO/DR. SA        INDONESIA     1960 1975    -999       0
2.       0   153  12492   80 MENADO/ SAM RATULANG INDONESIA     1986 2004    -999       0
4. 9701400   153  12492   80 MENADO/DR. SAM RATUL INDONESIA     1995 2006    -999       0
5. 9997418   153  12492   81 SAMRATULANGI MENADO  INDONESIA     1973 1989    -999       0
TMax stations:    4
6.       0   153  12492   80 MAPANGET/MANADO      INDONESIA     1960 1975    -999       0
7.       0   153  12492   80 MENADO/ SAM RATULANG ID ID         1957 2004    -999       0
9. 9701400   153  12492   80 MENADO/DR. SAM RATUL INDONESIA     1995 2006    -999       0
10. 9997418   153  12492   81 SAMRATULANGI MENADO  INDONESIA     1972 1989    -999       0

*** Remember: Merge first, then Match ***
Enter ANY pair to match or merge, ‘a’ to auto-match (no merges), or ‘x’ to end:

I honestly have no idea what to do here. and there are countless others of equal bafflingness.

I’ll have to go home soon, leaving it running and hoping none of the systems die overnight :-( ((

.. it survived, thank $deity. And a long run of duplicate stations, each requiring multiple
decisions concerning spatial info, exact names, and data precedence for overlaps. If for any reason
this has to be re-run, it can certainly be speeded up! Some large clouds, too – this one started
with 59 members from each database:

-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
User Match Decision(s) Please!
TMin stations:    7
11. 7101965  4362  -7940   78 TORONTO ISLAND                     1905 1959    -999       0
14. 7163427  4363  -7940   77 TORONTO ISLAND A     CANADA        1957 1994    -999       0
23. 7101987  4380  -7955  194 TORONTO MET RES STN                1965 1988    -999       0
24. 7163434  4380  -7955  194 TORONTO MET RES STN  CANADA        1965 1988    -999       0
36.       0  4388  -7944  233 RICHMOND HILL                      1959 2003    -999       0
39. 7163408  4388  -7945  233 RICHMOND HILL        CANADA        1959 1990    -999       0
40. 7163409  4387  -7943  218 RICHMOND HILL WPCP                 1960 1981    -999       0
TMax stations:    8
70. 7101965  4362  -7940   78 TORONTO ISLAND                     1905 1959    -999       0
71. 7126500  4363  -7940   77 TORONTO ISLAND A                   1957 1994    -999       0
73. 7163427  4363  -7940   77 TORONTO ISLAND A     CANADA        1957 1990    -999       0
82. 7101987  4380  -7955  194 TORONTO MET RES STN                1965 1988    -999       0
83. 7163434  4380  -7955  194 TORONTO MET RES STN  CANADA        1965 1988    -999       0
95.       0  4388  -7944  233 RICHMOND HILL                      1959 2003    -999       0
98. 7163408  4388  -7945  233 RICHMOND HILL        CANADA        1959 1990    -999       0
99. 7163409  4387  -7943  218 RICHMOND HILL WPCP                 1960 1981    -999       0

There were even larger clouds later.

One thing that’s unsettling is that many of the assigned WMo codes for Canadian stations do
not return any hits with a web search. Usually the country’s met office, or at least the
Weather Underground, show up – but for these stations, nothing at all. Makes me wonder if
these are long-discontinued, or were even invented somewhere other than Canada! Examples:

7162040 brockville
7163231 brockville
7163229 brockville
7187742 forestburg
7100165 forestburg

Here’s a heartwarming example of a cloud which self-paired completely (debug ines included):

<BEGIN QUOTE>
DBG: cloud formed with ( 6, 6) members
DBG: automerging done, leaving ( 6, 6)
DBG: pot.auto i,j:    1   1
DBG: i,ncs2m,cs2m(1-5):    1   1           1    8578    8582    8596       0
DBG: paired:    1   1  108 MILE HOUSE ABEL

Attempting to pair stations:
From TMin:            0  5170 -12140  994 108 MILE HOUSE ABEL                1987 2002    -999 -999.00
From TMax:            0  5170 -12140  994 108 MILE HOUSE ABEL                1987 2002   -999  -999.00
DBG: AUTOPAIRED:    1   1
DBG: pot.auto i,j:    2   2
DBG: i,ncs2m,cs2m(1-5):    2   1           2    8578    8582    8596       0
DBG: paired:    2   2  100 MILE HOUSE

Attempting to pair stations:
From TMin:      7194273  5165 -12130 1059 100 MILE HOUSE       CANADA        1970 1999    -999 -999.00
From TMax:      7194273  5165 -12130 1059 100 MILE HOUSE       CANADA        1970 1999   -999  -999.00
DBG: AUTOPAIRED:    2   2
DBG: pot.auto i,j:    3   3
DBG: i,ncs2m,cs2m(1-5):    3   1           3    8578    8582    8596       0
DBG: paired:    3   3  HORSE LAKE

Attempting to pair stations:
From TMin:      7103611  5160 -12120  994 HORSE LAKE                         1983 1994    -999 -999.00
From TMax:      7103611  5160 -12120  994 HORSE LAKE                         1983 1994   -999  -999.00
DBG: AUTOPAIRED:    3   3
DBG: pot.auto i,j:    4   4
DBG: i,ncs2m,cs2m(1-5):    4   1           4    8578    8582    8596       0
DBG: paired:    4   4  LONE BUTTE 2

Attempting to pair stations:
From TMin:      7103629  5155 -12120 1145 LONE BUTTE 2                       1981 1991    -999 -999.00
From TMax:      7103629  5155 -12120 1145 LONE BUTTE 2                       1981 1991   -999  -999.00
DBG: AUTOPAIRED:    4   4
DBG: pot.auto i,j:    5   5
DBG: i,ncs2m,cs2m(1-5):    5   1           5    8578    8582    8596       0
DBG: paired:    5   5  100 MILE HOUSE 6NE

Attempting to pair stations:
From TMin:      7103637  5168 -12122  928 100 MILE HOUSE 6NE                 1987 2002    -999 -999.00
From TMax:      7103637  5168 -12122  928 100 MILE HOUSE 6NE                 1987 2002   -999  -999.00
DBG: AUTOPAIRED:    5   5
DBG: pot.auto i,j:    6   6
DBG: i,ncs2m,cs2m(1-5):    6   1           6    8578    8582    8596       0
DBG: paired:    6   6  WATCH LAKE NORTH

Attempting to pair stations:
From TMin:      7103660  5147 -12112 1069 WATCH LAKE NORTH                   1987 1996    -999 -999.00
From TMax:      7103660  5147 -12112 1069 WATCH LAKE NORTH                   1987 1996   -999  -999.00
DBG: AUTOPAIRED:    6   6
<END QUOTE>

Now arguably, the MILE HOUSE ABEL stations should have rolled into one of the other MILE HOUSE ones with
a WMO code.. but the lat/lon/alt aren’t close enough. Which is as intended.

*

*

Well, it *kind of* worked. Thought the resultant files aren’t exactly what I’d expected:

-rw——-   1 f098     cru      12715138 Jul 25 15:25 act.0707241721.dat
-rw——-   1 f098     cru        435839 Jul 25 15:25 log.0707241721.dat
-rw——-   1 f098     cru       4126850 Jul 25 15:25 mat.0707241721.dat
-rw——-   1 f098     cru       6221390 Jul 25 15:25 tmn.0707021605.dtb.lost
-rw——-   1 f098     cru       2962918 Jul 25 15:25 tmn.0707241721.dat
-rw——-   1 f098     cru             0 Jul 25 15:25 tmx.0702091313.dtb.lost
-rw——-   1 f098     cru       2962918 Jul 25 15:25 tmx.0707241721.dat

act.0707241721.dat: hopefully-complete record of all activities

log.0707241721.dat: hopefully-useful log of odd happenings (and mergeinfo() trails)

mat.0707241721.dat: hopefully-complete list of all merges and pairings

tmn.0707021605.dtb.lost: too-small collection of unpaired stations

tmn.0707241721.dat: too-small output database

tmx.0702091313.dtb.lost: MUCH too-small collection of unpaired stations!!!

tmx.0707241721.dat: too-small (but hey, the same size as the twin) output database

ANALYSIS

Well, LOL, the reason the output databases are so small is that every station looks like this:

9999810  -748  10932  114 SEMPOR               INDONESIA     1971 2000    -999 -999.00
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1971  229  225  225  229  229-9999  223  221  222  225  224-9999

Yes – just one line of data. The write loops went from start year to start year. Ho hum :-/

Not as easy to fix as you might think, seeing as the data may well be the result of a merge and
so can’t just be pasted in from the source database.

As for the ‘unbalanced’ ‘lost’ files: well for a start, the same error as above (just one line of data),
then on top of that, both sets written to the same file. what time did I write that bit, 3am?!! Ecch.

33. So, as expected.. I’m gonna have to write in clauses to make use of the log, act and mat files. I so do
not want to do this.. but not as much as I don’t want to do a day’s interacting again!!

Got it to work.. sort of. Turns out I had included enough information in the ACT file, and so was able to
write auminmaxresync.for. A few teething troubles, but two new databases (‘tm[n|x].0707301343.dtb’)
created with 13654 stations in each. And yes – the headers are identical :-)

[edit: see below - the 'final' databases are tm*.0708071548.dtb]

Here are the header counts, demonstrating that something’s still not quite right..

Original:
14355 tmn.0707021605.dtb.heads

New:
13654 tmn.0707301343.dtb.heads

Lost/merged:
14318 tmn.0707021605.dtb.lost.heads (should be 14355-13654-37 = 664?)
37 tmn.0707021605.dtb.merg.heads (seems low)

Original:
14315 tmx.0702091313.dtb.heads

New:
13654 tmx.0707301343.dtb.heads

Lost/merged:
14269 tmx.0702091313.dtb.lost.heads (should be 14315-13654-46 = 615?)
46 tmx.0702091313.dtb.merg.heads (seems low)

In fact, looking at the original ACT file that we used:

crua6[/cru/cruts/version_3_0/db/dtr] grep ‘usermerg’ act.0707241721.dat | wc -l
258
crua6[/cru/cruts/version_3_0/db/dtr] grep ‘automerg’ act.0707241721.dat | wc -l
889

..so will have to look at how the db1/2xref arrays are prepped and set in the program. Nonetheless the
construction of the new databases looks pretty good. There’s aminor problem where the external reference
field is sometimes -999.00 and sometimes 0. Not sure which is best, probably 0, as the field will usually
be used for reference numbers/characters rather than real data values. Used an inline perl command to fix.

..after some rudimentary corrections:

uealogin1[/cru/cruts/version_3_0/db/dtr] wc -l *.heads
14355 tmn.0707021605.dtb.heads
122 tmn.0707021605.dtb.lost.heads
579 tmn.0707021605.dtb.merg.heads
13654 tmn.0708062250.dtb.heads
14315 tmx.0702091313.dtb.heads
93 tmx.0702091313.dtb.lost.heads
570 tmx.0702091313.dtb.merg.heads
13654 tmx.0708062250.dtb.heads

Almost perfect! But unfortunately, there is a slight discrepancy, and they have a habit of being tips of
icebergs. If you add up the header/station counts of the new tmin database, merg and lost files, you get
13654 + 579 + 122 = 14355, the original station count. If you try the same check for tmax, however, you get
13654 + 570 + 93 = 14317, two more than the original count! I suspected a couple of stations were being
counted twice, so using ‘comm’ I looked for identical headers. Unfortunately there weren’t any!! So I have
invented two stations, hmm. Got the program to investigate, and found two stations in the cross-reference
array which had cross refs *and* merge flags:

ERROR: db2xref(  126) =      127  -14010 :
126> 9596400 -4110  14680    3 LOW HEAD             AUSTRALIA     2000 2006    -999   91293
14010> 9596900 -4170  14710  150 CRESSY RESEARCH STAT AUSTRALIA     1971 2006    -999   91306

and

ERROR: db2xref(13948) =      227    -226 :
13948> 9570600 -3470  14650  145 NARRANDERA AIRPORT   AUSTRALIA     1971 2006    -999       0
226>       0 -3570  14560  110 FINLEY (CSIRO)       AUSTRALIA     2000 2001    -999       0

So in the first case, LOW HEAD has been merged with another station (#14010) AND paired with #127.
Similarly, NARRANDERA AIRPORT has been mreged with #226 and paired with #227. However, these apparent
merges are false! As we see in the first case, 14010 is not LOW HEAD. Similarly for the second case.

Looking in the relevant match file from the process (mat.0707241721.dat) we find:

AUTO MERGE FROM CHAIN:
TMax Stn 1:       0 -4110  14680    3 LOW HEAD                 AUSTRALIA 2000 2006   -999  -999.00
TMax Stn 2:       0 -4105  14678    4 LOW HEAD             AUSTRALIA     2000 2004   -999  -999.00
New Header:       0 -4110  14680    3 LOW HEAD             AUSTRALIA     2000 2006    -999       0
Note: Stn 1 data overwrote Stn 2 data

MANUAL PAIRING FROM CHAIN:
TMin:       9596400 -4110  14680    3 LOW HEAD             AUSTRALIA     2000 2006    -999   91293
TMax:             0 -4110  14680    3 LOW HEAD             AUSTRALIA     2000 2006    -999       0
New Header: 9596400 -4110  14680    3 LOW HEAD             AUSTRALIA     2000 2006    -999   91293

and

AUTO MERGE FROM CHAIN:
TMax Stn 1:       0 -3470  14650  145 NARRANDERA AIRPORT       AUSTRALIA 2000 2006   -999  -999.00
TMax Stn 2: 9570600 -3471  14651  145 NARRANDERA AIRPORT   AUSTRALIA     1972 1980   -999  -999.00
New Header: 9570600 -3470  14650  145 NARRANDERA AIRPORT   AUSTRALIA     1972 2006    -999       0
Note: Stn 2 data overwrote Stn 1 data

MANUAL PAIRING FROM CHAIN:
TMin:       9570600 -3470  14650  145 NARRANDERA AIRPORT   AUSTRALIA     1971 2003    -999       0
TMax:       9570600 -3470  14650  145 NARRANDERA AIRPORT   AUSTRALIA     1972 2006    -999       0
New Header: 9570600 -3470  14650  145 NARRANDERA AIRPORT   AUSTRALIA     1971 2006    -999       0

Found the problem – mistyping of an assignment.. and so:

crua6[/cru/cruts/version_3_0/db/dtr] wc -l *.heads

14355 tmn.0707021605.dtb.heads
122 tmn.0707021605.dtb.lost.heads
579 tmn.0707021605.dtb.merg.heads
13654 tmn.0708071548.dtb.heads

14315 tmx.0702091313.dtb.heads
93 tmx.0702091313.dtb.lost.heads
568 tmx.0702091313.dtb.merg.heads
13654 tmx.0708071548.dtb.heads

Phew! Well the headers are identical for the two new databases:

crua6[/cru/cruts/version_3_0/db/dtr] cmp tmn.0708071548.dtb.heads  tmx.0708071548.dtb.heads |wc -l
0

34. So the to the real test – converting to DTR! Wrote tmnx2dtr.for, which does exactly that. It reported
233 instances where tmin > tmax (all set to missing values) and a handful where tmin == tmax (no prob).
Looking at the 233 illogicals, most of the stations look as though considerable work is needed on them.
This highlights the fact that all I’ve done is to synchronise the tmin and tmax databases with each
other, and with the Australian stations – there is still a lot of data cleansing to perform at some
stage! But not right now :-)

Input Files
TMin: tmn.0708071548.dtb
TMax: tmx.0708071548.dtb

Output file
DTR: dtr.0708071924.dtb

Cases of identical values:  39
Cases of min > max (BAD!): 233
All illegals written to:   illdtr.0708071924.dat

Example of ‘illegal’ values to demonstrate quality of station data:

station:  9600100   587   9532  126 SABANG/CUT BAU       ID ID         1984 2006    -999       0
min data: 2006  203 -197  200-9999 -211  207  233-9999-9999-9999-9999-9999
max data: 2006  290 -299  307-9999 -315  309  308-9999-9999-9999-9999-9999

Doesn’t look very likely!

Normals added:

crua6[/cru/cruts/version_3_0/db/dtr] ./addnormline

****  ADDNORMLINE ****

Calculates monthly normals
for 1961-1990, provided at
least  75% of values are
present. Results go into a
normals line coming after
the header. Operator called
if different normals exist!

Please enter the input database: dtr.0708071924.dtb

Proposed output database name:   dtr.0708081052.dat

ACCEPT/REJECT (A/R): A
Output database name: dtr.0708081052.dat
Derived logfile name: dtr.0708081052.log

So the final DTR database is dtr.0708081052.dtb.

And so to the main process:

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/primaries/dtr] ./anomdtb

> ***** AnomDTB: converts .dtb to anom .txt for gridding *****

> Enter the suffix of the variable required:
.dtr
> Select the .cts or .dtb file to load:
dtr.0708081052.dtb

> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal:           23
> Specify the no. of stdevs at which to reject data:
3
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
0
> Select the generic .txt file to save (yy.mm=auto):
dtr.txt
> Select the first,last years AD to save:
1901,2006
> Operating…

> NORMALS            MEAN percent      STDEV percent
>         .dtb    3746373    65.9
>         .cts     178161     3.1    3924534    69.0
> PROCESS        DECISION percent %of-chk
> no lat/lon          650     0.0     0.0
> no normal       1763302    31.0    31.0
> out-of-range         24     0.0     0.0
> accepted        3924510    69.0
> Dumping years 1901-2006 to .txt files…

<END QUOTE>

So a lower pewrcentage than last time (69.0 vs. 78.9), but then, more data overall so a better
result (3924510 vs. 3167636).

Gridding:
IDL> quick_interp_tdm2,1901,2006,’dtrglo/dtr.’,750,gs=0.5,pts_prefix=’dtrtxt/dtr.’,dumpglo=’dumpglo’

Convert from .glo:
crua6[/cru/cruts/version_3_0/primaries/dtr] ./glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.dtr
Enter a name for the gridded climatology file: clim.6190.lan.dtr.grid
Enter the path and stem of the .glo files: dtrglo/dtr.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files:
Now, CONCENTRATE. Addition or Percentage (A/P)? A
Right, erm.. off I jolly well go!
dtr.01.1901.glo
(etc)
dtr.12.2006.glo

Finally, gridding:
Writing cru_ts_3_00.1901.1910.dtr.dat
Writing cru_ts_3_00.1911.1920.dtr.dat
Writing cru_ts_3_00.1921.1930.dtr.dat
Writing cru_ts_3_00.1931.1940.dtr.dat
Writing cru_ts_3_00.1941.1950.dtr.dat
Writing cru_ts_3_00.1951.1960.dtr.dat
Writing cru_ts_3_00.1961.1970.dtr.dat
Writing cru_ts_3_00.1971.1980.dtr.dat
Writing cru_ts_3_00.1981.1990.dtr.dat
Writing cru_ts_3_00.1991.2000.dtr.dat
Writing cru_ts_3_00.2001.2006.dtr.dat
Writing cru_ts_3_00.1901.2006.dtr.dat

35. Onto the secondaries, working from the rerun methodology (see section 20 above).

Began with temperature, using the anomaly txt files from the half-degree generation:

IDL> quick_interp_tdm2,1901,2006,’tmpbin/tmpbin’,1200,gs=2.5,dumpbin=’dumpbin’,pts_prefix=’tmp0km0705101334txt/tmp.’

This produced binaries such as ‘tmpbin1901′.

Then precipitation:

IDL> quick_interp_tdm2,1901,2006,’prebin/prebin’,450,gs=2.5,dumpbin=’dumpbin’,pts_prefix=’pre0km0612181221txt/pre.’

Finally, dtr:

IDL> quick_interp_tdm2,1901,2006,’dtrbin/dtrbin’,50,gs=2.5,dumpbin=’dumpbin’,pts_prefix=’dtrtxt/dtr.’

*** EEEK! Is that ’50′ a mistype? Meaning that anything using binary DTR will need re-doing? (RAL, Dec 07) ***

And so to the synthetics.

FRS:

IDL> .compile /cru/cruts/fromdpe1a/code/idl/pro/rdbin.pro
% Compiled module: RDBIN.
IDL> .compile /cru/cruts/fromdpe1a/code/idl/pro/frs_gts_tdm.pro
% Compiled module: FRS_GTS.
IDL> frs_gts,dtr_prefix=’dtrbin/dtrbin’,tmp_prefix=’tmpbin/tmpbin’,1901,2006,outprefix=’frssyn/frssyn’
IDL> quick_interp_tdm2,1901,2006,’frsgrid/frsgrid’,750,gs=0.5,dumpglo=’dumpglo’,nostn=1,synth_prefix=’frssyn/frssyn’

crua6[/cru/cruts/version_3_0/secondaries/frs] ../glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.frs
Enter a name for the gridded climatology file: clim.6190.lan.frs.grid
Enter the path and stem of the .glo files: frsgrid/frsgrid.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files:
Now, CONCENTRATE. Addition or Percentage (A/P)? A
Right, erm.. off I jolly well go!
frsgrid.01.1901.glo
(etc)
frsgrid.12.2006.glo

crua6[/cru/cruts/version_3_0/secondaries/frs] ../mergegrids
Welcome! This is the MERGEGRIDS program.
I will create decadal and full gridded files
from the output files of (eg) glo2abs.for.

Enter a gridfile with YYYY for year and MM for month: frsgridabs/frsgrid.MM.YYYY.glo.abs
Enter Start Year:  1901
Enter Start Month: 01
Enter End Year:    2006
Enter End Month:   12

Please enter a sample OUTPUT filename, replacing
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.frs.dat
Writing cru_ts_3_00.1901.1910.frs.dat
Writing cru_ts_3_00.1911.1920.frs.dat
Writing cru_ts_3_00.1921.1930.frs.dat
Writing cru_ts_3_00.1931.1940.frs.dat
Writing cru_ts_3_00.1941.1950.frs.dat
Writing cru_ts_3_00.1951.1960.frs.dat
Writing cru_ts_3_00.1961.1970.frs.dat
Writing cru_ts_3_00.1971.1980.frs.dat
Writing cru_ts_3_00.1981.1990.frs.dat
Writing cru_ts_3_00.1991.2000.frs.dat
Writing cru_ts_3_00.2001.2006.frs.dat

RD0:

IDL> .compile /cru/cruts/fromdpe1a/code/idl/pro/rdbin.pro
% Compiled module: RDBIN.
IDL> .compile /cru/cruts/fromdpe1a/code/idl/pro/rd0_gts_tdm.pro
% Compiled module: RD0_GTS.
IDL> rd0_gts,1901,2006,1961,1990,outprefix=’rd0syn/rd0syn’,pre_prefix=’prebin/prebin’
Reading precip and rd0 normals
% Compiled module: STRIP.
yes
filesize=     6220800
gridsize=     0.500000
% Compiled module: DEFXYZ.
yes
filesize=     6220800
gridsize=     0.500000
% Compiled module: DAYS.
Calculating synthetic Rd0 normal
1961
yes
filesize=      248832
gridsize=      2.50000
% Compiled module: RD0CAL.
1962
yes

(etc)

2006
yes
filesize=      248832
gridsize=      2.50000
% Program caused arithmetic error: Floating divide by 0
% Program caused arithmetic error: Floating illegal operand
IDL>

(as before, see section 20.)

IDL> quick_interp_tdm2,1901,2006,’rd0grid/rd0grid’,450,gs=0.5,dumpglo=’dumpglo’,nostn=1,synth_prefix=’rd0syn/rd0syn’

crua6[/cru/cruts/version_3_0/secondaries/rd0] ../glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: forrtl: error (69): process interrupted (SIGINT)
crua6[/cru/cruts/version_3_0/secondaries/rd0] mkdir rd0gridabs
crua6[/cru/cruts/version_3_0/secondaries/rd0] ../glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.wet
Enter a name for the gridded climatology file: clim.6190.lan.wet.grid
Enter the path and stem of the .glo files: rd0grid/rd0grid.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: rd0gridabs/
Now, CONCENTRATE. Addition or Percentage (A/P)? A
Right, erm.. off I jolly well go!
rd0grid.01.1901.glo
(etc)
rd0grid.12.2006.glo

crua6[/cru/cruts/version_3_0/secondaries/rd0] ../mergegrids
Welcome! This is the MERGEGRIDS program.
I will create decadal and full gridded files
from the output files of (eg) glo2abs.for.

Enter a gridfile with YYYY for year and MM for month: rd0gridabs/rd0grid.MM.YYYY.glo.abs
Enter Start Year:  1901
Enter Start Month: 01
Enter End Year:    2006
Enter End Month:   12

Please enter a sample OUTPUT filename, replacing
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.rd0.dat
Writing cru_ts_3_00.1901.1910.rd0.dat
(etc)

I have to admit, I still don’t understand secondary parameter generation. I’ve read the papers, and the
miniscule amount of ‘Read Me’ documentation, and it just doesn’t make sense. In particular, why use 2.5
degree grids of the primaries instead of 0.5? Why deliberately lose spatial resolution, only to have to
reinterpolate later?

No matter; on to Vapour Pressure. Here’s the complete output from the initial binary gridding,using dtr and tmp:

IDL> vap_gts_anom,dtr_prefix=’dtrbin/dtrbin’,tmp_prefix=’tmpbin/tmpbin’,1901,2006,outprefix=’vapsyn/vapsyn’,dumpbin=1
% Compiled module: VAP_GTS_ANOM.
% Compiled module: RDBIN.
% Compiled module: STRIP.
% Compiled module: DEFXYZ.
Land,sea:       56016       68400
Calculating tmn normal
% Compiled module: TVAP.
Calculating synthetic vap normal
% Compiled module: ESAT.
Calculating synthetic anomalies
% Compiled module: MOMENT.
1901 vap (x,s2,<<,>>):  1.61250e-05  6.15570e-06    -0.160607     0.222689
% Compiled module: WRBIN.
1902 vap (x,s2,<<,>>): -0.000123188  3.46116e-05    -0.268891    0.0261283
1903 vap (x,s2,<<,>>):  6.86689e-05  4.52675e-06    -0.121429     0.123995
1904 vap (x,s2,<<,>>): -1.30788e-05  1.83887e-05    -0.454975    0.0919596
1905 vap (x,s2,<<,>>):  1.94645e-05  1.32224e-05    -0.408679    0.0498396
1906 vap (x,s2,<<,>>):  3.22279e-05  3.74796e-06    -0.178658    0.0261283
1907 vap (x,s2,<<,>>): -2.56545e-05  1.68228e-05    -0.268768    0.0498040
1908 vap (x,s2,<<,>>):  6.39573e-05  3.49149e-06    -0.173230     0.354836
1909 vap (x,s2,<<,>>):  3.50080e-05  3.21530e-06    -0.201157    0.0261283
1910 vap (x,s2,<<,>>):  3.45249e-05  6.15026e-06    -0.130285     0.144744
1911 vap (x,s2,<<,>>):  3.99470e-05  5.85673e-06    -0.360082    0.0261283
1912 vap (x,s2,<<,>>): -7.91931e-06  1.06891e-05    -0.279282    0.0261283
1913 vap (x,s2,<<,>>):  6.07153e-05  7.10663e-07   -0.0148902    0.0261283
1914 vap (x,s2,<<,>>):  7.22507e-05  2.52354e-06    -0.130205     0.124774
1915 vap (x,s2,<<,>>): -2.11176e-05  1.59592e-05    -0.308456    0.0579963
1916 vap (x,s2,<<,>>): -8.95735e-05  2.41852e-05    -0.247123     0.140438
1917 vap (x,s2,<<,>>): -0.000105104  2.43058e-05    -0.229282     0.282290
1918 vap (x,s2,<<,>>):  1.14711e-05  7.76188e-06    -0.248782    0.0261283
1919 vap (x,s2,<<,>>):  2.51597e-05  5.75406e-06    -0.295303     0.215085
1920 vap (x,s2,<<,>>): -2.78549e-06  1.81183e-05    -0.373193    0.0261283
1921 vap (x,s2,<<,>>):  6.07153e-05  7.10663e-07   -0.0148902    0.0261283
1922 vap (x,s2,<<,>>): -1.86602e-05  1.22345e-05    -0.275667    0.0261283
1923 vap (x,s2,<<,>>):  5.76800e-05  1.22728e-06    -0.170021    0.0261283
1924 vap (x,s2,<<,>>):  6.07153e-05  7.10663e-07   -0.0148902    0.0261283
1925 vap (x,s2,<<,>>):  8.32519e-05  5.55618e-06    -0.109315     0.186182
1926 vap (x,s2,<<,>>):  0.000106602  5.15263e-06    -0.105764     0.206929
1927 vap (x,s2,<<,>>):  5.23023e-05  2.64333e-06    -0.194649    0.0498040
1928 vap (x,s2,<<,>>):  5.50934e-05  2.47944e-06    -0.314917    0.0261283
1929 vap (x,s2,<<,>>): -0.000524952  0.000155755    -0.417342     0.215959
1930 vap (x,s2,<<,>>):  8.28323e-05  1.87314e-05    -0.328074     0.193805
1931 vap (x,s2,<<,>>): -7.80687e-05  3.63543e-05    -0.315060     0.215417
1932 vap (x,s2,<<,>>):  5.62579e-05  3.81547e-06    -0.249130     0.120583
1933 vap (x,s2,<<,>>): -3.47433e-05  1.69009e-05    -0.218800     0.148224
1934 vap (x,s2,<<,>>):  0.000156604  1.56121e-05    -0.173230     0.152809
1935 vap (x,s2,<<,>>):  6.69520e-05  4.91451e-06    -0.160529     0.120391
1936 vap (x,s2,<<,>>): -0.000255663  6.63373e-05    -0.398866    0.0261283
1937 vap (x,s2,<<,>>):  6.99402e-05  2.70766e-05    -0.328074     0.201202
1938 vap (x,s2,<<,>>):  5.91796e-05  6.70722e-06    -0.215017     0.155977
1939 vap (x,s2,<<,>>):  4.88266e-05  5.25789e-06    -0.173294    0.0893239
1940 vap (x,s2,<<,>>):  9.63896e-06  7.45103e-06    -0.214763    0.0758103
1941 vap (x,s2,<<,>>):  4.11127e-05  4.15525e-06    -0.234030    0.0261283
1942 vap (x,s2,<<,>>): -9.97969e-05  3.88466e-05    -0.288682     0.148893
1943 vap (x,s2,<<,>>):  8.38607e-05  3.48416e-06   -0.0148902     0.163562
1944 vap (x,s2,<<,>>):  7.96681e-05  7.91305e-06    -0.227413     0.104055
1945 vap (x,s2,<<,>>):  3.37215e-05  3.99524e-06    -0.248782    0.0261283
1946 vap (x,s2,<<,>>):  5.31976e-05  2.63755e-06    -0.128263     0.163584
1947 vap (x,s2,<<,>>):  0.000131113  1.66296e-05    -0.353903     0.193758
1948 vap (x,s2,<<,>>):  6.80941e-05  1.62353e-06   -0.0148902     0.163624
1949 vap (x,s2,<<,>>):  2.47925e-05  2.45819e-05    -0.328074     0.237848
1950 vap (x,s2,<<,>>): -9.57348e-05  7.78468e-05    -0.366764     0.726541
1951 vap (x,s2,<<,>>): -6.54446e-06  1.35656e-05    -0.446058    0.0261283
1952 vap (x,s2,<<,>>): -0.000158974  5.02732e-05    -0.262313     0.193617
1953 vap (x,s2,<<,>>):  1.18525e-05  4.22691e-05    -0.282204     0.230629
1954 vap (x,s2,<<,>>): -0.000151975  6.78713e-05    -0.373235     0.230602
1955 vap (x,s2,<<,>>): -0.000134153  5.23124e-05    -0.298578    0.0841820
1956 vap (x,s2,<<,>>): -9.61671e-05  5.20484e-05    -0.492004    0.0888951
1957 vap (x,s2,<<,>>): -1.18048e-05  1.31769e-05    -0.220902    0.0261283
1958 vap (x,s2,<<,>>): -8.61762e-06  1.12079e-05    -0.207799     0.148170
1959 vap (x,s2,<<,>>):  8.27399e-05  4.88857e-06   -0.0929929     0.170919
1960 vap (x,s2,<<,>>):  3.38773e-05  1.53901e-05    -0.207944     0.155940
1961 vap (x,s2,<<,>>):  5.72571e-05  9.01807e-07   -0.0653905    0.0261283
1962 vap (x,s2,<<,>>):  8.20891e-05  3.78016e-06    -0.240435     0.126662
1963 vap (x,s2,<<,>>): -0.000108489  3.85148e-05    -0.266356    0.0836364
1964 vap (x,s2,<<,>>):  3.02043e-05  6.37207e-06    -0.240547     0.150816
1965 vap (x,s2,<<,>>):  5.76898e-05  2.48022e-06    -0.279282     0.143283
1966 vap (x,s2,<<,>>): -0.000300312  5.32054e-05    -0.622719    0.0261283
1967 vap (x,s2,<<,>>):  6.43500e-05  8.58218e-07   -0.0148902    0.0496181
1968 vap (x,s2,<<,>>): -0.000241750  4.22773e-05    -0.214442     0.271730
1969 vap (x,s2,<<,>>): -0.000568502  9.92260e-05    -0.385322    0.0732047
1970 vap (x,s2,<<,>>):  6.07153e-05  7.10663e-07   -0.0148902    0.0261283
1971 vap (x,s2,<<,>>):  2.15333e-05  4.77100e-06    -0.188071    0.0261283
1972 vap (x,s2,<<,>>): -7.14160e-05  3.56948e-05    -0.365803     0.201611
1973 vap (x,s2,<<,>>):  5.77503e-05  1.17079e-06    -0.160550    0.0261283
1974 vap (x,s2,<<,>>):  3.49354e-05  4.93069e-06    -0.149678     0.144313
1975 vap (x,s2,<<,>>):  6.14429e-05  7.36204e-07   -0.0148902    0.0380432
1976 vap (x,s2,<<,>>):  6.49657e-05  3.25410e-06    -0.266356     0.165472
1977 vap (x,s2,<<,>>):  0.000107180  1.92804e-05    -0.304625     0.208459
1978 vap (x,s2,<<,>>): -4.80106e-05  3.28909e-05    -0.285492     0.105108
1979 vap (x,s2,<<,>>): -0.000102001  2.35900e-05    -0.214390     0.112952
1980 vap (x,s2,<<,>>):  4.16963e-05  2.70211e-06    -0.144913    0.0864268
1981 vap (x,s2,<<,>>):  0.000274196  1.86668e-05   -0.0148902     0.222522
1982 vap (x,s2,<<,>>):  8.57426e-07  7.08135e-06    -0.161781    0.0831981
1983 vap (x,s2,<<,>>): -5.84499e-06  1.76470e-05    -0.234194     0.128289
1984 vap (x,s2,<<,>>): -0.000106476  2.97454e-05    -0.335850     0.150833
1985 vap (x,s2,<<,>>):  9.32757e-06  4.35533e-05    -0.323331     0.222522
1986 vap (x,s2,<<,>>):  7.22110e-05  4.76179e-06    -0.141725     0.185658
1987 vap (x,s2,<<,>>): -2.27107e-05  2.09631e-05    -0.291446     0.103599
1988 vap (x,s2,<<,>>):  6.58090e-05  9.21014e-07   -0.0148902    0.0670816
1989 vap (x,s2,<<,>>):  9.54406e-05  1.72599e-05    -0.266297     0.160293
1990 vap (x,s2,<<,>>):  0.000218826  3.56583e-05    -0.174187     0.236204
1991 vap (x,s2,<<,>>):  5.93288e-05  8.18618e-07   -0.0776650    0.0261283
1992 vap (x,s2,<<,>>):  7.57687e-05  4.27091e-06    -0.174292     0.215085
1993 vap (x,s2,<<,>>): -1.69378e-05  2.36942e-05    -0.314882    0.0420169
1994 vap (x,s2,<<,>>):  6.36348e-05  1.18760e-06   -0.0148902     0.163543
1995 vap (x,s2,<<,>>):  0.000281573  6.09912e-05    -0.463574     0.259426
1996 vap (x,s2,<<,>>):  5.03362e-05  5.47691e-06    -0.224751     0.124774
1997 vap (x,s2,<<,>>):  0.000132649  2.97693e-05    -0.446455     0.281070
1998 vap (x,s2,<<,>>):  5.96544e-07  3.39098e-05    -0.359037     0.201228
1999 vap (x,s2,<<,>>):  5.91499e-05  2.37232e-06    -0.166206     0.215985
2000 vap (x,s2,<<,>>):  4.06034e-05  4.61604e-06   -0.0898572     0.191977
2001 vap (x,s2,<<,>>):  0.000138230  8.53512e-06   -0.0512625     0.206929
2002 vap (x,s2,<<,>>):  0.000218003  4.36873e-05    -0.760830     0.282290
2003 vap (x,s2,<<,>>):  7.00864e-05  7.67472e-06    -0.301868     0.237875
2004 vap (x,s2,<<,>>):  5.49200e-06  2.13246e-05    -0.500544     0.112129
2005 vap (x,s2,<<,>>):  6.05939e-06  5.83817e-05    -0.885566     0.199814
2006 vap (x,s2,<<,>>):  9.02885e-05  3.60834e-05    -0.455230     0.607388

How very useful! No idea what any of that means. although it’s heartwarming to see that it’s
nothing like the results of the 2.10 rerun, where 1991 looked like this:

1991 vap (x,s2,<<,>>):  0.000493031  0.000742087   -0.0595093      1.86497

Now, of course, it looks like this:

1991 vap (x,s2,<<,>>):  5.93288e-05  8.18618e-07   -0.0776650    0.0261283

From this I can deduce.. err.. umm..

Anyway now I need to use whatever VAP station data we have. And here I’m a little flaky (again),
the vap database hasn’t been updated, is it going to be? Asked Dave L and he supplied summaries
he’d produced of CLIMAT bulletins from 2000-2006. Slightly odd format but very useful all the
same.

And now, a brief interlude. As we’ve reached the stage of thinking about secondary variables, I
wondered about the CLIMAT updates, as one of the outstanding work items is to write routines to
convert CLIMAT and MCDW bulletins to CRU format (so that mergedb.for can read them). So I look at
a CLIMAT bulletin, and what’s the first thing I notice? It’s that there is absolutely no station
identification information apart from the WMO code. None. No lat/lon, no name, no country. Which
means that all the bells and whistles I built into mergedb, (though they were needed for the db
merging of course) are surplus to requirements. The data must simply be added to whichever station
has the same number at the start, and there’s no way to check it’s right. I don’t appear to have a
copy of a MCDW bulletin yet, only a PDF.. I wonder if that’s the same? Anyway, back to the main job.

As I was examining the vap database, I noticed there was a ‘wet’ database. Could I not use that to
assist with rd0 generation? well.. it’s not documented, but then, none of the process is so I might
as well bluff my way into it! Units seem to vary:

CLIMAT bulletins have day counts:

SURFACE LAND ‘CLIMAT’ DATA FOR  2006/10.   MISSING DATA=-32768
MET OFFICE, HADLEY CENTRE CROWN COPYRIGHT
WMO BLK WMO STN   STNLP    MSLP    TEMP   VAP P DAYS RN  RAIN R   QUINT SUN HRS   SUN %   MIN_T   MAX_T
01     001   10152   10164      5     52      9       63        2   -32768  -32768     -12      20

Dave L’s CLIMAT update has days x 10:

100100 7093  -867    9JAN MAYEN(NOR-NAVY) NORWAY       20002006        -7777777
2000  150  120  180   60  150   20   30  130  120  150   70   70

The existing ‘wet’ database (wet.0311061611.dtb) has days x 100:

10010  7093   -866    9 JAN MAYEN(NOR NAVY)  NORWAY        1990 2003   -999     -999
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1990-9999-9999-9999-9999  400  600  600 1800 1500 1100  800 1800

The published climatology has days x 100 as well:

Tyndall Centre grim file created on 13.01.2004 at 15:22 by Dr. Tim Mitchell
.wet = wet day frequency (days)
0.5deg lan clim:1961-90 MarkNew but adj so that wet=<pre
[Long=-180.00, 180.00] [Lati= -90.00,  90.00] [Grid X,Y= 720, 360]
[Boxes=   67420] [Years=1975-1975] [Multi=    0.0100] [Missing=-999]
Grid-ref=   1, 148
1760 1580 1790 1270  890  510  470  290  430  400  590 1160

So I guess we go with days x100. Dave’s files will have to be reformatted anyway so it’s a
negligible overhead. Okaaaay..

Wrote dave2cru.for to convert Dave L’s CLIMAT composites to CRU-format files in the appropriate
units. One problem is the significant number of stations without names or countries: they are
simply ‘xxxxxxxxxx’ and I’m not sure how mergedb is going to take to that! Well only one way to
find out.. so I converted the rain days data:

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db] ./dave2cru

DAVE2CRU – convert Dave L CLIMAT composites to dtb files
Enter the CLIMAT composite to be converted: CLIMAT_MCDW_MCDW_rdy_updat_merged

Example data line from that file:
2000  150  120  180   60  150   20   30  130  120  150   70   70

Please enter a factor to apply (or 1): 10
Please enter the 3-ch parameter code: rd0

The output file will be: rd0.0708151122.dtb

3411 stations written.
<END QUOTE>

Then tried to merge that into wet.0311061611.dtb, and immediately hot formatting issues – that pesky last
field has been badly abused here, taking values including:

-999.00
0.00
nocode     (yes, really!)

Had a quick review of mergedb; it won’t be trivial to update it to treat that field as a8. So reluctantly,
changed all the ‘nocode’ entries to ’0′:

crua6[/cru/cruts/version_3_0/db/rd0] perl -pi -e ‘s/nocode/     0/g’ wet.0311061611.dt*

Unfortunately, that didn’t solve the problems.. as there are alphanumerics in that field later on:

-712356  5492 -11782  665 SPRING CRK WOLVERINE CANADA        1969 1988   -999  307F0P9

So.. ***sigh***.. will have to alter mergedb.for to treat that field as alpha. Aaarrgghhh.

Did that. Next problem is best summarised with an example:

**************************************************
*                                                *
*  OPERATOR DECISION REQUIRED:                   *
*                                                *
100100  7093   -867    9 JAN MAYEN(NOR-NAVY)  NORWAY        2000 2006    -999       0
*                                                *
*  This incoming station has a possible match in *
*  the current database, but either the WMO code *
*  or the lat/lon values differ.                 *
*                                                *
*  Incoming:                                     *
100100  7093   -867    9 JAN MAYEN(NOR-NAVY)  NORWAY        2000 2006    -999       0
*  Potential match:                              *
10010  7093   -866    9 JAN MAYEN(NOR NAVY)  NORWAY        1990 2003    -999    -999

Yes, the ‘wet’ database features old-style 5-digit WMO codes. The best approach is probably to alter
mergedb again, to multiply any 5-digit codes by 10. Not sure if there is a similar problem with 7-digit
codes, hopefully not.

Oh, more bloody delays. Modified mergedb to ‘adjust’ the WMO codes, fine. But then a proper run of it
just demonstrated that it’s far too picky. Even a 0.01-degree difference in coordinates required ops
intervention. What we need for updates is an absolute priority for WMO codes, and only a shout if the
name or the spatial coordinates are waaay off. I am seriously considering scrapping mergedb in favour of
a version of auminmaxresync – its cloud-based approach and ‘intelligent’ matching is far more efficient
than mergedb’s brute-force attack, as you’d expect from a program built on top of that knowledge. And it
does save all its actions. But I don’t know that I have the wherewithal.. okay, I do.

Derived newmergedb.for from auminmaxresync.for. Should be fairly robust. Doesn’t offer as many bells
and whistles as mergedb.for, but should be faster and more helpful all the same.

Well.. it works.. but the data doesn’t. It’s that old devil called WMO numbering again:

Comparing Update:  718000  4868    622  217 NANCY/ESSEY          FRANCE        2001 2002    -999       0
..with Master:  718000  4665  -5306   28 CAPE RACE (MARS)     CANADA        1920 1969   -999     -999

Now what’s happened here? Well the CLIMAT numbering only gives five digits (71 800) and so an extra zero
has been added to bring it up to six. Unfortunately, that’s the wrong thing to do, because that’s the code
of CAPE RACE. The six-digit code for NANCY/ESSEY is 071800. Mailed Phil and DL as this could be a big
problem – many of the Update stations have no other metadata!

Also noticed that some of the CLIMAT data seemed to be missing, eg for NANCY/ESSEY:

718000 4868   622  217NANCY/ESSEY         FRANCE       20002006        -7777777
2000-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2001-9999  110-9999-9999-9999-9999-9999  120  150  110  130   90
2002   80  160   70   70   80   30   60  120  100  130  180  140
2003-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2004-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2005-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
2006-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999

I have the CLIMAT bulletin for 10/2006, which gives data for Rain Days (12 in this case). It doesn’t seem
likely that nothing was reported after 2002.

I am now wondering whether it would be best to go back to the MCDW and CLIMAT bulletins themselves and work
directly from those.

Well, information is always useful. And I probably did know this once.. long ago. All official WMO codes
are five digits, countrycountrystationstationstation. However, we use seven-digit codes, because when no
official code is available we improvise with two extra digits. Now I can’t see why we didn’t leave the rest
at five digits, that would have been clear. I also can’t see why, if we had to make them all seven digits,
we extended the ‘legitimate’ five-digit codes by multiplying by 100, instead of adding two numerically-
meaningless zeros at the most significant (left) end. But, that’s what happened, and like everything else
that’s the way it’s staying.

So – incoming stations with WMO codes can only match stations with codes ending ’00′. Put another way, for
comparison purposes any 7-digit codes ending ’00′ should be truncated to five digits.

Also got the locations of the original CLIMAT and MCDW bulletins.

CLIMAT are here:

http://hadobs.metoffice.com/crutem3/data/station_updates/

MCDW are here:
ftp://ftp1.ncdc.noaa.gov/pub/data/mcdw

http://www1.ncdc.noaa.gov/pub/data/mcdw/

Downloaded all CLIMAT and MCDW bulletins (CLIMAT 01/2003 to 07/2007; MCDW 01/2003 to 06/2007 (with a
mysterious extra called ‘ssm0302.Apr211542′ – which turns out to be identical to ssm0302.fin)).

Wrote mcdw2cru.for and climat2cru.for, just guess what they do, go on..

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/MCDW] ./mcdw2cru

MCDW2CRU: Convert MCDW Bulletins to CRU Format

Enter the earliest MCDW file: ssm0301.fin
Enter the latest MCDW file (or <ret> for single files): ssm0706.fin

All Files Processed
tmp.0709071541.dtb: 2407 stations written
vap.0709071541.dtb: 2398 stations written
pre.0709071541.dtb: 2407 stations written
sun.0709071541.dtb: 1693 stations written

Thanks for playing! Byeee!
<END QUOTE>

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/CLIMAT] ./climat2cru

CLIMAT2CRU: Convert MCDW Bulletins to CRU Format

Enter the earliest CLIMAT file: climat_data_200301.txt
Enter the latest CLIMAT file (or <ret> for single file): climat_data_200707.txt

All Files Processed
tmp.0709071547.dtb: 2881 stations written
vap.0709071547.dtb: 2870 stations written
pre.0709071547.dtb: 2878 stations written
sun.0709071547.dtb: 2020 stations written
tmn.0709071547.dtb: 2800 stations written
tmx.0709071547.dtb: 2800 stations written

Thanks for playing! Byeee!
<END QUOTE>

Of course, it wasn’t quite that simple. MCDW has an inexplicably complex format, which I’m sure will vary
over time and eventually break the converter. For instance, most text is left-justified, except the month
names for the overdue data, which are right-justified. Also, there is no missing value code, just blank
space if a value is absent. This necessitates reading everything as strings and then testing for content.
Oh, and a small amount of rain is marked ‘T’.. as are small departures from the mean!!

So moan over, now we have a set of updates for the secondary databases. And, indeed for the primary ones -
except that I’ve already processed those, as updated by Dave L.. er.. ah well. So as I’m running stupidly
late anyway – why not find out? It’s that Imp of the Perverse on my shoulder again.

Actually as I examined all the databases in the tree to work out what was wheat and what chaff, I had my
awful memory jogged quite nastily: WE NEED RAIN DAYS. So both conversion progs will need adjusting and
re-running!! Waaaaah! And frankly at 18:45 on a Friday evening.. it’s not gonna happen right now.

..okay, a another week, another razorblade to slide down. Modified mcdw2cru to include rain days:

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/MCDW] ./mcdw2cru

MCDW2CRU: Convert MCDW Bulletins to CRU Format

Enter the earliest MCDW file: ssm0301.fin
Enter the latest MCDW file (or <ret> for single files): ssm0706.fin

All Files Processed
tmp.0709111032.dtb: 2407 stations written
vap.0709111032.dtb: 2398 stations written
rdy.0709111032.dtb: 2407 stations written
pre.0709111032.dtb: 2407 stations written
sun.0709111032.dtb: 1693 stations written

Thanks for playing! Byeee!
<END QUOTE>

Checked, and the four preexisting databases match perfectly with their counterparts, so I didn’t break
anything in the adjustments. and the rdy file looks good too (actually the above is the *final* run;
there were numerous bugs as per).

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/CLIMAT] ./climat2cru

CLIMAT2CRU: Convert MCDW Bulletins to CRU Format

Enter the earliest CLIMAT file: climat_data_200301.txt
Enter the latest CLIMAT file (or <ret> for single file): climat_data_200707.txt

All Files Processed
tmp.0709101706.dtb: 2881 stations written
vap.0709101706.dtb: 2870 stations written
rdy.0709101706.dtb: 2876 stations written
pre.0709101706.dtb: 2878 stations written
sun.0709101706.dtb: 2020 stations written
tmn.0709101706.dtb: 2800 stations written
tmx.0709101706.dtb: 2800 stations written

Thanks for playing! Byeee!
<END QUOTE>

Again, existing outputs are unchanged and the new rdy file looks OK (though see bracketed note above for MCDW).

So.. to the incorporation of these updates into the secondary databases. Oh, my.

Beginning with Rain Days, known variously as rd0, rdy, pdy.. this allowed me to modify newmergedb.for to cope
with various ‘freedoms’ enjoyed by the existing databases (such as six-digit WMO codes). And then, when run,
an unexpected side-effect of my flash correlation display thingy: it shows up existing problems with the data!

Here is the first ‘issue’ encountered by newmergedb, taken from the top and with my comments in <anglebrackets>:

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/rd0] ./newmergedb

WELCOME TO THE DATABASE UPDATER

Before we get started, an important question:
Should the incoming ‘update’ header info and data take precedence over the existing database?
Or even vice-versa? This will significantly reduce user decisions later, but is a big step!

Enter ‘U’ to give Updates precedence, ‘M’ to give Masters precedence, ‘X’ for equality: U
Please enter the Master Database name: wet.0311061611.dtb
Please enter the Update Database name: rdy.0709111032.dtb

Reading in both databases..
Master database stations:     4988
Update database stations:     2407

Looking for WMO code matches..

***** OPERATOR ADJUDICATION REQUIRED *****

In attempting to pair two stations, possible data incompatibilities have been found.

MASTER:  221130  6896   3305   51 MURMANSK             EX USSR       1936 2003   -999     -999
UPDATE: 2211300  6858   3303   51 MURMANSK             RUSSIAN FEDER 2003 2007    -999       0

CORRELATION STATISTICS (enter ‘C’ for more information):
> -0.60 is minimum correlation coeff.
>  0.65 is maximum correlation coeff.
> -0.01 is mean correlation coeff.

Enter ‘Y’ to allow, ‘N’ to deny, or an information code letter: C

<OKAY – SO I’VE REQUESTED A DISPLAY OF THE LAGGED CORRELATIONS>

Master Data: Correlation with Update first year aligned to this year -v
1936  900  600 1000  800 1000  900 1300 1700 2100 1800  900 1000    0.27
1937  300 1400 1300  800 1400 1800  500 1200 1600 1000 1100 1500    0.15
1938  900 1000 1500 1800 1200 1500 1200 1700  500  700 1600  700   -0.13
1939 1500 1300 1100 1400 1200 1200 1000 1300 1800 1600 1100 1300    0.24
1940 1000 1500 1000 1200 1100 1700 2600 1500 1500 1400 1700 1100    0.15
1941 1800 1200 1000 1200  900 1100  900 1200 1900 1500 1000 1400    0.48
1942  900  900 1700  900 1600 1000  600 1100 1400 1300  700  700    0.51
1943  800 1000 1000 1300  900  800 1500 1600 1400 1500 1300 1200    0.44
1944 1000  400  900  800 1200  600  900 2000  900 1100 1000  900    0.32
1945  500  400  700  700  800 1800  900 1100 1200 1100 1300  700    0.19
1946 1200 1200  100  700  900 1200  400  900  800 1900 1300 1400    0.16
1947  900 1300 1300 1100 1600 1000  800 1400 1400 1700 2100 1900    0.09
1948 1100 1400 1400 1200 1300 1800 1200 1700 1500 2200 2100 1900    0.10
1949 1100 1100  500 1500 1600 1100 1500 1200 2200 2500  900 1600    0.04
1950 1300  800 1000 1100 1700 1200 1500  800 1100 1300 1500 1400   -0.04
1951 1100  600 1400 1400 1500 1600 2100 1300 1500 1700 2000 1700   -0.13
1952 2100  800 1100 1800 1300 1200 2400 2200 1600 1000 1000 2300   -0.23
1953 2100 1400 2100 1500  900  300 1300 1700 1500  800 1200  800   -0.24
1954 2100  600 1300 1000 1300 1700 1600 2000 1800 1300 1400 1200   -0.40
1955 2200 1300  900 1000 1600 2000 1100 1400 1000 2100 2300 1600   -0.20
1956 1300 1100 1300  400 1600 1300  900 1500 2000 1300 2000 1400   -0.30
1957 1700 1600 1100 1100 1900 1900 1400 1600 1400 1700 2300 2600   -0.27
1958 1300 2200 1900  700 1500 1200 2100 1000 1900 1700 1600 1000   -0.21
1959 2500 1800 1300  900  900 1600 1600 1500 2200 1700 1000  900   -0.33
1960 1800 1700 1500  400 1300 1500  400 1000 1300 1500 1000 1400   -0.21
1961 2100 1800 2200 1500  800 1400 1600 1100 1900 1200 1200 2100   -0.59
1962 2100 1100 1000 1500 1300 1100 1300 1700 1200 2000 1600 2300   -0.37
1963 2100 2100 2000 1000  700 2000 1400 1800 1400 1600 2000 2400   -0.56
1964 2400 1100 1000 1700 1100 1400 1400 1400 2000 1200 2100 1800   -0.42
1965 1400 2100 1300 1000 1700 1700 1400 2400 1300 2100 1900 2100   -0.41
1966 1600 1600 2000 2000 1700 1200 2000 2500 2500 2700 1600  600   -0.34
1967 2200 1700 1600 1200 1000 1400 1600 1300 1700 1500 1200 2100   -0.21
1968 1600 1800 1800 1800 1500 1800 1400 2100 1000 2000 2100 2000   -0.28
1969 1100  300 1900 1200 1000 1300 1500 1200 1200 2000 1700  800   -0.25
1970 1900 1400 1200  900  600 1200 1500  700 2300 1700 1700 2100   -0.23
1971 2000 1300 1600 1600 1200 1100 1400 1800 2000 1600 1700 1500   -0.39
1972 1300 1200 1300 1200 1700  800 1400 1800 1900 2000 1700 1600   -0.26
1973 1800 1100 1700  900 1200 1500  500 1800 1200 2000 2100 2100   -0.36
1974 1100 2400  700 1600 1300 1300 1800 2000 1900 1200 1400 2400   -0.29
1975 1500 2200 1400 1700 2500 2200 2300 1600 1700 2300 1800 2600   -0.47
1976 1900  800 1100 1500 1000  900 1300 1800 2200 1600 1400 1600   -0.33
1977 1800 1400 2200 1200 1600 1900 1300 1500 1500 1900 1500 2000   -0.40
1978 1500 1800 1400 2100  700 1000 1100 1900 1700 2300 1500 2200   -0.24
1979 1700 1700 1700 1200 1500 1800  900 1200 1800 1600 1500 2300   -0.39
1980 1900 1300 1300 1000 1400  900  700 1100 1300 1600 2200 1700   -0.36
1981 2600  500 1900 2000  800 1900 1500 2000 1400 1500 1800 1600   -0.46
1982 2200 1800 1100 1600 1500 2200 1800 1400 1700 1700 1900 1400   -0.60
1983 2400 1900 1700 1200  800 1500 1200 2000 1400 2100 2000 2500   -0.23
1984 1900  800 1500 2000 1100 1600 2000 1700 1100 1400 1000 1200
1985-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1986-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1987-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1988-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1989-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999    0.65
1990-9999-9999-9999-9999-9999  500 1300  900  700  900 1300  700    0.62
1991-9999  900  500  300  700 1000 1500  700 1700 1000 1300 1300    0.54
1992  800 1000  600  500  700  900-9999 1300-9999  700  900 1200    0.60
1993  600  900  400  500  900 1500 1000  800  800 1000  400 1000    0.55
1994 1300 1000  300  600  700 1000  900  600 1200    0 1400  600    0.43
1995  900  900  600  700  700  900 1100 1300  600 1800 1300  500    0.61
1996  500 1100  400  700  700 1200 1200 1100 1100  900 1000 1400    0.54
1997 1200  800 1300  600  600  100  500 1100  900-9999 1000  900    0.61
1998 1200 1300  800 1100 1100 1100  800  600 1200 1100  600 1200    0.52
1999  600  400  600 1000  700  700 1800 1400  700 1600  800 1200    0.62
2000 1100  600 1500 1700  900 1500  800  800 1000 1000  600  600    0.40
2001  600  500  700  700  600  500 1200 1200  700 1300  900 1000    0.63
2002 1000  800 1300  200  900 1100 1400 1200 1400 1800 1100  700
2003 1100-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
Update Data:
2003 1100  700  700  500 1000  400  700 1100 1200 2100  800 1900
2004  900  700  600  600 1300 1200 1000 1200 1400  900 1000 1000
2005 1000  400  800 1100  900  600 1200 1000 1600 1000 1300 1200
2006  700  500 1300  400  600 1200 1600  700 1000-9999  600 1500
2007 1400  400  400 1300 1200 1200-9999-9999-9999-9999-9999-9999

<DO YOU SEE? THERE’S THAT OH-SO FAMILIAR BLOCK OF MISSING CODES IN THE LATE 80S,
THEN THE DATA PICKS UP AGAIN. BUT LOOK AT THE CORRELATIONS ON THE RIGHT, ALL
GOOD AFTER THE BREAK, DECIDEDLY DODGY BEFORE IT. THESE ARE TWO DIFFERENT
STATIONS, AREN’T THEY? AAAARRRGGGHHHHHHH!!!!!>

MASTER:  221130  6896   3305   51 MURMANSK             EX USSR       1936 2003   -999     -999
UPDATE: 2211300  6858   3303   51 MURMANSK             RUSSIAN FEDER 2003 2007    -999       0

CORRELATION STATISTICS (enter ‘C’ for more information):
> -0.60 is minimum correlation coeff.
>  0.65 is maximum correlation coeff.
> -0.01 is mean correlation coeff.

Enter ‘Y’ to allow, ‘N’ to deny, or an information code letter:
<END QUOTE>

So.. should I really go to town (again) and allow the Master database to be ‘fixed’ by this
program? Quite honestly I don’t have time – but it just shows the state our data holdings
have drifted into. Who added those two series together? When? Why? Untraceable, except
anecdotally.

It’s the same story for many other Russian stations, unfortunately – meaning that (probably)
there was a full Russian update that did no data integrity checking at all. I just hope it’s
restricted to Russia!!

There are, of course, metadata issues too. Take:

<BEGIN QUOTE>
MASTER:  206740  7353   8040   47 DIKSON ISLAND        EX USSR       1936 2003   -999     -999
UPDATE: 2067400  7330   8024   47 OSTROV DIKSON        RUSSIAN FEDER 2003 2007    -999       0

CORRELATION STATISTICS (enter ‘C’ for more information):
> -0.70 is minimum correlation coeff.
>  0.81 is maximum correlation coeff.
> -0.01 is mean correlation coeff.
<END QUOTE>

This is pretty obviously the same station (well OK.. apart from the duff early period, but I’ve
got used to that now). But look at the longitude! That’s probably 20km! LUckily I selected
‘Update wins’ and so the metadata aren’t compared. This is still going to take ages, because although
I can match WMO codes (or should be able to), I must check that the data correlate adequately – and
for all these stations there will be questions. I don’t think it would be a good idea to take the
usual approach of coding to avoid the situation, because (a) it will be non-trivial to code for, and
(b) not all of the situations are the same. But I am beginning to wish I could just blindly merge
based on WMO code.. the trouble is that then I’m continuing the approach that created these broken
databases. Look at this one:

<BEGIN QUOTE>
***** OPERATOR ADJUDICATION REQUIRED *****

In attempting to pair two stations, possible data incompatibilities have been found.

MASTER:  239330  6096   6906   40 HANTY MANSIJSK       EX USSR       1936 1984   -999     -999
UPDATE: 2393300  6101   6902   46 HANTY-MANSIJSK       RUSSIAN FEDER 2003 2007    -999       0

CORRELATION STATISTICS (enter ‘C’ for more information):
> -0.42 is minimum correlation coeff.
>  0.39 is maximum correlation coeff.
> -0.02 is mean correlation coeff.

Enter ‘Y’ to allow, ‘N’ to deny, or an information code letter: C
Master Data: Correlation with Update first year aligned to this year -v
1936 1400  800 1700  900 1200  800  700  800 1800-9999-9999-9999    0.33
1937 1400  800  500 1700 1500  800 1200 1000 1700 1300  700 1200    0.32
1938 1000 1700 1200 1100 1100  800  800 1300 1400 1900 1800 1300    0.04
1939 1100 1700 1600 1800 1500  800 1500 1900 1700 1800 1300 1300    0.09
1940 1300  700  900  900 1800 1200  900 1300 1200 2200 1900 1800    0.08
1941 1400 1100 1800 1000 1400 1900 1400  700 1300 1200 1900 2000    0.02
1942 1700  900 1600  900 1200 1500 1300 1500 1200 1900 1500 1500   -0.06
1943 1400 1300 1300  800 1400 1600 1300 1500 1900 2000  700 1900   -0.17
1944 1900 1500 2000 1100 1200 1300 1500 1700 1800 1200 1500 1900   -0.32
1945 1300 1000 1400 2100 2000 1100 1700  700 1600 1800 2300 1700   -0.42
1946 2300 1900 1500 1100 1100 2000 1800 1000 1200 2100 2000 1800   -0.35
1947 1900 1400 1600 1000 2100 1900 2100 1000 1200 2000 2100 1500   -0.35
1948 1700 1500 1800  800 1300 1800 1700 1300 1800 2200 2000 2100   -0.15
1949 2300 2100 1000  700 1600 1400 1200  800 2100 2000 1100 1400   -0.07
1950 2100 2300 1000 1100 1500 1600 1600 2300 1900 1200 1100 1500    0.00
1951 1600 1000 1500  800 1500 1400 1200  600 1800 1800 1400 2400   -0.07
1952 1600  400 1100 1300 1100 1400  800 2000 1500 2300 1300 1600   -0.04
1953 2000 1200 1500  500 1300 1500 1100 1200 2300 2200 1600 2100   -0.02
1954 1700 1800  700  700 1000 1300 1200 1600 2000 1800 1800  600    0.01
1955 2400 1400 1000 1100 1700 1200 1000 1300 1500 1300 2300 1600   -0.08
1956 1300  800 1000 1100 1000 1000 1400 1800 1900 1900 2600 2000   -0.29
1957 1900 1200 1700 1000 1100 1100 1100  700  800 2300 1900 2200   -0.18
1958 1300 1600 1500  400 1500 1100 1300 1400 1900 2400 2000 1600   -0.28
1959 1700 1600  700 1300 1700 1100 1100 1600 2000 2100 1900 1600   -0.04
1960 1800 1600-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999    0.24
1961-9999-9999-9999-9999-9999-9999-9999 1600 1600 1700 1900 1600    0.33
1962 1700  800 1200  600  400 1100  900 2000 1100 1900 1700 1500    0.25
1963 1200 1300 1700  700 1100 1600  900 1000 1100 1400 1800 2000   -0.04
1964 1900  500 1300 1300 1200 1200 1100 1100 1700 1500 2000 1800    0.13
1965 1200 1400  700  900 1200 1100 1300 1400 1800 2500 1000 1700    0.23
1966 1800 1600 2100 1300 1500 2100  900 1800 1500 2400 1900  800    0.11
1967 1600 1200 1100  600  800 1100 1100  700 1300 1200 1300 1900    0.39
1968 1600 1400 1600 1200  900 1300 1400 1000 1700 1300 1400 1200    0.24
1969  900 1000 1100 1500 1700 1700 1000 1800 1200 1400 1900 1300    0.04
1970 1500 1200 1600 1400  700 1600  700 1600 1000 1500 1900 1600   -0.02
1971 1700  400 1100 1700 1300 1700  700 2000  900 2100 2000 1900   -0.11
1972 1200 1500 1400  800 1700 1300 1700 2000 2100 1700 2500 1900   -0.08
1973 1200 1100 1100  700  800 1300 2100 1000 2400 1900 1800 2300   -0.11
1974  700 1200 1800 1800 1400 1200 1000 1300 1100 1600 1900  700   -0.14
1975 2200 1800 1400 1300 1500 1500 1400 1500 1400 2300 1900 2100   -0.15
1976 2000 1500  600  700 1100 1600 1300 1100 1500 1800 1600 1200   -0.11
1977 1900 1700 1800 1400 1000 1100 1000 1300 1500 1800 1700 2100   -0.15
1978 1600 1000  800 1400 1400  800 1600 1600 2300 2200 2200 1800    0.03
1979 1600 1600 1600  900  900 1900 1200 1700 1200 2100 1600 2000    0.00
1980 1600 1200  500  800 1500 1100  800 1700 1200  600 2200 2200   -0.05
1981 2000 1000 1700 1300 1500 1100  800  400 1500  800 1500 1900    0.06
1982 2400 1800 1100 1200 1200 1100 1000 1700 1200 2100 1800 2000    0.03
1983 2500 2100 1800 1300 1400 1200 1200 1300 1300 1900 2300 1900    0.10
1984 1200  700  500 1300  900  800 1100 1000 1700 1600 1600 1300
Update Data:
2003 1500  900  600  400  900 1200  500  700 1100  600  700 1500
2004  700  600  700  400  600 1100  500  900  900 1400 1500  600
2005  700  400  800 1400  300  900  800  800  900  500 1200  600
2006  800  700  900 1000  800  500 1000  500 1300 1100  700 1600
2007 1100 1100  900  700 1300 1500-9999-9999-9999-9999-9999-9999
<END QUOTE>

Here, the expected 1990-2003 period is MISSING – so the correlations aren’t so hot! Yet
the WMO codes and station names /locations are identical (or close). What the hell is
supposed to happen here? Oh yeah – there is no ‘supposed’, I can make it up. So I have :-)

If an update station matches a ‘master’ station by WMO code, but the data is unpalatably
inconsistent, the operator is given three choices:

<BEGIN QUOTE>
You have failed a match despite the WMO codes matching.
This must be resolved!! Please choose one:

1. Match them after all.
2. Leave the existing station alone, and discard the update.
3. Give existing station a false code, and make the update the new WMO station.

Enter 1,2 or 3:
<END QUOTE>

You can’t imagine what this has cost me – to actually allow the operator to assign false
WMO codes!! But what else is there in such situations? Especially when dealing with a ‘Master’
database of dubious provenance (which, er, they all are and always will be).

False codes will be obtained by multiplying the legitimate code (5 digits) by 100, then adding
1 at a time until a number is found with no matches in the database. THIS IS NOT PERFECT but as
there is no central repository for WMO codes – especially made-up ones – we’ll have to chance
duplicating one that’s present in one of the other databases. In any case, anyone comparing WMO
codes between databases – something I’ve studiously avoided doing except for tmin/tmax where I
had to – will be treating the false codes with suspicion anyway. Hopefully.

Of course, option 3 cannot be offered for CLIMAT bulletins, there being no metadata with which
to form a new station.

This still meant an awful lot of encounters with naughty Master stations, when really I suspect
nobody else gives a hoot about. So with a somewhat cynical shrug, I added the nuclear option -
to match every WMO possible, and turn the rest into new stations (er, CLIMAT excepted). In other
words, what CRU usually do. It will allow bad databases to pass unnoticed, and good databases to
become bad, but I really don’t think people care enough to fix ‘em, and it’s the main reason the
project is nearly a year late.

And there are STILL WMO code problems!!! Let’s try again with the issue. Let’s look at the first
station in most of the databases, JAN MAYEN. Here it is in various recent databases:

dtr.0705152339.dtb: 100100  7093   -867    9 JAN MAYEN            NORWAY        1998 2006   -999  -999.00
pre.0709111032.dtb:0100100  7056   -840    9 JAN MAYEN            NORWAY        2003 2007    -999       0
sun.0709111032.dtb:0100100  7056   -840    9 JAN MAYEN            NORWAY        2003 2007    -999       0
tmn.0702091139.dtb: 100100  7093   -867    9 JAN MAYEN            NORWAY        1998 2006   -999  -999.00
tmn.0705152339.dtb: 100100  7093   -867    9 JAN MAYEN            NORWAY        1998 2006   -999  -999.00
tmp.0709111032.dtb:0100100  7056   -840    9 JAN MAYEN            NORWAY        2003 2007    -999       0
tmx.0702091313.dtb: 100100  7093   -867    9 JAN MAYEN            NORWAY        1998 2006   -999  -999.00
tmx.0705152339.dtb: 100100  7093   -867    9 JAN MAYEN            NORWAY        1998 2006   -999  -999.00
vap.0709111032.dtb:0100100  7056   -840    9 JAN MAYEN            NORWAY        2003 2007    -999       0

As we can see, even I’m cocking it up! Though recoverably. DTR, TMN and TMX need to be written as (i7.7).

Anyway, here it is in the problem database:

wet.0311061611.dtb:  10010  7093   -866    9 JAN MAYEN(NOR NAVY)  NORWAY        1990 2003   -999     -999

You see? The leading zero’s been lost (presumably through writing as i7) and then a zero has been added at
the trailing end. So it’s a 5-digi WMO code BUT NOT THE RIGHT ONE. Aaaarrrgghhhhhh!!!!!!

I think this can only be fixed in one of two ways:

1. By hand.

2. By automatic comparison with other (more reliable) databases.

As usual – I’m going with 2. Hold onto your hats.

Actually, a brief interlude to churn out the tmin & tmax primaries, which got sort-of
forgotten after dtr was done:

<BEGIN ABRIDGED QUOTES (separated by ‘#####’)>
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
> Enter the suffix of the variable required:
.tmn
> Select the .cts or .dtb file to load:
tmn.0708071548.dtb
> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal:           23
> Specify the no. of stdevs at which to reject data:
3
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
0
> Select the generic .txt file to save (yy.mm=auto):
tmn.txt
> Select the first,last years AD to save:
1901,2006
> Operating…
> NORMALS            MEAN percent      STDEV percent
>         .dtb    3814210    65.5
>         .cts     210801     3.6    4025011    69.2
> PROCESS        DECISION percent %of-chk
> no lat/lon          650     0.0     0.0
> no normal       1793923    30.8    30.8
> out-of-range        976     0.0     0.0
> accepted        4024035    69.1
> Dumping years 1901-2006 to .txt files…
#####
IDL> quick_interp_tdm2,1901,2006,’tmnglo/tmn.’,750,gs=0.5,pts_prefix=’tmntxt/tmn.’,dumpglo=’dumpglo’
#####
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: gunzip clim.6190.lan.tmn
FILE NOT FOUND – PLEASE TRY AGAIN: clim.6190.lan.tmn
Enter a name for the gridded climatology file: clim.6190.lan.tmn.grid
Enter the path and stem of the .glo files: tmnglo/tmn.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: tmnabs
Now, CONCENTRATE. Addition or Percentage (A/P)? A
Right, erm.. off I jolly well go!
tmn.01.1901.glo
(etc)
tmn.12.2006.glo
#####
Welcome! This is the MERGEGRIDS program.
I will create decadal and full gridded files
from the output files of (eg) glo2abs.for.
Enter a gridfile with YYYY for year and MM for month: tmnabs/tmn.MM.YYYY.glo.abs
Enter Start Year:  1901
Enter Start Month: 01
Enter End Year:    2006
Enter End Month:   12
Please enter a sample OUTPUT filename, replacing
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.tmn.dat
Writing cru_ts_3_00.1901.1910.tmn.dat
(etc)
#####
> ***** AnomDTB: converts .dtb to anom .txt for gridding *****
> Enter the suffix of the variable required:
.tmx
> Select the .cts or .dtb file to load:
tmx.0708071548.dtb
> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal:           23
> Specify the no. of stdevs at which to reject data:
3
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
0
> Select the generic .txt file to save (yy.mm=auto):
tmx.txt
> Select the first,last years AD to save:
1901,2006
> Operating…
> NORMALS            MEAN percent      STDEV percent
>         .dtb    3795470    65.4
>         .cts     205607     3.5    4001077    68.9
> PROCESS        DECISION percent %of-chk
> no lat/lon          652     0.0     0.0
> no normal       1805313    31.1    31.1
> out-of-range        471     0.0     0.0
> accepted        4000606    68.9
> Dumping years 1901-2006 to .txt files…
#####
IDL> quick_interp_tdm2,1901,2006,’tmxglo/tmx.’,750,gs=0.5,pts_prefix=’tmxtxt/tmx.’,dumpglo=’dumpglo’
#####
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.tmx
Enter a name for the gridded climatology file: clim.6190.lan.tmx.grid
Enter the path and stem of the .glo files: tmxglo/tmx.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: tmxabs
Now, CONCENTRATE. Addition or Percentage (A/P)? A
Right, erm.. off I jolly well go!
tmx.01.1901.glo
(etc)
tmx.12.2006.glo
#####
Welcome! This is the MERGEGRIDS program.
I will create decadal and full gridded files
from the output files of (eg) glo2abs.for.
Enter a gridfile with YYYY for year and MM for month: tmxabs/tmx.MM.YYYY.glo.abs
Enter Start Year:  1901
Enter Start Month: 01
Enter End Year:    2006
Enter End Month:   12
Please enter a sample OUTPUT filename, replacing
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.tmx.dat
Writing cru_ts_3_00.1901.1910.tmx.dat
(etc)
<END ABRIDGED QUOTES>

This took longer than hoped.. running out of disk space again. This is why Tim didn’t save more of
the intermediate products – which would have made my detective work easier. The ridiculous process
he adopted – and which we have dutifully followed – creates hundreds of intermediate files at every
stage, none of which are automatically zipped/unzipped. Crazy. I’ve filled a 100gb disk!

So, anyway, back on Earth I wrote wmocmp.for, a program to – you guessed it – compare WMO codes from
a given set of databases.  Results were, ah.. ‘interesting’:

<BEGIN QUOTE>
REPORT:

Database Title                Exact Match  Close Match  Vague Match  Awful Match  Codes Added      WMO = 0
../db/pre/pre.0612181221.dtb          n/a          n/a          n/a          n/a        14397         1540
../db/dtr/tmn.0708071548.dtb         1865         3389           57           77         5747         2519
../db/tmp/tmp.0705101334.dtb            0            4           28          106         4927            0
<END QUOTE>

So the largest database, precip, contained 14397 stations with usable WMO codes (and 1540 without).
The TMin, (and TMax and DTR, which were tested then excluded as they matched TMin 100%) database only agreed
perfectly with precip for 1865 stations, nearby 3389, believable 57, worrying 77. TMean fared worse, with NO
exact matches (WMO misformatting again) and over 100 worrying ones.

The big story is the need to fix the tmean WMO codes. For instance:

10010   709    -87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00

is illegal, and needs to become one of:
01001   709    -87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00
0001001   709    -87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00
0100100   709    -87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00

I favour the first as it’s technically accurate. Alternatively we seem to have widely adopted the third, which
at least has the virtue of being consistent. Of course it’s the only one that will match the precip:

100100  7093   -867   10 JAN MAYEN            NORWAY        1921 2006   -999  -999.00

..which itself should be either:

0100100  7093   -867   10 JAN MAYEN            NORWAY        1921 2006   -999  -999.00

or:

01001  7093   -867   10 JAN MAYEN            NORWAY        1921 2006   -999  -999.00

Aaaaarrrggghhhh!!!!

And the reason this is so important is that the incoming updates will rely PRIMARILY on matching the WMO codes!
In fact CLIMAT bulletins carry no other identification, of course. Clearly I am going to need a reference set
of ‘qenuine WMO codes’.. and wouldn’t you know it, I’ve found four!

Location                                                N. Stations      Notes
http://weather.noaa.gov/data/nsd_bbsss.txt              11548            Full country names, ‘;’ delim
http://www.htw-dresden.de/~kleist/wx_stations_ct.html   13000+           *10, leading zeros kept, fmt probs
From Dave Lister                                        13080            *10 and leading zeros lost, country codes
From Philip Brohan                                      11894            2+3, No countries

The strategy is to use Dave Lister’s list, grabbing country names from the Dresden list. Wrote
getcountrycodes.for and extracted an imperfect but useful-as-a-reference list. Hopefully in the main the country
will not need fixing or referring to!!

Wrote ‘fixwmos.for’ – probably not for the first time, but it’s the first prog of that name in my repository so I’ll
have to hope for the best. After an unreasonable amount of teething troubles (due to my forgetting that the tmp
database stores lats & lons in degs*100 not degs*10, and also to the presence of a ‘-99999′ as the lon for GUATEMALA
in the reference set) I managed to sort-of fix the tmp database:

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/tmp] ./fixwmos

FIXWMOS – Fix WMO Codes in a Database

Enter the database to be fixed: tmp.0705101334.dtb

The operation completed successfully.

2263 WMO Codes were ‘fixed’ and all were rewritten as (i7.7)

The output database is tmp.0709281456.dtb

crua6[/cru/cruts/version_3_0/db/tmp]
<END QUOTE>

The first records have changed as follows:

crua6[/cru/cruts/version_3_0/db/tmp] diff tmp.0705101334.dtb tmp.0709281456.dtb |head -30
1c1
<   10010   709    -87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00

> 0100100   709    -87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00

So far so good.. but records that weren’t matched with the reference set didn’t fare so well:

89c89
<   10050   780    142    9 ISFJORD RADIO        NORWAY        1912 1979 101912  -999.00

> 0010050   780    142    9 ISFJORD RADIO        NORWAY        1912 1979 101912  -999.00

This is misleading because, although there probably won’t BE any incoming updates for ISFJORD RADIO, we can’t say for
certain that there will never be updates for any station outside the current reference set. In fact, we can say with
confidence that there will be!

So, what to do? Do we assume a particular factor to adjust ALL codes by, based on the matches? Or do we attempt (note
careful use of verb) to use the country codes database to work out the most significant ‘real’ digits of these codes?

Well, I fancy the first one. We’ll make two passes through the data, the first pass changes nothing but saves counts of
the successful factors in bins: *0.01, *0.1, *1, *10, *100 should do it. I sure hope all the results are in one bin!

It worked. An initial ‘verbose’ run showed a consistent choice of factor, though it’ll exit with an error code if multiple
factors are registered in one database.

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/tmp] ./fixwmos

FIXWMOS – Fix WMO Codes in a Database

Enter the database to be fixed: tmp.0705101334.dtb
locfac set to: 10
First ref: 0100100

The operation completed successfully.

2263 WMO Codes were ‘matched’
All codes were modified with a factor of  10
Lons/lats were modified with a factor of  10

The output database is tmp.0710011359.dtb

crua6[/cru/cruts/version_3_0/db/tmp]
<END QUOTE>

Example results:
<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/tmp] diff tmp.0705101334.dtb tmp.0710011359.dtb | head -12
1c1
<   10010   709    -87   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00

> 0100100  7090   -870   10 Jan Mayen            NORWAY        1921 2006 341921  -999.00
89c89
<   10050   780    142    9 ISFJORD RADIO        NORWAY        1912 1979 101912  -999.00

> 0100500  7800   1420    9 ISFJORD RADIO        NORWAY        1912 1979 101912  -999.00
159c159
<   10080   783    155   28 Svalbard Lufthavn    NORWAY        1911 2006 341911  -999.00

> 0100800  7830   1550   28 Svalbard Lufthavn    NORWAY        1911 2006 341911  -999.00
<END QUOTE>

Then.. attacked the wet database! And immediately found this beauty:

0 -9999 -99999 -999 UNKNOWN              UNKNOWN       1994 2003   -999        0
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1994  500  800  600  400  600  100    0  100  200  400 1000 1300
1995  400  100 1100  900 1200  800  200  100  200  400  800  500
1996  500 1100 1500  600  900-9999    0  300  400  700    0 1100
1997  800 1000  700 1000 1000 1000  200  200  400  700  200 1000
1998  700  700 1000 1000-9999  800  100  100    0  200  400  700
1999  300 1000  800-9999  700  800    0  200-9999  600  400  200
2000 1100  600  900  900 1000  400-9999  100  200  300    0  400
2001    0  800  300  500 1200    0    0    0  200  200  500  800
2002  800  300  600 1300  800  500  400  100  300  400  400  600
2003  300-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999

Gotta love the system! Like this is ever going to be a blind bit of use. Modified the code to
leave such stations unmolested, but identified in a separate file so they can be ‘cleansed’, it
being a little too risky to auto-cleanse such things.

Hopefully the final attack on ‘wet’:

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/db/rd0] ./fixwmos

FIXWMOS – Fix WMO Codes in a Database

Enter the database to be fixed: wet.0311061611.dtb

The operation completed successfully.

1920 WMO Codes were ‘matched’
All codes were modified with a factor of  10
Lons/lats were modified with a factor of   1

The output database is wet.0710021341.dtb

IMPORTANT: the following WMO codes were not altered:
False codes (wmo<0):          2917
Illegal codes (0<=wmo<1000):     1
(illegals written to wet.0311061611.bad)
crua6[/cru/cruts/version_3_0/db/rd0]
<END QUOTE>

I then removed the sole illegal (see above) from wet.0710021341.dtb, which becomes the ‘new old’
wet/rd0 database.

So.. to incorporate the updates! Finally. First, the MCDW, metadata-rich ones:

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/rd0] ./newmergedb

WELCOME TO THE DATABASE UPDATER

Before we get started, an important question:
If you are merging an update – CLIMAT, MCDW,
ian – do
you want the quick and dirty approach? This will blindly match
on WMO codes alone, ignoring data/metadata checks, and making any
unmatched updates into new stations (metadata permitting)?

Enter ‘B’ for blind merging, or <ret>: B
Please enter the Master Database name: wet.0710021341.dtb
Please enter the Update Database name: rdy.0709111032.dtb

Reading in both databases..
Master database stations:     4987
Update database stations:     2407

Looking for WMO code matches..
* new header 0100100  7056   -840    9 JAN MAYEN            NORWAY        1990 2007    -999    -999 *
2 reject(s) from update process 0710041559

Writing wet.0710041559.dtb

OUTPUT(S) WRITTEN

New master database: wet.0710041559.dtb

Update database stations:         2407
> Matched with Master stations:  1556
(automatically:  1556)
(by operator:     0)
> Added as new Master stations:     0
> Rejected:                         2
Rejects file:                 rdy.0709111032.dtb.rejected
Note: IEEE floating-point exception flags raised:
Inexact;  Invalid Operation;
See the Numerical Computation Guide, ieee_flags(3M)
uealogin1[/cru/cruts/version_3_0/db/rd0]
<END QUOTE>

(also knocked up rrstats.for at this stage, to analyse replication rates by
latitude band for a given database – needs a Matlab prog to drive really)

[a bit of debugging here as the last records weren't being written properly,
filenames adjusted above accordingly]

Then, the CLIMAT, nothing-but-the-code ones:

*WARNING: ignore this, the CLIMAT bulletins were later improved with metadata and newmergedb rerun*

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/rd0] ./newmergedb

WELCOME TO THE DATABASE UPDATER

Before we get started, an important question:
If you are merging an update – CLIMAT, MCDW, Australian – do
you want the quick and dirty approach? This will blindly match
on WMO codes alone, ignoring data/metadata checks, and making any
unmatched updates into new stations (metadata permitting)?

Enter ‘B’ for blind merging, or <ret>: B
Please enter the Master Database name: wet.0710041559.dtb
Please enter the Update Database name: rdy.0709101706.dtb

Reading in both databases..
Master database stations:     5836
Update database stations:     2876

Looking for WMO code matches..
378 reject(s) from update process 0710081508

Writing wet.0710081508.dtb

OUTPUT(S) WRITTEN

New master database: wet.0710081508.dtb

Update database stations:         2876
> Matched with Master stations:  2498
(automatically:  2498)
(by operator:     0)
> Added as new Master stations:     0
> Rejected:                       378
Rejects file:                 rdy.0709101706.dtb.rejected
Note: IEEE floating-point exception flags raised:
Inexact;  Invalid Operation;
See the Numerical Computation Guide, ieee_flags(3M)
uealogin1[/cru/cruts/version_3_0/db/rd0]
<END QUOTE>

Now of course, we can’t add any of the CLIMAT bulletin stations as ‘new’ stations
because we don’t have any metadata! so.. is it worth using the lookup table? Because
although I’m thrilled at the high match rate (87%!), it does seem worse when you
realise that you lost the rest..

* see below, CLIMAT metadata fixed! *

At this stage I knocked up rrstats.for and the visualisation companion tool, cmprr.m. A simple process
to show station counts against time for each 10-degree latitude band (with 20-degree bands at the
North and South extremities). A bit basic and needs more work – but good for a quick & dirty check.

Wrote dllist2headers.for to convert the ‘Dave Lister’ WMO list to CRU header format – the main difficulty
being the accurate conversion of the two-character ‘country codes’ – especially since many are actually
state codes for the US! Ended up with wmo.0710151633.dat as our reference WMO set.

Incorporated the reference WMO set into climat2cru.for. Successfully reprocessed the CLIMAT bulletins
into databases with at least SOME metadata:

pre.0710151817.dtb
rdy.0710151817.dtb
sun.0710151817.dtb
tmn.0710151817.dtb
tmp.0710151817.dtb
tmx.0710151817.dtb
vap.0710151817.dtb

In fact, it was far more successful than I expected – only 11 stations out of 2878 without metadata!

Re-ran newmergedb:

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/rd0] ./newmergedb

WELCOME TO THE DATABASE UPDATER

Before we get started, an important question:
If you are merging an update – CLIMAT, MCDW, Australian – do
you want the quick and dirty approach? This will blindly match
on WMO codes alone, ignoring data/metadata checks, and making any
unmatched updates into new stations (metadata permitting)?

Enter ‘B’ for blind merging, or <ret>: B
Please enter the Master Database name: wet.0710041559.dtb
Please enter the Update Database name: rdy.0710151817.dtb

Reading in both databases..
Master database stations:     5836
Update database stations:     2876

Looking for WMO code matches..
71 reject(s) from update process 0710161148

Writing wet.0710161148.dtb

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

OUTPUT(S) WRITTEN

New master database: wet.0710161148.dtb

Update database stations:         2876
> Matched with Master stations:  2498
(automatically:  2498)
(by operator:     0)
> Added as new Master stations:   307
> Rejected:                        71
Rejects file:                 rdy.0710151817.dtb.rejected
Note: IEEE floating-point exception flags raised:
Inexact;  Invalid Operation;
See the Numerical Computation Guide, ieee_flags(3M)
uealogin1[/cru/cruts/version_3_0/db/rd0]
<END QUOTE>

307 stations rescued! and they’ll be there in future of course, for metadata-free CLIMAT bulletins
to match with.

So where were we.. Rain Days. Family tree:

wet.0311061611.dtb
+
rdy.0709111032.dtb  (MCDW composite)
+
rdy.0710151817.dtb  (CLIMAT composite with metadata added)
V
V
wet.0710161148.dtb

Now it gets tough. The current model for a secondary is that it is derived from one or more primaries,
plus their normals, plus the normals for the secondary.

The IDL secondary generators do not allow ‘genuine’ secondary data to be incorporated. This would have
been ideal, as the gradual increase in observations would have gradually taken precedence over the
primary-derived synthetics.

The current stats for the wet database were derived from the new proglet, dtbstats.for:

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./dtbstat

DTBSTAT: Database Stats Report

Please enter the (18ch.) database name: wet.0710161148.dtb

Report for: wet.0710161148.dtb

Stations in Northern Hemisphere:     5365
Stations in Southern Hemisphere:      778
Total:     6143

Maximum Timespan in Northern Hemisphere: 1840 to 2007
Maximum Timespan in Southern Hemisphere: 1943 to 2007
Global Timespan: 1840 to 2007

crua6[/cru/cruts/version_3_0/secondaries/rd0]
<END QUOTE>

So, without further ado, I treated RD0 as a Primary and derived gridded output from the database:

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./anomdtb

> ***** AnomDTB: converts .dtb to anom .txt for gridding *****

> Enter the suffix of the variable required:
.rd0
> Select the .cts or .dtb file to load:
wet.0710161148.dtb
> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal:           23
> Specify the no. of stdevs at which to reject data:
3
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
0
> Select the generic .txt file to save (yy.mm=auto):
rd0.txt
> Select the first,last years AD to save:
1901,2007
> Operating…

> NORMALS            MEAN percent      STDEV percent
>         .dtb          0     0.0
>         .cts     731118    45.4     730956    45.4
> PROCESS        DECISION percent %of-chk
> no lat/lon            0     0.0     0.0
> no normal        878015    54.6    54.6
> out-of-range         56     0.0     0.0
> accepted         731062    45.4
> Dumping years 1901-2007 to .txt files…

crua6[/cru/cruts/version_3_0/secondaries/rd0]
<END QUOTE>

Not particularly good – the bulk of the data being recent, less than half had valid normals (anomdtb
calculates normals on the fly, on a per-month basis). However, this isn’t so much of a problem as the
plan is to screen it for valid station contributions anyway.

<BEGIN QUOTE>
IDL> quick_interp_tdm2,1901,2007,’rd0glo/rd0.’,450,gs=0.5,dumpglo=’dumpglo’,pts_prefix=’rd0txt/rd0.’
% Compiled module: QUICK_INTERP_TDM2.
% Compiled module: GLIMIT.
Defaults set
1901
% Compiled module: MAP_SET.
% Compiled module: CROSSP.
% Compiled module: STRIP.
% Compiled module: SAVEGLO.
% Compiled module: SELECTMODEL.
1902
(etc)
2007
no stations found in: rd0txt/rd0.2007.08.txt
no stations found in: rd0txt/rd0.2007.09.txt
no stations found in: rd0txt/rd0.2007.10.txt
no stations found in: rd0txt/rd0.2007.11.txt
no stations found in: rd0txt/rd0.2007.12.txt
IDL>
<END QUOTE>

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.wet
Enter a name for the gridded climatology file: clim.6190.lan.wet.grid2
Enter the path and stem of the .glo files: rd0glo/rd0.
Enter the starting year: 1901
Enter the ending year:   2007
Enter the path (if any) for the output files: rd0abs/
Now, CONCENTRATE. Addition or Percentage (A/P)? A         ! this was a guess! We’ll see how the results look
Right, erm.. off I jolly well go!
rd0.01.1901.glo
(etc)
<END QUOTE>

Then.. wait a minute! I checked back, and sure enough, quick_interp_tdm.pro DOES allow both synthetic and ‘real’ data
to be included in the gridding. From the program description:

<BEGIN QUOTE>
; TDM: the dummy grid points default to zero, but if the synth_prefix files are present in call,
;  the synthetic data from these grids are read in and used instead
<END QUOTE>

And so.. (after some confusion, and renaming so that anomdtb selects percentage anomalies)..

IDL> quick_interp_tdm2,1901,2006,’rd0pcglo/rd0pc’,450,gs=0.5,dumpglo=’dumpglo’,synth_prefix=’rd0syn/rd0syn’,pts_prefix=’rd0pctxt/rd0pc.’

The trouble is, we won’t be able to produce reliable station count files this way. Or can we use the same strategy,
producing station counts from the wet database route, and filling in ‘gaps’ with the precip station counts? Err.

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.wet
Enter a name for the gridded climatology file: clim.grid
Enter the path and stem of the .glo files: rd0pcglo/rd0pc.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: rd0pcgloabs/
Now, CONCENTRATE. Addition or Percentage (A/P)? P
Right, erm.. off I jolly well go!
rd0pc.01.1901.glo
(etc)
<END QUOTE>

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./mergegrids
Welcome! This is the MERGEGRIDS program.
I will create decadal and full gridded files
from the output files of (eg) glo2abs.for.

Enter a gridfile with YYYY for year and MM for month: rd0pcgloabs/rd0pc.MM.YYYY.glo.abs
Enter Start Year:  1901
Enter Start Month: 01
Enter End Year:    2006
Enter End Month:   12

Please enter a sample OUTPUT filename, replacing
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.rd0.dat
Writing cru_ts_3_00.1901.1910.rd0.dat
Writing cru_ts_3_00.1911.1920.rd0.dat
Writing cru_ts_3_00.1921.1930.rd0.dat
Writing cru_ts_3_00.1931.1940.rd0.dat
Writing cru_ts_3_00.1941.1950.rd0.dat
Writing cru_ts_3_00.1951.1960.rd0.dat
Writing cru_ts_3_00.1961.1970.rd0.dat
Writing cru_ts_3_00.1971.1980.rd0.dat
Writing cru_ts_3_00.1981.1990.rd0.dat
Writing cru_ts_3_00.1991.2000.rd0.dat
Writing cru_ts_3_00.2001.2006.rd0.dat
crua6[/cru/cruts/version_3_0/secondaries/rd0]
<END QUOTE>

All according to plan.. except the values themselves!

For January, 2001:

Minimum      =      0
Maximum      =  32630
Vals >31000  =      1

For the whole of 2001:

Minimum      =      0
Maximum      =  56763
Vals >31000  =      5

Not good. We’re out by a factor of at least 10, though the extremes are few enough to just cap at DiM. So where has
this factor come from?

Well here’s the January 2001 climatology:

Minimum      =      0
Maximum      =   3050
Vals >3100   =      0

That all seems fine for a percentage normals set. Not entirly sure about 0 though.

so let’s look at the January 2001 gridded anomalies file:

Minimum      =    -48.046
Maximum      =      0.0129

This leads to a show-stopper, I’m afraid. It looks as though the calculation I’m using for percentage anomalies is,
not to put too fine a point on it, cobblers.

This is what I use to build actuals from anomalies in glo2abs.for:

absgrid(ilon(i),ilat(i)) = nint(normals(i,imo) +
*                  anoms(ilon(i),ilat(i)) * normals(i,imo) / 100)

or, to put it another way, V = N(A+N)/100

This is what anomdtb.f90 uses to build anomalies from actuals:

DataA(XAYear,XMonth,XAStn) = nint(1000.0*((real(DataA(XAYear,XMonth,XAStn)) / &
real(NormMean(XMonth,XAStn)))-1.0))

or, in the same terms, A = 1000((V/N)-1)

which reverses to: V = N(A+1000)/1000

This could well explain things. It could also mean that I have to reproduce v3.00 precip AFTER it’s been used (against
my wishes) by Dave L and Dimitrious.

Well to start with, I’ll try the new calculation in glo2abs to reproduce the rd0 data.

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.wet
Enter a name for the gridded climatology file: c.grid
Enter the path and stem of the .glo files: rd0pcglo/rd0pc.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: rd0pcgloabs
Now, CONCENTRATE. Addition or Percentage (A/P)? P
Right, erm.. off I jolly well go!
rd0pc.01.1901.glo
(etc)
<END QUOTE>

This *does* improve matters considerably. Now, for January 2001:

Minimum      =      0
Maximum      =   5090  (a little high but not fatal)
Vals >3100   =    556
Vals >3500   =    110
Vals >4000   =      2  (so the bulk of the excessions are only a few days over)

In fact the 2nd highest Max is 4369, well below 5090.

So, good news – but only in the sense that I’ve found the error. Bad news in that it’s a further confirmation that my
abilities are short of what’s required here.

Rushed back to precip. Found the .glo files in /cru/cruts/version_3_0/primaries/precip/pre0km0612181221glo/, and
re-ran glo2abs with the revised percentage anomaly equation:

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/primaries/precip] ./glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.pre
Enter a name for the gridded climatology file: clim.6190.lan.pre.gridded2
Enter the path and stem of the .glo files: pre0km0612181221glo/pregrid.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: pre0km0612181221abs/
Now, CONCENTRATE. Addition or Percentage (A/P)? P
Right, erm.. off I jolly well go!
pregrid.01.1901.glo
(etc)
<END QUOTE>

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/primaries/precip] ./mergegrids
Welcome! This is the MERGEGRIDS program.
I will create decadal and full gridded files
from the output files of (eg) glo2abs.for.

Enter a gridfile with YYYY for year and MM for month: pre0km0612181221abs/pregrid.MM.YYYY.glo.abs
Enter Start Year:  1901
Enter Start Month: 01
Enter End Year:    2006
Enter End Month:   12

Please enter a sample OUTPUT filename, replacing
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.pre.dat
Writing cru_ts_3_00.1901.1910.pre.dat
Writing cru_ts_3_00.1911.1920.pre.dat
Writing cru_ts_3_00.1921.1930.pre.dat
Writing cru_ts_3_00.1931.1940.pre.dat
Writing cru_ts_3_00.1941.1950.pre.dat
Writing cru_ts_3_00.1951.1960.pre.dat
Writing cru_ts_3_00.1961.1970.pre.dat
Writing cru_ts_3_00.1971.1980.pre.dat
Writing cru_ts_3_00.1981.1990.pre.dat
Writing cru_ts_3_00.1991.2000.pre.dat
Writing cru_ts_3_00.2001.2006.pre.dat
crua6[/cru/cruts/version_3_0/primaries/precip]
<END QUOTE>

Then back to finish off rd0. Modified glo2abs to allow the operator to set minima and maxima, with a
specific option to set wet day limits (DiM*100):

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.wet
Enter a name for the gridded climatology file: clim…grid
Enter the path and stem of the .glo files: rd0pcglo/rd0pc.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: rd0pcgloabs/
Now, CONCENTRATE. Addition or Percentage (A/P)? P
Do you wish to limit the output values? (Y/N): Y
1. Set minimum to zero
2. Set a single minimum and maximum
3. Set monthly minima and maxima (for wet/rd0)

Choose: 3
Right, erm.. off I jolly well go!
rd0pc.01.1901.glo
(etc)
<END QUOTE>

Output was checked.. and as expected, January 2001 had 556 values of 3100 :-)

<BEGIN QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/rd0] ./mergegrids
Welcome! This is the MERGEGRIDS program.
I will create decadal and full gridded files
from the output files of (eg) glo2abs.for.

Enter a gridfile with YYYY for year and MM for month: rd0pcgloabs/rd0pc.MM.YYYY.glo.abs
Enter Start Year:  1901
Enter Start Month: 01
Enter End Year:    2006
Enter End Month:   12

Please enter a sample OUTPUT filename, replacing
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.rd0.dat
Writing cru_ts_3_00.1901.1910.rd0.dat
Writing cru_ts_3_00.1911.1920.rd0.dat
Writing cru_ts_3_00.1921.1930.rd0.dat
Writing cru_ts_3_00.1931.1940.rd0.dat
Writing cru_ts_3_00.1941.1950.rd0.dat
Writing cru_ts_3_00.1951.1960.rd0.dat
Writing cru_ts_3_00.1961.1970.rd0.dat
Writing cru_ts_3_00.1971.1980.rd0.dat
Writing cru_ts_3_00.1981.1990.rd0.dat
Writing cru_ts_3_00.1991.2000.rd0.dat
Writing cru_ts_3_00.2001.2006.rd0.dat
crua6[/cru/cruts/version_3_0/secondaries/rd0]
<END QUOTE>

Back to where this all started – Vapour Pressure.

We have:

1. ‘Master’ (ie original) database                    vap.0311181410.dtb
2. MCDW updates database                              vap.0709111032.dtb
3. CLIMAT updates database *with added metadata*      vap.0710151817.dtb

so first we incorporate the MCDW updates..

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/vap] ./newmergedb

WELCOME TO THE DATABASE UPDATER

Before we get started, an important question:
If you are merging an update – CLIMAT, MCDW, Australian – do
you want the quick and dirty approach? This will blindly match
on WMO codes alone, ignoring data/metadata checks, and making any
unmatched updates into new stations (metadata permitting)?

Enter ‘B’ for blind merging, or <ret>: B
Please enter the Master Database name: vap.0311181410.dtb
Please enter the Update Database name: vap.0709111032.dtb

Reading in both databases..
Master database stations:     7691
Update database stations:     2398

Looking for WMO code matches..
2 reject(s) from update process 0710241541

Writing vap.0710241541.dtb

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

OUTPUT(S) WRITTEN

New master database: vap.0710241541.dtb

Update database stations:         2398
> Matched with Master stations:  1847
(automatically:  1847)
(by operator:     0)
> Added as new Master stations:   549
> Rejected:                         2
Rejects file:                 vap.0709111032.dtb.rejected
uealogin1[/cru/cruts/version_3_0/db/vap]
<END QUOTE>

Then, the CLIMAT ones:

<BEGIN QUOTE>
uealogin1[/cru/cruts/version_3_0/db/vap] ./newmergedb

WELCOME TO THE DATABASE UPDATER

Before we get started, an important question:
If you are merging an update – CLIMAT, MCDW, Australian – do
you want the quick and dirty approach? This will blindly match
on WMO codes alone, ignoring data/metadata checks, and making any
unmatched updates into new stations (metadata permitting)?

Enter ‘B’ for blind merging, or <ret>: B
Please enter the Master Database name: vap.0710241541.dtb
Please enter the Update Database name: vap.0710151817.dtb

Reading in both databases..
Master database stations:     8240
Update database stations:     2870

Looking for WMO code matches..
68 reject(s) from update process 0710241549

Writing vap.0710241549.dtb

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

OUTPUT(S) WRITTEN

New master database: vap.0710241549.dtb

Update database stations:         2870
> Matched with Master stations:  2599
(automatically:  2599)
(by operator:     0)
> Added as new Master stations:   203
> Rejected:                        68
Rejects file:                 vap.0710151817.dtb.rejected
uealogin1[/cru/cruts/version_3_0/db/vap]
<END QUOTE>

So, not as good as the MCDW update.. lost 68.. but then of course we are talking about station data that
arrived with NO metadata AT ALL.

So we will try the unaltered rd0 process on vap. It should be the same; a mix of synthetic and observed.

*************************************************************************************
* PRIORITY INTERRUPT * PRIORITY INTERRUPT * PRIORITY INTERRUPT * PRIORITY INTERRUPT *
*************************************************************************************

After an email enquiry from Wladimir J. Alonso (alonsow@mail.nih.gov), in which unusual behaviour of CRU TS 2.10
Vapour Pressure data was observed, I discovered that some of the Wet Days and Vepour Pressure datasets had been
swapped!!

The files I was looking at were decadal, 1981-1990.

Vapour Pressure, January:     Min 0       Max 310
Vapour Pressure, February:    Min 0       Max 280

Wet Days, January:            Min 0       Max 3220
Wet days, February:           Min 0       Max 3240

So I wrote crutsstats.for, whioch returns monthly and annual minima, maxima and means for any gridded output file.

Tried it on the full runs, and they look OK:

crua6[/cru/cruts/vap_wet_investigation] head -90 cru_ts_2_10.1901-2002.vap.grid.stats |tail -10
1981       0     322      82       0     324      84       0     320      90       0     335      99       0     352     111       0     356     130       0     349     144       0     344     143       0     360     124       0     323     105       0     320      90       0     321      83       0     360     107
1982       0     312      80       0     323      83       0     318      88       0     329      98       0     348     111       0     357     126       0     365     143       0     364     140       0     355     124       0     318     105       0     321      90       0     318      84       0     365     106
1983       0     348      82       0     340      85       0     330      90       0     505      99       0     348     112       0     364     130       0     360     145       0     362     143       0     368     126       0     323     105       0     318      91       0     317      82       0     505     108
1984       0     312      80       0     320      82       0     315      89       0     329      97       0     347     112       0     359     130       0     353     144       0     343     140       0     353     122       0     324     105       0     318      89       0     316      81       0     359     106
1985       0     314      80       0     320      81       0     319      88       0     359      98       0     352     111       0     367     128       0     358     141       0     355     141       0     353     123       0     323     105       0     322      90       0     323      82       0     367     106
1986       0     312      81       0     330      83       0     316      89       0     321      99       0     366     112       0     394     129       0     371     143       0     342     139       0     354     122       0     323     104       0     318      90       0     316      82       0     394     106
1987       0     320      81       0     318      85       0     318      88       0     335      98       0     363     112       0     366     130       0     397     147       0     356     142       0     354     126       0     345     105       0     325      91       0     365      84       0     397     107
1988       0     413      83       0     324      84       0     352      90       0     323      99       0     346     113       0     363     131       0     367     148       0     358     144       0     387     126       0     342     105       0     320      89       0     315      83       0     413     108
1989       0     336      80       0     320      83       0     327      90       0     324      98       0     343     112       0     366     130       0     365     145       0     349     142       0     353     124       0     323     105       0     324      90       0     332      84       0     366     107
1990       0     320      83       0     323      85       0     476      92       0     413     101       0     361     113       0     363     132       0     371     146       0     353     143       0     371     124       0     327     106       0     318      93       0     317      84       0     476     108

crua6[/cru/cruts/vap_wet_investigation] head -90 cru_ts_2_10.1901-2002.wet.grid.stats | tail -10
1981       0    3100    1018       0    2800     919       0    3100     980       0    3000     911       0    3100     945       0    3000    1010       0    3100    1051       0    3100    1040       0    3000     981       0    3100    1017       0    3000    1021       0    3100    1003       0    3100     992
1982       0    3100     983       0    2800     894       0    3100     967       0    3000     925       0    3100     927       0    3000     941       0    3100     979       0    3100    1054       0    3000    1007       0    3100    1055       0    3000     996       0    3100    1044       0    3100     981
1983       0    3100    1035       0    2800     863       0    3100     941       0    3000     919       0    3100     929       0    3000     949       0    3100     990       0    3100    1039       0    3000     996       0    3100    1026       0    3000    1034       0    3100    1057       0    3100     982
1984       0    3100     981       0    2900     848       0    3100     920       0    3000     841       0    3100     932       0    3000     973       0    3100    1048       0    3100    1057       0    3000    1023       0    3100    1057       0    3000     992       0    3100    1016       0    3100     974
1985       0    3100     969       0    2800     896       0    3100     952       0    3000     896       0    3100     928       0    3000     938       0    3100    1057       0    3100    1043       0    3000     993       0    3100    1043       0    3000    1066       0    3100    1029       0    3100     984
1986       0    3100     988       0    2800     908       0    3100     950       0    3000     895       0    3100     922       0    3000     962       0    3100    1022       0    3100    1052       0    3000    1037       0    3100    1052       0    3000    1048       0    3100     986       0    3100     985
1987       0    3100    1011       0    2800     909       0    3100     930       0    3000     856       0    3100     954       0    3000     972       0    3100    1021       0    3100    1064       0    3000     978       0    3100     991       0    3000    1002       0    3100    1047       0    3100     978
1988       0    3100    1033       0    2900     924       0    3100     971       0    3000     903       0    3100     938       0    3000     980       0    3100    1039       0    3100    1101       0    3000    1014       0    3100    1017       0    3000    1007       0    3100    1054       0    3100     998
1989       0    3100    1019       0    2800     936       0    3100    1015       0    3000     892       0    3100     978       0    3000    1020       0    3100    1054       0    3100    1075       0    3000    1023       0    3100    1070       0    3000    1046       0    3100    1053       0    3100    1015
1990       0    3100     996       0    2800     959       0    3100    1011       0    3000     953       0    3100     928       0    3000     907       0    3100     983       0    3100     986       0    3000     915       0    3100     968       0    3000     949       0    3100     959       0    3100     960

So the monthly maxima are fine here. But for the decadal files?

crua6[/cru/cruts/vap_wet_investigation] cat cru_ts_2_10.1981-1990.vap.grid.stats0
1981       0     310     102       0     280      92       0     310      98       0     300      91       0     310      95       0     300     101       0     310     105       0     310     104       0     300      98       0     310     102       0     300     102       0     310     100       0     310      99
1982       0     310      98       0     280      89       0     310      97       0     300      93       0     310      93       0     300      94       0     310      98       0     310     105       0     300     101       0     310     106       0     300     100       0     310     104       0     310      98
1983       0     310     104       0     280      86       0     310      94       0     300      92       0     310      93       0     300      95       0     310      99       0     310     104       0     300     100       0     310     103       0     300     103       0     310     106       0     310      98
1984       0     310      98       0     290      85       0     310      92       0     300      84       0     310      93       0     300      97       0     310     105       0     310     106       0     300     102       0     310     106       0     300      99       0     310     102       0     310      97
1985       0     310      97       0     280      90       0     310      95       0     300      90       0     310      93       0     300      94       0     310     106       0     310     104       0     300      99       0     310     104       0     300     107       0     310     103       0     310      98
1986       0     310      99       0     280      91       0     310      95       0     300      90       0     310      92       0     300      96       0     310     102       0     310     105       0     300     104       0     310     105       0     300     105       0     310      99       0     310      99
1987       0     310     101       0     280      91       0     310      93       0     300      86       0     310      95       0     300      97       0     310     102       0     310     106       0     300      98       0     310      99       0     300     100       0     310     105       0     310      98
1988       0     310     103       0     290      92       0     310      97       0     300      90       0     310      94       0     300      98       0     310     104       0     310     110       0     300     101       0     310     102       0     300     101       0     310     105       0     310     100
1989       0     310     102       0     280      94       0     310     101       0     300      89       0     310      98       0     300     102       0     310     105       0     310     107       0     300     102       0     310     107       0     300     105       0     310     105       0     310     101
1990       0     310     100       0     280      96       0     310     101       0     300      95       0     310      93       0     300      91       0     310      98       0     310      99       0     300      91       0     310      97       0     300      95       0     310      96       0     310      96

crua6[/cru/cruts/vap_wet_investigation] cat cru_ts_2_10.1981-1990.wet.grid.stats
1981       0    3220     819       0    3240     842       0    3200     903       0    3350     992       0    3520    1113       0    3560    1304       0    3490    1440       0    3440    1427       0    3600    1236       0    3230    1048       0    3200     898       0    3210     833       0    3600    1071
1982       0    3120     801       0    3230     827       0    3180     881       0    3290     982       0    3480    1108       0    3570    1264       0    3650    1432       0    3640    1405       0    3550    1239       0    3180    1048       0    3210     901       0    3180     835       0    3650    1060
1983       0    3480     820       0    3400     850       0    3300     898       0    5050     993       0    3480    1125       0    3640    1295       0    3600    1451       0    3620    1428       0    3680    1259       0    3230    1050       0    3180     912       0    3170     822       0    5050    1075
1984       0    3120     803       0    3200     823       0    3150     887       0    3290     971       0    3470    1124       0    3590    1299       0    3530    1437       0    3430    1404       0    3530    1218       0    3240    1053       0    3180     894       0    3160     812       0    3590    1060
1985       0    3140     803       0    3200     815       0    3190     882       0    3590     978       0    3520    1113       0    3670    1277       0    3580    1405       0    3550    1411       0    3530    1233       0    3230    1048       0    3220     900       0    3230     821       0    3670    1057
1986       0    3120     809       0    3300     827       0    3160     889       0    3210     990       0    3660    1120       0    3940    1294       0    3710    1428       0    3420    1393       0    3540    1220       0    3230    1041       0    3180     895       0    3160     821       0    3940    1061
1987       0    3200     810       0    3180     849       0    3180     880       0    3350     980       0    3630    1124       0    3660    1296       0    3970    1466       0    3560    1423       0    3540    1260       0    3450    1054       0    3250     910       0    3650     844       0    3970    1075
1988       0    4130     829       0    3240     835       0    3520     902       0    3230     989       0    3460    1133       0    3630    1311       0    3670    1475       0    3580    1441       0    3870    1264       0    3420    1054       0    3200     889       0    3150     832       0    4130    1079
1989       0    3360     804       0    3200     825       0    3270     898       0    3240     978       0    3430    1120       0    3660    1301       0    3650    1447       0    3490    1421       0    3530    1240       0    3230    1052       0    3240     900       0    3320     836       0    3660    1069
1990       0    3200     827       0    3230     853       0    4760     918       0    4130    1005       0    3610    1127       0    3630    1322       0    3710    1462       0    3530    1428       0    3710    1236       0    3270    1062       0    3180     930       0    3170     844       0    4760    1084

Much confusion! The orders of magnitude have changed to reflect the expected ranges – but the data have clearly been swapped!

Another decade:

crua6[/cru/cruts/vap_wet_investigation]cat cru_ts_2_10.1921-1930.vap.grid.stats
1921       0     310     102       0     280      89       0     310     100       0     300      88       0     310      95       0     300      97       0     310     101       0     310     104       0     300     102       0     310     104       0     300      97       0     310     101       0     310      98
1922       0     310      95       0     280      93       0     310      97       0     300      89       0     310      95       0     300      98       0     310     105       0     310     107       0     300      98       0     310     104       0     300     102       0     310     103       0     310      99
1923       0     310     100       0     280      88       0     310      97       0     300      90       0     310      97       0     300      98       0     310     101       0     310     101       0     300     100       0     310     104       0     300     101       0     310     103       0     310      98
1924       0     310      97       0     290      89       0     310      95       0     300      90       0     310      91       0     300      97       0     310     100       0     310     102       0     300     101       0     310     102       0     300     102       0     310     100       0     310      97
1925       0     310      98       0     280      89       0     310      98       0     300      87       0     310      90       0     300      96       0     310     101       0     310     103       0     300     103       0     310     101       0     300     103       0     310     100       0     310      97
1926       0     310      99       0     280      87       0     310      95       0     300      87       0     310      95       0     300      93       0     310     103       0     310     104       0     300      99       0     310     102       0     300     102       0     310     101       0     310      97
1927       0     310      96       0     280      87       0     310      96       0     300      89       0     310      94       0     300      97       0     310     103       0     310     104       0     300     102       0     310     103       0     300     102       0     310      99       0     310      98
1928       0     310      97       0     290      89       0     310      91       0     300      88       0     310      90       0     300      96       0     310     101       0     310     104       0     300      97       0     310      99       0     300      99       0     310      96       0     310      96
1929       0     310      95       0     280      84       0     310      95       0     300      86       0     310      91       0     300      95       0     310     100       0     310     102       0     300      98       0     310     102       0     300      98       0     310      98       0     310      95
1930       0     310      98       0     280      88       0     310      97       0     300      88       0     310      93       0     300      93       0     310      99       0     310     103       0     300      99       0     310     105       0     300     101       0     310      97       0     310      97
crua6[/cru/cruts/vap_wet_investigation] cat cru_ts_2_10.1921-1930.wet.grid.stats
1921       0    3120     805       0    3190     814       0    3140     874       0    3210     969       0    3800    1106       0    3590    1289       0    3600    1439       0    3440    1390       0    3530    1220       0    3230    1032       0    3180     877       0    3160     824       0    3800    1053
1922       0    3120     794       0    3220     813       0    3140     874       0    3210     971       0    3470    1104       0    3590    1280       0    3560    1420       0    3440    1387       0    3530    1211       0    3230    1025       0    3180     896       0    3140     812       0    3590    1049
1923       0    3070     799       0    3140     808       0    3140     871       0    3210     947       0    3460    1082       0    3660    1276       0    3560    1410       0    3440    1392       0    3530    1222       0    3230    1048       0    3180     907       0    3160     826       0    3660    1049
1924       0    3270     792       0    3230     817       0    3160     879       0    3340     955       0    3460    1094       0    3710    1264       0    3560    1415       0    3440    1386       0    3530    1228       0    3160    1034       0    3180     892       0    3140     806       0    3710    1047
1925       0    3110     786       0    3190     815       0    3140     873       0    3210     966       0    3470    1084       0    3590    1253       0    3560    1408       0    3460    1397       0    3530    1231       0    3230    1025       0    3160     896       0    3220     828       0    3590    1047
1926       0    3260     815       0    3290     842       0    3310     889       0    3310     957       0    3460    1085       0    3950    1266       0    3560    1406       0    3450    1402       0    3530    1237       0    3230    1042       0    3250     899       0    3150     811       0    3950    1054
1927       0    3120     795       0    3300     822       0    3170     873       0    3360     959       0    3540    1096       0    3610    1271       0    3550    1424       0    3450    1390       0    3530    1233       0    3230    1053       0    3180     897       0    3280     814       0    3610    1052
1928       0    3200     809       0    3240     823       0    3140     875       0    3400     963       0    3470    1095       0    3590    1263       0    3560    1425       0    3450    1397       0    3530    1228       0    3230    1039       0    3180     902       0    3160     824       0    3590    1054
1929       0    3150     794       0    3190     802       0    3160     867       0    3310     950       0    3600    1084       0    3580    1250       0    3550    1399       0    3440    1385       0    3530    1218       0    3230    1049       0    3180     897       0    3160     806       0    3600    1042
1930       0    3190     798       0    3190     824       0    3150     881       0    3210     965       0    3470    1099       0    3590    1276       0    3530    1424       0    3440    1409       0    3540    1220       0    3200    1042       0    3300     907       0    3280     829       0    3590    1056

The same story. And the final two years:

crua6[/cru/cruts/vap_wet_investigation] cat cru_ts_2_10.2001-2002.vap.grid.stats
2001       0     310      87       0     280      84       0     310      90       0     300      81       0     310      87       0     300      93       0     310      95       0     310      95       0     300      89       0     310      95       0     300      95       0     310      87       0     310      90
2002       0     310      91       0     280      85       0     310      92       0     300      83       0     310      88       0     300      89       0     310      93       0     310      94       0     300      92       0     310      93       0     300      88       0     310      86       0     310      90
crua6[/cru/cruts/vap_wet_investigation] cat cru_ts_2_10.2001-2002.wet.grid.stats
2001       0    3320     834       0    3250     841       0    3180     913       0    3490    1010       0    3490    1147       0    4380    1323       0    3660    1487       0    5120    1466       0    3530    1266       0    3460    1088       0    3620     932       0    3410     843       0    5120    1096
2002       0    3310     837       0    3390     863       0    3270     918       0    3370    1012       0    3930    1151       0    4140    1339       0    3750    1503       0    5110    1453       0    3530    1261       0    3310    1067       0    3470     922       0    3300     833       0    5110    1096

It looks like a consistent problem: all the decadal VAp and WET files should be discarded, and only the ‘full run’ 1901-2002
files used. But my theory that the error occurred when the 1901-2002 files were converted to decadal doesn’t sound true now,
because why would the precision levels change? Surely, if the decadal files are derived from the 1901-2002 files, it’s just
a case of copying data across?

Let’s look at *just* 1981, to try and assess this issue:

FULL 1901-2002 FILE
VAP:
1981       0     322      82       0     324      84       0     320      90       0     335      99       0     352     111       0     356     130       0     349     144       0     344     143       0     360     124       0     323     105       0     320      90       0     321      83       0     360     107
WET:
1981       0    3100    1018       0    2800     919       0    3100     980       0    3000     911       0    3100     945       0    3000    1010       0    3100    1051       0    3100    1040       0    3000     981       0    3100    1017       0    3000    1021       0    3100    1003       0    3100     992

DECADAL 1981-1990 FILE
VAP:
1981       0     310     102       0     280      92       0     310      98       0     300      91       0     310      95       0     300     101       0     310     105       0     310     104       0     300      98       0     310     102       0     300     102       0     310     100       0     310      99
WET:
1981       0    3220     819       0    3240     842       0    3200     903       0    3350     992       0    3520    1113       0    3560    1304       0    3490    1440       0    3440    1427       0    3600    1236       0    3230    1048       0    3200     898       0    3210     833       0    3600    1071

It’s evident that the data have not only been swapped – they’ve been scaled too. Aaaarrrgghhhhhh!!!!!

*******************************************************************************
* PRIORITY INTERRUPT ENDS * PRIORITY INTERRUPT ENDS * PRIORITY INTERRUPT ENDS *
*******************************************************************************

Now, where were we.. ah yes, Vapour Pressure. So far:

Original:          vap.0311181410.dtb
+
MCDW:              vap.0709111032.dtb
v
v
Intermediate:      vap.0710241541.dtb
+
CLIMAT:            vap.0710151817.dtb
v
v
Final:             vap.0710241549.dtb

Produce anomalies:

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./anomdtb

> ***** AnomDTB: converts .dtb to anom .txt for gridding *****

> Enter the suffix of the variable required:
.vap
> Select the .cts or .dtb file to load:
vap.0710241549.dtb
> Specify the start,end of the normals period:
1961,1990
> Specify the missing percentage permitted:
25
> Data required for a normal:           23
> Specify the no. of stdevs at which to reject data:
3
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
0
> Select the generic .txt file to save (yy.mm=auto):
vap.txt
> Select the first,last years AD to save:
1901,2006
> Operating…

> NORMALS            MEAN percent      STDEV percent
>         .dtb     908812    45.2
>         .cts      35390     1.8     944202    47.0
> PROCESS        DECISION percent %of-chk
> no lat/lon          105     0.0     0.0
> no normal       1064261    53.0    53.0
> out-of-range         49     0.0     0.0
> accepted         944153    47.0
> Dumping years 1901-2006 to .txt files…

crua6[/cru/cruts/version_3_0/secondaries/vap]
<END_QUOTE>

Well.. 47% accepted, 53% no normals.. pretty much as expected, and unlikely to improve no matter how many new CLIMAT
and MCDW updates there are. We need back data for 1961-1990.

Synthetic production:

<BEGIN_QUOTE>
IDL> vap_gts_anom,dtr_prefix=’../dtrbin/dtrbin’,tmp_prefix=’../tmpbin/tmpbin’,1901,2006,outprefix=’vapsyn/vapsyn’,dumpbin=1
% Compiled module: VAP_GTS_ANOM.
% Compiled module: RDBIN.
% Compiled module: STRIP.
% Compiled module: DEFXYZ.
Land,sea:       56016       68400
Calculating tmn normal
% Compiled module: TVAP.
Calculating synthetic vap normal
% Compiled module: ESAT.
Calculating synthetic anomalies
% Compiled module: MOMENT.
1901 vap (x,s2,<<,>>):  1.61250e-05  6.15570e-06    -0.160607     0.222689
% Compiled module: WRBIN.
1902 vap (x,s2,<<,>>): -0.000123188  3.46116e-05    -0.268891    0.0261283
1903 vap (x,s2,<<,>>):  6.86689e-05  4.52675e-06    -0.121429     0.123995
(etc)
<END_QUOTE>

(also produced, vapsyn/vapsyn1901 .. vapsyn/vapsyn2006)

Gridding with both observed and synthetic data:

IDL> quick_interp_tdm2,1901,2006,’vapglo/vap.’,1000,gs=0.5,dumpglo=’dumpglo’,synth_prefix=’vapsyn/vapsyn’,pts_prefix=’vaptxt/vap.’

Create absolute grids from anomaly grids:

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.vap
Enter a name for the gridded climatology file: clim.6190.lan.vap.grid
Enter the path and stem of the .glo files: vapglo/vap.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: vapabs/
Now, CONCENTRATE. Addition or Percentage (A/P)? A
Do you wish to limit the output values? (Y/N): Y
1. Set minimum to zero
2. Set a single minimum and maximum
3. Set monthly minima and maxima (for wet/rd0)

Choose: 1
Right, erm.. off I jolly well go!
vap.01.1901.glo
vap.02.1901.glo
(etc)
<END_QUOTE>

and finally, create the output files:

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./mergegrids
Welcome! This is the MERGEGRIDS program.
I will create decadal and full gridded files
from the output files of (eg) glo2abs.for.

Enter a gridfile with YYYY for year and MM for month: vapabs/vap.MM.YYYY.glo.abs
Enter Start Year:  1901
Enter Start Month: 01
Enter End Year:    2006
Enter End Month:   12

Please enter a sample OUTPUT filename, replacing
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.YYYY.vap.dat
Try again.. read instructions this time?

Please enter a sample OUTPUT filename, replacing
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.vap.dat
Writing cru_ts_3_00.1901.1910.vap.dat
Writing cru_ts_3_00.1911.1920.vap.dat
Writing cru_ts_3_00.1921.1930.vap.dat
Writing cru_ts_3_00.1931.1940.vap.dat
Writing cru_ts_3_00.1941.1950.vap.dat
Writing cru_ts_3_00.1951.1960.vap.dat
Writing cru_ts_3_00.1961.1970.vap.dat
Writing cru_ts_3_00.1971.1980.vap.dat
Writing cru_ts_3_00.1981.1990.vap.dat
Writing cru_ts_3_00.1991.2000.vap.dat
Writing cru_ts_3_00.2001.2006.vap.dat
<END_QUOTE>

Ah – and I was really hoping this time that it would just WORK. But of course not – nothing works first
time in this project. I ran crutsstats on cru_ts_3_00.1901.2006.vap.dat, and:

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./crutsstats

CRUTSSTATS: Stats for CRU TS gridded files

Enter the monthly gridded data file: cru_ts_3_00.1901.2006.vap.dat

Please enter the start year: 1901

106 years from 1901 to 2006

Output file is cru_ts_3_00.1901.2006.vap.dat.stats
1901       1     358     106
1902       1     358     106
1903       1     358     106
1904       1     358     106
1905       1     358     106
(etc)
2002       1     358     106
2003       1     358     106
2004       1     358     106
2005       1     358     106
2006       1     358     106
<END_QUOTE>

What?! Every year has the same min (fine, VAP of 0 is probably impossible), max (I can just about believe,
if there’s a cell with no stations inside the cdd and the normal for it happens to be the highest value, and
MEAN (oh no, NO WAY!). What’s odder – the .glo files are different:

crua6[/cru/cruts/version_3_0/secondaries/vap/vapabs] diff vap.06.1974.glo.abs.nh vap.06.1975.glo.abs.nh |wc -l
56

Admittedly, 56 lines different out of 360 isn’t hugely different. And looking, they are only slight and
infrequent differences. But the monthly stats are all cloned as well:

1901       1     311      80       1     320      83       1     315      89       1     320      98       1     346     111       1     358     128       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106
1902       1     311      80       1     320      83       1     315      89       1     320      98       1     346     111       1     358     128       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106
1903       1     311      80       1     320      83       1     315      89       1     320      98       1     346     111       1     358     128       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106
1904       1     311      80       1     320      83       1     315      89       1     320      98       1     346     111       1     358     128       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106
1905       1     311      80       1     320      83       1     315      89       1     320      98       1     346     111       1     358     128       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106
1906       1     311      80       1     320      83       1     315      89       1     320      98       1     346     111       1     358     128       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106
1907       1     311      80       1     320      83       1     315      89       1     320      98       1     346     111       1     358     128       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106

Well the first thing to do, after the inevitable wailing and gnashing of teeth, is to re-run glo2abs
without the ‘zero minimum’ flag (just in case I coded that badly, I was in a hurry):

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.vap
Enter a name for the gridded climatology file: clim.6190.lan.vap.grid2
Enter the path and stem of the .glo files: vapglo/vap.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: vapabs/
Now, CONCENTRATE. Addition or Percentage (A/P)? A
Do you wish to limit the output values? (Y/N): N
Right, erm.. off I jolly well go!
vap.01.1901.glo
vap.02.1901.glo
(etc)
<END_QUOTE>

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./mergegrids
Welcome! This is the MERGEGRIDS program.
I will create decadal and full gridded files
from the output files of (eg) glo2abs.for.

Enter a gridfile with YYYY for year and MM for month: vapabs/vap.MM.YYYY.glo.abs
Enter Start Year:  1901
Enter Start Month: 01
Enter End Year:    2006
Enter End Month:   12

Please enter a sample OUTPUT filename, replacing
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.vap.dat
Writing cru_ts_3_00.1901.1910.vap.dat
Writing cru_ts_3_00.1911.1920.vap.dat
Writing cru_ts_3_00.1921.1930.vap.dat
Writing cru_ts_3_00.1931.1940.vap.dat
Writing cru_ts_3_00.1941.1950.vap.dat
Writing cru_ts_3_00.1951.1960.vap.dat
Writing cru_ts_3_00.1961.1970.vap.dat
Writing cru_ts_3_00.1971.1980.vap.dat
Writing cru_ts_3_00.1981.1990.vap.dat
Writing cru_ts_3_00.1991.2000.vap.dat
Writing cru_ts_3_00.2001.2006.vap.dat
<END_QUOTE>

Sadly, that gave the same result. So what of the published (v2.10) VAP dataset? That looks ~ok:

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./crutsstats

CRUTSSTATS: Stats for CRU TS gridded files

Enter the monthly gridded data file: cru_ts_2_10.1901-2002.vap.grid

Please enter the start year: 1901

102 years from 1901 to 2002

Output file is cru_ts_2_10.1901-2002.vap.grid.stats
1901       0     411     105
1902       0     413     104
1903       0     465     104
1904       0     359     104
1905       0     383     104
1906       0     376     105
1907       0     387     104
(etc)
<END_QUOTE>

Not good at all. Or, rather, good that it must be a solvable problem. Except that it’s 10 to 5 on a Sunday
afternoon and it’s me that’s got to solve it.

Where to start? Well, retrace your steps, that’s how you get out of a minefield. So first up, to compare
similar months in the anomaly files. Though I already know what I’m going to find, don’t I? Because glo2abs
isn’t going to do anything unusual, it just adds the normal and there you go. So if the absolutes are very
similar, the anomalies will be, too.. hmm. Well, I *suppose* I could try producing two more copies of the
output files – one with just synthetic data and one with just observed data? It’s only a couple of re-runs
of the quick_interp_tdm2.pro IDL routine..

Started with the synthetic-only run:

<BEGIN_QUOTE>
IDL> quick_interp_tdm2,1901,2006,’vapsynglo/vapsyn.’,1000,gs=0.5,dumpglo=’dumpglo’,nostn=1,synth_prefix=’vapsyn/vapsyn’

crua6[/cru/cruts/version_3_0/secondaries/vap/syn_only] ./glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: ../clim.6190.lan.vap
Enter a name for the gridded climatology file: clim.6190.lan.vap.grid
Enter the path and stem of the .glo files: vapsynglo/vapsyn.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: vapsynabs/
Now, CONCENTRATE. Addition or Percentage (A/P)? A
Do you wish to limit the output values? (Y/N): N
Right, erm.. off I jolly well go!
vapsyn.01.1901.glo
vapsyn.02.1901.glo
(etc)

crua6[/cru/cruts/version_3_0/secondaries/vap/syn_only] ./mergegrids
Welcome! This is the MERGEGRIDS program.
I will create decadal and full gridded files
from the output files of (eg) glo2abs.for.

Enter a gridfile with YYYY for year and MM for month: vapsynabs/vapsyn.MM.YYYY.glo.abs
Enter Start Year:  1901
Enter Start Month: 01
Enter End Year:    2006
Enter End Month:   12

Please enter a sample OUTPUT filename, replacing
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.vap.syn.dat
Writing cru_ts_3_00.1901.1910.vap.syn.dat
Writing cru_ts_3_00.1911.1920.vap.syn.dat
Writing cru_ts_3_00.1921.1930.vap.syn.dat
Writing cru_ts_3_00.1931.1940.vap.syn.dat
Writing cru_ts_3_00.1941.1950.vap.syn.dat
Writing cru_ts_3_00.1951.1960.vap.syn.dat
Writing cru_ts_3_00.1961.1970.vap.syn.dat
Writing cru_ts_3_00.1971.1980.vap.syn.dat
Writing cru_ts_3_00.1981.1990.vap.syn.dat
Writing cru_ts_3_00.1991.2000.vap.syn.dat
Writing cru_ts_3_00.2001.2006.vap.syn.dat
<END_QUOTE>

And then the observed-only:

<BEGIN_QUOTE>
IDL> quick_interp_tdm2,1901,2006,’vapobsglo/vapobs.’,1000,gs=0.5,dumpglo=’dumpglo’,pts_prefix=’vaptxt/vap.’

crua6[/cru/cruts/version_3_0/secondaries/vap/obs_only] ./glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: ../clim.6190.lan.vap
Enter a name for the gridded climatology file: clim.6190.lan.vap.grid
Enter the path and stem of the .glo files: vapobsglo/vapobs.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: vapobsabs/
Now, CONCENTRATE. Addition or Percentage (A/P)? A
Do you wish to limit the output values? (Y/N): N
Right, erm.. off I jolly well go!
vapobs.01.1901.glo
vapobs.02.1901.glo
(etc)

crua6[/cru/cruts/version_3_0/secondaries/vap/obs_only] ./mergegrids
Welcome! This is the MERGEGRIDS program.
I will create decadal and full gridded files
from the output files of (eg) glo2abs.for.

Enter a gridfile with YYYY for year and MM for month: vapobsabs/vapobs.MM.YYYY.glo.abs
Enter Start Year:  1901
Enter Start Month: 01
Enter End Year:    2006
Enter End Month:   12

Please enter a sample OUTPUT filename, replacing
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.vap.obs.dat
Writing cru_ts_3_00.1901.1910.vap.obs.dat
Writing cru_ts_3_00.1911.1920.vap.obs.dat
Writing cru_ts_3_00.1921.1930.vap.obs.dat
Writing cru_ts_3_00.1931.1940.vap.obs.dat
Writing cru_ts_3_00.1941.1950.vap.obs.dat
Writing cru_ts_3_00.1951.1960.vap.obs.dat
Writing cru_ts_3_00.1961.1970.vap.obs.dat
Writing cru_ts_3_00.1971.1980.vap.obs.dat
Writing cru_ts_3_00.1981.1990.vap.obs.dat
Writing cru_ts_3_00.1991.2000.vap.obs.dat
Writing cru_ts_3_00.2001.2006.vap.obs.dat
<END_QUOTE>

So.. how do the stats look for these two datasets?

Synthetic-only:

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap/syn_only] ./crutsstats

CRUTSSTATS: Stats for CRU TS gridded files

Enter the monthly gridded data file: cru_ts_3_00.1901.2006.vap.syn.dat

Please enter the start year: 1901

106 years from 1901 to 2006

Output file is cru_ts_3_00.1901.2006.vap.syn.dat.stats
1901       1     358     106
1902       1     358     106
1903       1     358     106
1904       1     358     106
1905       1     358     106
1906       1     358     106
(etc)
<END_QUOTE>

Observed-only:

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap/obs_only] ./crutsstats

CRUTSSTATS: Stats for CRU TS gridded files

Enter the monthly gridded data file: cru_ts_3_00.1901.2006.vap.obs.dat

Please enter the start year: 1901

106 years from 1901 to 2006

Output file is cru_ts_3_00.1901.2006.vap.obs.dat.stats
1901       1     358     106
1902       1     358     106
1903       1     358     106
1904       1     358     106
1905       1     358     106
1906       1     358     106
(etc)
<END_QUOTE>

Oh, GOD. What is going on? Are we data sparse and just looking at the climatology? How can a synthetic
dataset derived from tmp and dtr produce the same statistics as an ‘real’ dataset derived from observations?

Let’s be logical. Here are the two ‘separated’ gridding runs:

IDL> quick_interp_tdm2,1901,2006,’vapsynglo/vapsyn.’,1000,gs=0.5,dumpglo=’dumpglo’,nostn=1,synth_prefix=’vapsyn/vapsyn’
IDL> quick_interp_tdm2,1901,2006,’vapobsglo/vapobs.’,1000,gs=0.5,dumpglo=’dumpglo’,pts_prefix=’vaptxt/vap.’

Well they look fine. The synthetic run has no other data inputs (‘nostn=1′), and the observed run has no references to
the synthetic data. So.. either quick_interp_tdm2.pro is doing something ‘unusual’, or, or.. hang on, let’s try the
climatology for stats:

1961       1     311      80       1     320      83       1     315      89       1     320      98       1     346     111       1     358     128       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106

Ah, Bingo was his name-o! as I was hoping (well OK it’s a bad kind of hope), the reason it’s all the same is that it is
by and large defaulting to the climatology. Which means that not much (any?) data is getting through, no matter if we
use synthetic, observed, or both together. What’s odd about that conclusion is that the synthetic data is derived from
TMP and DTR – two very well-populated datasets! So synthetics alone should pretty much fill the.. hang on, just though
of something horrendous.. oh, okay, probably not that. I was wondering if glo2abs.for was factoring the normals so that
the anomalies were insignificant, but the equation is:

absgrid(ilon(i),ilat(i)) =
*          nint(anoms(ilon(i),ilat(i))*10) + normals(i,imo)

..so the anomaly is getting the weight! But still – - not a wise thing to leave to automatics. So glo2abs should prompt
the user.. but with what? Just one anomaly and normal? Several? The same one from different timesteps? Eeek. Let’s look
at this actual case.

January 1961, lines 11103, 11104 in the glo file (11099, 11100 without header, putting it on about 33.5 degs N)
0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  4.7173E-04  4.7224E-03
5.4273E-03  6.1323E-03  6.8372E-03  7.5422E-03  8.2472E-03  1.9677E-03  0.0000E+00  0.0000E+00

Those anomalies are mighty tiny, given that the absolutes are three-digit integers! Hardly surprising they’re not really
appearing on the radar when added to normals typically two orders of magnitude higher! Even with the *10 in the glo2abs
prog, we’re still looking at values around 0.06.

Looked at the observed anomalies (output from anomdtb.f90) – here the anomalies are larger! Between -5 and +5, roughly,
which is what I’m used to seeing in .txt files.

To investigate the synthetics, I needed to look at re-run vap_gts_tdm.pro. It says,

; Note that anomalies are in hPa*10 (bin) or hPa (glo)

So the binary file anomaly units – the ones we’re using – are in hPa*10. Let’s get one o’ them synthetic glo files:

IDL> vap_gts_anom,dtr_prefix=’../dtrbin/dtrbin’,tmp_prefix=’../tmpbin/tmpbin’,1961,1961,outprefix=’vapsynglo/vapsyn.’,dumpglo=1
Land,sea:       56016       68400
Calculating tmn normal
% Compiled module: TVAP.
Calculating synthetic vap normal
% Compiled module: ESAT.
Calculating synthetic anomalies
% Compiled module: MOMENT.
1961 vap (x,s2,<<,>>):  5.72571e-05  9.01807e-07   -0.0653905    0.0261283
% Compiled module: SAVEGLO.
% Compiled module: SELECTMODEL.

For Jan 1961 (may as well stick with it), -999 is the missing value code. The range is -0.0149 to +0.0222 (remember this is
an anomaly in hPa according to the program comment). So if it’s telling the truth, the binary anomalies presented to
quick_interp_tdm2.pro will range from roughly -0.3 to +0.3. still nt going to impinge on normals between 1 and 358, is it?

So, what are the normals in? Well according to clim.6190.lan.vap:

crua6[/cru/cruts/version_3_0/secondaries/vap] head -11 clim.6190.lan.vap
Tyndall Centre grim file created on 12.01.2004 at 11:47 by Dr. Tim Mitchell
.vap = vapour pressure (hPa)
0.5deg lan clim:1961-90 MarkNew
[Long=-180.00, 180.00] [Lati= -90.00,  90.00] [Grid X,Y= 720, 360]
[Boxes=   67420] [Years=1975-1975] [Multi=    0.1000] [Missing=-999]
Grid-ref=   1, 148
291  294  296  293  287  279  265  262  271  279  286  287
Grid-ref=   1, 311
14   11   13   21   44   69   92   90   65   37   22   14
Grid-ref=   1, 312
13   10   12   20   43   67   90   87   63   35   21   13

That’s what I’ve been missing! D’oh. That ‘[Multi=    0.1000]‘. That would still only give a range of 0.1 to 35.8 hPa, and
my anomalies are still around 0.006 (or 0.3 for synthetics).

Two things, then. Firstly to get glo2abs to read the multiplicative factor from the climatology header and impose it on the
output. Secondly to work out why all the anomalies have different magnitudes! Or is vapour pressure really so teeny?

Working on glo2abs. Well my theory for additive anomalies is this: I read in the normals, and apply the multiplicative factor
in the header (for VAP it’s 0.1). I assume the anomalies are already in the relevant units (ie require no factoring). This
looks to be the case for .txt files anyway. So I can add the anomaly to the adjusted normal. Then (because I need integer
output) I can DIVIDE by the factor (because that got us from integer to real before). Fine in theory but it all depends on
the anomalies being in regular ‘units’ (why wouldn’t they be? They’re reals!). OK, check from the beginning, obs first:

Database:       hPa*10 (typically 3-digit integers)

anomdtb.for calls subroutine CheckVariSuffix, which contains:

<BEGIN_QUOTE>
else if (Suffix.EQ.”.vap”) then
Variable=”vapour pressure (hPa)”
Factor = 0.1
<END_QUOTE>

And how does anomdtb.f90 use the Factor? well in the original version:

<BEGIN_QUOTE>
crua6[/cru/cruts/untouched/code/linux/cruts] grep ‘Factor’ anomdtb.f90
real :: MissThresh,StdevThresh,DistanceThresh,Factor, ExeSpace,WyeSpace
call CheckVariSuffix (LoadSuffix,Variable,Factor)
OpTot = OpTot + (real(DataA(XAYear,XMonth,XAStn))/Factor)
OpTotSq = OpTotSq + ((real(DataA(XAYear,XMonth,XAStn))/Factor) ** 2)
NormMean  (XMonth,XAStn) = Factor*OpTot/OpEn
if (OpTotSq.GT.0) NormStdev (XMonth,XAStn) = Factor*sqrt((OpTotSq/OpEn)-((OpTot/OpEn)**2))
OpTot = OpTot + (real(DataA(XAYear,XMonth,XAStn))/Factor)
OpTotSq = OpTotSq + ((real(DataA(XAYear,XMonth,XAStn))/Factor) ** 2)
NormMean  (XMonth,XAStn) = Factor*OpTot/OpEn
NormStdev (XMonth,XAStn) = Factor*sqrt((OpEn/(OpEn-1))*((OpTotSq/OpEn)-((OpTot/OpEn)**2)))
OpTot = OpTot + (DataA(XAYear,XMonth,XAStn)/Factor)
OpTotSq = OpTotSq + (DataA(XAYear,XMonth,XAStn)/Factor) ** 2
OpStDev = Factor*sqrt((OpEn/(OpEn-1))*((OpTotSq/OpEn)-((OpTot/OpEn)**2)))
OpMean  = Factor*(OpTot/OpEn)
ALat(XAStn),ALon(XAStn),AElv(XAStn),real(DataA(XAYear,XMonth,XAStn))*Factor,AStn(XAStn)
<END_QUOTE>

I *think* the factor is being used multiplicatively. I don’t understand why it’s being used as a divisor though.. I must
have understood last December because I managed to rewrite the ‘standard deviation’ section, also using it as a divisor!

One obvious thing to try is to use the revised glo2abs. That should now be working in ‘units’ (but saving in whatever
range the normals are in). After that I could try comparing the old and ‘new’ (ie modded by me) versions of anomdtb.f90
to ensure I didn’t break something (sure I didn’t, but still..)

So, I revised glo2abs. It now reads the ‘Multi’ factor from the climatology header, and applies it to the normals before
they’re used.

So, re-ran quick_interp+tdm2.pro:

IDL> quick_interp_tdm2,1901,2006,’vapglo/vap.’,1000,gs=0.5,dumpglo=’dumpglo’,synth_prefix=’vapsyn/vapsyn’,pts_prefix=’vaptxt/vap.’

A sample of the outputs, vap.12.1962.glo, had a range of values from -2.3006 to +1.8388, with the majority being 0. A total
of 56387 cells were nonzero, which given that there are 67420 land cells, isn’t too bad. It’s a pretty gaussian distribution,
too. It still seems like a small variation (typically +/- 0.5). For the cell where I live (Norwich, 363,286), the normals are:

Grid-ref= 363, 286
71   69   76   86  107  129  147  149  135  115   88   77

Or in hPa:

Grid-ref= 363, 286
7.1   6.9   7.6   8.6   10.7  12.9  14.7  14.9  13.5  11.5  8.8   7.7

The nearest station (well based on a quick search) is LOWESTOFT. Taking 1962 and 1963 and scaling:

62  7.6   6.9   6.5   9.2  10.9  12.6  14.4  15.0  13.6  12.3   8.9   6.5
63  5.4   5.5   7.9   9.9  11.1  14.8  15.8  15.1  14.6  11.7  10.3   6.9

The ranges:
2.2   1.4   1.4   0.7   0.2   2.2   1.4   0.1   1.0   0.6   1.4   0.4

Well our sample December 1962 range of anomalies was -2.3006 to +1.8388, and the January range is -3.3640 to +2.1250. So, I
have to admit, that’s the same order of magnitude for our particular cell, year and month(s).

So, assuming these .glo files are OK, we’ll try glo2abs again:

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.vap
Enter a name for the gridded climatology file: deleteme1
Enter the path and stem of the .glo files: vapglo/vap.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: vapabs/
Now, CONCENTRATE. Addition or Percentage (A/P)? A
Do you wish to limit the output values? (Y/N): Y
1. Set minimum to zero
2. Set a single minimum and maximum
3. Set monthly minima and maxima (for wet/rd0)
Choose: 1
Right, erm.. off I jolly well go!
vap.01.1901.glo
vap.02.1901.glo
(etc)
<END_QUOTE>

..and the result.. look good! For (again) December 1962:

Min   0 (well I did set that, see above)
Max 315

Number of zeros: 1078, perfectly respectable although I do wonder if VAP=0 is illegal.. hmm.. OK, added an option in glo2abs:

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./glo2abs
Welcome! This is the GLO2ABS program.
I will create a set of absolute grids from
a set of anomaly grids (in .glo format), also
a gridded version of the climatology.
Enter the path and name of the normals file: clim.6190.lan.vap
Enter a name for the gridded climatology file: deleteme3
Enter the path and stem of the .glo files: vapglo/vap.
Enter the starting year: 1901
Enter the ending year:   2006
Enter the path (if any) for the output files: vapabs/
Now, CONCENTRATE. Addition or Percentage (A/P)? A
Do you wish to limit the output values? (Y/N): Y
1. Set minimum to zero
2. Set a single minimum and maximum
3. Set monthly minima and maxima (for wet/rd0)
4. Set all values >0, (ie, positive)
Choose: 4
Right, erm.. off I jolly well go!
vap.01.1901.glo
vap.02.1901.glo
(etc)
<END_QUOTE>

Result for December 1962: Min 1, Max 315. A good spread of values, without a disproportionate number of ’1′s, I’m please
to say.

So, to generate the output files. Again.

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./mergegrids
Welcome! This is the MERGEGRIDS program.
I will create decadal and full gridded files
from the output files of (eg) glo2abs.for.

Enter a gridfile with YYYY for year and MM for month: vapabs/vap.MM.YYYY.glo.abs
Enter Start Year:  1901
Enter Start Month: 01
Enter End Year:    2006
Enter End Month:   12

Please enter a sample OUTPUT filename, replacing
start year with SSSS and end year with EEEE: cru_ts_3_00.SSSS.EEEE.vap.dat
Writing cru_ts_3_00.1901.1910.vap.dat
Writing cru_ts_3_00.1911.1920.vap.dat
Writing cru_ts_3_00.1921.1930.vap.dat
Writing cru_ts_3_00.1931.1940.vap.dat
Writing cru_ts_3_00.1941.1950.vap.dat
Writing cru_ts_3_00.1951.1960.vap.dat
Writing cru_ts_3_00.1961.1970.vap.dat
Writing cru_ts_3_00.1971.1980.vap.dat
Writing cru_ts_3_00.1981.1990.vap.dat
Writing cru_ts_3_00.1991.2000.vap.dat
Writing cru_ts_3_00.2001.2006.vap.dat
<END_QUOTE>

And what of the statistics. Well by now I’ve realised that we don’t have complete coverage! So the normals are
bound to poke through quite a bit. In fact, the story is as it was in the beginning! *cries*

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/vap] ./crutsstats

CRUTSSTATS: Stats for CRU TS gridded files

Enter the monthly gridded data file: cru_ts_3_00.1901.2006.vap.dat

Please enter the start year: 1901

106 years from 1901 to 2006

Output file is cru_ts_3_00.1901.2006.vap.dat.stats
1901       1     358     106
1902       1     358     106
1903       1     358     106
1904       1     358     106
1905       1     358     106
1906       1     358     106
1907       1     358     106
1908       1     358     106
(etc)
<END_QUOTE>

Now admittedly, the 106 mean does vary.. it hioits the dizzying heights of 107 on occasion! With a couple of 105s
thrown in to balance the books. Had a look at the stats in detail, compared to those for CRU TS 2.10. And guess
what? Yes.. the old stats are better! Here’s the first decade:

CRU TS 2.10
1901       0     324      79       0     338      82       0     314      88       0     321      97       0     411     110       0     378     128       0     358     143       0     343     140       0     353     122       0     332     103       0     318      88       0     314      81       0     411     105
1902       0     312      80       0     319      82       0     314      87       0     321      96       0     413     109       0     366     125       0     356     141       0     343     138       0     353     122       0     323     102       0     318      88       0     315      80       0     413     104
1903       0     314      79       0     331      82       0     315      88       0     334      95       0     465     109       0     359     125       0     371     141       0     359     139       0     353     122       0     323     102       0     318      88       0     315      80       0     465     104
1904       0     310      78       0     319      81       0     312      86       0     321      95       0     347     109       0     359     126       0     355     140       0     344     138       0     354     121       0     323     103       0     318      89       0     316      81       0     359     104
1905       0     314      79       0     319      79       0     321      86       0     326      95       0     346     109       0     383     127       0     356     142       0     344     139       0     353     122       0     330     103       0     318      90       0     321      82       0     383     104
1906       0     328      80       0     330      81       0     323      87       0     335      98       0     376     111       0     359     128       0     356     142       0     343     140       0     353     122       0     323     103       0     318      89       0     316      82       0     376     105
1907       0     312      79       0     327      80       0     314      87       0     321      94       0     387     106       0     359     125       0     379     140       0     343     139       0     353     122       0     323     104       0     318      87       0     316      81       0     387     104
1908       0     312      79       0     323      81       0     330      86       0     338      95       0     346     109       0     359     127       0     353     142       0     343     138       0     353     122       0     316     102       0     318      87       0     316      81       0     359     104
1909       0     312      79       0     319      81       0     323      87       0     321      94       0     346     107       0     359     125       0     355     141       0     343     140       0     354     122       0     320     103       0     318      90       0     316      81       0     359     104
1910       0     312      80       0     319      82       0     315      86       0     321      95       0     347     109       0     359     126       0     383     142       0     343     139       0     353     122       0     318     102       0     318      87       0     316      80       0     383     104

CRU TS 3.00
1901       1     311      80       1     320      83       1     315      89       1     320      98       1     346     111       1     358     128       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106
1902       1     311      80       1     320      83       1     315      89       1     320      98       1     346     111       1     358     128       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106
1903       1     311      80       1     320      83       1     315      89       1     320      98       1     346     111       1     358     128       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106
1904       1     311      80       1     320      82       1     315      89       1     320      98       1     346     111       1     358     128       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106
1905       1     311      80       1     320      83       1     315      88       1     320      98       1     346     111       1     358     128       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106
1906       1     311      80       1     320      83       1     315      89       1     320      98       1     346     111       1     358     128       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106
1907       1     311      80       1     320      83       1     315      89       1     320      98       1     346     111       1     358     128       1     356     143       1     342     141       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106
1908       1     311      80       1     320      83       1     315      89       1     320      98       1     346     111       1     358     129       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106
1909       1     311      80       1     320      83       1     315      89       1     320      98       1     346     111       1     358     128       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106
1910       1     311      80       1     320      83       1     315      89       1     320      98       1     346     111       1     358     128       1     356     143       1     342     140       1     354     123       1     323     104       1     318      90       1     315      82       1     358     106

..and here’s a more recent decade:

CRU TS 2.10
1991       0     314      82       0     322      84       0     331      90       0     672     100       0     523     113       0     540     134       0     607     147       0     424     143       0     353     125       0     328     106       0     386      91       0     350      83       0     672     108
1992       0     337      82       0     383      84       0     450      90       0     613      98       0     347     112       0     359     128       0     373     140       0     345     140       0     353     122       0     347     103       0     414      89       0     384      83       0     613     106
1993       0     324      81       0     403      83       0     449      90       0     622      98       0     518     113       0     534     131       0     652     147       0     398     143       0     353     122       0     333     105       0     408      89       0     339      84       0     652     107
1994       0     346      82       0     396      82       0     457      90       0     626     100       0     524     113       0     507     132       0     605     146       0     416     143       0     349     125       0     332     107       0     397      93       0     341      84       0     626     108
1995       0     369      83       0     406      86       0     461      90       0     686     100       0     505     114       0     565     134       0     673     146       0     492     147       0     364     127       0     342     108       0     427      91       0     339      82       0     686     109
1996       0     334      81       0     431      83       0     548      88       0     634      97       0     524     113       0     530     131       0     645     147       0     422     142       0     366     124       0     337     106       0     413      91       0     344      84       0     645     107
1997       0     367      82       0     322      84       0     348      90       0     323      99       0     344     113       0     484     133       0     426     147       0     523     145       0     353     126       0     348     108       0     345      93       0     370      86       0     523     109
1998       0     339      84       0     345      89       0     338      92       0     355     104       0     361     116       1     531     137       1     356     152       0     560     149       0     370     128       0     347     108       0     369      92       0     334      85       0     560     111
1999       0     323      83       0     334      86       0     324      90       0     336     100       0     362     113       0     487     132       0     362     148       0     357     143       1     353     127       0     331     107       0     337      91       0     316      85       0     487     109
2000       0     319      82       0     319      85       0     319      91       0     328     102       0     356     114       0     476     133       0     358     146       0     520     146       0     353     124       0     333     107       0     335      91       0     334      84       0     520     109

CRU TS 3.00
1991       1     311      81       1     320      83       1     320      90       1     320     100       1     346     113       1     358     132       1     356     146       1     342     143       1     354     125       1     323     105       1     318      91       1     315      82       1     358     108
1992       1     311      82       1     319      84       1     315      90       1     320      97       1     346     111       1     358     127       1     356     141       1     342     140       1     354     122       1     323     102       1     317      89       1     315      83       1     358     106
1993       1     313      81       1     315      83       1     315      89       1     320      98       1     346     112       1     358     131       1     356     146       1     342     142       1     354     122       1     323     103       1     323      88       1     317      83       1     358     106
1994       1     311      82       1     322      82       1     315      89       1     320      99       1     346     112       1     358     131       1     356     146       1     346     142       1     354     125       1     323     106       1     318      92       1     315      83       1     358     107
1995       1     311      82       1     318      85       1     320      90       1     324      99       1     346     112       1     358     131       1     356     146       1     345     144       1     354     124       1     323     107       1     321      90       1     315      81       1     358     108
1996       1     311      80       1     321      82       1     320      87       1     320      96       1     346     111       1     358     130       1     356     145       1     343     141       1     354     122       1     323     105       1     318      90       1     319      82       1     358     106
1997       1     311      81       1     320      84       1     315      90       1     320      99       1     346     113       1     358     131       1     356     145       1     342     143       1     354     123       1     323     106       1     318      90       1     315      83       1     358     107
1998       1     311      81       1     334      85       1     326      89       1     338     100       1     346     114       1     358     134       1     356     148       1     342     145       1     354     125       1     323     105       1     318      89       1     315      84       1     358     108
1999       1     316      82       1     320      85       1     322      88       1     320      99       1     346     112       1     358     131       1     356     148       1     342     142       1     354     125       1     323     106       1     318      91       1     315      84       1     358     108
2000       1     317      82       1     320      84       1     315      90       1     320     100       1     346     113       1     358     131       1     356     146       1     342     144       1     354     123       1     323     105       1     318      90       1     315      83       1     358     108

I DON’T UNDERSTAND!!!!!

Well, OK – I see that a VAP of zero is acceptable. Though as it’s a pressure, I don’t believe it! I’ll stick with 1.

The issue is that the earlier dataset has a variability (in the maximum) that we just don’t have in the new one. And
I feel that I’ve been through every bloody phase of the process and checked we’re doing it right!!!

~~~

Right. Let’s look at the distributions of values in each dataset. We’ll take Jan 1910 and Jun 2000. And as this is
a textual document, I’ll have to describe the results.

Offsets. Well each month has 360 lines, so each year has 4320 lines. So for Jan 1910 we need to skip nine years,
or 38880 lines, then take the next 360. For Jun 2000 we need to skip 99 years, or 427680 lines, then another five
months, or 1800 lines, then take the next 360. So:

head -39240 cru_ts_2.10.1901-2002.vap.dat |tail -360 > cru_ts_2.10.Jan.1910.vap.dat
head -39240 cru_ts_3.00.1901.2006.vap.dat |tail -360 > cru_ts_3.00.Jan.1910.vap.dat

head -428040 cru_ts_2.10.1901-2002.vap.dat |tail -360 > cru_ts_2.10.Jun.2000.vap.dat
head -428040 cru_ts_3_00.1901.2006.vap.dat |tail -360 > cru_ts_3_00.Jun.2000.vap.dat

I loaded the resultant monthly files into Matlab, and played with them mercilessly.

Well to start with, they all look the same. Truly. I’ve got a 4-plot page with TS 2.10 in the left-hand column,
and TS 3.00 on the right. January 1910 on the top, June 2000 on the bottom. and they look pretty much inseparable,
though if I had to Spot The Difference, the TS 2.10 June 2000 distribution is a little flatter (that is, the
massive spike at the low end is a little shorter, and the rest of the entourage are a little taller.

What are particularly worthy of note are the maximums. Because they don’t match those produced by crutsstats.for.

Month        Model    Max (Matlab)   Max (crutsstats)
Jan 1910   TS 2.10             312                312
Jan 1910   TS 3.00             311                311
Jun 2000   TS 2.10             319                476
Jun 2000   TS 3.00             317                358

Not entirely sure why the latter ones would be wrong. But I suspect crutsstats – because otherwise I miscounted
the line numbers to extract June 2000 with! Actually, OK, that does seem more likely.

Let’s try it from the 1991-2000 files. The offset will be 9*4320 + 5*360 + 360 = 41040.

gunzip -c /cru/cruts/fromtyn1/data/cru_ts_2.10/newly_gridded/data_dec/cru_ts_2_10.1991-2000.vap.grid.gz | head -41040 | tail -360 > cru_ts_2_10.Jun.2000.vap.dat
gunzip -c cru_ts_3_00.1991.2000.vap.dat.gz | head -41040 | tail -360 > cru_ts_3_00.Jun.2000.vap.dat

Well – looks like I did miscount, because the new files are different! And so are the Maxima:

Month        Model    Max (Matlab)   Max (crutsstats)
Jun 2000   TS 2.10             300                476
Jun 2000   TS 3.00             358                358

..so almost perfect. At least the stats for the file I’m creating match.

And now the June 2000 histograms are much more interesting! And of course (for this is THIS project), much
more worrying. The June 2000 plot for the new data (3.00) shows a fall at VAP ->0. This is in contrast to the
other three, which show a more expotential decline from a high near 0 (though admittedly the 2.10 version does have a second
peak at around 120). In fact, the June 2000 3.00 series has peaks at ~90 and ~300! Oh, help.

The big question must be, why does it have so little representation in the low numbers? Especially given that I’m rounding
erroneous negatives up to 1!!

Oh, sod it. It’ll do. I don’t think I can justify spending any longer on a dataset, the previous version of which was
completely wrong (misnamed) and nobody noticed for five years.

So.. one week to go before handover, and I’m just STARTING the Sun/Cloud parameter, the one I thought would cause the most
trouble! Oh, boy. Let’s try and work out the scenario.

Historically, we’ve issued Cloud:

crua6[/cru/cruts/fromtyn1/data/cru_ts_2.10/data_all] gunzip -c cru_ts_2_10.1901-2002.cld.Z |head -10
Tyndall Centre grim file created on 22.01.2004 at 13:52 by Dr. Tim Mitchell
.cld = cloud cover (percentage)
CRU TS 2.1
[Long=-180.00, 180.00] [Lati= -90.00,  90.00] [Grid X,Y= 720, 360]
[Boxes=   67420] [Years=1901-2002] [Multi=    0.1000] [Missing=-999]
Grid-ref=   1, 148
725  750  750  700  638  600  613  613  663  675  713  725

..so data is in % x10.

Then, of course, there’s the relevant read_me text (from /cru/cruts/fromdpe1a/code/idl/pro/read_me_GRIDDING.txt):

“Bear in mind that there is no working synthetic method for cloud, because Mark New
lost the coefficients file and never found it again (despite searching on tape
archives at UEA) and never recreated it. This hasn’t mattered too much, because
the synthetic cloud grids had not been discarded for 1901-95, and after 1995
sunshine data is used instead of cloud data anyway.”

So that’s alright then! See also the earlier attempts to recreate TS 2.10 cloud.

The main gridding prog for cloud appears to be cal_cld_gts_tdm.pro:

pro cal_cld_gts_tdm,dtr_prefix,outprefix,year1,year2,info=info

; calculates cld anomalies using relationship with dtr anomalies
; reads coefficients from predefined files (*1000)
; reads DTR data from binary output files from quick_interp_tdm2.pro (binfac=1000)
; creates cld anomaly grids at dtr grid resolution
; output can then be used as dummy input to splining program that also
;  includes real cloud anomaly data

As for converting sun hours to cloud cover.. we only appear to have interactive, file-by-file
programs. Herewith all the relevant progs I can find:

IDL
./idl/pro/cal_cld_gts_tdm.pro        (synthetic cloud from DTR)
./idl/pro/cloudcorr.pro              (construct cloud correlation coefficients with DTR)
./idl/pro/cloudcorrspc.pro           (construct cloud correlation coefficients with sunshine %)
./idl/pro/cloudcorrspcann.pro        (construct cloud correlation coefficients with sunshine %)
./idl/pro/cloudcorrspcann9196.pro    (construct cloud correlation coefficients with sunshine %)

(the ‘ann’ versions above include the assumption that the relationships remain constant through the year)

F77
./f77/mnew/sh2cld_tdm.for            (this one needs to be modded as for sp2cldp_m.for I think)
./f77/mnew/Hsp2cldp_m.for            (one I wrote last year which seems to almost do what we need)
./f77/mnew/sp2cld_m.for              (this one needs to be modded as for sp2cldp_m.for I think)
./f77/mnew/sh2sp_m.for
./f77/mnew/sh2sp_normal.for
./f77/mnew/sh2sp_tdm.for

Aaaand – another head-banging shocker! The program sh2cld_tdm.for, which describes itself thusly:

program sunh2cld
c converts sun hours monthly time series to cloud percent (n/N)

Does NO SUCH THING!!! Instead it creates SUN percentages! This is clear from the variable names and
user interactions.

So.. if I add the sunh -> sun% process from sh2cld_tdm.for into Hsp2cldp_m.for, I should end up with a
sun hours to cloud percent convertor. Possibly. Except that the sun% to cld% engine looks like it’s
creating oktas instead:

do im=1,12
ratio = (real(sunp(im))/100)
if (ratio.ge.0.95)           cldp(im) = 0
if (ratio.lt.0.95.and.ratio.ge.0.35)
*                                   cldp(im) = (0.95-ratio)*100
if (ratio.lt.0.35.and.ratio.ge.0.15)
*                                   cldp(im) = ((0.35-ratio)*50)+60
if (ratio.lt.0.15)           cldp(im) = ((0.15-ratio)*100)+70
if (cldp(im).gt.80.0)        cldp(im) = 80.0
if (ratio.lt.0)              cldp(im) = -9999
enddo

Added the previous ‘*12.5′ mod to approximate true percentages (*10).

Looking back I see we found cloud and sunpercent databases (line counts shown):

228936 cld.0301081434.dtb
104448 cld.0312181428.dtb
111989 combo.cld.dtb
57395 spc.0301201628.dtb
51551 spc.0312221624.dtb
51551 spc.94-00.0312221624.dtb

And agreed a strategy:

<BEGIN_QUOTE>
AGREED APPROACH for cloud (5 Oct 06).

For 1901 to 1995 – stay with published data. No clear way to replicate
process as undocumented.

For 1996 to 2002:
1. convert sun database to pseudo-cloud using the f77 programs;
2. anomalise wrt 96-00 with anomdtb.f;
3. grid using quick_interp_tdm.pro (which will use 6190 norms);
4. calculate (mean9600 – mean6190) for monthly grids, using the
published cru_ts_2.0 cloud data;
5. add to gridded data from step 3.

This should approximate the correction needed.
<END_QUOTE>

This is confusing. I can only use one (observed) cloud database in the final gridding. The above
agreement seems to assume that all data after 1996 will come from sun. But dtbstat.for reports:

<BEGIN_QUOTE>
Report for: spc.0312221624.dtb (it’s similar for the other spcs, except the earlier one goes to 2002)

Stations in Northern Hemisphere:     1750
Stations in Southern Hemisphere:      350
Total:     2100

Maximum Timespan in Northern Hemisphere: 1889 to 2003
Maximum Timespan in Southern Hemisphere: 1944 to 2003
Global Timespan: 1889 to 2003

Minimum Data Value:     0
Maximum Data Value:  1000
<END_QUOTE>

So the Sun Percent databases run for long periods. Similarly, for cloud:

<BEGIN_QUOTE>
Report for: cld.0312181428.dtb

Stations in Northern Hemisphere:     3286
Stations in Southern Hemisphere:      319
Total:     3605

Maximum Timespan in Northern Hemisphere: 1905 to 1996
Maximum Timespan in Southern Hemisphere: 1959 to 1996
Global Timespan: 1905 to 1996

Minimum Data Value:     0
Maximum Data Value:  1000
<END_QUOTE>

Not as long a run, and it sure ends at 1996! So 1901 to 1995 will, as agreed, remain untouched.

Well.. let’s try converting the MCDW and CLIMAT Sun hours to Sun percents, then adding to the
SPC database (spc.0312221624.dtb). Modified Hsh2cld .for to save sun percent too. Lots of debugging..
eventually dug out:

Doorenbos, J., Pruitt, W.O., 1977. Guidelines for predicting crop water requirements. FAO irrigation
and drainage paper no. 24. Food and Agriculture Organization of the United Nations, Rome.

This was used to inform the Fortran conversion programs by indicating the latitude-potential_sun and
sun-to-cloud relationships. It also assisted greatly in understanding what was wrong – Tim was in
fact calculating Cloud Percent, despite calling it Sun Percent!! Just awful.

And so..

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/db/cld] ./Hsh2cld

Hsh2cld – Convert a Sun Hours database to a Cloud Percent one

Please enter the Sun Hours database: sun.0709111032.dtb
Data Factor detected: *1.000

Completed -     1693 stations converted.

Sun Percentage Database:   spc.0711271420.dtb
Cloud Percentage Database: cld.0711271420.dtb

crua6[/cru/cruts/version_3_0/db/cld] ./Hsh2cld

Hsh2cld – Convert a Sun Hours database to a Cloud Percent one

Please enter the Sun Hours database: sun.0710151817.dtb
Data Factor detected: *0.100

Completed -     2020 stations converted.

Sun Percentage Database:   spc.0711271421.dtb
Cloud Percentage Database: cld.0711271421.dtb

crua6[/cru/cruts/version_3_0/db/cld]
<END_QUOTE>

So, now the luxury of a little experiment.. I merged the MCDW and CLIMAT ‘spc’ databases into
the existing one *separately*. Here were the results:

MCDW:
<BEGIN_QUOTE>
uealogin1[/cru/cruts/version_3_0/db/cld] ./newmergedb

WELCOME TO THE DATABASE UPDATER

Before we get started, an important question:
If you are merging an update – CLIMAT, MCDW, Australian – do
you want the quick and dirty approach? This will blindly match
on WMO codes alone, ignoring data/metadata checks, and making any
unmatched updates into new stations (metadata permitting)?

Enter ‘B’ for blind merging, or <ret>: B
Please enter the Master Database name: spc.0312221624.dtb
Please enter the Update Database name: spc.0711271420.dtb

Reading in both databases..
Master database stations:     2100
Update database stations:     1693

New master database: spc.0711271504.dtb

Update database stations:         1693
> Matched with Master stations:   867
(automatically:   867)
(by operator:     0)
> Added as new Master stations:   826
> Rejected:                         0
<END_QUOTE>

CLIMAT:
<BEGIN_QUOTE>
Enter ‘B’ for blind merging, or <ret>: B
Please enter the Master Database name: spc.0312221624.dtb
Please enter the Update Database name: spc.0711271421.dtb

Reading in both databases..
Master database stations:     2100
Update database stations:     2020

98 reject(s) from update process 0711271505

New master database: spc.0711271505.dtb

Update database stations:         2020
> Matched with Master stations:   917
(automatically:   917)
(by operator:     0)
> Added as new Master stations:  1005
> Rejected:                        98
Rejects file:                 spc.0711271421.dtb.rejected
<END_QUOTE>

So, as expected, a few of the CLIMAT stations couldn’t be matched for metadata.. no worries.
what’s interestng is that roughly the same ratio of stations were matched with existing in both
cases (867/1693 vs 917/2020). Slightly better for MCDW though.

Now, as our updates only start in 2003, that means we’ve just lost between 826 and 1005 sets of
data (added as new). We can’t be exact as we don’t know the overlap between the MCDW and the CLIMAT
bulletins.. but we will have a better idea when I try the anomdtb experiment on the combined update.
First, add the CLIMAT update again, this time to the MCDW-updated database:

CLIMAT:
<BEGIN_QUOTE>
Enter ‘B’ for blind merging, or <ret>: B
Please enter the Master Database name: spc.0711271504.dtb
Please enter the Update Database name: spc.0711271421.dtb

Reading in both databases..
Master database stations:     2926
Update database stations:     2020

38 reject(s) from update process 0711271514

New master database: spc.0711271514.dtb

Update database stations:         2020
> Matched with Master stations:  1736
(automatically:  1736)
(by operator:     0)
> Added as new Master stations:   246
> Rejected:                        38
Rejects file:                 spc.0711271421.dtb.rejected
<END_QUOTE>

Note several bits of good news! Firstly, rejects are down to 38 (60 having matched with MCDW stations).
That’s not *that* good of course – those will be new and so 2003 onwards only. Similarly, (1005-246=)
759 CLIMAT bulletins matched MCDW ones, they will also be 2003 onwards only. In other words, there were
only (1736-759=) 977 updates to existing stations. So.. yes I’m being sidetracked again.. I found and
downloaded ALL the MCDW bulletins, back to 1994!

<BEGIN_QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/MCDW] ./mcdw2cru

MCDW2CRU: Convert MCDW Bulletins to CRU Format

Enter the earliest MCDW file: ssm9409.fin
Enter the latest MCDW file (or <ret> for single files): ssm0708.fin

All Files Processed
tmp.0711271645.dtb: 2785 stations written   *** SEE LATER RUNS ***
vap.0711271645.dtb: 2786 stations written   *** SEE LATER RUNS ***
rdy.0711271645.dtb: 2781 stations written   *** SEE LATER RUNS ***
pre.0711271645.dtb: 2791 stations written   *** SEE LATER RUNS ***
sun.0711271645.dtb: 2184 stations written   *** SEE LATER RUNS ***

Thanks for playing! Byeee!
<END_QUOTE>

Now I’m not planning to re-run all the previous parameters! Hell, they should have had the older data
in already! But for sun/cloud, this could help enormously. Here’s the plan:

1. Merge the CLIMAT-sourced database into the new MCDW-sourced database.
2. Convert this modern sun hours database into a modern cloud percent database.
3. Add normals for 95-02.
4. Use the new program ‘normshift.for’ to calculate 95-02 normals from TS 2.10 CLD.
5. Calculate difference between TS 2.10 6190 normls and the above.
6. Modify the in-database normals (step 3) with the difference (step 5).
7. Carry on as before?

No.. this won’t work. anomdtb.for calculates normals on the fly – it would have to know too much.

The next opportunity comes at the output from anomdtb – the normalised values in the *.txt files that
the IDL gridder reads. These are just files – one per month – with lists of coordinates and values, so
ideal to add normalised values to. Decided that this will be the process:

Modern SunH DB  ->  Hsh2cld.for  ->  Modern Cld% DB
Modern Cld% DB  ->  newprog.for  ->  6190anomalies.txt

..meanwhile, as before..

Normal Cld% DB  ->  anomdtb.for  ->  6190anomalies.txt

So we then just have to merge the two 6190 anomaly sets! Which could just be a concatenation.

Easy, then.. the only thing we need is the miraculous ‘newprog.for’! With three days before delivery.

No, no, no – HANG ON. Let’s not try and boil the ocean! How about:

1901-2002      Static, as published, leave well alone (or recalculate with better DTR).
2003-2006/7    Calc from modern SunH and use the suggested mods after gridding.

This is what was originally intended. But there will be problems:

1. MCDW only goes back to 2006, so what’s the data density for 2003-2005? Should this also use synthetic
cloud from DTR? I guess yes.

2. No guarantee of continuity from 2002 to 2003. This could be the real stickler. Moving from one system
to the other – this is why it might be better to re-run 1901-2002 as well.

OKAY.. normshift.for now creates a gridded set of conversion data between whatever period you choose
and 1961-1990. Such that it can be added to the gridded output of the process run with the ‘false’
normalisation period.

So.. first, merge your bulletins:

Well FIRSTLY, you realise that your databases don’t have normals lines, so you modify mcdw2cru.for and
climat2cru.for to optionally add them, then you re-run them on the bulletins, ending up with:

<BEGIN_QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/MCDW] ./mcdw2cru

MCDW2CRU: Convert MCDW Bulletins to CRU Format

Enter the earliest MCDW file: ssm9409.fin
Enter the latest MCDW file (or <ret> for single files): ssm0708.fin
Add a dummy normals line? (Y/N): Y

All Files Processed
tmp.0711272156.dtb: 2785 stations written
vap.0711272156.dtb: 2786 stations written
rdy.0711272156.dtb: 2781 stations written
pre.0711272156.dtb: 2791 stations written
sun.0711272156.dtb: 2184 stations written

Thanks for playing! Byeee!
<END_QUOTE>

<BEGIN_QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/CLIMAT] ./climat2cru

CLIMAT2CRU: Convert MCDW Bulletins to CRU Format

Enter the earliest CLIMAT file: climat_data_200301.txt
Enter the latest CLIMAT file (or <ret> for single file): climat_data_200707.txt
Add a dummy normals line? (Y/N): Y

All Files Processed
tmp.0711272219.dtb: 2881 stations written
vap.0711272219.dtb: 2870 stations written
rdy.0711272219.dtb: 2876 stations written
pre.0711272219.dtb: 2878 stations written
sun.0711272219.dtb: 2020 stations written
tmn.0711272219.dtb: 2800 stations written
tmx.0711272219.dtb: 2800 stations written

Thanks for playing! Byeee!
<END_QUOTE>

So.. NOW can I merge CLIMAT into MCDW?!

As expected, thank goodness:

<BEGIN_QUOTE>
uealogin1[/cru/cruts/version_3_0/incoming/merge_CLIMAT_into_MCDW] ./newmergedb

WELCOME TO THE DATABASE UPDATER

Before we get started, an important question:
If you are merging an update – CLIMAT, MCDW, Australian – do
you want the quick and dirty approach? This will blindly match
on WMO codes alone, ignoring data/metadata checks, and making any
unmatched updates into new stations (metadata permitting)?

Enter ‘B’ for blind merging, or <ret>: B
Please enter the Master Database name: sun.0711272156.dtb
Please enter the Update Database name: sun.0711272219.dtb

Reading in both databases..
Master database stations:     2184
Update database stations:     2020

Looking for WMO code matches..
28 reject(s) from update process 0711272225

Writing sun.0711272225.dtb

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

OUTPUT(S) WRITTEN

New master database: sun.0711272225.dtb

Update database stations:         2020
> Matched with Master stations:  1775
(automatically:  1775)
(by operator:     0)
> Added as new Master stations:   217
> Rejected:                        28
Rejects file:                 sun.0711272219.dtb.rejected
<END_QUOTE>

Wahey! Lots of stations to play with!

So, next.. convert to cloud!

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/db/cld] ./Hsh2cld

Hsh2cld – Convert a Sun Hours database to a Cloud Percent one

Please enter the Sun Hours database: sun.0711272225.dtb
Data Factor detected: *1.000

Completed -     2401 stations converted.

Sun Percentage Database:   spc.0711272230.dtb
Cloud Percentage Database: cld.0711272230.dtb
<END_QUOTE>

So.. bated breath..

..and yay!

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/cld] ./anomdtb

> ***** AnomDTB: converts .dtb to anom .txt for gridding *****

> Enter the suffix of the variable required:
.cld
> Select the .cts or .dtb file to load:
cld.0711272230.dtb

> Specify the start,end of the normals period:
1995,2002
> Specify the missing percentage permitted:
12.5
> Data required for a normal:            7
> Specify the no. of stdevs at which to reject data:
3
> Select outputs (1=.cts,2=.ann,3=.txt,4=.stn):
3
> Check for duplicate stns after anomalising? (0=no,>0=km range)
0
> Select the generic .txt file to save (yy.mm=auto):
cld.txt
> Select the first,last years AD to save:
1995,2007
> Operating…

/tmp_mnt/cru-auto/cruts/version_3_0/secondaries/cld/cld.0711272230.dts

> NORMALS            MEAN percent      STDEV percent
>         .dtb          0     0.0
>         .cts      83961    49.3      83961    49.3
> PROCESS        DECISION percent %of-chk
> no lat/lon           95     0.1     0.1
> no normal         86174    50.6    50.7
> out-of-range         28     0.0     0.0
> accepted          83933    49.3
> Dumping years 1995-2007 to .txt files…
<END_QUOTE>

Well.. a ‘qualified’ yay.. only half got normals! But I don’t like to raise the ‘missing percentage’
limit to 25% because we’re only talking about 8 values to begin with!!

The output files look OK.. between 400 and 600 values in each, not a lot really but hey, better than
nowt. So onto the conversion data (must stop calling ‘em factors, they’re not multiplicative).

<BEGIN_QUOTE>
crua6[/cru/cruts/version_3_0/secondaries/cld] ./normshift

NORMSHIFT – Normals from any period

Please enter the source file:   cru_ts_2_10.1901-2002.cld.grid
Enter the start year of this file:  1901
Enter the end year of this file:    2002
Enter the normal period start year: 1995
Enter the normal period end year:   2002
Enter the 3-character parameter:    cld

Normals file will be: clim.9502.to.6190.grid.cld
<END_QUOTE>

So, erm.. now we need to create our synthetic cloud from DTR. Except that’s the thing we CAN’T do because
pro cal_cld_gts_tdm.pro needs those bloody coefficients (a.25.7190, etc) that went AWOL. Frustratingly we
do have some of the outputs from the program (ie, a.25.01.7190.glo), but that’s obviously no use.

So, erm. We need synthetic cloud for 2003-2007, or we won’t have enough data to run with. And yes it’s
taken me this long to realise that. Oh, bugger.

Had a detailed search around Mark New’s old disk (still online thankfully). Found this:

<BEGIN_QUOTE>
crua6[/cru/mark1/markn/gts/cld/val] ls -l
total 7584
lrwxrwxrwx   1 f080     cru           25 Sep 12  2005 c1 -> /cru/u1/f080/isccp/c1_mon
-rw-r–r–   1 f080     cru         1290 Mar 24  1998 cld_corr.j
-rw-r–r–   1 f080     cru          938 Mar 17  1998 cld_scat.j
-rw-r—–   1 f080     cru       922584 Mar 24  1998 cru_hahn_corr.ps
-rw-r—–   1 f080     cru       922588 Mar 24  1998 cru_isccp_corr.ps
-rw-r—–   1 f080     cru          533 Mar 27  1998 cruobs_hahn_corr.j
-rw-r—–   1 f080     cru       868561 Mar 27  1998 cruobs_hahn_corr.ps
-rw-r–r–   1 f080     cru          697 Mar 20  1998 dtr_corr.j
-rw-r—–   1 f080     cru           50 Mar 27  1998 foo
-rw-r—–   1 f080     cru       248832 Mar 27  1998 glo25.cld.1980
-rw-r—–   1 f080     cru       248832 Mar 27  1998 glo25.cld.1981
-rw-r—–   1 f080     cru       248832 Mar 27  1998 glo25.cld.1982
-rw-r—–   1 f080     cru       248832 Mar 27  1998 glo25.cld.1983
-rw-r—–   1 f080     cru       248832 Mar 27  1998 glo25.cld.1984
-rw-r—–   1 f080     cru       248832 Mar 27  1998 glo25.cld.1985
-rw-r—–   1 f080     cru       248832 Mar 27  1998 glo25.cld.1986
-rw-r—–   1 f080     cru       248832 Mar 27  1998 glo25.cld.1987
-rw-r—–   1 f080     cru       248832 Mar 27  1998 glo25.cld.1988
-rw-r—–   1 f080     cru       248832 Mar 27  1998 glo25.cld.1989
-rw-r—–   1 f080     cru       248832 Mar 27  1998 glo25.cld.1990
-rw-r—–   1 f080     cru       248832 Mar 27  1998 glo25.cld.1991
-rw-r—–   1 f080     cru       248832 Mar 27  1998 glo25.cld.1992
-rw-r—–   1 f080     cru       248832 Mar 27  1998 glo25.cld.1993
-rw-r—–   1 f080     cru       248832 Mar 27  1998 glo25.cld.1994
-rw-r—–   1 f080     cru       248832 Mar 27  1998 glo25.cld.1995
-rw-r—–   1 f080     cru       922592 Mar 24  1998 hahn_isccp_corr.ps
-rw-r—–   1 f080     cru         2378 Mar 24  1998 test.j
<END_QUOTE>

..which looks to me like the place where he calculated the coefficients. The *.j files are IDL ‘Journal’ files,
so can be run from within IDL. This was my first attempt:

<BEGIN_QUOTE>
IDL> .run cld_corr.j
% Compiled module: $MAIN$.
% Compiled module: RD25_GTS.
YEAR:    1981
% Compiled module: RDBIN.
% Compiled module: STRIP.
foo: Permission denied.
foo: Permission denied.
foo: Permission denied.
% OPENR: Error opening file. Unit: 99, File: /home/cru/f098/u1/hahn/hahn25.1981
No such file or directory
% Execution halted at:  RDBIN              63 /cru/u2/f080/Idl/rdbin.pro
%                       RD25_GTS           11 /cru/u2/f080/Idl/rd25_gts.pro
%                       $MAIN$              1 /tmp_mnt/cru-auto/mark1/f080/gts/cld/val/cld_corr.j
IDL>
<END_QUOTE>

I then had to chase around to find three sets of missing files.. to fulfil these five conditions:

if keyword_set(hgrid) eq 0 then rd25_gts,$
hgrid,’~/u1/hahn/hahn25.’,1981,1991
if keyword_set(rgrid) eq 0 then rd25_gts,$
rgrid,’../glo_reg_25/glo.cld.’,1981,1991
if keyword_set(hgrid2) eq 0 then rd25_gts,$
hgrid2,’~/u1/hahn/hahn25.’,1983,1991
if keyword_set(igrid) eq 0 then rdisccp_gts,$
igrid,’c1/isccp.’,1983,1991
if keyword_set(rgrid2) eq 0 then rd25_gts,$
rgrid2,’../glo_reg_25/glo.cld.’,1983,1991

I managed to find the hahn25 files (on Mark’s disk), and some likely-looking isccp files (also on Mark’s disk).
But although there were plenty of files with ‘glo’, ‘cld’ and ’25′ in them, there were none matching the filename
construction above. However, as some of those were in the same directory – I’ll take that chance!!

I did try, honestly. Very hard. I found all the files, and put them in directories. I made a local copy of the job
file, ‘H_cld_corr.j’, with the local directory refs in. Hell, I even precompiled the correct version of rdbin!

All for no