Friday, April 27, 2012


cc: "Thorne, Peter" <>, Leopold Haimberger <>, Karl Taylor <>, Tom Wigley <>, John Lanzante <>, "'Susan Solomon'" <>, Melissa Free <>, peter gleckler <>, "'Philip D. Jones'" <>, Thomas R Karl <>, Steve Klein <>, carl mears <>, Doug Nychka <>, Gavin Schmidt <>, Frank Wentz <>,
date: Fri, 25 Apr 2008 12:55:28 -0700
from: Ben Santer <>
subject: Re: [Fwd: JOC-08-0098 - International Journal of Climatology]
to: Steve Sherwood <>

Dear Steve,

Thanks very much for these comments. They will be very helpful in
responding to Reviewer #1.

Best regards,


Steve Sherwood wrote:
> Ben,
> It sounds like the reviewer was fair. If (s)he misunderstood or didn't
> catch things, the length of the manuscript may have been a factor, and I
> am definitely sympathetic to that particular complaint.
>> CONCERN #1: Assumption of an AR-1 model for regression residuals.
> I also am no great fan of AR1 models parameterized by the lag-1
> variance, because if the time step is too short they can go greatly
> astray at longer lags where it matters. But if you choose the
> persistence parameter to give a good fit to the entire autocorrelation
> function--i.e. make sure it decays to 1/e at about the right lag--it
> should work fine. I suggest trying this to see whether it changes
> anything much, and if not, leaving it at that. I think that for simply
> generating confidence intervals on a scalar measure there is no reason
> to go to higher-order AR processes, as a matter of principle.
>> CONCERN #2: No "attempt to combine data across model runs."
> The only point of doing this would seem to be to test whether there are
> any individual models that can be falsified by the data. It is a
> judgment call whether to go down this road--my judgment would be, no,
> that is a subject for a model evaluation/intercomparison paper. The
> question at issue here is whether GCMs or the CMIP3 forcings share some
> common flaw; the implication of the Douglass et al paper is that they
> do, and that future climate may therefore venture outside the range
> simulated by GCMs. The appropriate null hypothesis is that the observed
> data record could with nonnegligible probability have been produced by a
> climate model---not that it could be reproduced by every climate model.
>> The Reviewer seems to be arguing that the main advantage of his
>> approach #2 (use of ensemble-mean model trends in significance
>> testing) relative to our paired trends test (his approach #1) is that
>> non-independence of tests is less of an issue with approach #2. I'm
>> not sure whether I agree. Are results from tests involving GFDL CM2.0
>> and GFDL CM2.0 temperature data truly "independent" given that both
>> models were forced with the same historical changes in anthropogenic
>> and natural external forcings? The same concerns apply to the high-
>> and low-resolution versions of the MIROC model, the GISS models, etc.
> (S)he seems to have been referring to the fact that all models are
> tested with the same data. I also fail to see how any change in
> approach would affect this issue.
>> I am puzzled by some of the comments the Reviewer has made at the top
>> of page 3 of his review. I guess the Reviewer is making these comments
>> in the context of the pair-wise tests described on page 2. Crucially,
>> the comment that we should use "...the standard error if testing the
>> average model trend" (and by "standard error" he means DCPS07's
>> sigma{SE}) IS INCONSISTENT with the Reviewer's approach #3, which
>> involves use of the inter-model standard deviation in testing the
>> average model trend.
> I also am puzzled. The standard error is appropriate if you have a
> large ensemble of observed time series, but not if you have only one.
> Computing the standard error of the model mean is useless when you have
> no good estimate of the mean of the real world to compare it to. The
> essential mistake of DCPS was to assume that the single real-world time
> series was a perfect estimator of the mean.
>> And I disagree with the Reviewer's comments regarding the superfluous
>> nature of Section 6. The Reviewer states that, "when simulating from a
>> know (statistical) model... the test statistics should by definition
>> give the correct answer. The whole point of Section 6 is that the
>> DCPS07 consistency test does NOT give the correct answer when applied
>> to randomly-generated data!
> Maybe there is a more compact way to show this?
>> In order to satisfy the Reviewer's curiosity, I'm perfectly willing to
>> repeat the simulations described in Section 6 with a higher-order AR
>> model. However, I don't like the idea of simulation of synthetic
>> volcanoes, etc. This would be a huge time sink, and would not help to
>> illustrate or clarify the statistical mistakes in DCPS07.
> I wouldn't advise any of that.
> -SS

Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (925) 422-2486
FAX: (925) 422-7675


No comments:

Post a Comment