cc: "Thorne, Peter" <peter.thorneatXYZxyzoffice.gov.uk>, Leopold Haimberger <leopold.haimbergeratXYZxyzvie.ac.at>, Karl Taylor <taylor13atXYZxyzl.gov>, Tom Wigley <wigleyatXYZxyz.ucar.edu>, John Lanzante <John.LanzanteatXYZxyza.gov>, "'Susan Solomon'" <ssolomonatXYZxyznoaa.gov>, Melissa Free <Melissa.FreeatXYZxyza.gov>, peter gleckler <gleckler1atXYZxyzl.gov>, "'Philip D. Jones'" <p.jonesatXYZxyz.ac.uk>, Thomas R Karl <Thomas.R.KarlatXYZxyza.gov>, Steve Klein <klein21atXYZxyzl.llnl.gov>, carl mears <mearsatXYZxyzss.com>, Doug Nychka <nychkaatXYZxyzr.edu>, Gavin Schmidt <gschmidtatXYZxyzs.nasa.gov>, Frank Wentz <frank.wentzatXYZxyzss.com>, ssolomonatXYZxyzi.com

date: Fri, 25 Apr 2008 12:55:28 -0700

from: Ben Santer <santer1atXYZxyzl.gov>

subject: Re: [Fwd: JOC-08-0098 - International Journal of Climatology]

to: Steve Sherwood <steven.sherwoodatXYZxyze.edu>

<x-flowed>

Dear Steve,

Thanks very much for these comments. They will be very helpful in

responding to Reviewer #1.

Best regards,

Ben

Steve Sherwood wrote:

> Ben,

>

> It sounds like the reviewer was fair. If (s)he misunderstood or didn't

> catch things, the length of the manuscript may have been a factor, and I

> am definitely sympathetic to that particular complaint.

>>

>> CONCERN #1: Assumption of an AR-1 model for regression residuals.

> I also am no great fan of AR1 models parameterized by the lag-1

> variance, because if the time step is too short they can go greatly

> astray at longer lags where it matters. But if you choose the

> persistence parameter to give a good fit to the entire autocorrelation

> function--i.e. make sure it decays to 1/e at about the right lag--it

> should work fine. I suggest trying this to see whether it changes

> anything much, and if not, leaving it at that. I think that for simply

> generating confidence intervals on a scalar measure there is no reason

> to go to higher-order AR processes, as a matter of principle.

>

>> CONCERN #2: No "attempt to combine data across model runs."

> The only point of doing this would seem to be to test whether there are

> any individual models that can be falsified by the data. It is a

> judgment call whether to go down this road--my judgment would be, no,

> that is a subject for a model evaluation/intercomparison paper. The

> question at issue here is whether GCMs or the CMIP3 forcings share some

> common flaw; the implication of the Douglass et al paper is that they

> do, and that future climate may therefore venture outside the range

> simulated by GCMs. The appropriate null hypothesis is that the observed

> data record could with nonnegligible probability have been produced by a

> climate model---not that it could be reproduced by every climate model.

>

>>

>> The Reviewer seems to be arguing that the main advantage of his

>> approach #2 (use of ensemble-mean model trends in significance

>> testing) relative to our paired trends test (his approach #1) is that

>> non-independence of tests is less of an issue with approach #2. I'm

>> not sure whether I agree. Are results from tests involving GFDL CM2.0

>> and GFDL CM2.0 temperature data truly "independent" given that both

>> models were forced with the same historical changes in anthropogenic

>> and natural external forcings? The same concerns apply to the high-

>> and low-resolution versions of the MIROC model, the GISS models, etc.

> (S)he seems to have been referring to the fact that all models are

> tested with the same data. I also fail to see how any change in

> approach would affect this issue.

>>

>> I am puzzled by some of the comments the Reviewer has made at the top

>> of page 3 of his review. I guess the Reviewer is making these comments

>> in the context of the pair-wise tests described on page 2. Crucially,

>> the comment that we should use "...the standard error if testing the

>> average model trend" (and by "standard error" he means DCPS07's

>> sigma{SE}) IS INCONSISTENT with the Reviewer's approach #3, which

>> involves use of the inter-model standard deviation in testing the

>> average model trend.

> I also am puzzled. The standard error is appropriate if you have a

> large ensemble of observed time series, but not if you have only one.

> Computing the standard error of the model mean is useless when you have

> no good estimate of the mean of the real world to compare it to. The

> essential mistake of DCPS was to assume that the single real-world time

> series was a perfect estimator of the mean.

>>

>> And I disagree with the Reviewer's comments regarding the superfluous

>> nature of Section 6. The Reviewer states that, "when simulating from a

>> know (statistical) model... the test statistics should by definition

>> give the correct answer. The whole point of Section 6 is that the

>> DCPS07 consistency test does NOT give the correct answer when applied

>> to randomly-generated data!

> Maybe there is a more compact way to show this?

>> In order to satisfy the Reviewer's curiosity, I'm perfectly willing to

>> repeat the simulations described in Section 6 with a higher-order AR

>> model. However, I don't like the idea of simulation of synthetic

>> volcanoes, etc. This would be a huge time sink, and would not help to

>> illustrate or clarify the statistical mistakes in DCPS07.

> I wouldn't advise any of that.

>

> -SS

>

--

----------------------------------------------------------------------------

Benjamin D. Santer

Program for Climate Model Diagnosis and Intercomparison

Lawrence Livermore National Laboratory

P.O. Box 808, Mail Stop L-103

Livermore, CA 94550, U.S.A.

Tel: (925) 422-2486

FAX: (925) 422-7675

email: santer1atXYZxyzl.gov

----------------------------------------------------------------------------

</x-flowed>

## No comments:

## Post a Comment