Gavin Schmidt and Michael Mann
Readers may recall a flurry of excitement in the blogosphere concerning the McShane and Wyner paper in August. Well, the discussions on the McShane and Wyner paper in AOAS have now been put online. There are a stunning 13 different discussion pieces, an editorial and a rebuttal. The invited discussions and rebuttal were basically published ‘as is’, with simple editorial review, rather than proper external peer review. This is a relatively unusual way of doing things in our experience, but it does seem to have been effective at getting rapid responses with a wide variety of perspectives, though without peer review, a large number of unjustified, unsupportable and irrelevant statements have also got through.
A few of these discussions were already online, i.e. from Martin Tingley, Schmidt, Mann and Rutherford (SMR), and one from Smerdon. Others, including contributions from Nychka & Li, Wahl & Ammann, McIntyre & McKitrick, Smith, Berliner and Rougier are newly available on the AOAS site and we have not yet read these as carefully yet.
Inevitably, focus in the discussions is on problems with MW, but it is worth stating upfront here (as is also stated in a number of the papers) that MW made positive contributions to the discussion as well – they introduced a number of new methods (and provided code that allows everyone to try them out), and their use of the Monte Carlo/Markov Chain (MCMC) Bayesian approach to assess uncertainties in the reconstructions is certainly interesting. This does not excuse their rather poor framing of the issues, and the multiple errors they made in describing previous work, but it does make the discussions somewhat more interesting than a simple error correcting exercise might have been. MW are also to be commended on actually following through on publishing a reconstruction and its uncertainties, rather than simply pointing to potential issues and never working through the implications.
The discussions raise some serious general issues with MW’s work – with respect to how they use the data, the methodologies they introduce (specifically the ‘Lasso’ method), the conclusions they draw, whether there are objective methods to decide whether one method of reconstruction is better than another and whether the Bayesian approach outlined in the last part of the paper is really what it is claimed. But there are also a couple of very specific issues to the MW analysis; for instance, the claim that MW used the same data as Mann et al, 2008 (henceforth M08).
On that specific issue, presumably just an oversight, MW apparently used the “Start Year” column in the M08 spreadsheet instead of the “Start Year (for recon)” column. The difference between the two is related to the fact that many tree ring reconstructions only have a small number of trees in their earliest periods and that greatly inflates their uncertainty (and therefore reduces their utility). To reduce the impact of this problem, M08 only used tree ring records when they had at least 8 individual trees, which left 59 series in the 1000 AD frozen network. The fact that there were only 59 series in the AD 1000 network of M08 was stated clearly in the paper, and the criterion regarding the minimal number of trees (8) was described in the Supplementary Information. The difference in results between the correct M08 network and spurious 95 record network MW actually used is unfortunately quite significant. Using the correct data substantially reduces the estimates of peak medieval warmth shown by MW (as well as reducing the apparent spread among the reconstructions). This is even more true when the frequently challenged “Tiljander” series are removed, leaving a network of 55 series. In their rebuttal, MW claim that M08 quality control is simply an ‘ad hoc’ filtering and deny that they made a mistake at all. This is not really credible, and it would have done them much credit to simply accept this criticism.
With just this correction, applying MW’s own procedures yields strong conclusions regarding how anomalous recent warmth is the longer-term context. MW found recent warmth to be unusual in a long-term context: they estimated an 80% likelihood that the decade 1997-2006 was warmer than any other for at least the past 1000 years. Using the more appropriate 55-proxy dataset with the same estimation procedure (which involved retaining K=10 PCs of the proxy data), yields a higher probability of 84% that recent decadal warmth is unprecedented for the past millennium.
However K=10 principal components is almost certainly too large, and the resulting reconstruction likely suffers from statistical over-fitting. Objective selection criteria applied to the M08 AD 1000 proxy network as well as independent “pseudoproxy” analyses (discussed below) favor retaining only K=4 PCs. (Note that MW correctly point out that SMR made an error in calculating this, but correct application of the Wilks (2006) method fortunately does not change the result, 4 PCs should be retained in each case). Nonetheless, this choice yields a very close match with the relevant M08 reconstruction. It also yields considerably higher probabilities up to 99% that recent decadal warmth is unprecedented for at least the past millennium. These posterior probabilities imply substantially higher confidence than the “likely” assessment by M08 and IPCC (2007) (a 67% level of confidence). Indeed, a probability of 99% not only exceeds the IPCC “very likely” threshold (90%), but reaches the “virtually certain” (99%) threshold. In this sense, the MW analysis, using the proper proxy data and proper methodological choices, yields inferences regarding the unusual nature of recent warmth that are even more confident than expressed in past work.
An important real issue is whether proxy data provides more information than naive models (such as the mean of the calibrating data for instance) or outperform random noise of various types. This is something that has been addressed in many previous studies which have come to very different different conclusions than MW, and so the reasons why MW came to their conclusion is worth investigating. Two factors appear to be important – their use of the “Lasso” method exclusively to assess this, and the use of short holdout periods (30 years) for both extrapolated and interpolated validation periods.
So how do you assess how good a method is? This is addressed in almost half of the discussion papers – Tingley in particular gives strong evidence that Lasso is not in fact a very suitable method, and is outperformed by his Composite Regression method in test cases, Kaplan points out that using noise with significant long term trends will also perform well in interpolation. Both Smith and the paper by Craigmile and Rajaratnam also address this point.
In our submission, we tested all of the MW methods in “pseudoproxy” experiments based on long climate simulations (a standard benchmark used by practitioners in the field). Again, Lasso was outperformed by almost every other method, especially the EIV method used in M08, but even in comparison with the other methods MW introduced. The only support for ‘Lasso’ comes from McIntyre and McKitrick who curiously claim that the main criteria in choosing a method should be how long it has been used in other contexts, regardless of how poorly it performs in practice for a specific new application. A very odd criteria indeed, which if followed would lead to the complete cessation of any innovation in statistical approaches.
The MW rebuttal focuses a lot on SMR and we will take the time to look into the specifics more closely, but some of their criticism is simply bogus. They claim our supplemental code was not usable, but in fact we provided a turnkey R script for every single figure in our submission – something not true of their code, so that is a little cheeky of them [as is declaring that one of us to be a mere blogger, rather than a climate scientist ;-) ]. They make a great deal of the fact that we only plotted the ~50 year smoothed data rather than the annual means. But this seems to be more a function of their misconstruing what these reconstructions are for (or are capable of) rather than a real issue. Not least of which, the smoothing allows the curves and methods to be more easily distinguished – it is not a ‘correction’ to plot noisy annual data in order to obscure the differences in results!
Additionally, MW make an egregiously wrong claim about centering in our calculations. All the PC calculations use prcomp(proxy, center=TRUE, scale=TRUE)
to specifically deal with that, while the plots use a constant baseline of 1900-1980 for consistency. They confuse plotting convention with a calculation.
There is a great deal to digest in these discussions, and so we would like to open the discussion here to all of the authors to give their thoughts on how it all stacks up, what can be taken forward, and how such interactions might be better managed in future. For instance, we are somewhat hesitant to support non-peer reviewed contributions (even our own) in the literature, but perhaps others can make a case for it.
In summary, there is much sense in these contributions, and Berliner’s last paragraph sums this up nicely:
The problem of anthropogenic climate change cannot be settled by a purely statistical argument. We can have no controlled experiment with a series of exchangeable Earths randomly assigned to various forcing levels to enable traditional statistical studies of causation. (The use of large-scale climate system models can be viewed as a surrogate, though we need to better assess this.) Rather, the issue involves the combination of statistical analyses and, rather than versus, climate science.
Hear, hear.
PS: The full code, data, and additional supplements from SM&R are available here.
David B. Benson says
(something called for by )
??
Otherwise quite clear.
[Response: oops. Fixed-thanks! -mike]
Pinko Punko says
I do appreciate the back and forth, but when MW gets the initial paper and the rejoinder, the appearances of the last word favors them. I’m glad you have some thoughts on it, and I hope this will be further addressed here and in the lit proper.
apeescape says
On code that didn’t work, isn’t MW talking about the RegEM code in MATLAB? Out of curiosity, I was able to run the MATLAB code in Octave, but I had to fiddle around with the folders to make it right. It looks like the R code only takes in output from the EM stuff.
btw I liked the symphony metaphor from Tingley :)
steven mosher says
Gavin,
can I suggest a forum where you limit the commenters to the authors who submitted and one designated “second” for each team. It’s worth a shot, maybe some fruitful dialog would ensue. There are plenty of other blogs for the peanut gallery to engage each other.
Say hi, when you hit SF for AGU.
Andy says
Well, the last statement really does cover it. I first heard this phrase from Hank Shugart (who was a global change scientist long before any such thing really existed): ‘The problem with studying the Earth is that n=1 and df=0.’ So stats won’t help us solve this problem of 20th century warmth in the context of the last few centuries. But fortunately we have 100+ years of radiative physics to help point the way…
Lazarus says
Interesting post. I can’t claim to be a whiz at statistics but I remember telling some skeptics on another forum, Accuweather / climate change I believe, that the major point and problem with this paper were that the results still showed a ‘hockey stick’ indicating current warming was pretty anomalous and that the authors were not climatologists, nor did they seem to consult any to discuss why certain methods were used over the ones they decided to use. That criticism seems to be borne out here.
CM says
I stumbled on this sentence. It seemed to say MW’s use of data gave a lower estimate of peak medieval warmth. Anyway, it’s clear from the context (and your paper) that it’s the other way around.
[Response: Thanks–rephrased this for clarity. – mike]
Ibrahim says
I know of papers with peer review, that have a large number of unjustified, unsupportable and irrelevant statements.
[Response: My experience is that they go down tremendously as a function of the quality of the reviews. – gavin]
Ray Ladbury says
Nice. A really good glimpse of how science actually gets done. Of course it well be ignored by the denialati.
Hugh Laue says
“The problem of anthropogenic climate change cannot be settled by a purely statistical argument. … Rather, the issue involves the combination of statistical analyses and, rather than versus, climate science.”
I’m happy to see this point made (again), one obvious to scientists who use statistics as a tool to discover cause and effect relationships and build their theories (models) of bio-physical reality. That is why the denialist “argument” that climate change is “natural” is anti-science, since it is never supported by a credible theory (or any predictive theory, for that matter) that explains how natural causes lead to the observed evidence.
Nick Barnes says
I welcome the fact that all the contributions include all their data and code, and the clear call, in the accompanying editorial, for this to be a requirement.
MartinM says
The long version of M&W’s rejoinder doesn’t appear to be up on the AOAS site yet, but McShane has posted a copy here.
Alexandre #10 says
Hugh Laue #10
Oh, denialists do come up with predictive theories now and then. They just don’t match the reality they tried to predict. Maybe that’s why the avoid doing it very often…
Neven says
Criteria -> criterium
[Response: I bet you’re a bike racer…jim]
Christopher Hogan says
You see this type of work by economists all the time. I’ve come to call it the one-size-fits-all approach.
Why did they chose that particular method? In the prior version of the paper, the motivation was along the lines of “there’s no best way, so we’re justified in using the lasso”.
In my experience, their choice of method was dictated by one key constraint: They has to pick a method that didn’t require any actual detailed understanding of the subject matter. Hence the lasso. No need to know the relationships among the data series, perform any preliminary data reduction, selected or reject items, and so on.
Just toss them all in and let the machine sort it out for you.
And those one-size-fits-all approaches typically yields statistically inefficient estimators. It’s no different from leaving sets of nearly multicolinear regressors in a regression. You’ll get an answer, probably even a mostly reasonable set of predicted values, but you won’t have minimized the confidence intervals around your predicted values.
Normally, that’s not a huge deal. In normal statistical analysis (even in the social sciences), you’re trying to reject a null hypothesis. As long as your estimator is unbiased, if you’re dumb enough to inflate your confidence intervals, that’s your tough luck. So there’s a nice alignment of incentives — sloppiness costs you.
Here, this is normal science stood on its head. They succeed by failing to reject the null hypothesis (that current temperatures are no different from the historical reconstruction.) The large size of the confidence interval is the point of the research. The incentives here are perverse — the sloppier the analysis, the better.
Upshot: In this topsy-turvy analysis, where they are seeking to fail to reject the null, they want the least efficient estimator they can get away with. That gives them the highest chance of failing to reject.
So here, if they used anything other than the most efficient estimator available, that’s just bad science. Basically, if-they-screw-it-up, they get the result they are after.
So, IMHO, what they proved not that Mann et. al are wrong, but that a statistically inefficient estimator … is inefficient.
Bob (Sphaerica) says
A brief grammar note: “as is declaring that one of us to be a mere blogger” should either lose the “that” or else change “to be” to “is.”
I’d also (in a way) strongly second Steve Mosher’s preference that we respect Gavin and Michael’s plea “to open the discussion here to all of the authors to give their thoughts…” Only the authors should post comments (at least initially, and for some time after that), and the peanut gallery should watch.
I’m not suggesting at all that the moderators should enforce this. It should be a self-imposed ban. Listen and learn for a while (although I think brief, meaningful inquiries should be entertained, as long as they don’t involve long winded dissertations or combative replies by the person posing the question). Keep it short and polite, and only to address one’s confusion or misunderstanding.
CTG says
At least criterium is single, which makes it less wrong than criteria.
Now, can we all just agree on criterion?
Scientific American says
Thanks once again to RealClimate for digesting a long and complicated set of papers and comments into something that is clear and comprehensible to an interested and scientifically literate layman.
I remember the issue of deciding how many principle components to use in my thesis work involving automated land use classification based on remote sensing, and it is clear that using too many PCs will begin to NOT explain the variability, while using the RIGHT number gets us the best answer. So our understanding of different climate factors gets better with time, now all we need to do is ACT on our collective knowledge. Why do we act as if our descendants do not matter?
J Bowers says
DC has an interesting comment:
http://deepclimate.org/2010/12/10/open-thread-7/#comment-6909
David Beach says
Re comment 13: The singular of criteria is criterion. Sorry, I am a classics educated physicist, which makes me a nit-picking nuisance – but I am a very good editor! I see criteria used instead of criterion so often that I had to say something….
Steven Sullivan says
I’m curious to know which flaws of MW2010, if any , were identified by *multiple* discussion pieces in the AOAS set? I would assume these to be the most egregious.
Neven says
Ah yes, indeed, criterion. In Dutch we say ‘criterium’. I should’ve been more specific:
“that the main criteria in choosing a method should be how long it has been used”
and
“A very odd criteria”
Carmen S says
“Second, we take the data as given and do not account for uncertainties, errors, and biases in selection, processing, in-filling, and smoothing of the data as well as the possibility that the data has been ”snooped” (subconsciously or otherwise) based on key features of the first and last block.”
[Response: Except that they didn’t. They used there own set of data, and are complaining about ‘ad hockery’ when it was pointed out. – gavin]
Bill says
Re #17 :
“So our understanding of different climate factors gets better with time, now all we need to do is ACT on our collective knowledge. Why do we act as if our descendants do not matter?”
Why ask a question like this and then not allow any discussion of the answers? Like many people, we would like to hear the expert views on what should be done. If its not allowed on here, then why allow the original post.
Troy Ca says
As you suggest, a main point of contention between MW and SMR seems to be the use of 95 vs. 59 series. MW claim the M08 choice of series was ad-hoc, whereas you mention here that:
“To reduce the impact of this problem, M08 only used tree ring records when they had at least 8 individual trees, which left 59 series in the 1000 AD frozen network.”
Perhaps elaborating on why the specific number 8 was chosen for the minimum number of trees can put this issue to rest?
[Response: Mike might be better placed to answer – but the main reason is that reconstructions with small numbers of trees have much greater variance. However, in this case it’s irrelevant why that was done because if someone is purporting to analyse the same data as a previous paper, they should do so – or provide some justification why not. Neither option was followed by M&W, and all we did was point it out. – gavin]
Nicolas Nierenberg says
One thing I dislike in these discussions is the constant assumption that the other person is just wrong on a specific. This has been happening on both sides in this discussion and it makes it painful to follow. It would be much better if you guys were emailing each other when you thought you noticed something and resolving the issue rather than a game of gotcha. If in the end you couldn’t resolve it that way then you could report that explaining each view. The prime example at the moment is centering. They thought you were dead wrong, so they should have emailed and asked. You now think they are dead wrong, so you should have emailed and asked.
[Response: I agree it’s stupid. When I found an error in their code, I emailed McShane directly and they fixed it without a fuss. The first i heard of them having an issue with a technical part of our code is reading the rejoinder two days ago. They are correct in that i implemented the fit to the log eigenvalue spectrum in fig S4 incorrectly, but fortunately it makes no difference (as stated above) – I have no idea why they didn’t let me know when they found it. As for the centering issue, there is no issue. All PCs are calculated with “center=true” – and again, if they thought there was something silly that would get in the way of scientific issue (which i still presume they are interested in), i have no idea why they didn’t email me. McShane has emailed me previously (for explanations on the area weighting we used, and the locations and use of the gcm data), so it is not as if he’s shy. Right now i’m not anywhere i can look into these issues in detail – or decide what to do about it, but to reiterate, i’d much rather be talking about something serious than dealing with supposed gotcha’s. – gavin]
AMac says
In their “A Comment on ‘A statistical analysis…'” (PDF), Schmidt, Mann, and Rutherford mention the Lake Korttajarvi varved sediment (Tiljander) data series twice.
Line 38ff:
Lines 159/160:
There are not four Tiljander data series — only three. The primary series recorded by Tiljander et al. were X-Ray Density, varve Thickness, and Lightsum. Lightsum is the portion of varve thickness contributed by mineral matter. (Varve Thickness and Lightsum can each be measured in millimeters; diagrammed here.)
Darksum is taken to be the portion of varve thickness contributed by organic matter. It was calculated as:
Darksum (mm) = Thickness (mm) – Lightsum (mm)
There are only two degrees of freedom among Thickness, Lightsum, and Darksum.
The authors of Tiljander et al. (2003) suggested that the pre-1720 portions of XRD, Lightsum, and Darksum contain climate-related signals. They made no such claim for Thickness.
Septic Matthew says
Why the concern with peer review? All of your peers are now reviewing all of the articles, and they are reading this thread and the threads at Climate Audit, Watts Up With That, and others. These might be the most thoroughly peer-reviewed paper and commentary in your field. They just are not the peers that you would have preferred. Not only that, almost all the peers will have access to almost all of the data and code used in preparation of the paper and commentaries.
Personally, I am awaiting the print version (I am a member of the IMS which publishes AOAS, and I pay for it as it has become my favorite periodical), with which I shall spend much time before downloading and running the available code on the available data.
[Response: Because it will be extremely difficult for future readers to see where the discussion leads to – fatuous statements that are now printed will be quoted for a long time, while their rebuttals (on blogs, or future papers etc) won’t be. It is far more efficient to have less error in the first place. – gavin]
Anne van der Bom says
Bill,
14 December 2010 at 6:0 PM
Like many people, we would like to hear the expert views on what should be done. If its not allowed on here, then why allow the original post.
The most common source of confusion. It drives these false skeptics that are trying to sell us the idea that we have no problem because they don’t like the solution. Problem and solution are not the same and they are different debates.
This post is, like the majority of posts on RealClimate, not about ‘views what should be done’, but analysis of how the planetary climate system works and what consequences we can expect from our collective actions.
With regard to solutions, you’ll not easily see RealClimate going further than a simple: “reduce CO2 emissions, and other warming agents too. ASAP.” How that should be done is of course the issue that merits debate. But it is a different debate about political, societal, economical and technological changes. This is not the venue for such a discussion.
Scientific American merely expressed the common frustration that no real action is being taken, although the analysis has been showing beyond a reasonable doubt, for at least decades already, that there is a problem.
Dr. Shooshmon, phd. says
“MW found recent warmth to be unusual in a long-term context: they estimated an 80% likelihood that the decade 1997-2006 was warmer than any other for at least the past 1000 years.”
Oh yes, they are to be commended for anything that refutes the Medieval Warm Period. A simple question for Gavin and Michael Fann, why is the last 1000 years important, given that it is so insignificant in the earth’s lifetime. I mean, there’s a 100% chance that the temperature was much higher in early periods of earth’s history. There is a 100% chance that co2 was much much higher, especially during the time of the dinosaurs.
Basically, you scientists are telling us that something that has happened before cannot be allowed to happen again because it will somehow be worse. Despite the fact that we have had co2 concentrations much higher than the 780ppm doubling that you all fear. So given that the earth has sustained higher co2 levels, higher temperature, why is 780ppm now too high? Did I miss something? Did history begin when I was born?
[Response: The short answer is that there is now an advanced civilization in which many millions of people are dependent on an agricultural system that was designed and implemented to work within a relatively stable climate regime, especially that of the last couple hundred years. There are now many, and complex, dependencies on a stable climate. Also, the doubling typically referred to is 560 ppm, not 780.–Jim]
Christopher Hogan says
I thought that the editorial accompanying the paper and responses was quite revealing. I read it as being, basically, an apology for having accepted the paper.
First, the editor carefully describes the review process, in great detail. Editors, assistant editors, reviewers, incoming new editor, rounds of review, ending with this:
“Acceptance of a paper reflects our opinion that the work represents a meaningful contribution to applied statistics, broadly construed, and that the authors have made a good faith effort to respond to the concerns of the reviewers.”
That is, he takes great pains to show that they did all normal due diligence, and his only guarantee is that the authors met those standard of due diligence.
A few other things stand out.
First, he says it’s so obvious that C02 warms the earth that it’s pointless to test an hypothesis of no warming. (Which is, I think, the hypothesis this paper just tested, isn’t it?)
“I particularly object to the testing of sharp null hypotheses when there is no plausible basis for believing the null is true. An example of an implausible sharp null hypothesis would be that a large increase in the concentration of CO2 in the atmosphere has exactly zero effect on the global mean temperature.”
Second, there’s the clear statement that getting the data right is the most important thing. Is this in response to the commenter’s pointing out that MW got the data wrong?
“One claim I frequently make is that, in terms of what is
most important about using statistics to answer scientific questions,
data are more important than models and models are more important
than specific modes of inference.
Third, he clearly states that the right way to do this is with teams that include climatologists. But that’s saying that MW is exactly the wrong model for how to make real progress in this area:
“Greater cooperation between the climatological and statistical communities would benefit both disciplines and be invaluable in the broader
public discussion of climate change. There have been great strides
made in this regard in recent years, which is reflected in the diversity of affiliations of the discussants and the extent to which they
demonstrate their understanding of both statistics and climatology.”
Finally, it ends with a strong, unambiguous policy recommendation unsupported by the analysis, in what I’m pretty sure is not actually a policy journal:
“Thus, while research on climate change should continue, now is the time for individuals and governments to act to limit the consequences of greenhouse gas emissions on the Earth’s climate over the next century and well beyond.”
My paraphrase:
“We did our normal due diligence. In hindsight, statisticians shouldn’t undertake this alone. For example, how could we know they’d screwed up the data, until after we’d accepted the paper and gotten the commentaries from climatologists. And in any event, it’s so obvious that C02 warms the earth that it isn’t worth testing the “sharp null” that it doesn’t. Ignore the thrust of what we just published here (that we can’t say that current temperatures are anomalous), and start restricting GHG emissions now.”
is(de) says
@25:
a very basic concept in dendrochronology is replication of proxy series – the usual conceptual visualisation for this is the “linear aggregate model” of Ed Cook (Cook, E., Kairiūkštis, L. 1990, Methods of dendrochronology: applications in the environmental sciences, pp. 98). It basically says that ringwidth is a function of:
age trend + climate signal + endogenous (local) disturbances + exogenous (standwide) disturbances + unexplained variation
Basically, this concept also underlies all other dendro proxies, be it maximum latewood density or stable isotope ratios, even though there are variations, e. g. age trend in density data is treated differently from ring width data). Of those factors, the climate signal (and possibly exogenous disturbances) comprise the common signal of all trees in one site, while endogenous disturbances and unexplained variations are supposed to occur randomly. Averaging as many trees as possible will strengthen the common (climate) signal and reduce the noise, therefore improve the reconstruction quality (without some weighted mean statistics, for two trees, each will contribute 1/2 of the variance of a proxy time series – for 8 trees only 1/8 and so on). Working “hands on” with dendro data, this means that a chronology comprising 8 or 10 trees in one time span will not change significantly, even if you add some other trees. The other way is more problematic – having only 1 or 2 trees in a chronology means that dating is not completely certain (missing/false rings) and that there is no guarantee that the growth depression you see is not caused by any disturbancy, instead of temperature. This becomes especially important once you get to the juvenile growth phase of those trees, keep in mind that what was a 1200-year old mighty tree at sampling in 1980 or whenever, was much thinner a 1000 years earlier, and therefore much liklier to be influenced by competition, insect outbreaks, avalanches, rock movements, fire or other disturbances during its youth.
[Response: Thanks for the nice explanation. The biggest issue with small vs large trees is typically the influence of tree size on ring characteristics, but the things you mention can play a role too.–Jim]
Nicolas Nierenberg says
Gavin, Thank you for your response. You say that there is no issue on the centering. I’ll assume you are correct. Did you email McShane to confirm your belief? If you did then it would make it much easier on the reader to include the result of the exchange and I could know that the issue had been settled. As it stands it is just an argument.
[Response: No it’s not. The code is online and anyone can check it for themselves. As I stated, right now I’m not anywhere where I can do any work on this or engage in technical back and forths, but when I am, i certainly will be engaging on this further. – gavin]
On a higher level topic. I understand that you don’t agree with the interpretation that these statisticians have on the results and uncertainties in various reconstructions. How about enlisting some other prominent statisticians to do a joint paper on the issues? Cross field collaboration is a very big thing these days and clearly there are issues here that are deeply statistical in nature rather than having to do with expertise in climatology.
[Response: There are already lots of statisticians working on this – Rougier, Nychka, Tingley for instance all submitted commentary on M&W, who are by no means the voice of ‘statisticians’. Their paper was just not done very well, and our addition of the pseudo-proxy analysis from long GCM runs showed that clearly. I’m confident that other people will take this forward, and if I can be useful to that I will be, but that remains to be seen. – gavin]
dhogaza says
Dr Shooshman:
Yeah, you missed the point that some of us don’t want our species to go the way of the dinosaurs for as long as we can put it off.
Maya from the peanut gallery says
To Dr. Shooshmon @30 – the last 1,000 years isn’t significant in the lifetime of the planet, but it’s extremely significant in the lifetime of civilization.
As I understand it, not just the change in CO2 and temperature is a problem, but the rate of change. It’s astronomical, in geological time.
CM says
#30 said: “the earth has sustained higher co2 levels…Did history begin when I was born?”
Dude, I didn’t see you around 15 million years ago.
Steve Metzler says
30 Dr. Shooshmon, phd. says:
This is about the third time just this week that I’ve seen this ‘CO2 doubling = 780ppm’ meme appearing in an AGW comments thread. As Gavin already pointed out, the (first) doubling that climatologists are concerned about is the one from the *pre-industrial* concentration of 280ppm to 560ppm. NOT from the 390ppm we have today. So of course, we will reach 560ppm, with all the resultant consequences thereof, *much quicker* than 780ppm.
What happens is these memes spring up on contrarian sites and get repeated without question, and then they are impossible to kill off without a lengthy explanation. *Every* place they crop up. Whack-A-Mole indeed.
SecularAnimist says
Berliner’s last paragraph: “The problem of anthropogenic climate change cannot be settled by a purely statistical argument …”
I’m sorry, but what exactly is the “problem” that needs to be “settled”?
[Response: The epistemological problem of how much statistics alone, when not tightly integrated with physical knowledge, can tell you. Sort of an age-old question.–Jim]
We know what the problem is and we know what needs to be done to solve it. We have known both of those things for years.
And we know that we are still not doing what needs to be done to solve it. Not even close.
gavin says
Since this is likely to come up anyway, there is another conspiracy-laden post at CA with regards to the peer review of a new publication by McKitrick and Nierenberg. This had it’s genesis in a ‘comment’ on my 2009 paper on spurious correlations in work by McKitrick and Michaels and separately, de Laat and Maurelis. Instead of submitting a comment on my paper, M&N submitted a ‘new’ paper that was in effect simply a comment and specifically asked that I not be a reviewer. I was therefore not chosen as a reviewer (and I have no idea who the reviewers were). Nonetheless, since the submission was so highly related to my paper, and used some of the data I had uploaded as part of my earlier paper, the editor of IJOC asked me to prepare a counter-point to their submission. I did so, and in so doing pointed out a number of problems in the M&N paper (comparing the ensemble mean of the GCM simulations with a single realisation from the real world, and ignoring the fact that the single GCM realisations showed very similar levels of ‘contamination’, misunderstandings of the relationships between model versions, continued use of a flawed experimental design etc.). I had no further connection to the review process and at no time did I communicate directly to the reviewers.
The counter-point I submitted was fair and to the point (though critical), and in no way constituted any kind of improper influence. Editors make decisions about who reviews what paper – not authors, and they make the decisions about what gets accepted or not, not reviewers. Authors who seek to escape knowledgeable scrutiny of their work often come up with lists of people who they claim are unable to give a fair review, and editors need to use their discretion in assessing whether this is a genuine issue, or simply an attempted end run around the review process.
I have not yet seen the ‘new’ M&N paper, but it is very likely to be more of same attempts to rescue a flawed analysis. It should be noted that the main objection to my 2009 paper was that I didn’t show that the residuals from McKitrick’s regression contained auto-correlation. This could have been usefully added (and can be seen here), and in any case was admitted by McKitrick in yet another flawed paper on the topic earlier this year. The overwhelming reason why McKitrick is wrong though is because he is using an incorrect null hypothesis to judge the significance of his results. A much more relevant null is whether the real data exhibit patterns to economic activity that do not occur in single realisations of climate models, and this is something he has singularly failed to do in any of these iterations.
Thomas Lee Elifritz says
The short answer is that there is now an advanced civilization in which many millions of people are dependent on an agricultural system that was designed and implemented to work within a relatively stable climate regime, especially that of the last couple hundred years.
You mean’t ‘billions‘, right?
[Response: The larger point is that many of us could stand to have a better awareness of how much of societal/global security is dependent on a climate stability that we mostly, like so many things, take for granted.–Jim]
HAS says
Gavin @ 8.15PM 15 Dec
It would help clear the air if you could share the counter-point you submitted. Is that possible?
[Response: I’ll think about it. – gavin]
James Killen says
#40
Not really, it’s “many millions,” but only “a handful of billions” (unless you are using UK billions in which case you have only “a tiny fraction of a billion”).
Whatever, the point is that, to the best of our current knowledge, many millions of people will die and otherwise suffer as a result of our not taking the actions we should have back in 1990.
Gilles says
“The only support for ‘Lasso’ comes from McIntyre and McKitrick who curiously claim that the main criteria in choosing a method should be how long it has been used in other contexts, regardless of how poorly it performs in practice for a specific new application. A very odd criteria indeed, which if followed would lead to the complete cessation of any innovation in statistical approaches.”
Gavin, Mike, this assertion sounds reasonable, but does it imply that the reliability of your results depends entirely on the novelty of the method you introduced? this may be a questionable point for its solidity.
[Response: No, as any elementary class in logic would have taught you. – gavin]
steven mosher says
The overwhelming reason why McKitrick is wrong though is because he is using an incorrect null hypothesis to judge the significance of his results. A much more relevant null is whether the real data exhibit patterns to economic activity that do not occur in single realisations of climate models, and this is something he has singularly failed to do in any of these iterations.
#####
That’s a good point Gavin. With his code in hand could you do that?
[Response: Yes, and I have for each iteration I have looked at. And McKitrick’s conclusions fail to hold up every time. There is a limit to how many times I’m going to do it again. Anyone else could do so themselves using the archived data for Schmidt (2009) or data from IPCC AR4. Note that this requires looking at individual runs, not ensemble means. – gavin]
ICE says
@gavin #39
I particularly liked McKitrick’s complain about not being “given a chance to reply” to the “inane” reviews, and about the editor “refusing” to “reconsider their paper” after it was rejected…
Stephen says
Gavin,
are you saying that your approach would be to mine through individual runs of individual models to find something that matches? That sounds odd. Are you not bound to find examples of matching patterns in very noisy data if you look hard enough? Sorry if I’m misunderstanding.
[Response: Huh? Where did I say that? No, I am saying that the patterns of temperature trends are spatially complex due to a whole host of reasons – internal variability of the climate system, regionally specific forcings, local and micro-climate issues etc. Deciding that a spatial pattern that is correlated to a ‘socio-economic’ variable is causative, requires an understanding of what the distribution of that pattern is under a null hypothesis of no ‘contamination’. GCMs can produce such a distribution (albeit imperfectly), and so should be used for the null. In Schmidt (2009), I used the 5 runs I had easily available, to demonstrate that the significance test that McKitrick had used vastly overstated the importance of his correlations. I speculated that this was due to him not appreciating the spatial auto-correlation structure of the variables and over-estimating the degrees of freedom. This was true (as he now has admitted in McKitrick (2010) and the new paper). He claims that this can be corrected for, but he still isn’t using the proper null – in M&N they show the results from the ensemble means (of the GISS model and the full AR4 model set), but seem to be completely ignorant of the fact that ensemble mean results remove the spatial variations associated with internal variability which should be the exact thing you would use! Now, I haven’t done a full analysis of the AR4 individual runs to do this properly, and I am not motivated to do so, but if McKitrick was serious, he would have done it already, and of course he still could. – gavin]
Philip Machanick says
Gavin, at least they acknowledge you are a “popular” blogger :)
In their rejoinder MW claim they didn’t agree with reducing the data set to 59 as follows: “the application of ad hoc methods to screen and exclude data increases model uncertainty in ways that are unmeasurable and uncorrectable.” I would have thought that excluding data that fail to meet quality criteria is reasonable. The method for excluding data appears to me quite clearly described in the supplement to Mann et al. though the reason for requiring specific features could have been explained (e.g. exactly why “there must be at least eight samples during the screened period 1800–1960 and for every year used.”)
Yvan Dutil says
There has been some reports that the area weighting might not have been done corrected. This would have overweighted the arctic region and. in consequence, the slope of the baseline. Could anyone you has actually read the code can comment on it?
[Response: No idea what you are referring to. Can you be more precise? – gavin]
J Bowers says
Re. Gavin 39: Thanks for that response. Useful already.
Dan H. says
Regarding the relatively stable climate regime of the last couple hundred years, has anyone asked whether that was the anomaly. IS it not possible that the recent agricultural boom was the result of a preferencial climate. History tells us of mass chaos caused by a massive downfall in agricultural yields occurring numerous time. Granted, technology has contributed, but how many millions of deaths have been averted because of the abundance of food?
I know some will point to various droughts and floods ruined crops, which happen frequently. However, recent crop failures pale in comparioson to those that have occurred throughtout history. It may just be possible that whatever direction we move from here; hotter, colder, wetter, or dryer, agriculture will suffer.