In a recent paper, McKitrick and Michaels (2004, or “MM04”) argue that non-climatic factors such as economic activity may contaminate climate station data, and thus, may render invalid any estimates of surface temperature trends derived from these data. They propose that surface temperature trends may be linked to various local economic factors, such as national coal consumption, income per capita, GPD growth rate, literacy rates, and whether or not temperature stations were located within the former Soviet Union. If their conclusions were correct, this would hold implications for the reliability of the modern surface temperature record, an important piece of evidence indicating 20th century surface warming. However, numerous flaws with their analysis, some of them absolutely fundamental, render their conclusions invalid.
First of all, there are a number of issues that they did not address that logically must must be addressed for their conclusions to be tenable. MM04 failed to acknowledge other independent data supporting the instrumental thermometer-based land surface temperature observations, such as satellite-derived temperature trend estimates over land areas in the Northern Hemisphere (Intergovernmental Intergovernmental Panel on Climate Change, Third Assessment Report, Chapter 2, Box 2.1, p. 106) that cannot conceivably be subject to the non-climatic sources of bias considered by them. Furthermore, they fail to reconcile their hypothesis with the established large-scale warming evident from global sea surface temperature data that, again, cannot be influenced by the local, non-climatic factors they argue contaminate evidence for surface warming. By focusing on thermometer-based land observations only, and ignoring other evidence conflicting with their hypothesis, MM04 failed to address basic flaws in their arguments.
Perhaps even more troubling, it has been noted elsewhere that MM04 confused “degrees” and “radians” in their calculations of areal weighting factors, rendering all of their calculations incorrect, and their conclusions presumably entirely invalid.
The focus of this piece, however, is on yet another fundamental problem with their analysis as identified by Benestad (2004). Benestad (2004) repeated their analysis using a different statistical model (linear and generalised multiple regression model) and the same data set. Benestad (2004) first reproduced the basic results of MM04 (i.e., established similar coefficients for the various factors used by MM04) using the full data set. This established an appropriate baseline for further tests of the robustness of their statistical model. As described below, their statistical model failed these tests, dramatically.
For one thing, the statistical significance they cited for their results was vastly overstated. One of the most basic assumptions in statistical modeling is that the data used as predictors in the model are Independent and Identically Distributed (‘IID’). It is well-known, however, that temperatures from neighboring stations are not independent. Due to the large-scale structure of surface temperature variations, nearby measurements partly describe the same phenomenon. Any statistical analysis using such temperature data must account for the fact that the actual degrees of freedom in the data is far lower than the nominal number of stations (see e.g. Wilks, 1995). McKitrick and Michaels, however, failed to account for this issue in estimating the statistical significance of their results. Had they accounted for this “spatial correlation”, as Benestad (2004) points out, they would have found their results to be statistically insignificant.
Benestad (2004) then tested the skill of the model through a ‘validation’ experiment. Such an experiment seeks to construct a statistical model using part of the dataset, and then independently test the model’s validity by seeing how well it predicts the rest of the data that weren’t used. Benestad (2004) thus divided the data into two independent batches. Temperature station data between 75.5S and 35.2N were used to calibrate the statistical model, while the remaining data (stations north of 35.2N representing less representing something under 25% of earth’s surface) were used for validation of the model. It is clear that the model was not able to reproduce the trends in the independent data (see Figure 1). The conclusion of McKitrick and Michaels that surface temperature measurements are significantly influenced by the non-climatic factors used in their statistical model, hence appears to be false.
In their reply to Benestad(2004), McKitrick and Michaels (2004b, or “MM04b”) argue that such validation experiments (i.e, splitting up the data to test the validity of statistical modelling) is not common in the refereed climatological literature. That argument is puzzling indeed, as such tests are standard in statistical modeling exercises, and have been used and documented in many peer-reviewed articles in the meteorological and climatalogical literature (see this list of publications by just one researcher alone or even the introductory textbook by Wilks, 1995).
MM04b also complain that in Benestad (2004), the statistical model was calibrated with the ‘worst’ data (and that ‘better’, data covering less than 25% of earth’s surface, should have been used instead). This too is puzzling, since any hypothesised deterioration of data quality should in principle, as we understand the very premise of their hypothesis, be taken into account in the statistical model through the use of factors such as literacy or GDP.
In their reply to Benestad(2004), McKitrick and Michaels (2004b) claim that I do not dispute their approach (i.e., multivariate regression using economic variables as potential predictors of surface temperature). That claim is both peculiar, and misses the point. A method is only valid when applied correctly. As described, above, MM04 failed egregiously in this regard. The purpose of my paper was simply to demonstrate that, whether or not one accepts the merits of their approach, a correct, and more careful, repetition of their analysis alone is sufficient to falsify their results and their conclusions.
The conclusions of McKitrick and Michaels (2004) thus clearly do not stand up to independent scrutiny. This alone does not mean that their analysis was not a potentially useful contribution to the field. A critical analysis of past work by other researchers can provide independent quality control on scientific undertakings, with the caveat that the analysis is performed properly. Unfortunately, in the case of the McKitrick and Michaels (2004) analysis, this does not appear to have been the case.
FIGURE 1. Results of regression analyses with different models using half the data for calibration and half for prediction. The blue dots represent the calibration interval and if this were a valid model, the red, green and black symbols (circles, crosses and triangles) would show the predicted values for the independent data using different model configurations (red corresponds to McKitrick and Michaels’ analysis). The grey dots are the actual trend data that the model tries to predict. The y-axis is deg C per decade. After Benestad (2004).
References:
Benestad, R.E. (2004). Are temperature trends affected by economic activity? Comment on McKitrick & Michaels. Climate Research 27:171-173
McKitrick, R., and Michaels, P.J. (2004). A test of corrections for extraneous signals in gridded surface temperature data. Climate Research, 26: 159–173.
McKitrick, R., and Michaels, P.J. (2004b). Are temperature trends affected by economic activity? Reply to Benestad (2004)Climate Research, 27:175-176.
Wilks, D. S. (1995). Statistical Methods in the Atmospheric Sciences: An Introduction. Academic Press, New York, 467 pp.
William says
If this is the same paper, then http://cgi.cse.unsw.edu.au/~lambert/cgi-bin/blog/2004/08#mckitrick6 is relevant: the original M&M paper got their latitudes wrong hence all their results need revision…
[Response:Point well taken! Please see the revised version of 12/14/04.]
Steve Snyder says
Excellent new site; and well-needed.
Michael Tobis says
Let me be frank. While I’m hopeful about the realclimate.org effort I’m not especially optimistic that it will work.
[Response: Thanks. We need just this kind of constructive criticism at this early stage of the site’s development ;)
This article reinforces my fears. For instance, while this article has text that appears to critique some of McKitrick’s work, the following quote appears to corroborate it:
“Benestad (2004) repeated their analysis using a a different statistical model (linear and generalised multiple regression model), and found coefficients of similar magnitude for the factors McKitrick and Michaels proposed as important.”
I am guessing there is some qualifier missing above. As it stands, I find it hard to place this in the semantic context of the article. This tends to indicate that the article was assembled with insufficient care.
To make matters worse, we have this:
“In their reply to my comment, McKitrick and Michaels (2004b) note that I do not dispute their choice of data or their methodology. While that is true, they should not confuse this with my vouching for their approach. I have not commented on their choice of methodology or data simply because that is beside the point.”
This is perfectly coherent in a logical way, but not everyone reads with precision. Many people, reading casually as web users are wont to do, will see “I do not dispute” to mean “I agree”. The following sentences will then be lost in the shuffle
in ways that they would not be in apurely academic environment. It would be better to say something like:
“McKitrick and Michaels (2004b) misleadingly cliam that I do not dispute their choice of methodology. A methodology is only valid when correctly applied, and this M&M 2004 have egregiously failed to do.” (Or something like that, presuming that I take the meaning correctly.)
[Response: Again, thanks for your constructive criticism. We have revised the posting significantly to address, among other things, the various issues you’ve raised. Please see our revised version as of 12-14-04.]
Worst of all, even though I work in the climate field at a major and well-endowed university, I have no access to the journal in question, and am therefore unable to make much of the argument without going to considerable effort. Consider how much
more difficult this will be for the general readership.
[Response: Well, this we can’t do much about. However, we have linked to a pdf version of the Benestad (2004) article (we don’t have access to the other articles cited).
Something like this article may belong in the journal in question, but the purpose of this site is outreach to the general public, both directly and through journalists. This article does not succeed in this effort.
I appreciate that the effort is finally being made, (and for what it’s worth I’m willing to help) but one should have no illusions that the task is easy. I hope that in future even the primary contributors will not simply allow but demand some editorial input to ensure that their contributions are appropriate for a general audience. More articles like the current one will reduce realclimate.org to just another professional mailing list with negligible impact on public discourse.
[Response: Again, we hope you find the revised version of our posting an improvement over the original. We would certainly like to encourage you to consider contributing guest postings if you feel so inclined. And, again, we appreciate very much your helpful feedback on the current content of the site.
Jim Dukelow says
It is interesting that McKitrick and Michaels would not account for the effect of correlations between nearby stations on the size of confidence intervals and on statistical significance, since Ross McKitrick lectured me (quite correctly) several years ago on the similar effect of autoregressivity in temperature time series.
Jim Dukelow
James Acres says
I hate to be blunt and mean: You guys are wasting your time. No one who doesn’t agree with you is going to fight through your essays as currently writen. You are fighting the good fight — but you’ve no chance of winning.
I used the readability tool in MS Word to check your entry “Are Temperature Trends Affected by Economic Activity.” Your piece is less readable than journal articles in Harvard Law Review. Your piece is slightly easier to read than the average insurance policy.
——
Clarity and efficiency in writing is important on the web and in email. More important than on the printed page. Computer screens cause substantially more eye-fatigue than do printed pages. So people scan to compensate.
For clarity of writing on the web I’ve found the work of Dr. Rudolf Flesch helpful. He developed a formula back in the 40’s for readability. I’ve begun using it for my own web writing and email and noticed real improvement in audience comprehension.
A quick review of his work is here:
http://pages.stern.nyu.edu/~wstarbuc/Writing/Flesch.htm
Newer versions of MS Word include a readability-checker with the other grammar tools.
——
Suggestions:
1) Revise your pieces using some sort of readabilty tool in your word processor. While imperfect it will give an objective idea of how readable your pieces are.
2) Get a grad-student from the Lit or Writing department to write for you. (Personally — I wish all researchers would do this with all their publications! :-))
—-
Apologies for being so brutal. What you guys are doing is great! Research is of much greater utility when it’s more accessible. I just worry you’ll get discouraged from the lack of success.
James Acres
rasmus says
[Response: (to James Acres)
I understand your comment, and yes, this piece is very technical indeed. It also discusses several points, which makes it even more complex. I felt it was important to include the technicalities in order to convince. By popularising the piece, the article loses its edge. I guess that there is a similar reason why articles in Harward Law Review are not popularised either. Maybe I can write a more understandable ‘translation’ for the lay person? Or re-cap in one sentence: the analysis by MM04 can be thought analogous to conducting a survey, where 10 individuals are asked the same question 100 times and then presenting the results as a statistical sample from 1000 independent polls. -Rasmus]
Michael French says
I find some of the work on this site to be a challenge to read and it takes some work to understand the technical references, however the information and the broadening of my understanding through this effort make it well worth the investment. The time and thought that you have put into this effort are greatly appriciated. Thank you and plesae keep it up!!!
CharlieT says
I notice that Maurellis has put an animation of their Industrial Activity index (CO2 emission) vs warming, under fig 4 at http://www.sron.nl/www/code/eos/atmos/h2o/h2o.php?1=1&menuID=1120
-see William’s comment in the UHI discussion
(Correct!) Link to the base paper:
http://www.sron.nl/~josl/Documents/2003GL019024.pdf
[Response: Thank you for bringing my attention to the paper by De Laat and Maurellis (DLM04) and the URL. After having read their paper, I must admit that I’m left with a number of unanswered questions. DLM04 argue that there is a correlation between industrial actiivty (local CO2 emission) and temperature trends, and it may therefore seem to support MM04. I noticed, however, that their Fig. 2 showed a systematic difference between the trend in the global climate models (GCMs) corresponding to the regions where the CO2 emissions are higher than the given threshold value and those where they are lower (henceforth referred to as ‘above’ and ‘below’ curves). I presume that the CO2 emission data are the same as for the real world. The GCMs from IPCC (2001) do not, unless I’m very mistaken, account for the urban heat island effect. Hence, I see their results as supporting the contention by Benestad (2004) that the overlap between the economic activity and temperature trends was coincidental and misleading due to high spatial correlation.
I also find it hard to fit the conclusion of DLM04 into the broader picture: the SST trends are still positive and not affected by local industrial effects, and there was a strong temperature trend in Russia even after the collapse of the old Soviet that also affected their industry. If DLM04 were correct, then should not the temperature trends there drop after 1990? On the contrary, and MM04 proposed that the strong trends here were due to a deterioration of the quality of the stations.
One would expect an urban heat island to produce a local warming, but it is difficult to see physically how the industrial activity can lead to an ongoing warming trend unless the activity grows quite dramatically. Why should there be an ongoing trend with if there is a stable level of heat spillage?
I have also some possible misgivings about the analysis and the figures in the DLM04 paper:
(a) I find it hard to reconcile the curves for below and above the thresholds because why doesn’t the gap between these two increase with the level of CO2? Fig 1a in DLM04 shows how the mean surface temperature trends (y-axis) vary with different threshold values for the CO2 emissions (x-axis), and even over several orders of magnitude (0.01 … 30GT/year), the trends only change by ~0.1 according to their figure! To me, it seems like the trends are not all that sensitive after all to the changes in the CO2 threshold!
(b) Why do the trends for the lower tropospheric MSU trends (Fig.1b) increase so much rapidly with the threshold values above 2GT/year than the surface (Fig.1a)? It is almost tempting to draw the conclusion that (locally) the enhanced greenhouse effect is important after all according to observation (i.e. that the lower troposphere warms more rapidly than the surface)!
(c) The trends in the temperatures from the GCMs (Fig. 2 in DLM04) are strangely insensitive to the CO2-threshold (x-axis). Presumably the way the analysis is done, the CO2-thresholds are represented by different regions for which the industrial activity differs. The GCMs tend to produce temperature trends that vary geographically, and I would expect to see at least some changes in the trend estimates when the fractional surface area becomes small. I find it a bit suspicious that all of the GCMs produce a constant trend level for all different intervals and yet there are clear differences between the ‘above’ and ‘below’ curves.
Finally, I note that DLM04 observes that “Bengston et al [1999] have shown that model-predicted surface temperature trends are much larger (by about a factor of two) than what has been observed over the last two decades” but do not discuss the fact that the observed and simulated surface trends in Fig 1a (0.19 K/decade) and Fig 2 (NCAR-DOE-PCM: ~0.2K/decade) seem to agree very well (and even the surface trend from ECHAM-OPYC3 suggests ~0.3 K/decade). Furthermore, DLM04 argue that there are important differences between the real world and the GCMs in terms of how the trends change with the x-axis, but do not mention the similarities in that both GCMs and real world data iindicate higher trends for the ‘above’ curves. rasmus-]