At Jim Hansen’s now famous congressional testimony given in the hot summer of 1988, he showed GISS model projections of continued global warming assuming further increases in human produced greenhouse gases. This was one of the earliest transient climate model experiments and so rightly gets a fair bit of attention when the reliability of model projections are discussed. There have however been an awful lot of mis-statements over the years – some based on pure dishonesty, some based on simple confusion. Hansen himself (and, for full disclosure, my boss), revisited those simulations in a paper last year, where he showed a rather impressive match between the recently observed data and the model projections. But how impressive is this really? and what can be concluded from the subsequent years of observations?
In the original 1988 paper, three different scenarios were used A, B, and C. They consisted of hypothesised future concentrations of the main greenhouse gases – CO2, CH4, CFCs etc. together with a few scattered volcanic eruptions. The details varied for each scenario, but the net effect of all the changes was that Scenario A assumed exponential growth in forcings, Scenario B was roughly a linear increase in forcings, and Scenario C was similar to B, but had close to constant forcings from 2000 onwards. Scenario B and C had an ‘El Chichon’ sized volcanic eruption in 1995. Essentially, a high, middle and low estimate were chosen to bracket the set of possibilities. Hansen specifically stated that he thought the middle scenario (B) the “most plausible”.
These experiments were started from a control run with 1959 conditions and used observed greenhouse gas forcings up until 1984, and projections subsequently (NB. Scenario A had a slightly larger ‘observed’ forcing change to account for a small uncertainty in the minor CFCs). It should also be noted that these experiments were single realisations. Nowadays we would use an ensemble of runs with slightly perturbed initial conditions (usually a different ocean state) in order to average over ‘weather noise’ and extract the ‘forced’ signal. In the absence of an ensemble, this forced signal will be clearest in the long term trend.
How can we tell how successful the projections were?
Firstly, since the projected forcings started in 1984, that should be the starting year for any analysis, giving us just over two decades of comparison with the real world. The delay between the projections and the publication is a reflection of the time needed to gather the necessary data, churn through the model experiments and get results ready for publication. If the analysis uses earlier data i.e. 1959, it will be affected by the ‘cold start’ problem -i.e. the model is starting with a radiative balance that real world was not in. After a decade or so that is less important. Secondly, we need to address two questions – how accurate were the scenarios and how accurate were the modelled impacts.
So which forcing scenario came closest to the real world? Given that we’re mainly looking at the global mean surface temperature anomaly, the most appropriate comparison is for the net forcings for each scenario. This can be compared with the net forcings that we currently use in our 20th Century simulations based on the best estimates and observations of what actually happened (through to 2003). There is a minor technical detail which has to do with the ‘efficacies’ of various forcings – our current forcing estimates are weighted by the efficacies calculated in the GCM and reported here. These weight CH4, N2O and CFCs a little higher (factors of 1.1, 1.04 and 1.32, respectively) than the raw IPCC (2001) estimate would give.
The results are shown in the figure. I have deliberately not included the volcanic forcing in either the observed or projected values since that is a random element – scenarios B and C didn’t do badly since Pinatubo went off in 1991, rather than the assumed 1995 – but getting volcanic eruptions right is not the main point here. I show three variations of the ‘observed’ forcings – the first which includes all the forcings (except volcanic) i.e. including solar, aerosol effects, ozone and the like, many aspects of which were not as clearly understood in 1984. For comparison, I also show the forcings without solar effects (to demonstrate the relatively unimportant role solar plays on these timescales), and one which just includes the forcing from the well-mixed greenhouse gases. The last is probably the best one to compare to the scenarios, since they only consisted of projections of the WM-GHGs. All of the forcing data has been offset to have a 1984 start point.
Regardless of which variation one chooses, the scenario closest to the observations is clearly Scenario B. The difference in scenario B compared to any of the variations is around 0.1 W/m2 – around a 10% overestimate (compared to > 50% overestimate for scenario A, and a > 25% underestimate for scenario C). The overestimate in B compared to the best estimate of the total forcings is more like 5%. Given the uncertainties in the observed forcings, this is about as good as can be reasonably expected. As an aside, the match without including the efficacy factors is even better.
What about the modelled impacts?
Most of the focus has been on the global mean temperature trend in the models and observations (it would certainly be worthwhile to look at some more subtle metrics – rainfall, latitudinal temperature gradients, Hadley circulation etc. but that’s beyond the scope of this post). However, there are a number of subtleties here as well. Firstly, what is the best estimate of the global mean surface air temperature anomaly? GISS produces two estimates – the met station index (which does not cover a lot of the oceans), and a land-ocean index (which uses satellite ocean temperature changes in addition to the met stations). The former is likely to overestimate the true global surface air temperature trend (since the oceans do not warm as fast as the land), while the latter may underestimate the true trend, since the air temperature over the ocean is predicted to rise at a slightly higher rate than the ocean temperature. In Hansen’s 2006 paper, he uses both and suggests the true answer lies in between. For our purposes, you will see it doesn’t matter much.
As mentioned above, with a single realisation, there is going to be an amount of weather noise that has nothing to do with the forcings. In these simulations, this noise component has a standard deviation of around 0.1 deg C in the annual mean. That is, if the models had been run using a slightly different initial condition so that the weather was different, the difference in the two runs’ mean temperature in any one year would have a standard deviation of about 0.14 deg C., but the long term trends would be similar. Thus, comparing specific years is very prone to differences due to the noise, while looking at the trends is more robust.
From 1984 to 2006, the trends in the two observational datasets are 0.24+/- 0.07 and 0.21 +/- 0.06 deg C/decade, where the error bars (2) are the derived from the linear fit. The ‘true’ error bars should be slightly larger given the uncertainty in the annual estimates themselves. For the model simulations, the trends are for Scenario A: 0.39+/-0.05 deg C/decade, Scenario B: 0.24+/- 0.06 deg C/decade and Scenario C: 0.24 +/- 0.05 deg C/decade.
The bottom line? Scenario B is pretty close and certainly well within the error estimates of the real world changes. And if you factor in the 5 to 10% overestimate of the forcings in a simple way, Scenario B would be right in the middle of the observed trends. It is certainly close enough to provide confidence that the model is capable of matching the global mean temperature rise!
But can we say that this proves the model is correct? Not quite. Look at the difference between Scenario B and C. Despite the large difference in forcings in the later years, the long term trend over that same period is similar. The implication is that over a short period, the weather noise can mask significant differences in the forced component. This version of the model had a climate sensitivity was around 4 deg C for a doubling of CO2. This is a little higher than what would be our best guess (~3 deg C) based on observations, but is within the standard range (2 to 4.5 deg C). Is this 20 year trend sufficient to determine whether the model sensitivity was too high? No. Given the noise level, a trend 75% as large, would still be within the error bars of the observation (i.e. 0.18+/-0.05), assuming the transient trend would scale linearly. Maybe with another 10 years of data, this distinction will be possible. However, a model with a very low sensitivity, say 1 deg C, would have fallen well below the observed trends.
Hansen stated that this comparison was not sufficient for a ‘precise assessment’ of the model simulations and he is of course correct. However, that does not imply that no assessment can be made, or that stated errors in the projections (themselves erroneous) of 100 to 400% can’t be challenged. My assessment is that the model results were as consistent with the real world over this period as could possibly be expected and are therefore a useful demonstration of the model’s consistency with the real world. Thus when asked whether any climate model forecasts ahead of time have proven accurate, this comes as close as you get.
A couple of replies:
James (comment 58), as you know, we have discussed this bet previously, see your posting on it.
And I have admitted that we would have lost. The same holds true today. Yes, we cherry-picked the start of the period of the bet – that was the point. And yes, we would have lost despite that, as indeed there will be no statistically significant downward trend in the global average temperature in the lower troposphere as measured by satellites (take your pick of either UAH or RSS datasets) from January 1998 through December 2007 (the period of our proposed bet). In fact, the UAH data from January 1998 through present (April 2007) is (a non-significant) 0.072 ºC/decade and the RSS trend for the same period is (a non-significant) 0.012 ºC/decade (overall, from the start of the record until now, the UAH and RSS global lower tropospheric trends are (a significant) 0.15ºC/dec and 0.18ºC/dec, respectively). Is there anything more that you would like me to add?
Perhaps another wager? How about this – I’ll take the low end of the IPCC range of projected warming and you take the high of the range, and whoever reality proves to be closer to will be the winner? (note I would have won this bet for a period of the past 20 years using Dr. Hansen’s 1988 projections to define the range – as proven by Gavin’s analysis).
My point here is I personally believe, and probably so do a lot of other folks, that the high end of the IPCC temperature (and forcing) range is unrealistic. My question is why included it? Why did Dr. Hansen include his scenario A, when now, Gavin contends that the real “forecast” was scenario B? What is the real “forecast” now?
Eli (Comment 48, 64), do you think the emissions scenario used in Knutson and Tuleya (2004) was reasonable? (hint, they assumed a 1%/yr increase in atmospheric CO2 concentration from present day for 80 years – the current rate (depending on how you define “current” is some where between 0.5 and 0.6%/yr). If they used an “idealized” scenario, then they should have only given an “idealized” conclusion – not one with a date attached to it that is likely to be far sooner than observations suggest. And as far as the “hot Virginia sun” goes, we’ve been overall a bit chillier than normal since about February – kind of unpleasant during winter, but kind of nice, now! :^)
Urs, (comment 55), thanks for the information about global CO2 emissions. I have seen the numbers for 2003 and 2004(?) but nothing since, do you have, or can you point me to, the global CO2 emissions numbers for more recent years? Thanks!
-Chip Knappenberger
to some degree, supported by the fossil fuels industry since 1992
Mr Roberts (re: 83),
In Gavin’s response that you point to, he concludes “I have previously suggested that it would have been better if they had come with likelihoods attached – but they didn’t and there is not much I can do about that.” I take that to mean that he thinks that all scenarios are not equally likely. I agree. I am taking the position that I don’t think the high end scenarios are very likely at all–for instance, SRES A1FI results in a CO2 concentration of somewhere around 970ppm by 2100, SRES A2 produces ~850ppm. In comment 81, Thomas Lee Elifritz thinks the real forecast for CO2 concentration is 383 + 2pmm/yr which produces ~570ppm by 2100–an amount less than nearly all SRES scenarios! In 1988, Dr. Hansen wrote that his scenario B was “perhaps the most plausible.” So obviously people have opinions as to which scenarios are more likely than others.
Gavin concluded his original post with “Thus when asked whether any climate model forecasts ahead of time have proven accurate, this comes as close as you get.”
So why is everyone being so coy? I am just asking what people think are the most likely scenarios, i.e. which climate model forecasts you think “ahead of time” will prove to be most accurate? My hunch is that not many will give much credence to the high-end IPCC scenarios. Perhaps I will be wrong. You all have opinions, so what are they? And how do they square with the IPCC SRES scenarios?
-Chip Knappenberger
to some degree, supported by the fossil fuels industry since 1992
