It is a truism that all models are wrong. Just as no map can capture the real landscape and no portrait the true self, numerical models by necessity have to contain approximations to the complexity of the real world and so can never be perfect replications of reality. Similarly, any specific observations are only partial reflections of what is actually happening and have multiple sources of error. It is therefore to be expected that there will be discrepancies between models and observations. However, why these arise and what one should conclude from them are interesting and more subtle than most people realise. Indeed, such discrepancies are the classic way we learn something new – and it often isn’t what people first thought of.
The first thing to note is that any climate model-observation mismatch can have multiple (non-exclusive) causes which (simply put) are:
- The observations are in error
- The models are in error
- The comparison is flawed
In climate science there have been multiple examples of each possibility and multiple ways in which each set of errors has arisen, and so we’ll take them in turn.
1. Observational Error
These errors can be straight-up mistakes in transcription, instrument failure, or data corruption etc., but these are generally easy to spot and so I won’t dwell on this class of error. More subtly, most of the “observations” that we compare climate models to are actually syntheses of large amounts of raw observations. These data products are not just a function of the raw observations, but also of the assumptions and the “model” (usually statistical) that go into building the synthesis. These assumptions can relate to space or time interpolation, corrections for non-climate related factors, or inversions of the raw data to get the relevant climate variable. Examples of these kinds of errors being responsible for a climate model/observation discrepancy range from the omission of orbital decay effects in producing the UAH MSU data sets, or the problems of no-modern analogs in the CLIMAP reconstruction of ice age ocean temperatures.
In other fields, these kinds of issues arise in unacknowledged laboratory effects or instrument calibration errors. Examples abound, most recently for instance, the supposed ‘observation’ of ‘faster-than-light’ neutrinos.
2. Model Error
There are of course many model errors. These range from the inability to resolve sub-grid features of the topography, approximations made for computational efficiency, the necessarily incomplete physical scope of the models and inevitable coding bugs. Sometimes model-observation discrepancies can be easily traced to such issues. However, more often, model output is a function of multiple aspects of a simulation, and so even if the model is undoubtedly biased (a good example is the persistent ‘double ITCZ’ bias in simulations of tropical rainfall) it can be hard to associate this with a specific conceptual or coding error. The most useful comparisons are then those that allow for the most direct assessment of the cause of any discrepancy.”Process-based” diagnostics – where comparisons are made for specific processes, rather than specific fields, are becoming very useful in this respect.
When a comparison is being made in a specific experiment though, there are a few additional considerations. Any particular simulation (and hence diagnostic from it) arises as a result from a collection of multiple assumptions – in the model physics itself, the forcings of the simulation (such as the history of aerosols in a 20th Century experiment), and the initial conditions used in the simulation. Each potential source of the mismatch needs to be independently examined.
3. Flawed Comparisons
Even with a near-perfect model and accurate observations, model-observation comparisons can show big discrepancies because the diagnostics being compared while similar in both cases, actually end up be subtly (and perhaps importantly) biased. This can be as simple as assuming an estimate of the global mean surface temperature anomaly is truly global when it in fact has large gaps in regions that are behaving anomalously. This can be dealt with by masking the model fields prior to averaging, but it isn’t always done. Other examples have involved assuming the MSU-TMT record can be compared to temperatures at a specific height in the model, instead of using the full weighting profile. Yet another might be comparing satellite retrievals of low clouds with the model averages, but forgetting that satellites can’t see low clouds if they are hiding behind upper level ones. In paleo-climate, simple transfer functions of proxies like isotopes can often be complicated by other influences on the proxy (e.g. Werner et al, 2000). It is therefore incumbent on the modellers to try and produce diagnostics that are commensurate with what the observations actually represent.
Flaws in comparisons can be more conceptual as well – for instance comparing the ensemble mean of a set of model runs to the single realisation of the real world. Or comparing a single run with its own weather to a short term observation. These are not wrong so much as potentially misleading – since it is obvious why there is going to be a discrepancy, albeit one that doesn’t have much implications for our understanding.
Implications
The implications of any specific discrepancy therefore aren’t immediately obvious (for those who like their philosophy a little more academic, this is basically a rephrasing of the Quine/Duhem position on scientific underdetermination). Since any actual model prediction depends on a collection of hypotheses together, as do the ‘observation’ and the comparison, there are multiple chances for errors to creep in. It takes work to figure out where though.
The alternative ‘Popperian’ view – well encapsulated by Richard Feynman:
… we compare the result of the computation to nature, with experiment or experience, compare it directly with observation, to see if it works. If it disagrees with experiment it is wrong.
actually doesn’t work except in the purest of circumstances (and I’m not even sure I can think of a clean example). A recent obvious counter-example in physics was the fact that the ‘faster-than-light’ neutrino experiment has not falsified special relativity – despite Feynman’s dictum.
But does this exposition help in any current issues related to climate science? I think it does – mainly because it forces one to think about the other ancillary hypotheses are. For three particular mismatches – sea ice loss rates being much too low in CMIP3, tropical MSU-TMT rising too fast in CMIP5, or the ensemble mean global mean temperatures diverging from HadCRUT4 – it is likely that there are multiple sources of these mismatches across all three categories described above. The sea ice loss rate seems to be very sensitive to model resolution and has improved in CMIP5 – implicating aspects of the model structure as the main source of the problem. MSU-TMT trends have a lot of structural uncertainty in the observations (note the differences in trends between the UAH and RSS products). And global mean temperature trends are quite sensitive to observational products, masking, forcings in the models, and initial condition sensitivity.
Working out what is responsible for what is, as they say, an “active research question”.
Update: From the comments:
“our earth is a globe
whose surface we probe
no map can replace her
but just try to trace her”
– Steve Waterman, The World of Maps
References
- M. Werner, U. Mikolajewicz, M. Heimann, and G. Hoffmann, "Borehole versus isotope temperatures on Greenland: Seasonality does matter", Geophysical Research Letters, vol. 27, pp. 723-726, 2000. http://dx.doi.org/10.1029/1999GL006075
Alex says
This all makes sense, but will the IPCC report, which influences investment and policy decisions, reflect your statement that the reasons for mismatch between modeling and observation is an active research question? Will it retain the high degree of confidence regarding catastrophic anthropogenic global warming without clear answers for the responsibility of the mismatch?
[Response: You are not following the argument. That models and observations do not match in all respects is normal and expected. It was true for TAR, AR4 and will be for AR5. There is nothing new in this general issue. If you think that policies are being made based on exact numbers coming from a climate model, I’d have to ask for some evidence. Polices are being made (or at least considered) on the strongly evidence based premise that climate sensitivity is non-negligible, but that conclusion doesn’t depend on models as much as paleo-climate and so is unlikely to change. PS. I have no idea what you mean by “high confidence” in “catastrophic anthropogenic global warming”. – gavin]
Victor Venema says
Good explanation. Thanks.
Related to this, at conferences many modelers compare their results relative to “observations”, but do not mention what those observations are. That makes it hard for a listener that is knowledgeable with observations to judge whether the deviations are due to the model or the observations. Thus my plea to the modelers: please mention the name of the observational dataset in your legend.
Lichanos says
I have to take issue with your citing of Quine, and using it to set aside the Popper-Feynman point of view. After all, further down in the article you cite is this bit:
“Thus, if we accept Quine’s general picture of knowledge, it becomes quite difficult to disentangle normative from descriptive issues, or questions about the psychology of human belief revision from questions about the justifiability or rational defensibility of such revisions.”
Other criticisms are also discussed, including one that the whole issue, underdetermination, is overblown. My point, however, is that you should not use this argument unless you are willing to accept Quine’s general picture of knowledge. I for one, am not.
Quine is a powerful representative of an epistemological tradition of thought that currently dominates the English-speaking world, but it is a tradition that has basically reached a bad dead end. More detail than that would not be appropriate for a bog dedicated to climate science.
Hank Roberts says
http://www.washingtonpost.com/rf/image_606w/2010-2019/WashingtonPost/2013/09/11/Editorial-Opinion/Graphics/Tsketch11.jpg
Stripmining the future is its own reward, thus far.
Watcher says
Gavin,
I think your example concerning neutrinos and relativity is off the mark. Implicit in Feynman’s dictum is that the observations have to be correct. That’s ‘correct’ in a scientific sense, not a philosophical one. In the latter sense anyone who’s stayed awake in their Phil 101 course knows there’s no such thing as ‘correct’ (though you keep arguing anyway); while in the former sense the new airplane design either crashes or it doesn’t.
[Response: Don’t agree. The neutrino example is spot-on. No experiment is so clean or pure that there are no ancillary hypotheses (which is what would be required for Feynman’s dictum to be accurate. Indeed, Feynman made some of his best discoveries by challenging experimental data which proved to be dubious (so-called ‘Feynman points’). – gavin]
By the way, I thought it was admirable when the authors of the neutrino experiment more or less said at the time, “this is what we found but we could easily be wrong”, inviting others to help figure out why.
Tim Osborn says
Hi Gavin, good to see these things set out in a good, logical manner.
I wonder if it works better to separate the “model” from the “simulation”? i.e. don’t lump errors in the forcings (boundary conditions) and initial conditions in with the model errors.
I prefer to think of it this way: you have a model and you want to test whether it behaves like the real world. So you try to simulate some aspect of the real world and compare the simulation with observations. This is evaluating the simulation, not the model per se. If there is a discrepancy between simulation and observations, it might be (partly) because of errors in the forcings or initial conditions or in some other aspect of the experimental design. You say all this, of course, in your item 2 (model error) — but the point is that it isn’t model error.
Separating things out into more components like this is necessary if we want to build a useful statistical model of the data-model comparison, i.e. one that doesn’t just answer the question “is the model right?” (since if we look closely enough the answer will always be “no”) but also the more interesting question “how wrong is the model?”
[Response: Yes. This would be a good distinction. These points are lumped together in my point 2, but it would be clearer to break it out. Thanks. – gavin]
SecularAnimist says
Alex wrote: “high degree of confidence regarding catastrophic anthropogenic global warming”
A high degree of confidence is appropriate, given that catastrophic anthropogenic global warming is already occurring right before our eyes, all over the world. Deniers have to work very hard to ignore it.
Ray Ladbury says
Lichanos,
And yet, what we have is quite clearly a case of holistic underdetermination. Even if we do find a discrepancy, it may not be possible to state exactly which of the hypotheses underlying our model, and we cannot even state that there is a statistically significant discrepancy.
The Popperian paradigm only works for logically simple theories, and it works best when you have multiple theories with which to compare an observation. After all, even after the negative findings of Michelson and Morley, physicists did not reject the aether. The Lorentz equations were initially derived assuming that motion causes compression of the aether.
Hank Roberts says
> Quine … dead end
You mean dead because “there may be little at stake since the “fantasy of irresolubly rival systems of the world” doesn’t get anywhere useful?
Those claiming AGW can’t be true because:FREEDOM don’t think their position is a fantasy. They do seem to think their position is irresolvably at odds with the science and economics that show we’ve been stripmining the future to make money faster today.
Well, this ought to be looked at by the Metaphysics Research Lab
Tim Beatty says
In observational error, you’ve omitted the biggest one: the sample isn’t a representation of the population. There are statistical tools to measure that but they also rely on assumptions about the population. There are many measures used in climate science but limitations on accessibility or funding can often create a variety of sampling techniques not all of which have the same certainty wrt the population they represent.
Ray Ladbury says
Tim Beatty,
Random sampling errors are actually fairly well understood, even for “distribution-free” cases, and if you have enough data, this isn’t really a problem. Climate science is quite fortunate in this regard, as data are not scarce.
The trick comes in interpreting the data, and that requires models. In general, the longer a result has stood, the more you are likely to be able to take it to the bank, precisely because more data will have accumulated and any errors in interpretation will likely have been found.
The idea that a single observational disagreement will make the problem go away is sheer fantasy, and ignorant fantasy at that.
NickC says
I have been reading this blog for nearly ten years and think that Gavin taking us on a trip down a summary of academic theories relating to gaps in prediction and observation very disappointing, partly due to the assumption that he is an expert in those areas now. When was the last time we had some straight shooting regarding the implied power of the prediction made by models, the championing of them here and the advocacy power they have. Very disappointing to see the emphasis of conviction shift to paleo- climate alone when the tide turns.
[Response: Given your extensive reading of the blog, you surely can’t be unaware that I have consistently stated that the best constraints on sensitivity come from the paleo record- and most importantly the last glacial period. Thus I’m a little puzzled as to what inconsistency you think you have detected. I am still a strong advocate for the usefulness of climate modelling, and models are consistent with inferences from paleo. – gavin]
Alex says
Gavin, regarding your response in #1: If you think that policies are being made based on exact numbers coming from a climate model, I’d have to ask for some evidence.
I would direct you to the IPCC FAR Summary for Policymakers in which the bold-faced paragraph headlining the ‘Projections in Future Changes in Climate’ section reads:
For the next two decades, a warming of about 0.2 C per decade is projected for a range of SRES emission scenarios. Even if the concentration of all greenhouse gases and aerosols had been kept constant at year 2000 levels, a further warming of about 0.1 C per decade would be expected.
The table immediately below said paragraph predicts sea level rise with two significant digits under a variety of scenarios. You must be aware that this report has been cited numerous times by Nancy Pelosi, Harry Reid, Al Gore, and many lobbyist organizations pushing cap and trade legislation and many other EPA regulations. While I agree that any mismatch between models and observation “…in all respects is normal and expected.”, it is the very quantity in bold-face in the FAR summary that is both mismatched and driving these policy considerations.
What I was hoping for in this post was some technical leads for the mismatch, specific to your bullet point #2, above. Is it the fundamental CO2 forcing prediction, based on effective radiation temperature to space? Or the indirect CO2 forcing predicted due to H2O increases at high altitudes that have not materialized? I am probably as aware of any reader here of modeling challenges in general, and can appreciate the work your groups have performed, but I can also appreciate the implications of the mismatch that prompted your post: there is fundamental uncertainty in the interaction of the complex mechanisms that drive climate change, including the human effect.
[Response: The IPCC report is far more than a single line about the short term ensemble mean trend. Even the SPM is substantially more detailed, let alone the rest of the report. Your claim of a ‘fundamental’ mechanism that is at play here is simply wrong. As I outlined above, there are many reasons for mismatches, and the shorter the time period, the more reasons there are – forcings, initial conditions, internal variability are all likely playing a role as has been demonstrated in a number of recent papers. We don’t yet have a full synthesis (but people are working on it), but for you to automatically assume the answer says more about prior beliefs than it does about the evidence. – gavin]
Tim Beatty says
Ray Ladbury:
Maybe it’s a chicken/egg problem, but how do you test basic assumption like gridding? I would think model disagreement with sample could be attributed to a number of things unrelated to the model: 1) the natural event is extreme so the actual population including its sample is outside model limits, 2) the sample is not an accurate representation of the population or 3) the population is more complex or dynamic than the sample methods. As an example (and I don’t have data, just a thought experiment), when we estimate average global temperature and we grid up the planet, how do we test that the grid size is appropriate to sample? How do assess whether grid size required for accurate population is potentially seasonally or geographically (or both) dependent? Or if the grid is oversampled in certain areas? If the model is the only test, it could be revealing an extreme population, and extreme sample or model error. How do we know a model could be exactly accurate but the data it needs 10×10 sq mile sample sets in the ENSO SST region but only an 100×100 sq mi sample sets in Ukraine? Maybe it’s my ignorance of available data but I don’t know how to measure sensitivity to that kind of sampling error. It seems the goal is to get the model to agree with the sample but how do test the sample against the population and how do you estimate the variance in the population with the variance in the sample? 2012 is different from 2013. Both are represented as a population (nature) and as a sample of population (our measurement of nature). Is one of the populations extreme? Is the sample correlated well enough to the population? Are sampling methods dependent on conditions? Do we treat sampling as part of the model or separate?
AIC says
Please, a glossary for the various abbreviations in your post.
Thanks!
[Response: Here. Let me know if there is anything that is not clear. – gavin]
NickC says
Gavin, would it be safe to assume that you have enough confidence in paleo inferences to have mismatch of models to observation lessen your conviction we remain on a worrying path. In other words does it lessen your certainty or just point to gaps in knowledge that will eventually still bear out the overall thrust that anthropogenic contribution is worrying. I think it is pertinent to the discussion.
[Response: Yes. I’m not quite sure what specific mismatch you are referring to (or just making a general point), but on the basic issue – should one be concerned about future anthropogenic climate change – I have not changed my opinion, mainly because that is not based on models at all. Models are there to help us quantify the changes and without them we would have much larger uncertainties. – gavin]
Also in 12 you say ” and models are consistent with inferences from paleo” could you elaborate, on face value it seems incorrect, I must be missing something.
[Response: Charney sensitivity from paleo is around 3ºC, models are in the same range (the latest GISS model for instance is around 2.5ºC). – gavin]
sue says
“PS. I have no idea what you mean by “high confidence” in “catastrophic anthropogenic global warming”. – gavin]”
7
SecularAnimist says:
13 Sep 2013 at 5:15 PM
Alex wrote: “high degree of confidence regarding catastrophic anthropogenic global warming”
A high degree of confidence is appropriate, given that catastrophic anthropogenic global warming is already occurring right before our eyes, all over the world. Deniers have to work very hard to ignore it.
???? Can you clarify, Gavin, since you let this comment through moderation…
[Response: I have high confidence that anthropogenic effects are dominating current climate change and will increasingly do so in the decades to come. The changes that we have seen so far are not catastrophic on a global scale, though future changes are going to be much larger and there is a very real risk of substantive damages. However, when people use the term ‘catastrophic anthropogenic global warming’ they are not referring to any real science but are attempting to paint anyone who talks about the science as an alarmist. AGW is real and growing, but whether it turns into a catastrophe is very much up to us. – gavin]
Joe says
Gavin, I’ll have to read up on Quine to grasp your deeper epistemological point here, but your examples seem far off.
The faster-than-light neutrino experiment was an *error*, so of course it didn’t falsify special relativity. I’m sure Feynman didn’t claim that errors falsify theories, so there’s no contradiction of Feynman’s dictum here.
[Response: You are missing the point entirely. Since there is never a perfect experiment or observation, error is always a possibility. And since you (correctly) note that erroneous claims do not falsify anything, there is always the possibility that an experimental result however conflicting on it’s face, was actually in error. Therefore it is never as simple as Feynman’s dictum implies. – gavin]
Saying that a map doesn’t capture the true landscape or a portrait the true self is very confusing as a lead-in to a discussion of climate models. Maps actually do capture what they are supposed to capture, quite accurately. I don’t know what you mean by a portrait, but it’s not going to be a good analogy for a science like climate science.
[Response: Sometimes I like to use metaphors. Sue me. (But listen to the Neil Gaiman story first). – gavin]
I think it would be bad, bad news for climate scientists to start talking this way, to start retreating behind vague figurative/artistic analogies to describe their ability to cohere with reality.
Adopting an epistemolgy of lower standards, one where hypotheses or theories can’t be falsified, creates too much room for bias and motivated reasoning. Perhaps it wasn’t your intention to suggest a sloppy, low standards epistemology. Climate science is viewed by outsiders — and described by Judith Curry — as a biased, groupthink-driven field. There is very little that scientists can learn from 20th century epistemologies — many of which would make science impossible. The last thing climate science needs right now is some wishy-washy epistemology.
[Response: The last thing any science needs are false epistemologies that are just hoisted up the flagpole in order to ignore the balance of evidence. I illustrated my points with real cases where different resolutions have been found to previous mismatches, assuming that future mismatches will all be resolved in a single way is a-historical and extremely unlikely. – gavin]
(BTW, Alex said there’s “fundamental uncertainty…”, not that there’s a fundamental *mechanism*.)
Dag Flolo says
(Fixed a few type errors)
Hi Gavin
Maybe the following example is useful as a clean example ref:
… we compare the result of the computation to nature, with experiment or experience, compare it directly with observation, to see if it works. If it disagrees with experiment it is wrong.
actually doesn’t work except in the purest of circumstances (and I’m not even sure I can think of a clean example).
Here is a clean example:
Distance traveled (m) = velocity (m/s) * time (s)
The dimensions are traceable to international standards.
I can measure both distance, velocity and time in SI units.
I can determine the standard uncertainty for all the measured variables from statistics
It is falsifiable – i can move a body at a certain velocity for a certain time and measure the traveled distance
If the traveled distance does not fit with calculated distance within the uncertainty calculated by using the international standard Guide to the expression of Uncertainty in Measurement the model might be wrong.
[Response: This is simply a definition of velocity and so the statement is a tautology – it cannot be otherwise. Thus it doesn’t come into the discussion of testing theories. – gavin]
The model is useful if I can make predictions which affect my choice of action.
In my every day life it is useful even within uncertainties of 10 – 20 % (caused by uncertainty in predicted velocity) when estimating time of arrival.
If the uncertainty became to high it would not be useful.
The model is analogue to:
Increase in global average atmospheric temperature (K) = Effect from CO2 (K/ppm CO2) * Increase in CO2 level (ppm CO2)
For the model to be useful it must be correct within some level of standard uncertainty for some averaging period.
And repeatably so for many periods.
[Response: You have a very impoverished view of what utility is. Is it useful to know that a medical treatment improves outcomes by 0 to 30% in different trials? The answer for the FDA is very different than for a patient or a researcher. – gavin]
dhogaza says
Gavin:
“Flaws in comparisons can be more conceptual as well – for instance comparing the ensemble mean of a set of model runs to the single realisation of the real world. Or comparing a single run with its own weather to a short term observation. These are not wrong so much as potentially misleading – since it is obvious why there is going to be a discrepancy, albeit one that doesn’t have much implications for our understanding.”
This is almost worth a post in itself, as these fundamental misunderstandings are the basis for so many skeptical arguments (in particular combining the single realization of the real world with an ensemble mean).
In fact, in this thread, alex and nickc are both arguing based to some extent on fundamentally misunderstanding this (along with an apparent belief in model/reality mismatches that don’t actually exist to the extent they believe) …
Frank Davis says
So, if the model doesn’t work, there’s nothing to worry about?
[Response: No. If there is a mismatch, there is maybe an interesting reason for it, and people should try and find out what it is. – gavin]
Watcher says
Gavin,
Since I’m spreading admiration around, I reckon you should get some, too: you had to know a discussion of this type would be lively!
So let’s go to your responses to #18. Just because v = D/t doesn’t mean that all predictions based on Newton’s law of motion are tautologies. I agree that for some ‘simple’ theories they are so obvious in retrospect that they seem so. For example, if I measure v using D/t as my rocket leaves the atmosphere, it is not a tautology to predict where it will be 5 or 6 years from now so that it can drop a probe on Jupiter.
[Response: I didn’t say that Newton’s Law did not give testable predictions. Only that your specific example (v=D/t) was a tautology (and has nothing specific to do with Newton’s Law in any case). – gavin]
Ah! But maybe it didn’t make it to Jupiter. Does this falsify the theory? No, I forgot to account for the orbital motion of Earth and Jupiter, so indeed this expensive miss does not invalidate Newton’s theory. If I understand you correctly this is what you are getting at.
[Response: Yes – this is more to the point. ]
Nevertheless, it is surely possible to take into account the appropriate factors, calling into use gravitational constants and what-not, and come up with a better prediction. If I include only Earth and Jupiter the result will still be off, but I reckon that if I start taking the Sun into account my prediction will start looking a lot better. Maybe it needs tweaking a bit due to the lunar fly-by before it starts looking really good.
[Response: Yup. But how do you tell whether any remaining mismatch is due to a missing body or to the difference to general relativity? – gavin]
Nevertheless, while each of the ‘pieces’ of this construction comprises equally simply things like D=vt and F=GMm/r^2 the final prediction is not a tautology.
[Response: F=GMm/r^2 is a theory, D=vt is a definition. There is a difference. ]
I would venture to say that this is how science is supposed to work, and it is perverse to insist that Feynman’s philosophy be judged on the basis of the first simplistic ‘experiment’. It is the job of a scientist to know which theory/hypothesis is being tested by a given experiment, and is at the root of what I would call a properly designed experiment. Maybe we can say that a good scientist is able to take the world of Quine and reduce it to the world of Popper, at least to an extent that is ‘good enough’.
[Response: Agreed. ]
Just to belabour the point; notice that my Jupiter prediction failed to take into account either the colour of the rocket’s paint or the newly discovered earthlike planet around Alpha Centauri (or wherever) because my scientific judgement tells me that while this leaves me open to the criticism that my model is incomplete, I have good reason to believe that these things don’t matter in the current context. Indeed, if pressed I can estimate the impact using the same D=vt and F= etc. and show this to be the case. Maybe I launch several probes that all arrive safely, and in my mind I elevate my model to a ‘theory of Earth-Jupiter space travel’.
Now, suppose that next year I launch another probe, and this time it misses. Does that falsify Newton’s laws? Perhaps, but it’s more likely that my theory was incorrect. I check things out a bit and notice that this year Mars has moved close to the flight path, so previously I got the right answer with an incorrect theory that posited no influence from the planet Mars. Scientific honesty requires me to admit to another expensive error and revise my theory to include the new factor, and once again attempt the conversion of a Quine to a Popper situation.
My, my, I have gone on and I should probably get to the point. There is philosophy and there is science. If scientists behaved like philosophers nobody would ever get anything done because they’d all be too worried about having missed some factor, and anyway what if I’m just imagining the space probe in the first place? In a scientific sense it MATTERS very much what measurements say, and I will say again that measurements are the only things that really do matter. It’s not sufficient to say that they might be wrong, or they might be measuring something different from what they seem to, and so therefore I might be right even though my theory doesn’t agree with them.
The job of a scientist is to sort through the mess and develop a theory that can account for the measurements. Furthermore, for that theory to be useful it must be capable of producing verifiable predictions (e.g. the probe will get to Jupiter no matter what year I launch it). If the predictions don’t work out then the theory must be modified or abandoned.
Anything less is not science.
[Response: I am a scientist, not a philosopher, and anything I am talking about here comes directly from the practice of science, not theorising about it. However, as I’m sure the philosophers reading will be happy to know, there is some connection between what scientists actually do and how it is modelled by philosophers. It’s not a perfect model though (of course). – gavin]
Jan Galkowski says
This is a personal perspective of the subject from that of a practicing statiscian, and only a very amateur climate guy.
In the case of Earth’s climate as a source of observations, there’s an additional difficulty. As Slava Kharin observed in slides for the Banff Summer School, 2008,
“There is basically one observational record in climate research.” (See Slide 5, http://www.atmosp.physics.utoronto.ca/C-SPARC/ss08/lectures/Kharin-lecture1.pdf) And this is an issue. For there is enough variability in Earth’s climate that if the system were “initialized” again, say, 50 years back and somehow magically all the external inputs to the system kept exactly the same, the result would be a little different. There is a debate about how big this “internal variability” is (see Kumar, Chen, Hoerling, Eischeid, “Do Extreme Climate Events Require Extreme Forcings?”, http://dx.doi.org/10.1002/grl.50657), with climate amateur but statistician me coming down on the side of “not as much as you might think”. (My reasons are complicated, and I’ll write them up in an upcoming paper I’m putting on arXiv.org, that being a critical review of the statistics in the recent NATURE CLIMATE CHANGE paper by Fyfe, Gillett, and Zwiers, shared first with those authors. There are different flavors of variability beyond internal and external. See http://hypergeometric.wordpress.com/2013/08/28/overestimated-global-warming-over-the-past-20-years-fyfe-gillett-zwiers-2013/ for more.) But the point is, such variability makes modeling even harder, for not only are the general parameters of the physical system necessary to get right, but, if prediction is a goal, actually TRACKING the actual realization Earth is taking is part of the job. Slava Kharin argues, and I agree, that the one-observational-record reality means a Bayesian approach is the only sensible one. That’s not universally held in geophysical work, however.
Nevertheless, it’s important, I think, to parse properly what this all means. The reason why we want models is to help understand what data means, and what physical effects are important, how much, and how. We, of course, also want to use them for policy predictions, but using these as predictive devices is a tricky business. Statistically speaking, NONE of that should be taken to mean the long term projections are off in expected values in any significant way. Forcings are forcings, and AB INITIO physics says that extra energy needs to go someplace and be dissipated throughout the (primarily) fluid systems of Earth somehow. The devil is in the latter details, as are the impacts. But they will occur, even if amounts and timings will be off, as they necessarily must be.
So-called “two-sample comparisons” are tricky in complicated systems. Most direct techniques for doing so assume constant variation over large swaths of samples. That kind of approach tends to give large Regions of Probable Equivalence (ROPEs) which, of course, are less useful that otherwise. When this is done for predicting elections, say, something called “stratification” is used, where observations are qualified by (in this case) spatial extent, time of day, and other auxiliary variables and the response state of atmosphere considered as conditioned on these, and the model evaluated comparably, where it can be. Alas, sometimes doing that leaves few observations or few model runs to compare. That’s okay if a Bayesian approach is used. Not so much otherwise.
Gavin said all this, but I wanted to second his view, giving mine, as well as put a note about my ongoing hard look at Fyfe, Gillett, and Zwiers.
Watcher says
I have to echo some of the comments made above concerning the reliance on paleo studies.
The notion that proxy ESTIMATES of temperature 1000 years ago when there was no anthropogenic CO2 are a superior test of AGW theory than current temperature MEASUREMENTS in the presence of a significant anthropogenic CO2 component strikes me as absurd.
[Response: Here’s a test: If you read something written by someone who basically knows what they are talking about and it seems absurd to you, ponder – at least for a second or two – that it might be your interpretation that is at fault rather than the statement. If you did (and perhaps followed the links), you would realise that my comment had nothing whatever to do with temperatures 1000 years ago. But nice try. – gavin]
prokaryotes says
Models tap into the physical world processes but focus only on a given range of frequencies. Therefore any interpretation – conclusion is prone to human error. To understand future states better it appears to involve as much data as possible (which would also increase error rate). It would help to identify tipping point systems of the spectrum better. The conclusiveness, the reliability should increase with data spectrum ratio. I really like to read another post on CMIP5, combined modelling with all methane forcings.
But even a small data model (for instance analogous albedo “Daisyworld”) seems to be reliable in predicting trends.
Also i find this interesting
Link
When it comes to science messaging i think it would help to point out more often general agreements/predictions and underestimation (and why).
csoeder says
re: maps and models
I think that James Gleick was spot on in his book Chaos:
“Only the most naive scientist believes that the perfect model is the one that perfectly represents reality. Such a model would have the same drawbacks as a map as large and detailed as the city it represents, a map depicting every park, every street, every building, every tree, every pothole, every inhabitant, and every map. Were such a map possible, its specificity would defeat its purpose: to generalize and abstract. Mapmakers highlight such features as their clients choose. Whatever their purpose, maps and models must simplify as much as they mimic the world.” (Gleick p.278-279)
Radge Havers says
Joe:
“Saying that a map doesn’t capture the true landscape or a portrait the true self is very confusing as a lead-in to a discussion of climate models. Maps actually do capture what they are supposed to capture, quite accurately. I don’t know what you mean by a portrait, but it’s not going to be a good analogy for a science like climate science.”
Huh?! Here Be Dragons…
“The good cartographer is both a scientist and an artist. He must have a thorough knowledge of his subject and model, the Earth…. He must have the ability to generalize intelligently and to make a right selection of the features to show. These are represented by means of lines or colors; and the effective use of lines or colors requires more than knowledge of the subject – it requires artistic judgement.”
— Erwin Josephus Raisz (1893 – 1968)
—–
“The foremost cartographers of the land have prepared this for you; it’s a map of the area that you’ll be traversing.”
[Blackadder opens it up and sees it is blank]
“They’ll be very grateful if you could just fill it in as you go along.”
— Blackadder II, British Comedy set in Elizabethan times.
—–
“A map is the greatest of all epic poems. Its lines and colors show the realization of great dreams.”
— Gilbert H. Grosvenor, Editor of National Geographic (1903- 1954)
—–
“When our maps do not fit the territory, when we act as if our inferences are factual knowledge, we prepare ourselves for a world that isn’t there. If this happens often enough, the inevitable result is frustration and an ever-increasing tendency to warp the territory to fit our maps. We see what we want to see, and the more we see it, the more likely we are to reinforce this distorted perception, in the familiar circular and spiral feedback pattern.”
— Professor Harry L. Weinberg, 1959 in Levels of Knowing and Existence: Studies in General Semantics
—–
“There is no such thing as information overload, only bad design.”
— Edward Tufte
—–
“If you want a database that has everything, you’ve got it. It’s out there. It’s called reality.”
— Scott Morehouse, Director of Software Development, ESRI
—–
“our earth is a globe
whose surface we probe
no map can replace her
but just try to trace her”
— Steve Waterman, The World of Maps
Hank Roberts says
“data spectrum ratio“?
SecularAnimist says
Gavin wrote (in reply to #17): “The changes that we have seen so far are not catastrophic on a global scale”
To paraphrase Tip O’Neill, all catastrophe is local. And when “local” catastrophes are occurring everywhere at once, that’s “global”.
The millions of people all over the world who have already experienced mass destruction of their homes, livelihoods, food supply and/or water supply as a result of AGW-driven climate change and extreme weather might not agree that the changes we have seen so far are “not catastrophic”.
Which is, of course, why the primary “mission” of the deniers at this point is to deny any link between global warming and these ongoing and rapidly escalating effects — to argue, in essence, that yes, the world is warming; and yes, we are experiencing exactly the sort of effects that climate science has predicted for a generation would result from that warming; but no, those effects are not the result of the warming.
So what is causing them? According to the deniers, nothing. They are just our imagination.
SecularAnimist says
I wrote yesterday (#7): “A high degree of confidence is appropriate, given that catastrophic anthropogenic global warming is already occurring right before our eyes, all over the world. Deniers have to work very hard to ignore it.”
And right on queue, for a perfect example, see the piece in today’s Washington Post by Bjorn Lomborg, perhaps the hardest working denier in show business.
Sure, your decades of smoking cigarettes have given you lung cancer. And yes, you are coughing up blood. But you can’t attribute every bloody cough to the cancer. There are always bloody coughs every once in a while. It’s just natural variation, you see. And it doesn’t mean you are going to experience “globally catastrophic” effects from the cancer, like, you know, death.
prokaryotes says
Hank Roberts, Re “Spectrum Ratio” see also “Vautard, R., and M. Ghil (1989): “Singular spectrum analysis in nonlinear dynamics, with applications to paleoclimatic time series”, Physica D, 35, 395–424.” Link or Singular spectrum analysis
prokaryotes says
And Spectral signal-to-noise ratio
Lichanos says
“If you want a database that has everything, you’ve got it. It’s out there. It’s called reality.” – Scott Morehouse, Director of Software Development, ESRI—–
Gads! As a daily user of ESRI software for more than 20 years, I shudder at the thought of their executives being taken as authorities on anything but sales.
I suppose he thought he was being clever, but the notion of reality as a database is absurd. After decades of producing books along the lines of “Modeling Our World: The ESRI Way,” I guess they believe their own propaganda.
Berényi Péter says
I wonder if there is a non-equilibrium quasi steady state non-reproducible thermodynamic system, one with a vast number of internal degrees of freedom (other than the terrestrial climate system), which is successfully described by a computational model. If its dimensions are small enough to make it fit into the lab and studied that way in controlled experimental runs, so the model is verified properly, it is even better.
– A system is reproducible if for any pair of macrostates (A;B) A either always evolves to B or never.
[Response: Define “successfully”. – gavin]
Martin Vermeer says
A model doesn’t have to be perfect… just better than the competition. Like, you don’t have to out-run the lion, just the other guy…
Lee A. Arnold says
Gavin, you have hit upon one of my favorite topics. There was a related discussion here:
Naomi Oreskes, Kristin Shrader-Frechette, Kenneth Belitz, “Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences,” SCIENCE Vol. 263, No. 5147 (Feb. 4, 1994), pp. 641-646.
Note that your argument applies analogously to natural language too.
To help “climate change communication” in the public debate, I have been trying to create a natural-language-style flow-chart cartoon language that illustrates the principles of the lack of precise prediction in complex systems, for purposes of elementary pedagogy. This is not climatology, it is (non-mathematical) general systems to picture different things in the same format..
I am happy to report that it has had SUCCESS in counteracting denialist arguments in the comments section under the new Lomborg opinion column in the Washington Post. Here is the thing I did:
http://www.youtube.com/watch?v=SIvcQTXdjTg&list=PLT-vY3f9uw3AcZVEOpeL89YNb9kYdhz3p
And here is the complete list of the series:
http://www.youtube.com/playlist?list=PLT-vY3f9uw3AcZVEOpeL89YNb9kYdhz3p
They are all exactly one-minute long. The “food web” cartoon (#3) takes a similar approach.
Abhay says
With regard to comparisons (evaluation may be a proper word here) of models to (using) satellites, it is worth pointing out that this is a big research field in its own. There are various approaches for comparing these two and each approach has its own advantage and limitation.
For example, 1) one can do a “traditional” comparison whereby one compares means, standard deviations etc with satellite based estimates. This will tell you if a model captures the overall range of values and spatial variability, but will not tell you anything about how good any particular process is simulated.
2) Another way would be to carry out a process-oriented comparison, wherein one focuses on a set of processes or natural variabilities (e.g. ENSO, NAO or Indian Ocean Dipole) and investigate how good a particular model reproduces climatology of certain variables during those processes/variabilities (in reference to similar climatology from the satellites). But this approach will not have the advantage of the first one.
3) One could also employ satellite simulators so as to avoid comparing apples to oranges. The simulators take model data of a certain geophysical variable and carefully simulate it in a way particular satellite sensor would have seen that variable. This ensures a fair comparison. And it not only takes care of mismatches and sampling issues between models and satellites, but also different sensitivities of different satellite sensors to geophysical variables.
4) Eventually one could combine any or all of the approaches above, which I think would be the most stringent litmus test of the models.
All of this, of course, only applies if you have satellite based data sets (which in most cases go back to 1979) for comparison.
Ray Ladbury says
At this point it would not seem out of place to quote Richard Hamming:
“The purpose of computing is insight, not numbers.”
The same can be said of models. A model need not even be the best to accomplish this–Tamino’s 2-box model is a case in point, as its simplicity allows the important contributors to climate to be isolated and assessed.
On the other hand, the denialist model… Oh, yeah. There is no denialist model. And you guys wonder why no one intelligent takes you seriously?
Berényi Péter says
[Response: Define “successfully”. – gavin]
In case the system can be studied experimentally in a controlled lab environment, definition of “successfully” is straightforward. Both the experiment and model simulation can be run as many times as necessary with controlled parameters. As the system is supposed to be non-reproducible, only statistics of macroscopic state variables are comparable, of course, but with enough runs that can be made to converge to an arbitrary degree, provided the model is “correct”. If it is not, divergence is clearly visible, that is, the model is falsified.
[Response: Interesting, but not relevant. This presupposes a perfectly known set of basic equations that we can test for convergence as scales get arbitrarily small. That isn’t the case for climate models – too many magnitudes of scale between cloud microphysics or under-ice salt fingering and grid box averages. – gavin]
In case of modelling a single run of a unique physical instance, I have no idea what “successfully” means.
[Response: Similar to your first point – coherent statistics over time periods, robust patterns of teleconnections, process by process similarities, coherent emergent properties, quantitative matches in response to large perturbations (volcanoes, orbital forcing, continental configurations etc.).
However, theories in physics are usually supposed to hold for a wide class of systems, some of which may be studied in the lab. In that case it is a must to do so, because it is the easiest way to verify a theory. This is what one would expect in this particular branch of nonequilibrium thermodynamics, but I must admit I am ignorant enough to be unaware of any such attempt.
Can you give a pointer? Or explain why it is not done, should that be the case.
[Response: Only specific processes can be examined in the lab. Radiative transfer, aerosol formation, some aspects of cloud microphysics, ocean diffusion etc. – but the real world has many good experiments that the numerical models can be evaluated against (some mentioned above). – gavin]
BTW, for reproducible systems we know quite a lot. Unfortunately the terrestrial climate system does not belong to this class.
Journal of Physics A: Mathematical and General Volume 36 Number 3
Roderick Dewar 2003 J. Phys. A: Math. Gen. 36 631 doi:10.1088/0305-4470/36/3/303
Information theory explanation of the fluctuation theorem, maximum entropy production and self-organized criticality in non-equilibrium stationary states
Hoya Skeptic says
Re: Model error.
There are model errors and there are model errors. However, we’re not talking about one or a few mismatches between model predictions and observational data. Instead we are talking about the wholesale failure of the models to “predict” global temperatures, despite the undeniable increase in CO2, upon which the IPCC bases its reports and forecasts.
[Response: Not really.
All of the models ca 2007 that the IPCC used to forecast climate change predicted a steady increase in temperature (based, as they were, on the assumption that CO2 is the primary driver of temperature) and yet global temperatures have remained essentially flat since then.
[Response: If you reason from false premises, you are very unlikely to conclude anything useful. Point 1. Models do not predict a ‘steady increase’ in temperature, they predict many ups and downs and single runs often show a decade with insignificant OLS trends. Point 2. Models are not built with the ‘assumption’ that CO2 is the primary driver of temperature change. The models are actually completely ecumenical about reasons for climate changes – they will change their climate as a function of volcanoes, the sun, aerosol pollution, deforestation, ozone depletion as well as the main greenhouse gases. – gavin]
In short, the models have been “falsified.” Unfortunately, science — Big Science — is not Popperian but Kuhnian, and all that matters is the defense of the prevailing Paradigm, and the data be damned.
[Response: Oh dear. You didn’t read any of the top post did you? Please try again. – gavin]
Radge Havers says
#33 shuddering on a high horse:
“I suppose he thought he was being clever, but the notion of reality as a database is absurd.”
Literal minded much? He was being ironic. It’s basic cartography about selectivity, expressed in short form; a variant of a pretty standard line in geography. I’m surprised that you would miss the thrust of it after twenty years in the business…
NickC says
Thanks Gavin for the discussion.
Your reference to the Paleo is understood, however as with models there must be some inherent uncertainty in the different methodologies (particularly the transient constraints as recent data should also be accounted for in them).
[Response: Of course. – gavin]
As you say in a previous post on sensitivity … “There are three main methodologies that have been used in the literature to constrain sensitivity: The first is to focus on a time in the past when the climate was different and in quasi-equilibrium, and ESTIMATE the relationship between the relevant forcings and temperature response (paleo constraints). The second is to find a metric in the present day climate that WE THINK Is coupled to the sensitivity and for which we have some empirical data (these could be called climatological constraints). Finally, there are constraints based on changes in forcing and response over the recent past (transient constraints). There have been new papers taking each of these approaches in recent months.” (My capitalisation)
My point about all this really is how are any of these actually tested beyond theory, I fully appreciate the shortish time frame where we see a mismatch being a problem asserting anything is wrong yet, but could it do just that if it persistently continues? i.e. are the paleo inferences ever testable over decadal timeframes? Or any timeframes?
[Response: Be clear here that ‘the theory of climate’ is encapsulated in the GCMs (as best it can be given current technology). There is substantial structural uncertainty about the details of what that implies. We look to the real world and the paleo record to constrain those aspects of the climate system that have non-negligible uncertainties (most often climate sensitivities in a general sense). However, we don’t calibrate the emergent properties of the GCMs to the emergent properties derived from observations – they stay (more or less) as evaluation targets. For optimum evaluation purposes you obviously want true out-of-sample evaluations – and many aspects of paleo provide that (since models are not tuned to ice age conditions e.g.), as do future projections (dependent on reasonable scenarios of relevant forcings). True predictions – for instance of the consequences Pinatubo prior to impacts happening, longer term trends (i.e. post 1980s) have all proven skillful. – gavin]
Ray, who decides what the “insight” from model mismatch should be?
Astar says
What evidence is there for the assumption that climate sensitivity based on paleo record is applicable to present day? I would expect it to vary quite significantly based on factors such as ice cover, ocean currents, biosphere etc.
[Response: Good question. The answer is that the variation is apparently less than one might think – some discussion of this in the PALAEOSENS (2012) paper. – gavin]
wili says
I’m quite fond of Gaiman (who lives not far from me), and Quine and Feynman are worthy intellectuals to bring into the discussion. But you missed one bloke with a most fitting quote for your ruminations–The philosopher Alfred Korzybski who is generally credited with first stating, “The map is not the territory.”
And of course there is Magritte’s “Ceci n’est pas un pipe.”
But as to your conclusion, you state: “The sea ice loss rate seems to be very sensitive to model resolution and has improved in CMIP5 – implicating aspects of the model structure as the main source of the problem.”
Any ideas what exact aspects of the “model structure” might have been “the main source of the problem.”
[Response: I haven’t looked into it myself, and I’m not aware of any papers really going into the details (other than those that remark on the improvements). In our own modelling, we have improved the calculations to reduce the amount of numerical diffusion (which helped a lot), and increased resolution (which also helped), but changes to the ocean model also have a big impact, as do Arctic cloud processes and surface albedo parameterisations, so it gets complicated fast. – gavin]
Ray Ladbury says
NickC,
Who decides on the insight?
The relevant community of experts, of course. Who is in a better position to appreciate the strengths and weaknesses of a model and where it is most likely to bear fruit if tweaked. And ultimately, if the mismatch is sufficiently severe, the same experts will develop a different model. That is how science works.
Lichanos says
@ #41 on irony:
I’ve seen too many people taking the map for the terrain. Many of them armed with computer models, often produced with ESRI software.
And while were on irony, I once had a client ask me why we couldn’t have a spatial database at a scale of 1:1? After all, it was a computer model… Borges would have laugned.
Bill Everett says
Relating to comment 20 and Gavin’s Point 1 in response to comment 40:
There may be reason to strongly suspect that in any sufficiently complicated dynamical system model (such as climate) with stochastic parameters (e.g., exactly when and where a lightning strike starts a major wildfire or a major submarine earthquake perturbs ocean circulation in a region or a major volcanic eruption introduces stratospheric aerosols), it is almost certain that any given run of the model will have periods of significant deviation from the mean of multiple runs. In other words, we should expect the “real” climate to significantly differ from ensemble means.
The paper V. I. Klyatskin, “Clustering of a positive random field as a law of Nature” Theoret. Math. Phys. 176(3):1252-1266 (Sep 2013) treats much simpler models, but it rigorously establishes the conditions under which such behavior occurs in the simpler models.
Abstract: In parametrically excited stochastic dynamical systems, spatial structures can form with probability one (clustering) in almost every realization because of rare events occurring with a probability that tends to zero. Such problems occur in hydrodynamics, magnetohydrodynamics, plasma physics, astrophysics, and radiophysics.
Keywords: intermittency, Lyapunov characteristic parameter, dynamical localization, statistical topography, clustering
Joe says
Radge, I’m not sure about your examples. How do poetic and figurative quotes about cartography tell us something about climate science. I shudder to think that climate scientists think of themselves as artists or having broad interpretive license, and I’m a social scientist. (I was talking about Rand McNally road atlases and the like. We’re not going to discover that we were wrong about the location of Orlando.)
Gavin, is the issue of model mismatch related at all to confidence levels. I’m thinking of things like the last IPCC report saying that they were 90% confident that the warming of the 20th century was mostly caused by human activity. I assume a similar confidence level about future anthropogenic warming.
Is there a post on how such confidence levels are calculated? I’m familiar statistical methods that social scientists use, like regression, ANOVA, MLM/HLM, SEM, and the PCA stuff that came up with Mann. We never generate confidence levels around a prediction that spans a large body of work, except maybe some Bayesian stuff.
[Response: This particular attribution issue was discussed in depth in a post last year. That is somewhat separate to future projections though. – gavin]
Icarus62 says
We have many studies presenting the projections from GCMs under various forcing scenarios where unforced variability is simulated, and we have a few studies (not many I think) which have a model reproduce the *actual* forcings and unforced variability and see how well the output matches observations (a recent one by Yu Kosaka and Shang-Ping Xie being a case in point). I don’t know of any studies where the GCM runs are re-done with real-world forcings and unforced variability to pin down exactly where the original projections differed from reality. Presumably this is done by modellers to improve their models but are the conclusions published? They might say for example, “Ah yes, run number 12 in GCM model XYZ was a little too warm but that’s because real world forcings were a little lower than in the projections – the physics was correct, it was the scenario that wasn’t quite right”. In other words we want to know whether projections were off because the inputs haven’t matched reality or because the physics isn’t quite right. Hope that makes sense!
[Response: In broad terms this is correct. We are currently exploring the impacts that updates in the forcings have on the CMIP5 model runs and exploring the range of uncertainty where we don’t have solid information. This takes time though. – gavin]
MARodger says
A map as a model of physical space has a characteristic perhaps worth mentioning here.
The human map-user, on getting themselves lost within the physical space, will consult the map and often conclude that they are not lost but that the map is deficient in some way and thus continue ahead oblivious to their actual location.
This can become remarkably absurd before the logic of the situation becomes apparent. I hear sensible people say that they managed to walk miles in the wrong direction, reassured by minor features that they were on-route and ignoring the obvious discrepancies all around them. Indeed, I remember myself once deciding that I was no-route even though the stream I was following was flowing in the wrong direction!
Could there be a lesson in this for climatalogical understanding? If so, does it apply to you, to “us”? Or does it apply to the other lot? I know which I’d put my money on.