Rapid progress in the use of machine learning for weather and climate models is evident almost everywhere, but can we distinguish between real advances and vaporware?
First off, let’s define some terms to maximize clarity. Machine Learning (ML) is a broad term to distinguish any kind of statistical fitting of large data sets to complicated functions (various flavors of neural nets etc.), but it’s simpler to think of this as just a kind of large regression. The complexity of the functions being fitted has increased a lot in recent years, and the dimensionality of the data that can be fitted has also. Artificial Intelligence (AI) encompasses this, but also concepts like expert systems and (for a while) was distinct from statistical ML methods*. Generative AI (such as demonstrated by ChatGPT, or DALL-E) is something else again – both in size of the training data, and number of degrees of freedom in the fits (~ a trillion nodes). None of these things are ‘intelligent’ in the more standard sense – that remains an unrealized (unrealizable?) goal.
Recent success in weather forecasting
The most obvious examples of rapid improvements in ML applied to weather have come from attempts to forecast weather using ERA5 as a training dataset. Starting with FourCastNet (from NVIDIA in 2022), and followed by GraphCast (2023) and NeuralGCM (2024), these systems have shown remarkable ability to predict weather out to 5 to 7 days with skill approaching or even matching the physics-based forecasts. Note that claims that these systems exceed the skill of the physics-based forecasts AFAIK are not (yet) supported across the wide range of metrics that ECMWF itself uses to assess improvements in the forecast systems.
Two recent improvements to these systems have recently been announced – one at AGU from Bill Collins which showed techniques (‘bred vectors‘) that can be used to generate ensemble spreads with FourCastNet (which is not chaotic) that match the spread of the (chaotic) physics-based models (see also GenCast). The second advance, announced just this week, is GraphDOP, an impressive effort to learn the forecasts using the raw observations directly (as opposed to going through the existing data assimilation/reanalysis system).
Climate is not weather
This is all very impressive, but it should be made clear that all of these efforts are tackling an initial value problem (IVP) – i.e. given the situation at a specific time, they track the evolution of that state over a number of days. This class of problem is appropriate for weather forecasts and seasonal-to-sub seasonal (S2S) predictions, but isn’t a good fit for climate projections – which are mostly boundary value problems (BVPs). The ‘boundary values’ important for climate are just the levels of greenhouse gases, solar irradiance, the Earth’s orbit, aerosol and reactive gas emissions etc. Model systems that don’t track any of these climate drivers are simply not going to be able to predict the effect of changes in those drivers. To be specific, none of the systems mentioned so far have a climate sensitivity (of any type).
But why can’t we learn climate predictions in the same way? The problem with this idea is that we simply don’t have the appropriate training data set. For weather, we have 45 years of skillful predictions and validations, and for the most part, new weather predictions are fully within sample. While for climate we have a much shorter record of skillful prediction over a very small range of forcings, and what we want to predict (climate in 2050, 2100 etc.) is totally out-of-sample. Even relatively simple targets (conceptually) like the attribution of the climate anomalies over the last two years are not approachable via FourCastNet or similar since they don’t have an energy balance, aerosol inputs, or stratospheric water vapor – even indirectly.
What can we do instead?
A successful ML project requires a good training dataset, one that encompasses (more or less) the full range of inputs and outputs so that the ML predictions are within sample (no extrapolation). One can envisage a number of possibilities:
- Whole Model Emulation: This would involve learning from existing climate model simulations as a whole (that could encompass various kinds of ensembles). For instance, one could learn from an perturbed physics ensemble to find optimal parameter sets for a climate model e.g. Elsaesser et al., learn from scenario-based simulation to produce results for new scenarios (Watson Parris et al. (2022)), or learn from attribution simulations for the historical period to calculate the attributions based on different combinations or breakdowns of the inputs.
- Process-based Learning: Specific processes can be learned from detailed (and more accurate) process models – such as radiative transfer, convection, large eddy simulations, etc. and then used within existing climate models to increase the speed of computation and reduce biases Behrens et al.. The key here is to ensure that the full range of inputs are included in the training data.
- Complexity-based Learning: ML parameterizations drawn from more complete models (for instance with carbon cycles or interactive composition) can be implemented within simpler versions of the same model.
- Error-based Learning: One could use a nudged or data-assimilated model for the historical period, save the increments (or errors), learn those, and then apply them as an online correction in the future scenarios [I saw a paper this month proposing this, but I can’t find the reference – I’ll update if I find it]. Downscaling to station data climate statistics with bias corrections would be another application of this.
Each of these approaches has advantages, but also come with potential issues. Emulation of the whole model implies the emulation of that model’s biases. ML-based parameterizations have to work well for thousands of years of simulations, and thus need to be very stable (no random glitches or periodic blow-ups) (harder than you might think). Bias corrections based on historical observations might not generalize correctly in the future. Nonetheless, all of these approaches are already showing positive results or are being heavily worked on.
Predictions are hard
The speed at which this area of the field is growing is frankly mind-boggling – it was included in a significant percentage of abstracts at the recent AGU meeting. Given the diversity of approaches and number of people working on this, predictions of what is going to work best and be widely adopted are foolhardy. But I will hazard a few guesses:
- ML for tuning and calibration of climate models via perturbed physics ensembles is a no-brainer and multiple groups are already using this for their CMIP7 contributions.
- Similarly, the emulation of scenarios – based perhaps on new single forcing projections – will be in place before the official CMIP7 scenarios will be available (in 2026/7?), and thus might alleviate the bottleneck caused by having to run all the scenarios through the physics-based models.
- Historical emulators will make it much easier to do new kinds of attribution analysis – via sector, country, and, intriguingly, fossil fuel company…
- I expect there will be move to predict changes in statistical properties of the climate (particularly the Climate Impact Drivers) at specific global warming levels rather than predicting time series.
- Some ML-enhanced models will be submitted to the CMIP7 archive but they will have pretty much the same spread in climate sensitivity as the non-ML enhanced models, though they may have smaller biases. That is, I don’t think we will be able to constrain the feedbacks in ML-based parameterizations using present-day observations alone. Having said that, the challenge of getting stable coupled models with ML-based components is not yet a solved problem. Similarly, a climate model made up of purely ML-based components but with physics-based constraints is still very much a work in progress.
One further point is worth making is that the computational cost of these efforts is tiny compared to the cost of generative AI, and so there is not going to be (an ironic) growth of fossil fueled data centers instituted just for this.
What I don’t think will happen
Despite a few claims made in the relevant papers or some press releases, the ML models based on the weather or reanalyses mentioned above will not magically become climate models – they don’t have the relevant inputs, but even were they given them, there isn’t sufficient training data to constrain the impact they will have if they change.
Neither will generative AI come to the rescue and magically tell us how climate change will happen and be prevented – well, they will tell us, but it will either be the regurgitation of knowledge already understood, or simply made up. And at enormous cost [Please do not ask ChatGPT for anything technical, and certainly don’t bother asking for references***]. There are potential uses for this technology – converting casual requests into specific information demands and building the code on the fly to extract relevant data for instance. But the notion that these tools will write better proposals, do real science, and write the ensuing papers is the stuff of nightmares – and were this to start to be commonplace would lead to the collapse of both the grant funding apparatus and scientific publishing. I expect science agencies to start requiring ‘no AI was used to write this content’ certifications perhaps as soon as this year.
I guess that one might imagine a single effort learning from an all-encompassing data set – all the CMIP models, the km-scale models, the reanalyses, the observations, the paleo-climate data, with internal constraints based on physics etc. – literally all the knowledge we have, and indeed maybe that could work. I won’t hold my breath.
To summarise, most of the near-term results using ML will be in areas where the ML allows us to tackle big data type problems more efficiently than we could do before. This will lead to more skillful models, and perhaps better predictions, and allow us to increase resolution and detail faster than expected. Real progress will not be as fast as some of the more breathless commentaries have suggested, but progress will be real.
Vive la evolution!
*To get a sense of the history, it’s interesting to read the assessment of AI research in the early 1970s by Sir James Lighthill** – it was pretty damning, and pointed out the huge gap between promise and actuality at that time. Progress has been enormous since then (for instance in machine translation), mostly based on pattern recognition drawn from large datasets, as opposed to coding for rules, which needed huge increases in computer power to realize.
**As an aside, I knew Sir James briefly when I was doing my PhD. He was notorious for sleeping through seminars, often snoring loudly, and then asking very astute questions at the end – a skill I still aspire to.
***I’ve had a number of people email me for input, advice etc. introduce themselves by saying that a paper I wrote (which simply doesn’t exist) was very influential. Please don’t do that.
References
- R. Lam, A. Sanchez-Gonzalez, M. Willson, P. Wirnsberger, M. Fortunato, F. Alet, S. Ravuri, T. Ewalds, Z. Eaton-Rosen, W. Hu, A. Merose, S. Hoyer, G. Holland, O. Vinyals, J. Stott, A. Pritzel, S. Mohamed, and P. Battaglia, "Learning skillful medium-range global weather forecasting", Science, vol. 382, pp. 1416-1421, 2023. http://dx.doi.org/10.1126/science.adi2336
- D. Kochkov, J. Yuval, I. Langmore, P. Norgaard, J. Smith, G. Mooers, M. Klöwer, J. Lottes, S. Rasp, P. Düben, S. Hatfield, P. Battaglia, A. Sanchez-Gonzalez, M. Willson, M.P. Brenner, and S. Hoyer, "Neural general circulation models for weather and climate", Nature, vol. 632, pp. 1060-1066, 2024. http://dx.doi.org/10.1038/s41586-024-07744-y
- I. Price, A. Sanchez-Gonzalez, F. Alet, T.R. Andersson, A. El-Kadi, D. Masters, T. Ewalds, J. Stott, S. Mohamed, P. Battaglia, R. Lam, and M. Willson, "Probabilistic weather forecasting with machine learning", Nature, vol. 637, pp. 84-90, 2024. http://dx.doi.org/10.1038/s41586-024-08252-9
- G. Elsaesser, M.V. Walqui, Q. Yang, M. Kelley, A.S. Ackerman, A. Fridlind, G. Cesana, G.A. Schmidt, J. Wu, A. Behrangi, S.J. Camargo, B. De, K. Inoue, N. Leitmann-Niimi, and J.D. Strong, "Using Machine Learning to Generate a GISS ModelE Calibrated Physics Ensemble (CPE)", 2024. http://dx.doi.org/10.22541/essoar.172745119.96698579/v1
- D. Watson‐Parris, Y. Rao, D. Olivié, . Seland, P. Nowack, G. Camps‐Valls, P. Stier, S. Bouabid, M. Dewey, E. Fons, J. Gonzalez, P. Harder, K. Jeggle, J. Lenhardt, P. Manshausen, M. Novitasari, L. Ricard, and C. Roesch, "ClimateBench v1.0: A Benchmark for Data‐Driven Climate Projections", Journal of Advances in Modeling Earth Systems, vol. 14, 2022. http://dx.doi.org/10.1029/2021MS002954
AlanJ says
Hi Gavin, this subject is fascinating to me and the potential seems huge. How can I become involved in this area of research? I hold an MS in earth science with a paleoclimate focus and currently work as a data engineer, with ML experience, so it seems like a skillset that could contribute meaningfully. Are there groups open to volunteer contributions?