Climate Science

Release of the International Surface Temperature Initiative’s (ISTI’s) Global Land Surface Databank, an expanded set of fundamental surface temperature records

6 Jul 2014 by rasmus

Guest post by Jared Rennie, Cooperative Institute for Climate and Satellites, North Carolina on behalf of the databank working group of the International Surface Temperature Initiative

In the 21st Century, when multi-billion dollar decisions are being made to mitigate and adapt to climate change, society rightly expects openness and transparency in climate science to enable a greater understanding of how climate has changed and how it will continue to change. Arguably the very foundation of our understanding is the observational record. Today a new set of fundamental holdings of land surface air temperature records stretching back deep into the 19th Century has been released as a result of several years of effort by a multinational group of scientists.

The International Surface Temperature Initiative (ISTI) was launched by an international and multi-disciplinary group of scientists in 2010 to improve understanding of the Earth’s climate from the global to local scale. The Databank Working Group, under the leadership of NOAA’s National Climatic Data Center (NCDC), has produced an innovative data holding that largely leverages off existing data sources, but also incorporates many previously unavailable sources of surface air temperature. This data holding provides users a way to better track the origin of the data from its collection through its integration. By providing the data in various stages that lead to the integrated product, by including data origin tracking flags with information on each observation, and by providing the software used to process all observations, the processes involved in creating the observed fundamental climate record are completely open and transparent to the extent humanly possible.

Databank Architecture

The databank includes six data Stages, starting from the original observation to the final quality controlled and bias corrected product (Figure 1). The databank begins at Stage Zero holdings, which contain scanned images of digital observations in their original form. These images are hosted on the databank server when third party hosting is not possible. Stage One contains digitized data, in its native format, provided by the contributor. No effort is required on their part to convert the data into any other format. This reduces the possibility that errors could occur during translation. We collated over 50 sources ranging from single station records to holdings of several tens of thousands of stations.

Once data are submitted as Stage One, all data are converted into a common Stage Two format. In addition, data provenance flags are added to every observation to provide a history of that particular observation. Stage Two files are maintained in ASCII format, and the code to convert all the sources is provided. After collection and conversion to a common format, the data are then merged into a single, comprehensive Stage Three dataset. The algorithm that performs the merging is described below. Development of the merged dataset is followed by quality control and homogeneity adjustments (Stage Four and Five, respectively). These last two stages are not the responsibility of Databank Working Group, see the discussion of broader context below.

Merge Algorithm Description

The following is an overview of the process in which individual Stage Two sources are combined to form a comprehensive Stage Three dataset. A more detailed description can be found in a manuscript accepted and published by Geoscience Data Journal (Rennie et al., 2014).

The algorithm attempts to mimic the decisions an expert analyst would make manually. Given the fractured nature of historical data stewardship many sources will inevitably contain records for the same station and it is necessary to create a process for identifying and removing duplicate stations, merging some sources to produce a longer station record, and in other cases determining when a station should be brought in as a new distinct record.

The merge process is accomplished in an iterative fashion, starting from the highest priority data source (target) and running progressively through the other sources (candidates). A source hierarchy has been established which prioritizes datasets that have better data provenance, extensive metadata, and long, consistent periods of record. In addition it prioritizes holdings derived from daily data to allow consistency between daily holdings and monthly holdings. Every candidate station read in is compared to all target stations, and one of three possible decisions is made. First, when a station match is found, the candidate station is merged with the target station. Second, if the candidate station is determined to be unique it is added to the target dataset as a new station. Third, the available information is insufficient, conflicting, or ambiguous, and the candidate station is withheld.

Stations are first compared through their metadata to identify matching stations. Four tests are applied: geographic distance, height distance, station name similarity, and when the data record began. Non-missing metrics are then combined to create a metadata metric and it is determined whether to move on to data comparisons, or to withhold the candidate station. If a data comparison is deemed necessary, overlapping data between the target and candidate station is tested for goodness-of-fit using the Index of Agreement (IA). At least five years of overlap are required for a comparison to be made. A lookup table is used to provide two data metrics, the probability of station match (H1) and the probability of station uniqueness (H2). These are then combined with the metadata metric to create posterior metrics of station match and uniqueness. These are used to determine if the station is merged, added as unique, or withheld.

Stage Three Dataset Description

The integrated data holding recommended and endorsed by ISTI contains over 32,000 global stations (Figure 2), over four times as many stations as GHCN-M version 3. Although station coverage varies spatially and temporally, there are adequate stations with decadal and century periods of record at local, regional, and global scales. Since 1850, there consistently are more stations in the recommended merge than GHCN-M (Figure 3). In GHCN-M version 3, there was a significant drop in stations in 1990 reflecting the dependency on the decadal World Weather Records collection as a source, which is ameliorated by many of the new sources which can be updated much more rapidly and will enable better real-time monitoring.

Many thresholds are used in the merge and can be set by the user before running the merge program. Changing these thresholds can significantly alter the overall result of the program. Changes will also occur when the source priority hierarchy is altered. In order to characterize the uncertainty associated with the merge parameters, seven different variants of the Stage Three product were developed alongside the recommended merge. This uncertainty reflects the importance of data rescue. While a major effort has been undertaken through this initiative, more can be done to include areas that are lacking on both spatial and temporal scales, or lacking maximum and minimum temperature data.

Data Access

Version 1.0.0 of the Global Land Surface Databank has been released and data are provided from a primary ftp site hosted by the Global Observing Systems Information Center (GOSIC) and World Data Center A at NOAA NCDC. The Stage Three dataset has multiple formats, including a format approved by ISTI, a format similar to GHCN-M, and netCDF files adhering to the Climate and Forecast (CF) convention. The data holding is version controlled and will be updated frequently in response to newly discovered data sources and user comments.

All processing code is provided, for openness and transparency. Users are encouraged to experiment with the techniques used in these algorithms. The programs are designed to be modular, so that individuals have the option to develop and implement other methods that may be more robust than described here. We will remain open to releases of new versions should such techniques be constructed and verified.

ISTI’s online directory provides further details on the merging process and other aspects associated with the full development of the databank as well as all of the data and processing code.

We are always looking to increase the completeness and provenance of the holdings. Data submissions are always welcome and strongly encouraged. If you have a lead on a new data source, please contact data.submission@surfacetemperatures.org with any information which may be useful.

The broader context

It is important to stress that the databank is a release of fundamental data holdings – holdings which contain myriad non-climatic artefacts arising from instrument changes, siting changes, time of observation changes etc. To gain maximum value from these improved holdings it is imperative that as a global community we now analyze them in multiple distinct ways to ascertain better estimates of the true evolution of surface temperatures locally, regionally, and globally. Interested analysts are strongly encouraged to develop innovative approaches to the problem.

To help ascertain what works and what doesn’t the benchmarking working group are developing and will soon release a set of analogs to the databank. These will share the space and time sampling of the holdings but contain a set of known (to the originators) data issues that require removing. When analysts apply their methods to the analogs we can infer something meaningful about their methods. Further details are available in a discussion paper under peer review [Willett et al., submitted].

More Information

www.surfacetemperatures.org
ftp://ftp.ncdc.noaa.gov/pub/data/globaldatabank

References
Rennie, J.J. and coauthors, 2014, The International Surface Temperature Initiative Global Land Surface Databank: Monthly Temperature Data Version 1 Release Description and Methods. Accepted, Geoscience Data Journal.

Willett, K. M. et al., submitted, Concepts for benchmarking of homogenisation algorithm performance on the global scale. http://www.geosci-instrum-method-data-syst-discuss.net/4/235/2014/gid-4-235-2014.html

Unforced variations: July 2014

2 Jul 2014 by group

This month’s open thread. Topics of potential interest: The successful OCO-2 launch, continuing likelihood of an El Niño event this fall, predictions of the September Arctic sea ice minimum, Antarctic sea ice excursions, stochastic elements in climate models etc. Just for a change, no discussion of mitigation efforts please!

Unforced variations: June 2014

1 Jun 2014 by group

June is the month when the Arctic Sea Ice outlook gets going, when the EPA releases its rules on power plant CO2 emissions, and when, hopefully, commenters can get back to actually having constructive and respectful conversations about climate science (and not nuclear energy, impending apocalypsi (pl) or how terrible everyone else is). Thanks.

El Niño or Bust

8 May 2014 by mike

Guest commentary from Michelle L’Heureux, NOAA Climate Prediction Center

Much media attention has been directed at the possibility of an El Niño brewing this year. Many outlets have drawn comparison with the 1997-98 super El Niño. So, what are the odds that El Niño will occur? And if it does, how strong will it be?

To track El Niño, meteorologists at the NOAA/NWS Climate Prediction Center (CPC) release weekly and monthly updates on the status of the El Niño-Southern Oscillation (ENSO). The International Research Institute (IRI) for Climate and Society partner with us on the monthly ENSO release and are also collaborators on a brand new “ENSO blog” which is part of www.climate.gov (co-sponsored by the NOAA Climate Programs Office).

Blogging ENSO is a first for operational ENSO forecasters, and we hope that it gives us another way to both inform and interact with our users on ENSO predictions and impacts. In addition, we will collaborate with other scientists to profile interesting ENSO research and delve into the societal dimensions of ENSO.

As far back as November 2013, the CPC and the IRI have predicted an elevated chance of El Niño (relative to historical chance or climatology) based on a combination of model predictions and general trends over the tropical Pacific Ocean. Once the chance of El Niño reached 50% in March 2014, an El Niño Watch was issued to alert the public that conditions are more favorable for the development of El Niño.

Current forecasts for the Nino-3.4 SST index (as of 5 May 2014) from the NCEP Climate Forecast System version 2 model

More recently, on May 8th, the CPC/IRI ENSO team increased the chance that El Niño will develop, with a peak probability of ~80% during the late fall/early winter of this year. El Nino onset is currently favored sometime in the early summer (May-June-July). At this point, the team remains non-committal on the possible strength of El Niño preferring to watch the system for at least another month or more before trying to infer the intensity. But, could we get a super strong event? The range of possibilities implied by some models allude to such an outcome, but at this point the uncertainty is just too high. While subsurface heat content levels are well above average (March was the highest for that month since 1979 and April was the second highest), ENSO prediction relies on many other variables and factors. We also remain in the spring prediction barrier, which is a more uncertain time to be making ENSO predictions.

Could El Niño predictions fizzle? Yes, there is roughly a 2 in 10 chance at this point that this could happen. It happened in 2012 when an El Nino Watch was issued, chances became as high as 75% and El Niño never formed. Such is the nature of seasonal climate forecasting when there is enough forecast uncertainty that “busts” can and do occur. In fact, more strictly, if the forecast probabilities are “reliable,” an event with an 80% chance of occurring should only occur 80% of the time over a long historical record. Therefore, 20% of the time the event must NOT occur (click here for a description of verification techniques).

While folks might prefer total certainty in our forecasts, we live in an uncertain world. El Niño is most likely to occur this year, so please stay attentive to the various updates linked above and please visit our brand new ENSO blog.

Unforced variations: May 2014

2 May 2014 by group

This month’s open thread. In order to give everyone a break, no discussion of mitigation options this month – that has been done to death in previous threads. Anything related to climate science is totally fine: Carbon dioxide levels maybe, or TED talks perhaps…

Faking it

30 Apr 2014 by Gavin

Every so often contrarians post old newspaper quotes with the implication that nothing being talked about now is unprecedented or even unusual. And frankly, there are lots of old articles that get things wrong, are sensationalist or made predictions without a solid basis. And those are just the articles about the economy.

However, there are plenty of science articles that are just interesting, reporting events and explorations in the Arctic and elsewhere that give a fascinating view into how early scientists were coming to an understanding about climate change and processes. In particular, in the Atlantic sector of the Arctic the summer of 1922 was (for the time) quite warm, and there were a number of reports that discussed some unprecedented (again, for the time) observations of open water. The most detailed report was in the Monthly Weather Review:
[Read more…] about Faking it

Nenana Ice Classic: Update

25 Apr 2014 by Gavin

Somewhat randomly, my thoughts turned to the Nenana Ice Classic this evening, only to find that the ice break up had only just occurred (3:48 pm Alaskan Standard Time, April 25). This is quite early (the 7th earliest date, regardless of details associated with the vernal equinox or leap year issues), though perhaps unsurprising after the warm Alaskan winter this year (8th warmest on record). This is in strong contrast to the very late break up last year.

Break up dates accounting for leap years and variations in the vernal equinox.

As mentioned in my recent post, the Nenana break up date is a good indicator of Alaskan regional temperatures and despite last year’s late anomaly, the trends are very much towards a earlier spring. This is also true for trends in temperatures and ice break up mostly everywhere else too, despite individual years (like 2013/2014) being anomalously cold (for instance in the Great Lakes region). As we’ve often stressed, it is the trends that are important for judging climate change, not the individual years. Nonetheless, odds on dates as early as this years have more than doubled over the last century.

Labels for climate data

24 Apr 2014 by rasmus

“These results are quite strange”, my colleague told me. He analysed some of the recent climate model results from an experiment known by the cryptic name ‘CMIP5‘. It turned out that the results were ok, but we had made an error when reading and processing the model output. The particular climate model that initially gave the strange results had used a different calendar set-up to the previous models we had examined.

Mitigation of Climate Change – Part 3 of the new IPCC report

17 Apr 2014 by Stefan

Guest post by Brigitte Knopf

Global emissions continue to rise further and this is in the first place due to economic growth and to a lesser extent to population growth. To achieve climate protection, fossil power generation without CCS has to be phased out almost entirely by the end of the century. The mitigation of climate change constitutes a major technological and institutional challenge. But: It does not cost the world to save the planet.

This is how the new report was summarized by Ottmar Edenhofer, Co-Chair of Working Group III of the IPCC, whose report was adopted on 12 April 2014 in Berlin after intense debates with governments. The report consists of 16 chapters with more than 2000 pages. It was written by 235 authors from 58 countries and reviewed externally by 900 experts. Most prominent in public is the 33-page Summary for Policymakers (SPM) that was approved by all 193 countries. At a first glance, the above summary does not sound spectacular but more like a truism that we’ve often heard over the years. But this report indeed has something new to offer.

The 2-degree limit

Shindell: On constraining the Transient Climate Response

8 Apr 2014 by group

Guest commentary from Drew Shindell

There has been a lot of discussion of my recent paper in Nature Climate Change (Shindell, 2014). That study addressed a puzzle, namely that recent studies using the observed changes in Earth’s surface temperature suggested climate sensitivity is likely towards the lower end of the estimated range. However, studies evaluating model performance on key observed processes and paleoclimate evidence suggest that the higher end of sensitivity is more likely, partially conflicting with the studies based on the recent transient observed warming. The new study shows that climate sensitivity to historical changes in the abundance of aerosol particles in the atmosphere is larger than the sensitivity to CO₂, primarily because the aerosols are largely located near industrialized areas in the Northern Hemisphere middle and high latitudes where they trigger more rapid land responses and strong snow & ice feedbacks. Therefore studies based on observed warming have underestimated climate sensitivity as they did not account for the greater response to aerosol forcing, and multiple lines of evidence are now consistent in showing that climate sensitivity is in fact very unlikely to be at the low end of the range in recent estimates.
[Read more…] about Shindell: On constraining the Transient Climate Response

References

D.T. Shindell, "Inhomogeneous forcing and transient climate sensitivity", Nature Climate Change, vol. 4, pp. 274-277, 2014. http://dx.doi.org/10.1038/nclimate2136

Climate Science

Release of the International Surface Temperature Initiative’s (ISTI’s) Global Land Surface Databank, an expanded set of fundamental surface temperature records

Unforced variations: July 2014

Unforced variations: June 2014

El Niño or Bust

Unforced variations: May 2014

Faking it

Nenana Ice Classic: Update

Labels for climate data

Mitigation of Climate Change – Part 3 of the new IPCC report

Shindell: On constraining the Transient Climate Response

References

ABOUT

DATA AND GRAPHICS

INDEX

Realclimate Stats

Climate Science

References

Footer

ABOUT

DATA AND GRAPHICS

INDEX

Realclimate Stats