Tuesday, August 20, 2013

Benchmarking and assessment workshop

Cross-posted from the benchmarking blog.

The workshop agenda and full report can be found here

Below is the executive summary.

1st – 3rd July 2013 Benchmarking Working Group Workshop Report Executive Summary National Climatic Data Center (NCDC) of the National Oceanic and Atmospheric Administration (NOAA), Asheville, NC, USA

Attended in person:
Kate Willett (UK), Matt Menne (USA), Claude Williams (USA), Robert Lund (USA), Enric Aguilar (Spain), Colin Gallagher (USA), Zeke Hausfather (USA), Peter Thorne (USA), Jared Rennie (USA)

Attended by phone:
Ian Jolliffe (UK), Lisa Alexander (Australia), Stefan Brönniman (Switzerland), Lucie A. Vincent (Canada), Victor Venema (Germany), Renate Auchmann (Switzerland), Thordis Thorarinsdottir (Norway), Robert Dunn (UK), David Parker (UK)

A three day workshop was held to bring together some members of the ISTI Benchmarking working group with the aim of making significant progress towards the creation and dissemination of a homogenisation algorithm benchmark system. Specifically, we hoped to have: the method for creating the analog-clean-worlds finalised; the error-model worlds defined and a plan of how to develop these; and the concepts for assessment finalised including a decision on what data/statistics to ask users to return. This was an ambitious plan for three days with numerous issues and big decisions still to be tackled.

The complexity of much of the discussion throughout the three days really highlighted the value of this face-to-face meeting. It was important to take time to ensure that everyone understood and had come to the same conclusion. This was aided by whiteboard illustrations and software exploration, which would not have been possible over a teleconference.

In overview, we made significant progress in terms of developing and converging on concepts and important decisions. We did not complete the work of Team Creation as hoped, but necessary exploration of the existing methods was undertaken revealing significant weaknesses and ideas for new avenues to explore have been found.

The blind and open error-worlds concepts are 95% complete and progress was made on the specifics of the changepoint statistics for each world. Important decisions were also made regarding missing data, length of record and changepoint location frequency. Seasonal cycles were discussed at length and more research has been actioned. A significant first go was made at designing a build methodology for the error-models with some coding examples worked through and different probability distributions explored.

We converged on what we would like to receive from benchmark users for the assessment and worked through some examples of aggregating station results over regions. We will assess both retrieval of trends and climate characteristics in addition to ability to detect changepoints. Contingency tables of some form will also be used. We also hope to have some online or assessment software available so that users can make their own assessment of the open worlds and past versions of benchmarks. We plan to collaborate with the VALUE downscaling validation project where possible.

From an intense three days all participants and teleconference participants gained a better understanding of what we're trying to achieve and how we are going to get there. This was a highly valuable three days, not least through its effect of focussing our attention prior to the meeting and motivating further collaborative work after the meeting. Two new members have agreed to join the effort and their expertise is a fantastic contribution to the project.

Specifically, Kate and Robert are to work on their respective methods for Team Creation, utilising GCM data and the vector autoregressive method. This will result in a publication describing the methodology. We aim to finalise this work in August.

Follow on teleconferences, Team Corruption will focus on completing the distribution specifications and building the probability model to allocate station changepoints. This work is planned for completion by October 2013. Release of the benchmarks is scheduled for November 2013.


Team Validation will continue to develop the specific assessment tests and work these into a software package that can be easily implemented. This work is hoped to be completed by December 2013, but there is more time available as assessment will take place at least 1 year after benchmark release.