After reading about Steve Goddard’s Snowjob, I thought I’d take a much need break from my current research topic (time-of-observation bias) to take a look at North American Snow Cover Extent (NA-SCE). One of Goddard’s posts discussed results from Frei and Gong 2005 (FG hereafter). This paper compared observed NA-SCE to AR4 climate model simulation results. Because different models have different spatial resolution, the total North American land area (km2) would be different from model to model so the authors normalized the snow cover relative to the total land area of North America. NA-SCE were only available for 11 models when this paper was published. I checked CMIP3 and there are now 16 models with 59 runs. FG defined NA-SCE as the fraction of land area bounded by 190°-340° E and 20°-90° N covered with snow. By this definition, Greenland is included. Figure 1 shows the exact spatial extent.
Figure 1. North American continental land mass plus Greenland. The legend is the fraction of the gridcell that contains land. The figure is based on a land mask created by Carl Mears at Remote Sensing Systems (RSS).
To process the original gridded data, the first step is to calcluate the normalized area weights for a given model’s grid. These weights are then multiplied by the model’s land mask to zero out the grid points in the ocean. Unfortunately, MRI CGCM 2.3.2a didn’t have a land mask available. I created one by interpolating the same land mask shown in figure 1 into MRI’s spatial resolution. The next step is to take the subset of the masked area weights on the defined latitude/longitude bounds and sum up all the values. This represents the total land area of North America including Greenland as a fraction of the Earth’s surface area. To calculate the fractional NA-SCE, for each time step, I loaded the gridded data one time step at a time, zeroed out the missing values, multiplied by the land mask and the area weights. I took the sum upon the defined latitude/longitude extent. The resulting sum represent the fraction of the Earth’s surface covered in snow. After processing all the time steps, I divided the resulting time series by the total fractional land area to convert the units into fractional land area coverage of North America + Greenland. To check to see how well I emulated the process, I consulted a CSV file posted by Zeke Hausfather he obtained from Frei containing January NA-SCE for nine models running 20C3M/A1B. I took the difference between my and Frei’s series and found pretty good agreement to an extent. Figure 2 shows a comparison with GISS AOM.
Figure 2. Comparison of GISS AOM over the 20C3M and A1B periods.
My numbers underestimate the snow extent relative to Frei/Gong’s numbers by about 4%. Shortly into the A1B period, there is significant divergance. The year-to-year flucuations in January snow extent are similar. I don’t see any obvious reasons why both calculations would show similar variability, yet differ in trend during the A1B period. The fact that the divergance segment is declining means that my series is declining faster than Frei/Gong’s which makes me think that their post-20C3M series isn’t running A1B. The next model is GISS EH (figure 3).
Figure 3. Comparison of GISS EH over the 20C3M and A1B periods.
The year-to-year variations are somewhat similar, but too different for comfort. Looking at FG 2005, I see that they used 5 20C3M runs and 4 A1B runs. The data I got from CMIP3 indicates that there’re only 3 A1B runs. The folder marked run4 is empty. I checked the CMIP3 errata page and found no indication that GISS withdrew the data. I began to think that they didn’t combine the 20C3M-A1B runs the same way that I did. I combine then by only using 20C3M runs that have a corresponding A1B run. To see if I can improve the comparison, I took the average of the 20C3M and A1B runs and concatenated them. Figure 4 shows the resulting time series.
Figure 4. Comparison of the average of all (5) GISS EH 20C3M runs + the average of all (3) A1B runs.
Now the year-to-year variations are much more similar. My well-trained eye noticed that it appeared one series was lagging behind the other. Figure 5 shows the resulting differences when I lag my series by one year (one unit since the data represent January temperatures).
Figure 5. Comparison of the average of all (5) GISS EH 20C3M runs + the average of all (3) A1B runs lagged by one year.
Now the comparison has greatly improved. The difference is still about 2%, but error has much less variance. I check the CSV file and found FS’s GISS EH figures begin in 1881 when the series actually starts in 1880. This issue of the FG’s data beginning one year later than it should will be further discussed when we get to the NCAR model. As an aside, FG’s way of combining ensemble means from two different scenarios with a different number of runs is a bit problematic. If you have more runs from 1880-1999 than you have for 2000-2099, then you can expect some changes in variance. The more runs that go into an ensemble mean, the more noise gets smoothed out. The less runs that go into an ensemble mean, the less noise gets smoothed out. We find the same issue of creating an ensemble mean with dissimilar numbers of runs with GISS ER (figure 6).
Figure 6. Comparison of GISS ER over the 20C3M and A1B periods.
The difference looks similar to GISS EH. Now let’s look at the difference when we combine the ensemble mean of 20C3M with the ensemble mean of A1B.
Figure 7. Comparison of the average of all (5) GISS ER 20C3M runs + the average of all (3) A1B runs .
We’re not getting the same level of agreement over the whole period as we did for GISS EH. The A1B divergence makes me think that the FG data does not represent the A1B scenario. Figure 8 shows the comparison with IAP FGOALS.
Figure 8. Comparison of IAP FGOALS over the 20C3M and A1B periods.
The agreement is superb with an error of less than 1%! Figure 9 shows the comparison with INM CM 3.
Figure 9. Comparison of INM CM3 over the 20C3M and A1B periods.
My figures are about 8% too low on average. There’s no apparent divergence. Figures 10 and 11 show the MIROC 3.2 HIRES/MEDRES comparison.
Figure 10. Comparison of MIROC 3.2 HIRES over the 20C3M and A1B periods.
Figure 11. Comparison of MIROC 3.2 MEDRES over the 20C3M and A1B periods.
The agreement is very impressive. The median error is less than 1%. However, like previous models, there is divergence in the post-20C3M period.
Figure 12. Comparison of NCAR CCSM 3.0 over the 20C3M and A1B periods.
My well-trained eyeball told me that this model may also have a lag issue. Figure 13 shows the series with my data lagged by one unit.
Figure 13. Comparison of NCAR CCSM 3.0 over the 20C3M and A1B periods lagged by one year.
The agreement significantly improves. This model succumbed to the same error as GISS EH: The starting year was wrong. What explains the fact that lagging one series improves the 20C3M agreement? NCAR’s models represent the time variable as ‘days since 0000-1-1’. A common program used to convert this into year/month is UDUNITS. When I first began analysing cilmate data in R, I wrote a program to do this conversion. I then found out that UDUNITS had been ported to R. I compared results between the two programs and found that they give different answers when the reference time is year 0. UDUNITS tells me that NCAR’s 20C3M run begins in January 1871 while my original R program (and a newer Fortran 90 version) tells me that it begins in January 1870. The problem arises because UDUNITS assumes the year 1 C.E. immediately follows the year 1. B.C.E. In other words there is no year zero. Year 0 is interpreted as 1 CE so the year will be one too many. I think that Frei or Gong (whoever wrote the software to process the data) didn’t know this, so their data for NCAR is off by one year. Not a big deal, but it’s something to be weary mindful of if you’re going to use this package. But this doesn’t explain why GISS EH was off by one year. Some of the error is due to the fact that FG used six 20C3M runs and 4 A1B runs while I used seven 20C3M runs and 6 A1B runs.
To compare this data to observations, I’ve chosen Rutgers’ monthly area of snow extent data. To make this comparable to the model data, I used a land mask from Remote Sensing Systems to get the total area for North America + Greenland on the same latitude/longitude extents used for the model data. This turns out to be 23,192,196 km2. I had contacted someone at the Rutgers Global Snow Lab over the weekend about the latitude/longitude extent that one should use to make model data comparable to their data products. I was told (after I jumped the gun and processed the data) that the extent begins at ‘north of roughly 18 degees N’. I recalculated the area for this lower latitude bound and found it differed from the figure above about 1.8%.
Before I began the comparisons I chose to eliminate IPSL CM 4 from the analysis. It’s normalize snow cover is on the order of 10-4 which is certainly unrealistic. I double checked with the original gridded data and there’s something wrong. Either the program that copied the data to netCDF improperly scaled it or the model is seriously deficient. When I scale the data by 1,000 it looks realistic. Since I don’t know for sure what the problem is, it is safer to just exclude it. I’ve emailed an inquiry to the modelling group and haven’t heard back yet. Figures 14 and 15 show the annual averages of the ensemble means for the 20C3M and 20C3M-A1B periods.
Figure 14. Annual average ensemble mean for 20C3M runs.
Figure 15. Annual average ensemble mean for 20C3M-A1B runs.
As you can see there is very large inter-model variability. MIROC 3.2 HIRES appears to drop to zero towards the end of the series. The last few years of the gridded data are all zeros. Since FG’s numbers for this period are non-zero, this adds some plausibility to their using something other than A1B. CCCMA CGCM 3.1 T63 and CSIRO MK 3.5 show the closest agreement to the Rutgers data. How do they fare in terms of trends? Figures 16 and 17 show the trends over the 20C3M and 20C3M-A1B periods.
Figure 16. Trends in annual snow area extent from 1967-1999.
Figure 17. Trends in annual snow area extent from 1967-2009.
The regression indicates that snow area extent has been declining by -0.94 ± 0.70 %/decade from 1967-1999. Over the 1967-2009 period, the decline was -0.48 ± 0.48 %/decade. Applying the same statistical tests as were used in Santer e al. 2008, the proportion of models rejecting at 5% significance over the 1967-1999 period was 1/3 and 0 for the 1967-2009 period. The test for the multi-model ensemble mean trend rejected (p = 0.014) at 5% significance for the 1967-1999 period but failed to reject (p = 0.889) for the 1967-2009 period.
(Paragraph I never bothered pasting from my draft. Thanks Carrot Eater.)
Despite the hypothesis tests showing some model consistency with observations, shouldn’t the large spread in model-generated values be of some concern? Most model/observation data comparisons we’re familiar with involve temperature. Lucia has shown that there’s significant spread in global mean surface temperature in model simulations. While this spread may generate some reasonable concern, having one model with less than 1/3 of NA covered in snow and another model showing almost 3/4 covered in snow should generate a lot of concern. If a model gets the NA-SCE wrong, there must be a whole host of physical processes that are affected by that fact that there’s either too little or too much snow cover (think: large albedo errors).
Frei, A., and G. Gong (2005), Decadal to century scale trends in North American snow extent in coupled atmosphere-ocean general circulation models, Geophys. Res. Lett., 32, L18502, doi:10.1029/2005GL023394.
Santer, B.D., P.W. Thorne, L. Haimberger, K.E. Taylor, T.M.L. Wigley, J.R. Lanzante, S. Solomon, M. Free, P.J. Gleckler, P.D. Jones, T.R. Karl, S.A. Klein, C. Mears, D. Nychka, G.A. Schmidt, S.C. Sherwood, and F.J. Wentz (2008), Consistency of modelled and observed temperature trends in the tropical troposphere, Intl. J. Climatol., 28, 1703-1722, doi:10.1002/joc.1756.