The Forest has Burned Down

My numerous threats to make a comeback saw no follow through. I have done a lot of work since late last year, but I haven’t been able to focus on one thing long enough to create something worthy of a post. I found myself reading someone else’s blog which would motivate me to start working on something new, until I read someone else’s blog … Considering all the things I’ve worked on, I should have at least a dozen interesting posts. I spent a lot of time writing Fortran/R code for a radiative transfer model to take a new look at the model-derived MSU data. I found some issues that those concerned with the model/data comparisons would be very interested in. I’m currently working on an R package to organize my climate data-related functions. Maybe it will see the light of day, maybe not.

My plan is to make a fresh start, someday soon. Thanks to everyone who provided useful, critical and inspiring comments to further along my analysis. You’ll have a chance to do that again soon. But for now, the Forest has officially burned down.

Posted in Uncategorized | 3 Comments

Almost back!

I’m keeping my old external drive warm downloading fresh data from PCMDI. I hope I can fit everything I need into 500 GB with room to spare! I’m going to reprocess the atmospheric temperature and surface temperature data into a more accurately determined synthetic brightness temperature. Using the global-mean static weighting function was fine for emulating Santer et al., but I’d like to explore what differences arise from using other methods. This time around, I’ll save the synthetic MSU gridded data into a netCDF file instead of immediately computing spatial averages. I’m also downloading data for the pre-industrial control runs. This will allow me to diagnose the effect of unforced model variability on the perturbed simulations (20C3M, A1B, etc.) to aid in extracting the true forced temperature change. I’m hopeful I’ll have enough disk space left because any gridded products I create will be saved in netCDF4 format which uses HDF5 compression. For most of the day, I can get about 200 kb/s but sometimes I get as high as 2 Mb/s. I’ve been disconnecting/reconnecting to get it closer to 1-2 Mb/s when it dips too low. I’m shooting for the end of today.

I’m writing this post from my new system. It’s got an AMD Athlon II X4 635 Processor with 6 GB of memory and a fast graphics card running Ubuntu Lucid Lynx. It’s a major step from the 5+ year old, slug-paced piece of garbage I was forced to pass off as a computer. Webpages load lightning fast. Software compiles much faster. I’ve gotten all of my libraries build (Lapack, netCDF, etc.) I’m currently trying to interface Fortran with udunits2. I’ll be spending much time learning C because there are a lot of data types and constructions that go way over my head. If this interface turns out to be a fool’s errand, I may break down and write most of my programs in R and load compiled Fortran code to take down the more numerically intensive tasks.

Only 20 GB of data left to download!

Posted in Uncategorized | 8 Comments

Not Blogging Lately Part 2

The forest has been quiet for quite some time now. I’ve been very busy working on multiple climate-related projects. You’ll read about one pretty soon. My lack of blogging is also because I’ve lost some interest and inspiration for topics to cover. I’ve done a lot of reading on data homogenization so my next few posts will hopefully be on that topic. Much of the work that I’ve done in the past few weeks is thanks to my half-hearted migration to Ubuntu Linux. I have to say that writing code on a linux machine has been a lot easier in many ways for me than doing it on WinXP using MinGW. I say half-hearted because Ubuntu (specifically X) likes to crash because my video card apparently isn’t entire supported. Although in between crashes, I’ve learned to focus and get as much done before the screen suddenly goes black. Speaking of which, I better hit “Publish” now before the computer cra …

Posted in Uncategorized | 9 Comments

Better Late Than Never

I’m sure many of you have taken notice of my absence. I’ve been very busy working on other projects whose results you may see published sometime in the (hopefully) near future. For the last week or so, I’ve been writing programs to process the GHCN station data into a gridded product. Though this has already been done by many, I’ll show you my results anyhow. I’ll briefly go over the procedure and then discuss some gridding issues that came up while analyzing the data.

Data and Methods

  1. Load the station inventory (v2.temperature.inv) and data (v2.mean) into memory.
  2. Consolidate each station’s duplicate entries (if applicable) by computing a monthly offset for each station such that the sum of the squared differences between each station after the offset is applied is at a minimum (See Tamino and Roman). The offsets align the series which are then averaged together.
  3. Determine which stations are in each gridbox. I initially used a 5° x 5° grid.
  4. Consolidate all the stations in each grid box using the same methodology as in step 2.
  5. Calculate the climatology at each grid point and remove it to get temperature anomalies. The base period is 1961-1990 and I require a minimum of 20 years of non-missing data to calculate a valid climatological mean.
  6. Calculate spatial averages for the global, NH, SH and the tropics.


Normally I would impose a constraint on how much area needs to be represented by non-missing values to calculate a valid spatial average. Unfortunately, when I was writing my program I was more worried about getting it work properly and neglected to include a land mask to determine how much land is present in the four regions that I the calculated averages for. When I calculated the spatial averages, I also made sure to hold on to the summed area represented by the non-missing data to see how it varies with the overall average. First let’s see how my global average compares to the results of Jeff ID/Roman M, Nick Stokes, Zeke and Residual Analysis.
My reconstruction is consistent with the others so that gives me confidence that I didn’t seriously botch the calculations. Let’s look at the global average over the entire time period (1701-2010).
The first 150 years of the series shows much larger variability than the rest of the series. Most likely this is because the sampling was small enough to allow regional or small-scale variability to dominate the ‘global’ average. The area fraction data was of some concern. This is the sum of surface area accounted for by grid boxes with non-missing data normalized to the total amount of global land area. Why is it greater one? I think this is a non-obvious error that everyone who has created a gridded product of GHCN has made. If a grid box contains one or more stations and it is completely occupied by land then weighting it according to the surface area of that grid box is correct. When there is some ocean present then the weight that is applied is too much. Let’s see how this issue affects the other spatial averages.
The haphazard weighting affects all the spatial averages but is the strongest in the tropics. I re-ran the spatial averages using a land mask to properly adjust the area weights and compared the normalized area fraction both ways.
Now the area fraction doesn’t take on physically unrealistic values. Given that many bloggers are now combing the land and ocean data, this issue shouldn’t be as serious, but we still need to know how much land/ocean is in a grid box to properly combine land/ocean anomalies corresponding to the same grid box. This area fraction bias would certainly be reduced by using a finer grid. To test this, I re-ran my program with 2.5° x 2.5° resolution and compared the area fractions.
As expected, using a finer grid reduced the area bias and brought it more in line with the correct figures. What difference does this bias incur in the spatial averages and their trends?
Over the course of the 20th century up to the present, the bias in the global, nh/sh averages is statistically significant thought practically negligible. In the tropics however the bias is fairly large. Here’s the same data and trends over the most recent 30-year period.
Over the recent period, the biases are statistically and practically significant with the most extreme bias occurring in the tropics. The trends are positive and this means that the improperly weighted procedure produces anomalies over this period whose trends are too small. Now see what happens when the spatial averages are calculated on a finer grid.
Over the 20th century up to the present the bias is still of no practical significance.
Over the more recent period the biases are still of some practical significance but not as large. Here are the spatial averages with the correct weighting and their trends.
UPDATE May 23, 2010: The trends below are wrong. The uncertainty is one standard error, not two as I intended.
Here’s the same graph but from the 2.5° x 2.5° data.
In all regions except the tropics, the finer griding brought down the trends. Why? That’ll have to be the topic of another post.
P.S. Does anyone like the new theme?
Update- May 19, 2010
I’ve uploaded the annual averages and the Fortran code ( As per Carrick’s request I’ve created plots of the weighted mean latitude of all non-missing grid cells. Below are three plots covering three different periods: 1701-2010, 1850-2010 and 1950-2010.
Posted in gridded data, Station Data, Surface temperature record | 71 Comments

Theme change and other things

I’ve changed the theme because the old one was getting a bit boring. Unfortunately, this means that I need to resize all of my images because as you can see below they are too big.  I haven’t been blogging lately because I’ve been busy doing some serious Santer-related number crunching.  Still looking for inspiration for my next post. Radiosondes sound good.

Posted in Uncategorized | 3 Comments

Second Thoughts on Methods to Combine Station Data

In my last three posts on station data (here, here and here) I compared several methods for combining station data. I came to some conclusions that on second thought may not be completely founded. The products of each method were compared to the ‘true’ temperature series. The way I calculated the true temperature series may have inadvertently biased the results. I calculated an area-weighted average, which is essentially a simple un-weighted average given that the locations of the ‘station’ data were so close together within a 5° x 5° grid box. It shouldn’t be surprising then that when SAM, FDM and CAM were applied to perfect data, all three methods performed very well. I might have gotten very different results if I had interpolated the MIROC HIRES grid into a 5° x 5° grid and used that single grid point as a reference. I suspect that if I had done that then the RSM wouldn’t have come out looking so bad since it itself is a form of interpolation. This got me thinking that there may not be a straight-forward, unbiased way of determining the pros/cons of these methods. I need to do a lot more thinking about this. Ideas are welcome.

Posted in Uncategorized | 12 Comments

North American Snow Cover

After reading about Steve Goddard’s Snowjob, I thought I’d take a much need break from my current research topic (time-of-observation bias) to take a look at North American Snow Cover Extent (NA-SCE). One of Goddard’s posts discussed results from Frei and Gong 2005 (FG hereafter). This paper compared observed NA-SCE to AR4 climate model simulation results. Because different models have different spatial resolution, the total North American land area (km2) would be different from model to model so the authors normalized the snow cover relative to the total land area of North America. NA-SCE were only available for 11 models when this paper was published. I checked CMIP3 and there are now 16 models with 59 runs. FG defined NA-SCE as the fraction of land area bounded by 190°-340° E and 20°-90° N covered with snow. By this definition, Greenland is included. Figure 1 shows the exact spatial extent.

Figure 1. North American continental land mass plus Greenland. The legend is the fraction of the gridcell that contains land. The figure is based on a land mask created by Carl Mears at Remote Sensing Systems (RSS).

To process the original gridded data, the first step is to calcluate the normalized area weights for a given model’s grid. These weights are then multiplied by the model’s land mask to zero out the grid points in the ocean. Unfortunately, MRI CGCM 2.3.2a didn’t have a land mask available. I created one by interpolating the same land mask shown in figure 1 into MRI’s spatial resolution. The next step is to take the subset of the masked area weights on the defined latitude/longitude bounds and sum up all the values. This represents the total land area of North America including Greenland as a fraction of the Earth’s surface area. To calculate the fractional NA-SCE, for each time step, I loaded the gridded data one time step at a time, zeroed out the missing values, multiplied by the land mask and the area weights. I took the sum upon the defined latitude/longitude extent. The resulting sum represent the fraction of the Earth’s surface covered in snow. After processing all the time steps, I divided the resulting time series by the total fractional land area to convert the units into fractional land area coverage of North America + Greenland. To check to see how well I emulated the process, I consulted a CSV file posted by Zeke Hausfather he obtained from Frei containing January NA-SCE for nine models running 20C3M/A1B. I took the difference between my and Frei’s series and found pretty good agreement to an extent. Figure 2 shows a comparison with GISS AOM.

Figure 2. Comparison of GISS AOM over the 20C3M and A1B periods.

My numbers underestimate the snow extent relative to Frei/Gong’s numbers by about 4%. Shortly into the A1B period, there is significant divergance. The year-to-year flucuations in January snow extent are similar. I don’t see any obvious reasons why both calculations would show similar variability, yet differ in trend during the A1B period. The fact that the divergance segment is declining means that my series is declining faster than Frei/Gong’s which makes me think that their post-20C3M series isn’t running A1B. The next model is GISS EH (figure 3).

Figure 3. Comparison of GISS EH over the 20C3M and A1B periods.

The year-to-year variations are somewhat similar, but too different for comfort. Looking at FG 2005, I see that they used 5 20C3M runs and 4 A1B runs. The data I got from CMIP3 indicates that there’re only 3 A1B runs. The folder marked run4 is empty. I checked the CMIP3 errata page and found no indication that GISS withdrew the data. I began to think that they didn’t combine the 20C3M-A1B runs the same way that I did. I combine then by only using 20C3M runs that have a corresponding A1B run. To see if I can improve the comparison, I took the average of the 20C3M and A1B runs and concatenated them. Figure 4 shows the resulting time series.

Figure 4. Comparison of the average of all (5) GISS EH 20C3M runs + the average of all (3) A1B runs.

Now the year-to-year variations are much more similar. My well-trained eye noticed that it appeared one series was lagging behind the other. Figure 5 shows the resulting differences when I lag my series by one year (one unit since the data represent January temperatures).

Figure 5. Comparison of the average of all (5) GISS EH 20C3M runs + the average of all (3) A1B runs  lagged by one year.

Now the comparison has greatly improved. The difference is still about 2%, but error has much less variance. I check the CSV file and found FS’s GISS EH figures begin in 1881 when the series actually starts in 1880. This issue of the FG’s data beginning one year later than it should will be further discussed when we get to the NCAR model. As an aside, FG’s way of combining ensemble means from two different scenarios with a different number of runs is a bit problematic. If you have more runs from 1880-1999 than you have for 2000-2099, then you can expect some changes in variance. The more runs that go into an ensemble mean, the more noise gets smoothed out. The less runs that go into an ensemble mean, the less noise gets smoothed out. We find the same issue of creating an ensemble mean with dissimilar numbers of runs with GISS ER (figure 6).

Figure 6. Comparison of GISS ER over the 20C3M and A1B periods.

The difference looks similar to GISS EH. Now let’s look at the difference when we combine the ensemble mean of 20C3M with the ensemble mean of A1B.

Figure 7. Comparison of the average of all (5) GISS ER 20C3M runs + the average of all (3) A1B runs .

We’re not getting the same level of agreement over the whole period as we did for GISS EH. The A1B divergence makes me think that the FG data does not represent the A1B scenario. Figure 8 shows the comparison with IAP FGOALS.

Figure 8. Comparison of IAP FGOALS over the 20C3M and A1B periods.

The agreement is superb with an error of less than 1%! Figure 9 shows the comparison with INM CM 3.

Figure 9. Comparison of INM CM3 over the 20C3M and A1B periods.

My figures are about 8% too low on average. There’s no apparent divergence. Figures 10 and 11 show the MIROC 3.2 HIRES/MEDRES comparison.

Figure 10. Comparison of MIROC 3.2 HIRES over the 20C3M and A1B periods.

Figure 11. Comparison of MIROC 3.2 MEDRES over the 20C3M and A1B periods.

The agreement is very impressive. The median error is less than 1%. However, like previous models, there is divergence in the post-20C3M period.

Figure 12. Comparison of NCAR CCSM 3.0 over the 20C3M and A1B periods.

My well-trained eyeball told me that this model may also have a lag issue. Figure 13 shows the series with my data lagged by one unit.

Figure 13. Comparison of NCAR CCSM 3.0 over the 20C3M and A1B periods lagged by one year.

The agreement significantly improves. This model succumbed to the same error as GISS EH: The starting year was wrong. What explains the fact that lagging one series improves the 20C3M agreement? NCAR’s models represent the time variable as ‘days since 0000-1-1’. A common program used to convert this into year/month is UDUNITS. When I first began analysing cilmate data in R, I wrote a program to do this conversion. I then found out that UDUNITS had been ported to R. I compared results between the two programs and found that they give different answers when the reference time is year 0. UDUNITS tells me that NCAR’s 20C3M run begins in January 1871 while my original R program (and a newer Fortran 90 version) tells me that it begins in January 1870. The problem arises because UDUNITS assumes the year 1 C.E. immediately follows the year 1. B.C.E. In other words there is no year zero. Year 0 is interpreted as 1 CE so the year will be one too many. I think that Frei or Gong (whoever wrote the software to process the data) didn’t know this, so their data for NCAR is off by one year. Not a big deal, but it’s something to be weary mindful of if you’re going to use this package. But this doesn’t explain why GISS EH was off by one year. Some of the error is due to the fact that FG used six 20C3M runs and 4 A1B runs while I used seven 20C3M runs and 6 A1B runs.

To compare this data to observations, I’ve chosen Rutgers’ monthly area of snow extent data. To make this comparable to the model data, I used a land mask from Remote Sensing Systems to get the total area for North America + Greenland on the same latitude/longitude extents used for the model data. This turns out to be 23,192,196 km2. I had contacted someone at the Rutgers Global Snow Lab over the weekend about the latitude/longitude extent that one should use to make model data comparable to their data products. I was told (after I jumped the gun and processed the data) that the extent begins at ‘north of roughly 18 degees N’. I recalculated the area for this lower latitude bound and found it differed from the figure above about 1.8%.

Before I began the comparisons I chose to eliminate IPSL CM 4 from the analysis. It’s normalize snow cover is on the order of 10-4 which is certainly unrealistic. I double checked with the original gridded data and there’s something wrong. Either the program that copied the data to netCDF improperly scaled it or the model is seriously deficient. When I scale the data by 1,000 it looks realistic. Since I don’t know for sure what the problem is, it is safer to just exclude it. I’ve emailed an inquiry to the modelling group and haven’t heard back yet. Figures 14 and 15 show the annual averages of the ensemble means for the 20C3M and 20C3M-A1B periods.

Figure 14. Annual average ensemble mean for 20C3M runs.

Figure 15. Annual average ensemble mean for 20C3M-A1B runs.

As you can see there is very large inter-model variability. MIROC 3.2 HIRES appears to drop to zero towards the end of the series. The last few years of the gridded data are all zeros. Since FG’s numbers for this period are non-zero, this adds some plausibility to their using something other than A1B. CCCMA CGCM 3.1 T63 and CSIRO MK 3.5 show the closest agreement to the Rutgers data. How do they fare in terms of trends? Figures 16 and 17 show the trends over the 20C3M and 20C3M-A1B periods.

Figure 16. Trends in annual snow area extent from 1967-1999.

Figure 17. Trends in annual snow area extent from 1967-2009.

The regression indicates that snow area extent has been declining by -0.94 ± 0.70 %/decade from 1967-1999.  Over the 1967-2009 period, the decline was -0.48 ± 0.48 %/decade. Applying the same statistical tests as were used in Santer e al. 2008, the proportion of models rejecting at 5% significance over the 1967-1999 period was 1/3 and 0 for the 1967-2009 period. The test for the multi-model ensemble mean trend rejected (p = 0.014) at 5% significance for the 1967-1999 period but failed to reject (p = 0.889) for the 1967-2009 period.

(Paragraph I never bothered pasting from my draft. Thanks Carrot Eater.)

Despite the hypothesis tests showing some model consistency with observations, shouldn’t the large spread in model-generated values be of some concern? Most model/observation data comparisons we’re familiar with involve temperature. Lucia has shown that there’s significant spread in global mean surface temperature in model simulations. While this spread may generate some reasonable concern, having one model with less than 1/3 of NA covered in snow and another model showing almost 3/4 covered in snow should generate a lot of concern. If a model gets the NA-SCE wrong, there must be a whole host of physical processes that are affected by that fact that there’s either too little or too much snow cover (think: large albedo errors).

Frei, A., and G. Gong (2005), Decadal to century scale trends in North American snow extent in coupled atmosphere-ocean general circulation models, Geophys. Res. Lett., 32, L18502, doi:10.1029/2005GL023394.

Santer, B.D., P.W. Thorne, L. Haimberger, K.E. Taylor, T.M.L. Wigley, J.R. Lanzante, S. Solomon, M. Free, P.J. Gleckler, P.D. Jones, T.R. Karl, S.A. Klein, C. Mears, D. Nychka, G.A. Schmidt, S.C. Sherwood, and F.J. Wentz (2008), Consistency of modelled and observed temperature trends in the tropical troposphere, Intl. J. Climatol., 28, 1703-1722, doi:10.1002/joc.1756.

Posted in Climate models, Data comparisons | 18 Comments