I’m sure many of you have taken notice of my absence. I’ve been very busy working on other projects whose results you may see published sometime in the (hopefully) near future. For the last week or so, I’ve been writing programs to process the GHCN station data into a gridded product. Though this has already been done by many, I’ll show you my results anyhow. I’ll briefly go over the procedure and then discuss some gridding issues that came up while analyzing the data.
Data and Methods
- Load the station inventory (v2.temperature.inv) and data (v2.mean) into memory.
- Consolidate each station’s duplicate entries (if applicable) by computing a monthly offset for each station such that the sum of the squared differences between each station after the offset is applied is at a minimum (See Tamino and Roman). The offsets align the series which are then averaged together.
- Determine which stations are in each gridbox. I initially used a 5° x 5° grid.
- Consolidate all the stations in each grid box using the same methodology as in step 2.
- Calculate the climatology at each grid point and remove it to get temperature anomalies. The base period is 1961-1990 and I require a minimum of 20 years of non-missing data to calculate a valid climatological mean.
- Calculate spatial averages for the global, NH, SH and the tropics.
Normally I would impose a constraint on how much area needs to be represented by non-missing values to calculate a valid spatial average. Unfortunately, when I was writing my program I was more worried about getting it work properly and neglected to include a land mask to determine how much land is present in the four regions that I the calculated averages for. When I calculated the spatial averages, I also made sure to hold on to the summed area represented by the non-missing data to see how it varies with the overall average. First let’s see how my global average compares to the results of Jeff ID/Roman M, Nick Stokes, Zeke and Residual Analysis.
My reconstruction is consistent with the others so that gives me confidence that I didn’t seriously botch the calculations. Let’s look at the global average over the entire time period (1701-2010).
The first 150 years of the series shows much larger variability than the rest of the series. Most likely this is because the sampling was small enough to allow regional or small-scale variability to dominate the ‘global’ average. The area fraction data was of some concern. This is the sum of surface area accounted for by grid boxes with non-missing data normalized to the total amount of global land area. Why is it greater one? I think this is a non-obvious error that everyone who has created a gridded product of GHCN has made. If a grid box contains one or more stations and it is completely occupied by land then weighting it according to the surface area of that grid box is correct. When there is some ocean present then the weight that is applied is too much. Let’s see how this issue affects the other spatial averages.
The haphazard weighting affects all the spatial averages but is the strongest in the tropics. I re-ran the spatial averages using a land mask to properly adjust the area weights and compared the normalized area fraction both ways.
Now the area fraction doesn’t take on physically unrealistic values. Given that many bloggers are now combing the land and ocean data, this issue shouldn’t be as serious, but we still need to know how much land/ocean is in a grid box to properly combine land/ocean anomalies corresponding to the same grid box. This area fraction bias would certainly be reduced by using a finer grid. To test this, I re-ran my program with 2.5° x 2.5° resolution and compared the area fractions.
As expected, using a finer grid reduced the area bias and brought it more in line with the correct figures. What difference does this bias incur in the spatial averages and their trends?
Over the course of the 20th century up to the present, the bias in the global, nh/sh averages is statistically significant thought practically negligible. In the tropics however the bias is fairly large. Here’s the same data and trends over the most recent 30-year period.
Over the recent period, the biases are statistically and practically significant with the most extreme bias occurring in the tropics. The trends are positive and this means that the improperly weighted procedure produces anomalies over this period whose trends are too small. Now see what happens when the spatial averages are calculated on a finer grid.
Over the more recent period the biases are still of some practical significance but not as large. Here are the spatial averages with the correct weighting and their trends.
UPDATE May 23, 2010: The trends below are wrong. The uncertainty is one standard error, not two as I intended.
Here’s the same graph but from the 2.5° x 2.5° data.
In all regions except the tropics, the finer griding brought down the trends. Why? That’ll have to be the topic of another post.
P.S. Does anyone like the new theme?
Update- May 19, 2010
I’ve uploaded the annual averages and the Fortran code (http://drop.io/treesfortheforest). As per Carrick’s request I’ve created plots of the weighted mean latitude of all non-missing grid cells. Below are three plots covering three different periods: 1701-2010, 1850-2010 and 1950-2010.