Trees for the Forest

February 8, 2010

Atmospheric Temperature Trends

In this post I’ll revisit a topic I’ve already discussed multiple times. My motivation is in part due to the fact that I now have a synthetic MSU record spanning the late 19th century to the end of the 21st century and this allows me to explore more issues. Another motivation is that I’d like to evaluate model performance in some way other than the Santer d-test.

Data Sources

The two obvious choices for observational data sets are RSS and UAH. I had some trouble deciding over whether I should include radiosonde data considering its less than ideal coverage.  I’ve looked at four radiosonde/radiosonde-derived data sets: IUK, Raobcore (v1.4), HadAT2 and HadAT2-derived TLT. Let’s look at maps of all four data sets and see how they compare in terms of their spatial coverage. First up is IUK. The last time step in the data is December 2005 and so I will use that date for comparison for the three other series.

At the lowest pressure level, 850 hPa, the continental U.S. and Europe  are well represented. Russia has pretty decent coverage. The southern hemisphere is very sparse. The data as you’ll notice is restricted to the land only. The few data points apparently in the ocean are from sondes launched from islands. Next you’ll see Raobcore has coarser resolution and apparently better coverage of the northern hemisphere.

The improved coverage is simply due to the fact that a much larger grid box is used so a lot of areas with missing data in IUK are joined in with boxes that do have data. Here’s the HadAT2 coverage which is noticeably better than IUK, but again, it’s only because a larger grid box is being used.

The HadAT2-derived synthetic TLT anomalies have atrocious coverage.

The source of the sparse coverage is because Hadley’s calculation of the synthetic TLT brightness temperature requires that at least 80% of the data points through the atmospheric profile have to be non-missing values. The data is sparse enough and this constraint causes a lot of grid points to end up as missing. I found it a bit funny to read this on the HadAT2 MSU page:

… these weighting functions provide investigators with just enough rope to go and hang themselves with.

Very true and nicely put.

If I were to include radiosonde data into my analysis, I’d want to interpolate to get total coverage over land. I would then reprocess the model-simulated temperature temperature and calculate a land-only spatial average for the temperature at each pressure level and then the same for the brightness temperature. That’ll take a while (if I ever decide to do it) so in the interest of getting this post done, I’ll put that project on the back burner.

Observational and Model Data Eyeball Comparison

Tamino recently posted on model/observation comparison and showed that the spread of the AR4 2-m surface temperature agreed very well with the observational record (at least as far as GISS is concerned.) Below are similar graphs for TLT, TMT and TLS global averages. The model-simulated anomalies are relative to 1979-1998 to match the base period for UAH and RSS.

The multi-model mean for TLT and TMT shows no ENSO event in 1998 because although these events do occur in the models, the way the simulations are initialized guarantees no synchronization with the actual ENSO events in the real world. The only common non-linear feature in the TLS data that is reproduced in the models are the two warming events spurred on by the eruptions of El Chichon (1982) and Mt Pinatubo (1991). Now while it does look good that the observational anomalies fall comfortably within the model uncertainty, albeit on the lower end, how does the model uncertainty change with time? Here’s a plot of the model spread over the course of the 20th-21st centuries.

The uncertainties grow pretty quickly. Below I show the multi-model standard deviation to make the changing uncertainty more clear. The uncertainty minimizes during the base period as one would expect because the anomalies are constrained to average to zero. However, for the TLS data, the uncertainties grow significantly faster than TLT and TMT.

If the observational anomalies fall within the model uncertainty it doesn’t necessarily mean that the models are on track. If the uncertainty grows with time, then the models are disagreeing as to what to expect and the fact that they encompass observations loses any real meaning.

Observational and Model Data Statistical Comparison

Instead of comparing model and observational trends to test for systematic differences, I’ll look at the trend in the difference between the two. The two hypothesis to be tested are:

H1: The trend in the difference between any given climate model realization and observational data is zero.

H2: The multi-model ensemble mean difference trend is zero.

For H1, the standard errors are inflated with sqrt( ( 1 + r1 )/( 1- r1 ) ) to account for lag-1 serial correlation in the regression residuals. H1 is the standard hypothesis test that accompanies regression to determine statistical significance in the coefficients.

First let’s have a look at the trends with error bars compared with RSS and UAH. The black dashed line is the mean MSU trend. The grey and dark grey bands are the mean MSU trend ± 1 SE and ± 2 SE, respectively. The light grey dashed line is the zero point.

Now here’re the trends in the difference between the models and RSS and UAH. Again, the light grey dashed line is the zero point.

Hypothesis Test Results and Discussion

Obs TLT TMT TLS
RSS 0.565 0.652 0.609
UAH 0.652 0.783 0.739

Table 1: Proportion of models rejecting relative to observational data and MSU channel.

Looking at models individually, the H1 rejection rates are alarmingly high. As one would expect, they’re higher relative to UAH than RSS given that UAH shows a smaller trend. These results are very similar to my last analysis where rejection rates for TLT relative to RSS/UAH were 0.579/0.649. However, they differ very much for the mid-troposphere where the rates relative to RSS/UAH were 0.246/0.368.

Obs TLT TMT TLS
RSS 0.110 0.079* 0.264
UAH 0.058* 0.023** 0.159

Table 2: p-value for H2 test. (Significance labels: * significant at 10%, ** significant at 5%, *** significant at 1%)

The p-values for the H2 test show that the models aren’t as dead in the water as many believe. While the p-values for TLT and TMT aren’t terribly high like their TLS counterpart, there is only one rejection at 5% significance. In the last analysis I referred to, the p-values for the d-test for TLT relative to RSS and UAH were 0.001 and 0.000, respectively, which differs markedly from the current analysis. For the mid-troposphere, relative to RSS/UAH, the p-values were 0.083 and 0.017. These figures are relatively similar in current analysis. That last analysis didn’t include the lower stratosphere so I have nothing to compare to the new numbers.

P.S. I hate WordPress because the page never looks as good as it does in the editor! :(

February 5, 2010

A look at what’s to come

I just finished up a big programming project. In previous posts, I’ve analysized synthetic MSU data. That data was created using R. As I learned more and more about R, that program has gone through four rewrites. As much as I like R, it’s just not fast enough for me. I rewrote this program in Fortran 90. I was shocked at how fast it was able to process a single 2-m air surface temperature model run (10 seconds versus a few minutes in R). The calculation of the TLT, TMT and TLS temperatures took about a half-day which was merciful compared to what I would have had to endure with R. The data isn’t ready for prime time because I still need to compare it to older versions. That’ll be done very soon and you can expect to see something about stratospheric temperature trends.

In other news, Tamino just posted on the combining of duplicate station records, specifically for Skikda, Algeria found in the GHCN. I had been working on the same topic quite a while back, but I was experimenting with the first-difference method. For Skikda, there are three records. He combines them by taking the first series as a reference and subtracts from it the second series. The mean difference between the two is then removed from the second series and both are then averaged to create a new reference series. This continues until all the duplicates are averaged. This is similar to the GISS procedure except GISS sorts the duplicates according to record length and uses annual averages. Using the first-difference method is a bit overkill compared to the simplicity of this method.

In the comments, carrot eater said

Perhaps its time to update the Hansen 1987 study on the correlation between station pairs and the distance between them. The linear weighting to 1200 km is somewhat arbitrary; one could use some model results to provide a perfect field to test methods for gridding and spatial averaging.

Hansen et al. 1987 had used a GCM to estimate the error in the global mean anomaly that results from incomplete spatial coverage. I responded that the model used in that study was of coarse resolution (8° x 10°) and that there are newer models that have much higher spatial resolution . It would be interesting to see how well “station” data from a high resolution climate model would correlate with nearby “stations”. Stay tuned.

The next three topics that you can expect me to expound on are:

  • Stratospheric temperature trends
  • Something on GHCN
  • “Station” data correlations in GCMs

It feels good to be focused again.

January 4, 2010

Another (brief) Look at Climate Model Solar Forcing

Filed under: Climate models, solar — Chad @ 4:14 pm

I had briefly taken up this issue previously, but it was a self-made distraction from more important work (replicating Santer). Now I’ll take another slightly less brief but more comprehensive look. I downloaded the monthly top-of-the-atmosphere (TOA) downward solar flux data for 20C3M and SRES A1B and calculated global averages. There is an annual cycle in the data that is due to the elliptical shape of the Earth’s orbit which I removed by calculating a running 12-month average. First, let’s look at some solar irradiance reconstructions with data taken from Leif Svalgaard’s webpage.

Now let’s look at some model data.

BCM 2.0 shows a very strange annual cycle combined with a four-year cycle. The amplitude is extremely small at about 0.22 W/m2. The CCCMA model uses a flat solar constant. The GFDL models at least uses a solar history that’s realistic. For the post-20C3M period, the solar constant is kept, well, constant at the final 20C3M value.

FGOALS, INM and MIROC use realistic solar irradiance, but again, they maintain a flat solar constant post-20C3M. MRI CGCM stands out with a weird solar history. I was suspicious of this and consulted the Japanese Meteorological Research Institute’s website and found that they used the following solar irradiance.

I thought I may have made a mistake so I reran the numbers for this model and still found the same weird solar irradiance. I think they may have made an error.

The UKMO models have no variability whatsoever. The PCM model uses the Hoyt reconstruction. This makes me wonder why the IPCC didn’t require all the modeling groups to use the same forcing histories for all known factors. This complicates inter-model comparison and just looks bad.

P.S. Now that this post is up, I can see that I still haven’t perfected getting my images to look nice after WordPress processes them. Click on the images for a clearer view.

January 3, 2010

Not Blogging Lately

Filed under: Uncategorized — Chad @ 5:07 pm

Since my last post I’ve really managed to divide my time between many projects and lose the ability to focus on one thing at a time. Here are the projects I’m currently working on:

  • Solar forcing used in the AR4 simulations
  • GHCN data
  • How gridded data’s spatial average is affected by its coverage
  • How gridded data’s coverage affects the calculation of its anomaly

I’m hoping to crank out at least one post (probably on solar forcing) by the end of the week and be able to get back on track and stay focused. Keep your fingers crossed.

December 17, 2009

HadCRUT Data in Russia – What Difference Does it Make?

Filed under: CRU, Data comparisons, Surface temperature record — Chad @ 11:04 pm

UPDATED. SEE BELOW.

The latest chatter about the CRU scandal is that, according to the Moscow-based Institute of Economic Analysis, CRU “had probably tampered with Russian-climate data.” They claim that despite meteorological stations covering most of Russia, only 25% of the station data received at CRU were actually used and that the remaining stations do “not show any substantial warming in the late 20th century and the early 21st century.”

What Effect Does the Russian Data Have?

To answer this question, the global anomaly needs to be calculated two ways: 1) Use all of the gridded data and 2) Use all of the gridded data with the grid points in Russia masked out. HadCRUT’s global anomaly isn’t calculated in a straight-forward manner. The global anomaly is the average of the Northern and Southern Hemispheric anomalies. To see if I calculated it correctly, I compared my anomalies to the official numbers published here. Here’s the difference between the two series.

The error is bounded on ±5 E-4. The official numbers are reported to the third decimal place, so the error is entirely due to the rounding in the official data. Next, I used this map to determine the latitude/longitude bounds that need to be masked to remove Russia from the mix. Here’s what the data look like with and without Russia.

Now here’s the difference in global anomalies.

The red line is a running 5-year mean. The two yellow lines are linear trends. The first is calculated from 1850-1974 and the second 1975-present. The results of both regressions are below.

Pre 1975 Post 1975
Trend (°C/decade) 0.000478 0.00403
SE (°C/decade) 0.000359 0.00240
Lag-1 Coef 0.291 0.317
p 0.183 0.093

Both regressions yielded trends that are not significantly different from zero. Even if the p-values were low, this really wouldn’t mean anything. The contribution to the global trend from Russian data is very small compared to the overall trend of  0.172 ± 0.031 °C/decade.

UPDATE (DEC 18, 2009)

The way I calculated the effect that Russian data has on the global trend wasn’t well thought out in retrospect. Dash pointed on in the comments that

If so, since the Russian stations were (allegedly) cherry-picked to agree with the (alledged) warming, I would imagine that excluding that data would make little difference, as you find.

Lucia also chimed in with suggestions for an alternate analysis to better evaluate the Russian claims. The Russian report alleges that station data that was left out didn’t show any substantial warming. So I’ve redone the calculations to try to address this more carefully. Instead of masking out Russia, I took the climatology for the grid points in Russia from 1850-2008 and replaced the actual monthly data with its climatological mean for that month. This way Russia is neither warming nor cooling. The effect of Russian data using this method shows a stronger effect on the global trend. The figure below shows the difference in the global anomaly and the global anomaly with Russian data making no trend impact.

The first thing that jumps out (relative to the original analysis) is that the trends are larger and the break point in the mid 70s is more dramatic.

Pre 1975 Post 1975
Trend (°C/decade) -0.000957 0.006456
SE (°C/decade) 0.000394 0.002432
Lag-1 Coef 0.294 0.314
p 0.015 0.008

The trends now appreciably differ from zero at 5% significance. Assuming the noise components in both series are approximately similar, the Russian data contributes 0.00 – 0.01 °C/decade to the global trend.

December 14, 2009

False Precision – It Doesn’t Matter

Filed under: Climate models, Precision — Chad @ 8:12 am

I was recently looking at some USHCN data and thought it was odd that the max/min temperatures were reported to the nearest degree (F). It reminded me of an off-topic discussion at Lucia’s about false precision. The criticism is that if temperature stations are reporting minimum and maximum temperatures to the nearest degree, then the mean can have no more precision than the two numbers that went into calculating it. Thus  the mean ought to be rounded to the nearest degree as well. How can we say the planet is a fraction of a degree warmer now relative to some climatological norm if the data going into that calculation isn’t even that precise? I used this question as an excuse to take a break from  two projects I’m working on. I downloaded 3-hourly 2-m surface temperature data for GISS EH (run 5) which covered 1991-2000. To calculate the daily min/max, I extracted eight time steps (one day’s worth) at a time and calculated the min/max values at each grid point. All the calculations from this point are done two ways: 1) using the raw min/max and 2) using the rounded min/max. From the calculated min/max, I found the daily average. After processing all the data, I then calculated monthly averages and anomalies relative to the entire period. First, let’s look at the daily and monthly average.

Now lets compare the mean daily temperature calculated from the raw and rounded values by taking their difference.

The error is fairly significant. It has a mean of zero and a standard deviation of 0.0038 °C so 99% of the errors have a magnitude of at most 0.0115 °C.  Now we can calculate anomalies and see how they compare in magnitude and trend. Here’s the monthly anomaly based on the unrounded data.

Now see the difference between the anomalies calculated with the rounded and unrounded data.

This trendless error also has a mean of zero and has a standard deviation of 0.0007 °C so 99% of the errors have a magnitude of 0.0022 °C. It appears then that using rounded values makes no real difference in the magnitude and trend in anomalies. Normally, anomalies are not calculated from spatial averages. Instead, they are calculated directly from the gridded data. So I re-did my calculations to follow this norm. Here’s the difference between the gridded monthly mean surface temperature calculated from the raw and rounded data.

Here’s the difference between the anomalies calculated from both data sets.

As you can see the differences in both surface temperature and anomaly calculations are virtually identical. This is probably because in the calculation of the monthly climatology, errors in the averaged monthly values almost entirely cancel. So by removing the climatology, essentially no additional error is removed. The rounding process should introduce grid scale errors of magnitude 0.5 °C. The temporal averaging from daily to monthly values reduces these relatively large errors to a magnitude of 0.13 °C. The month to month error range is unsymmetrical which is probably due to the fact that the temperature is changing significantly over the averaging period so there aren’t approximately as many round-ups as round-downs which makes the error bias on the lower or higher side. This error bias is on average zero and symmetric. When spatial averaging is introduced, the error is reduced even further, by about a factor of ten. The reason is probably because the combined spatial and temporal averaging combines errors that are of a relatively similar magnitude and oppositely signed.

December 11, 2009

AR4 Model Hypothesis Tests Results: Now with TAS!

I thought I’d make my reappearance since the infamous broken-CPU-heatsink incident with some hypothesis tests. You’ve heard this story before but I’m redoing the analysis I did here (with newly written code) and including surface temperature (TAS) data. I’m using the same statistical analysis used in Santer et al. 2008. We’re testing two hypotheses:

H1: The trend in any given climate model realization is consistent with the observational trend.

H2: The multi-model ensemble mean trend is consistent with the observational trend.

Two periods will be tested: January 1979 – October 2009 and January 2001 – October 2009.  First let’s look at the global surface temperature data on the first period. The observational surface temperature data set I’m using is GISS (1200 km) because it has the best spatial coverage compared to HadCRUT and NCDC. (Open the image in a new tab to get a clearer view.)

The CCCMA CGCM 3.1 T47 and NCAR CCSM 3.0 runs are consistently above the upper end of the 2-SE confidence interval of the GISS trend. The rest of the model trends’ confidence intervals overlap with GISS fairly to very strongly. Now let’s look at the data in the tropics.

Virtually all the models’ confidence intervals overlap with GISS’s trend. The table below summarizes the statistical test results.

global tropical
proportion 0.4 0.2167
d* 2.1159 1.5868
dof 130.2433 80.7613
p 0.0181 0.0582

The H1 hypothesis rejects 40% of the time globally and about 22% of the time in the tropics. H2 rejects globally at 5% significance (but not 1% significance) but fails to reject in the tropics at 5% significance. Now let’s turn to the lower troposphere data.

The distribution of trends is similar to the global surface data accept they’re more skewed upwards.

TLT-RSS TLT-UAH
global tropical global tropical
proportion 0.5789 0.2632 proportion 0.6491 0.6667
d* 3.0445 1.9341 d* 3.669 3.3478
dof 145.0886 106.25 dof 146.898 109.2369
p 0.0014 0.0279 p 0.0002 0.0006

There is a big difference in the proportion of models rejecting H1 globally and in the tropics relative to RSS.  About a 2:1 ratio in fact. In both regions, H2 rejects at 5% significance. The disparity in rejection rates for H1 disappears relative to UAH. With UAH, H2 rejects strongly both globally and in the tropics. My first interpretation of the disparity in rejection rates was that the model’s confidence intervals were wider in the tropics (thus less likely to reject) because for observations and model simulations, the data is simply more noisy in the tropics. To see if this was the likely culprit, I looked at the figure below which shows the LT trends relative to RSS in the tropics.

It is visually apparent that the trends more closely overlap and match in the tropics. The disparity is not simply that the confidence intervals are widening because the time series are noisier than their global counterparts; it’s because the trends actually more closely match the observational trend.

I also ran numbers for the mid troposphere.

TMT-RSS TMT-UAH
global tropical global tropical
proportion 0.2456 0.2456 proportion 0.3684 0.614
d* 1.3905 1.7614 d* 2.1361 3.1651
dof 146.2916 107.3204 dof 147.5836 110.3389
p 0.0833 0.0405 p 0.0172 0.001

Globally and in the tropics, the H1  rejection rates relative to RSS are the same. H2 fails to reject globally, but rejects in the tropics. We have another H1 rejection rate disparity. This time with UAH. So far, the models aren’t doing well relative to UAH, the MSU dataset that shows the least warming. Next we’ll look at the post-20C3M period.

global tropical
proportion 0.4333 0.2167
d* 2.8564 1.6502
dof 72.7562 28.5376
p 0.0028 0.0549

Again, as we saw in the first time period for GISS and RSS/LT, there’s a strong regional-dependent disparity in the H1 rejection rates. Also, the H2 result is similar as well: reject globally, fail to reject in the tropics. Moving to the troposphere, H2 rejects at 5% significance, but the H1 rejection rate disparity is present for all three data sets in the surface, lower and mid troposphere. Below are figures for the LT temperature trends during the post-20C3M and their H1/H2 test results.

TLT-RSS TLT-UAH
global tropical global tropical
proportion 0.7193 0.3684 proporiton 0.6842 0.3509
d* 4.73 2.3737 d* 3.419 2.2768
dof 73.2019 38.4789 dof 57.1426 40.8587
p 0 0.0114 p 0.0006 0.0141
TMT-RSS TMT-UAH
global tropical global tropical
proportion 0.6842 0.3333 proportion 0.5439 0.2982
d* 4.5361 2.3896 d* 3.2558 2.293
dof 72.0253 38.731 dof 56.2589 41.1311
p 0 0.0109 p 0.001 0.0135

None of the H2 tests reject at 1% significance. One possible explanation for the disparity in H1 rejection rates is that the models are accurately representing physical processes in the tropics better than the globe overall. I still haven’t been able to come up with any other plausible explanations for the disparity and how it changes between the two periods analysed. Anyone care to offer a guess?

November 19, 2009

I’m Back

Filed under: Uncategorized — Chad @ 2:58 am

The title says it all. It’s good to be back with the old computer (since Tuesday afternoon). Time to get back to work.

November 5, 2009

Problems and more problems

Filed under: Uncategorized — Chad @ 11:01 pm

I was sitting at my computer, minding my own business watching The Office, when I suddenly heard a relatively loud pop. I thought my external drive tipped over. It didn’t. I was testing a Fortran program and I closed a notepad document. The next thing I know, the computer shutdown immediately. I tried turning it on multiple times, but it wouldn’t make it more than 15 seconds or so into booting up. I came back to it about 10 minutes later. It was booting up and asked if it should restart in safe mode. I started Windows XP normally. Before it got to the login screen, the computer turned off again. I called someone at BestBuy and they speculated it could be a failed heatsink or power source. I think the fact the boot up process got farther along after leaving it alone for 10 minutes may support the idea that the heatsink failed some how (because the CPU had time to cool off.)

If anyone has any ideas what may be causing this from the little information I have, your help would be greatly appreciated. Until I get the old system up and running, there will be no more climate blogging for me.

P.S. I thought my computer was slow and outdated. I’m now using a much older computer with half the memory (256 MB) and three-fourths the computing power (750 MHz.)

UPDATE – That thing that made the pop sound was one of the brackets on the heat sink breaking off of the base. Since the heat sink lost contact with the CPU, the computer shut down to prevent the CPU from melting. Now all I need to do is get a new base for the heat sink.

UPDATE – I ordered a new bracket to hold the heat sink in place. Though it’s unlikely to come any time soon.

November 1, 2009

Comparing Spatial Averages

Filed under: Climate models, Data comparisons, Surface temperature record — Chad @ 10:39 pm

As I’ve shown, the different surface data sets are not necessarily comparable because they are anomalies relative to different climatologies and have different spatial coverage. Here, I’m going to take the AR4 model output and process it in a way that will make it an apples-to-apple comparison to see what differences spatial coverage makes. I took data from January 1979 – December 2008, calculated the climatology for this period and subtracted it to form anomalies. I then interpolated the data into identical grid maps used for GISS, HadCRUT and NCDC. GISS anomalies are available  in 250 km and 1200 km interpolated versions. The interpolated model data was then masked to match the coverage of the four surface temperature data sets. Global averages were calculated for all four plus an average for 100% coverage. Here’s BCCR BCM 2.0 for starters.

BCCR_BCM_2.0_run_1_coverage_comparison

The red line is the area-weighted average for the particular data set coverage. The black line is for 100% coverage. Not much of a difference. The relevant statistics for the model data relative to each observational data set are given below. The trends are given in °C/decade.

MODEL 100% GISS.250 GISS.1200 HADCRUT NCDC
BCCR BCM 2.0 0.141 0.141 0.142 0.140 0.140
CCCMA CGCM 3.1 T47 0.246 0.234 0.243 0.233 0.237
CCCMA CGCM 3.1 T63 0.299 0.275 0.297 0.268 0.271
CSIRO MK 3.0 0.106 0.108 0.106 0.109 0.107
CSIRO MK 3.5 0.204 0.192 0.203 0.191 0.200
GFDL CM 2.0 0.279 0.260 0.276 0.265 0.272
GFDL CM 2.1 0.298 0.279 0.295 0.281 0.287
GISS AOM 0.129 0.124 0.129 0.124 0.127
GISS EH 0.164 0.160 0.163 0.161 0.165
GISS ER 0.183 0.179 0.182 0.181 0.185
IAP FGOALS 1.0g 0.150 0.122 0.135 0.117 0.120
INGV ECHAM 4 0.175 0.164 0.173 0.166 0.168
INM CM 3.0 0.218 0.193 0.219 0.190 0.196
IPSL CM 4 0.296 0.283 0.294 0.280 0.284
MIROC 3.2 HIRES 0.330 0.306 0.329 0.304 0.310
MIROC 3.2 MEDRES 0.181 0.182 0.182 0.184 0.186
MRI CGCM 2.3.2a 0.137 0.132 0.137 0.131 0.132
NCAR CCSM 3 0.270 0.253 0.269 0.249 0.255
NCAR PCM 1 0.181 0.170 0.179 0.164 0.170
UKMO HADGEM 1 0.271 0.255 0.270 0.247 0.252

MEAN 0.213 0.201 0.211 0.199 0.203
STD DEV 0.068 0.062 0.068 0.062 0.063
PERCENT (p < 0.05) 100.000 97.917 97.917 95.833 97.917

First thing some may notice is that for the 1979 – 2008 period, essentially every model run showed a statistically significant trend. This is because a) looking at  a longer period affords more degrees of freedom which translate into small standard errors and b) the noise is greatly reduced by looking over such a long time period. Notice that on average, the model runs using the GISS 1200 km coverage are the closest to the 100%. This isn’t just due to the fact that its spatial coverage is virtually 100%. GISS 1200 km includes the Arctic region where the warming trend is amplified. HadCRUT, the dataset with the worse spatial coverage, is on average lower than the 100% and GISS 1200 km trends, probably primarily due to its missing Arctic data. Trends corresponding to the GISS 250 km and NCDC coverage are roughly comparable. Here are the results for the Jan 2001 – Dec 2008 period.

 

MODEL 100% GISS.250 GISS.1200 HADCRUT NCDC
BCCR BCM 2.0 0.282 0.317 0.291 0.314 0.336
CCCMA CGCM 3.1 T47 0.316 0.263 0.308 0.247 0.245
CCCMA CGCM 3.1 T63 0.355 0.309 0.347 0.278 0.265
CSIRO MK 3.0 0.312 0.277 0.305 0.246 0.282
CSIRO MK 3.5 0.345 0.319 0.338 0.321 0.347
GFDL CM 2.0 0.138 0.108 0.124 0.071 0.123
GFDL CM 2.1 0.590 0.593 0.581 0.631 0.651
GISS AOM 0.137 0.137 0.143 0.150 0.170
GISS EH 0.094 0.094 0.090 0.078 0.102
GISS ER 0.170 0.156 0.169 0.156 0.171
IAP FGOALS 1.0g 0.224 0.179 0.201 0.173 0.184
INGV ECHAM 4 0.066 0.062 0.068 0.061 0.053
INM CM 3.0 0.358 0.292 0.366 0.257 0.293
IPSL CM 4 0.365 0.403 0.365 0.435 0.450
MIROC 3.2 HIRES 0.582 0.595 0.574 0.579 0.622
MIROC 3.2 MEDRES 0.253 0.237 0.248 0.224 0.233
MRI CGCM 2.3.2a 0.164 0.162 0.164 0.159 0.162
NCAR CCSM 3 0.298 0.279 0.301 0.281 0.294
NCAR PCM 1 0.237 0.194 0.238 0.186 0.186
UKMO HADGEM 1 0.514 0.489 0.511 0.470 0.484
MEAN 0.290 0.273 0.287 0.266 0.283
STD DEV 0.148 0.152 0.147 0.159 0.162
PERCENT (p < 0.05) 56.250 60.417 60.417 62.500 58.333

The shorter Jan 2001-Dec 2008 period show similar trends, but significantly more than half of them yield statistically significant warming trends. Looking at such a short period results in fewer degrees of freedom which are reduced even further by autocorrelation effects.

You may notice the model list to be a little short. This is because when I reprocessed the AR4 model data, I placed a few constraints on what gets processed. I wanted to join the skin surface, 2-m, and LT, MT, LS brightness temperature data into on file so I can very easily access multiple variables. Not all the data was processed. Some of you may remember the CNRM CM 3 data that SteveM posted on because it showed a weird step change at about the first and second third of the 20th century. Well, the model uses one set of pressure levels during 20c3m and uses another set of pressure levels during a1b. This means two different weighting functions would have to be used. This would undoubtedly cause a step change. I was only going to use models that represented the atmospheric temperature at the IPCC mandated pressure levels. Any models not playing by the rules were not processed. HadCM 3 is one of them. MUIB Echo G didn’t even have any atmospheric temperature data!

Older Posts »

Blog at WordPress.com.