Anecdotally, I hear each year about how the cold weather affects offense. My informal observation over a period of time has led me to infer that the majority of people following baseball in some capacity believe that this is inarguable truth. While a lot of people (myself included) enjoy forming opinions and beliefs without gathering a lot of facts, sometimes it’s useful to actually do so. What I will do in this article is examine research that has already been done on this concept as well as offer some of my own to reach some sort of fact based conclusion.
It is not difficult to locate articles on the relationship in baseball between game time temperature and overall offense. In 2006, Chris Constancio wrote an article at THBT that looked at the effect of game time weather specifically related to pitching statistics. His conclusion was that, "pitchers generally have worse control but higher strikeout rates and better luck with balls in play in cold weather." An article at sabernomics.com in 2008 looked specifically at the impact of weather on home runs in April. There are other links at the end of that article to other studies on similar topics.
This issue has been looked at for a particular park, as Dave Cameron took a look at Safeco Field in Seattle in 2012 where he examined the effects of a specific climate in the Pacific Northwest on the Mariner’s offense. One of his conclusions about Safeco was that the combination of warm and humid rarely happened, as the climate dictates that humidity increases with the cool weather and decreases with the warm weather. The implication is that the combination of hot and humid produce the greatest run scoring environment and that just does not occur in Seattle.
Royals Review’s own Jeff Zimmerman penned a piece at FanGraphs for us Fantasy geeks in which he analyzed average runs scored by temperature gradient. This is the first place, however, that I saw wind speed and direction referenced as a corollary that works with air temperature to affect overall offense. This article took a look at the increase of runs per game as temperature increases by analyzing games from 2007-2013. He also referenced the well documented research that has been done on the distance a batted baseball travels depending on temperature changes. The end of his article promised, and left me pining, for more.
For my study, I wanted to take a look at the statistical correlation between offensive outcomes and game time temperatures in Fahrenheit. To accomplish this, I looked at all of the game data from all teams from the 2013 season. This seems like an appropriate place for this:
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".
I downloaded 2 separate .zip files from Retrosheet for 2013 game data. The first file was a .txt file that contained every box score from the 2013 season. I was pretty excited at first, as I thought that all the data I needed would be neatly packaged in the .txt file. But, alas, this was not the case. The import was straightforward, but I had to name all my column headers and make sure I imported the right columns as pure text. Then, I found that temperature was nowhere to be found. Game time temperature, however, was in the data files that contained all of the 2013 play-by-play information (one file for each team, joy). After some creative filtering, sorting, copying and pasting, I compiled a master spreadsheet that combined both data sets and can be found here. Using this spreadsheet, I began to run 2-tailed Pearson Correlations between game time temperature and a variety of offensive outcomes. Specifically, the outcomes of total runs, total hits, total doubles, total triples, total home runs, total RBIzzz, total strikeouts, total walks, total stolen bases, total caught stealing, total number of pitchers used and total GIDPs.
The table below (click graphic for larger picture) shows all of the correlation coefficients between all these variables and outcomes. To give some perspective, the relationship between total RBIzzz and total runs is nearly a perfect one (.987). This makes sense, as almost all runs are caused by RBIzzz with the exception of runs scored on double plays or errors. Therefore, virtually every time you see a run scored, you also see an RBI. However, the important thing to remember with correlations is that they don’t prove causation. So, I cannot use this data to say that I have statistical proof that RBIzzz cause runs; only that the correlation between the two is almost perfect. This is the highest positive correlation on this grid. The next highest correlations were between total hits and total number of runs (.767) as well as total number of hits and RBIzzz (.763); again showing a very strong and statistically significant relationship between them.
Some other outcomes have lesser correlations, but are still strong and statistically significant. Things like the correlations between:
· Total home runs and total runs scored (.530)
· Total walks issued and number of pitchers used in a game (.490)
· Total doubles and runs scored (.484)
Then there are outcomes that have absolutely no correlation at all. Like the correlations between:
· Total hits and total strikeouts (.001)
· Total home runs and total caught stealing (-.002)
· Total home runs and total GIDPs (-.002)
So, with that in mind, here are the positive correlations that had statistical significance (more than can be explained by random chance) that I found in 2013 between game time temperature and offensive outcomes:
· Temperature and total hits (.077 with statistical significance at the .01 level)
· Temperature and total RBIzzz (.044 with statistical significance at the .05 level)
· Temperature and total runs (.043 with statistical significance at .05 level)
· Temperature and total number of GIDPs (.044 with statistical significance at .05 level)
· Temperature and total home runs (.043 with statistical significance at .05 level)
Now, here are the negative correlations that had statistical significance (more than can be explained by random chance) that I found in 2013 between game time temperature and offensive outcomes:
· Temperature and total strikeouts (-.066 with statistical significance at the .01 level)
· Temperature and total walks (-.050 with statistical significance at the .05 level)
Then, we have a correlation that shows a some semblance of statistical significance, but we’re starting to push it:
· Temperature and total doubles (.037 with statistical significance at the .10 level)
We then have the correlations that show no statistical significance at all:
· Temperature and total stolen bases (.010 – not statistically significant)
· Temperature and total caught stealing (.018 – not statistically significant)
· Temperature and total number of pitchers used (.017 – not statistically significant)
In conclusion, when looking at the data presented, one can reasonably infer that the game time temperature of a game has a statistically significant relationship with most offensive outcomes. There also appears to be more contact on pitches when the weather warms up as is evidenced by the decrease in both strikeouts and walks as the temperature increases. When combining this data with the work that was referenced at the beginning, it can be fairly safe to say that there is enough current data and research to support the idea that game time temperature does have an impact and effect on the offensive outcomes in major league baseball games.
But, you already knew that, didn't you?