I am undertaking the ambitious task of creating my own projection system and hope to write a series of articles that will break down how projections work in a digestible manner. In order to do this, I first have to compile a comprehensive database with all necessary player statistics for each player from every year. I am currently in the process of building the database for batters and offensive statistics and will then move to pitching and hopefully base running and defense. Sabr.org has a great article that lists various websites from which comprehensive databases can be downloaded for this purpose. I chose to use Sean Lahman’s Baseball Database and downloaded the entire set of tables from here.
Limited Use License - This database is copyright 1996-2016 by Sean Lahman.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. For details see:http://creativecommons.org/licenses/by-sa/3.0/
The only requirement to use this data base is to appropriately provide credit, and I believe what I listed above takes care of that. I took the original database and manipulated it in various ways with a lot of tables and data downloaded from Fangraphs.
While compiling my database, I decided to calculate my own statistics for batting average, on base average, slugging percentage, wOBA and wRAA and compare them to what is listed on Fangraphs to ensure my calculations matched and were correct. Fangraphs has an incredible series that explains Sabermetrics and advanced statistics that can be found here. This helped me gain a better understanding of how these statistics work and the formulas used to calculate.
It wasn’t too difficult to replicate the published statistics by applying the formulas to my own database, although my calculations for wRAA ended up being ever so slightly off because of rounding in the numbers for the constants found in this table. After getting wRAA down, I moved to the best all-encompassing statistic that exists for comparing players from different eras and leagues against each other, wRC+.
Replicating wRC+ was a more difficult task, mostly because of my own inability to properly write the order of operations into my formula. I was able to replicate the published Fangraphs wRC+ within +/- one point, with the difference in calculations being a result of the difference in wRAA from rounding. As I studied the formula, one thing struck me as strange; the league specific run environment (AL vs. NL) is controlled in the denominator of the formula because it uses league specific (AL or NL) wRC per plate appearance. However, the numerator uses the overall MLB average for runs scored per plate appearance instead of the league specific run environment, as is done with the denominator.
Park factor is used in the numerator to control for league run environment, but that causes a problem. Either you use the half-weight for park factor or you have to use the full weight. You can’t use the full-weight, because only half of games are played in the home park. When you use the half-weight, you inherently have introduced an unbalanced control for league run environment by adding a half-weight to the numerator and a full-weight to the denominator.
The result of this in combination with how the constants in both the numerator and denominator have changed over time has made wRC+ a statistic that could potentially be more accurate. The way that league specific runs per plate appearance and wRC per plate appearance have changed over time in each league has led to more variance and an increase in error of measurement in the current calculation for wRC+.
If you are still reading and awake, I will show statistical proof of this below.
My theory is pretty straight forward; it is more statistically accurate to go back to the previous calculation of wRC+ by using league specific (AL or NL) runs per plate appearance in the numerator instead of overall MLB totals. A potential reason for this is the gap is narrowing between the AL and NL in wRC per plate appearance while the difference in number of runs per plate appearance has become more pronounced between the leagues. The denominator of wRC+ is having a more difficult time controlling for league effects and it has become more important to control for league effects in the numerator. The current calculation inherently gives more weight to the denominator, and changes in run environment make it so there should be more weight to the numerator.
Fangraphs had previously calculated wRC+ using league specific run environment, but made the change to the current formula when it was pointed out that the double counting of league specific run environment introduced additional error. Park effect was already in the numerator to control for league effects, counting league specific run environment added another full weight that made the formula unbalanced by double counting in the numerator. The key thing here is that it is not double counting by adding league specific runs per plate appearance, it is 1.5 times counting. At the time, it was presumably statistically more accurate to use the ratio of 0.5 to 1.0 instead of 1.5 to 1.0. Changes in run environment over time, and particularly, since the introduction of inter-league play, now make it more statistically accurate to use the calculation that was used previously.
The first step in testing this theory was to recalculate wRC+ by using league specific runs in the calculation instead of the overall MLB average. My first attempt at doing this was unsuccessful, but I did not realize this until I had sent a draft of my findings to David Appelman at Fangraphs. David was incredibly gracious to take a look at my work and provide feedback, which included pointing out that my re-calculation of wRC+ was incorrect. In addition to pointing out an error I should have seen myself, he took the time to write a detailed explanation of how I would have to change the formula in order to accurately calculate wRC+ in the method I had proposed. In addition, I would need to use constants that were not readily available for download and would need to calculate those myself. I could not have done any of this work without the help of David and the data from Fangraphs.
It is possible that the reason for the observed differences I present below is a result of the slight variation in my replicated wRC+ and results would be significantly impacted if there was any error in my own calculations of the needed league specific constants. I am extremely open to scrutiny, criticism and the distinct possibility I may be wrong; I am at best an amateur statistician and bootleg Sabermetrician.
The database I used for these calculations included the individual player batting statistics for every player from every team from 1871-2015. Players with plate appearances for more than one team within the same season are listed with a separate record for each playing stint. The complete database with detailed description of each variable can be found on Google Docs for anyone that would like to manipulate the data and run their own statistics. These both originated from the database referenced above and any use requires compliance with the same limited use agreement.
After I added the replicated wRC+ and my recalculated wRC+ to the database, I was able to run a statistical test on the two numbers to verify they correlated almost perfectly and to identify if there was statistical significance in the difference between the two numbers. In order to ensure a good sample size, I only used players from 1871 to 2015 that had at least 250 total plate appearances in a season. I used a paired samples t-test to measure the variance between the two variables and the output is below.
There are a few key details in these statistics that show an updated calculation that uses league specific runs per plate appearance in the numerator would make a statistically significant improvement to the accuracy of wRC+.
The first box with the paired samples statistics for both calculations shows that my proposed updated calculation (on bottom) has a mean of 101.51, a standard deviation of 26.10 and a standard error mean of .16212. My replicated Fangraphs wRC+ has a mean of 102.29, a standard deviation of 26.88 and a standard error mean of .16699. My proposed calculation has the effect of lowering the mean slightly, making the data more clustered around the mean and reduces the error of measurement. The next step is to show these differences are statistically significant.
The second box with paired samples correlations is important because it establishes that my proposed calculation has an almost perfect correlation with the replicated wRC+ (r=.999). The key number in the third box with the paired samples test is the box on the far right hand side for statistical significance (Sig. (2-tailed)). This number (.000) appears to be zero, but if you get to the 12 or 13th decimal place, that’s where you will start to see numbers. This number (.000) verifies that the difference between the two calculations is indeed statistically significant (note - I even used 99% for the confidence interval instead of 95%).
To ensure the data is not skewed because I only have players with at least 250 plate appearances, I dropped the cut point to 100 plate appearances. The results still show my proposed calculation is more accurate by a statistically significant amount.
If we include every single player from 1871-2015 that had at least one plate appearance, including pitchers, we once again see that it is statistically more accurate.
All three of the above observations lump all seasons together across the history of baseball. What do the statistics look like if you run the paired samples t-test within each individual season? The table is too big to fit and be readable, but you can access the full table here. This will show the paired samples t-test results for each group of players (> 1 PA, >100 PA, >250 PA) for each year from 1871-2015 for a total of 434 possible points of comparison (note* - no player had 250 PA in 1871). Among the 434 points of comparison, my proposed revised calculation reduced error in 429 and 364 of those were statistically significant. All 5 years in which the proposed calculation increased error were statistically significant.
The last step is to run the same paired samples t-test on the individual standard error means from each year. This is another way to verify that the differences that are seen in the error of measurement are indeed statistically significant. The resulting output below again confirms that the reduction in the error of measurement from the revised proposed calculation exists and is statistically significant.
Lastly, since statistical significance has been established, it is important to report effect size (using Cohen’s d) in order to determine if the difference should be considered small, medium or large. The effect size among the 434 individual differences in error of measurement is .081905, which would be considered small. This means that while the difference in the error may be statistically significant, it may not justify completely changing the calculation of wRC+ to achieve what in essence is a shift of less than one tenth of a standard deviation in error.
One possible explanation for the proposed change being more accurate can be seen by looking at how the constants within the wRC+ calculation have changed over time. Below is a scatter-plot that represents the difference in league wRC per plate appearance between the AL and NL from 1901-2015 and a scatter plot that represents the difference in league runs scored per plate appearance over the same time period.
The y-axis in the graph below represents the difference in league wRC per plate appearance between the AL and NL with each circle being a data point (note: pitchers are removed from this data). Between 1901 and 1972, the overall wRC per plate appearance has been slightly higher in the AL on average, but there was a relatively even distribution when it came to which league was higher in which year.
This even distribution suddenly changes with the introduction of the designated hitter in 1973 and the data skews towards the AL having a higher wRC per PA for the majority of seasons (most of the circles are above the dotted line). There is another flip in the data that corresponds with introduction of inter-league play in 1997, as the data clusters around 0 and the NL posts a higher rate in multiple years. Notice that as you increase the year towards 2015, all of the circles are getting closer to 0. This is a graphical representation of each league’s wRC per PA regressing to the league average over time.
Next, let’s look at how this relationship between the leagues looks like in terms of overall runs scored per plate appearance.
In the graph below, you can see the black trend line is showing that the difference in overall runs per plate appearance is increasing across time. Similar to the graph above showing the difference in wRC per plate appearance, there is a relatively even distribution between leagues in terms of which one had the higher offensive run environment that changes around 1973. There is a noticeable shift in terms of runs per plate appearance in 1973, but unlike the graph above on wRC per plate appearance, once the AL established a higher run environment with the introduction of the DH, it has been higher than the NL in every year since. Interesting side note - the NL actually had a higher rate of runs per PA in the first year of the DH.
These two graphs are important because when you translate the changes in the data that are used as constants in the wRC+ formula, you can see how they can unevenly weigh the result. The total runs per plate appearance data is skewed towards a higher AL run environment and the difference between the two leagues is increasing over time. Overall wRC per plate appearance has become slightly skewed towards the NL over time and this difference between leagues is decreasing.
The result is that using MLB runs per plate appearance in wRC+ is causing problems in the numerator because the league specific averages are getting further away from the overall MLB average over time. In addition, the formula for wRC+ uses park factors to control for league run environment in the numerator and this exacerbates the issue, as the formula provides only a 0.5 weight to the numerator, while still providing a full 1.0 weight to the denominator.
Basically, wRC+ places more importance on controlling for league run environment in the denominator of the equation. Over time, the denominator has become much less important to control for league run environment and the numerator has become increasingly important to control. The mathematics behind this is introducing additional noise and the problem can be alleviated by going back to the original calculation of wRC+.
I believe that I have provided visual and statistical evidence that shows why my theory could be correct. I keep looking for ways that show that I am wrong, but am inclined to believe my theory is correct and proved by the data above. Which, of course, means that someone will immediately point out a simple flaw in the methodology. But, only four of you made it to the end of this article, so we’ll see.