I've spent some time recently reading up on some of more complex statistics out there and it sparked some questions and ideas that I thought might resonate with the readers of this site. I'll start with a couple questions and then move on to my idea for a stats project.
1) The original makeup of wOBA does not include an allowance for stolen bases or SB%. However, I've seen work on this site by Devil Fingers, where he says he's accounted for SBs in his wOBA. When you see a wOBA quoted, can you assume that it takes into account SBs? Is that now the generally accepted practice?
2) There is a statistical advantage to hitters in certain defensive alignments, such as when the infield is in double play depth or playing against the bunt. Essentially, in certain situations, the adage about runners on base causing havoc for the pitcher/defense can hold true. Is this statistical advantage used in calculations, such as wOBA? If not, how can defensive alignment be incorporated into an analysis of a hitter?
3) We've talked a bit on this site about using regression analysis. I don't know what all is out there, but I'm wondering what could be accomplished with stats and regression analysis, particularly using powerful software such as SAS. Specifically, I'm thinking about predictive values for hitters and pitchers.
I'd like to see some advanced regressions run on the base stats that comprise some of the more complex measures such as wOBA and WAR. I'd like to see what items or combination of items have the most predictive power. For example, it might seems obvious that previous seasons HRs have the most predictive accuracy for a typical power hitter, but HRs alone won't give you an accurate wOBA or WAR forecast for a hitter in the coming year. However, what if you combined HRs, 2Bs, and BBs - how predictive do you think that might be? These are the sort of things regressions can tell us.
I'd like to see how simple you could make an equation, that also has a reasonable degree of accuracy. That way it would be easy for a fan sitting at home to do some simple forecasts for his favorite players. What I think would be interesting is having different equations for different types of players. So the example above would be for a power hitter, but maybe a speedster would have something like total hits, stolen bases, and CS%.
I think you'd need to use basic counting stats for these exercises and not stats like OBP and SLG because those stats include too many variables already. For example, in SLG and wOBA a triple is obviously worth more than a double. But a triple will have less predictive value than a double because they occur far less frequently. I want to strip out the items that have minimal predictive value.
Pitchers would have something similar, but you could probably use slightly more advanced stats such as K/9, BB/9, or GB to FB ratio. For relievers, you'd have to add holds to the equation to make it worthwhile. Just kidding! Just seeing if you were still paying attention. I'm not exactly sure how to tweak the equation for relievers. Of course all of this is theoretical anyhow. I haven't run the regressions and I don't have access to powerful stats software like SAS or SPSS which would make this stuff much easier than grinding it out in Excel.
I don't know if this would be feasible for defense. I bet that's something that isn't being done though. Predictive regressions using defensive stats. Could be groundbreaking material!
Not sure if there is any value in this. Mostly it would simplify predictions without sacrificing too much in the way of accuracy. I thought it might be nice to have a few quick and dirty equations to forecast your favorite players for the coming year.