I'll start this with a precursor that we're (I guess I should say I am) going to get pretty gory with some statistical concepts in this post. I'm not a statistician, but my finance background causes me to run into some stat concepts on a daily basis so I'll try to do my best to explain the concepts because even I don't have the most perfect grasp of them.
Our focus today will be on Danny Duffy. I think it's safe to say that everyone here, all Royals fans in general, like Duffy. Ever since "bury me a Royal" hit the surface web (did it reach as far as the Deep Web?), Danny has ingrained himself in Royals fandom and culture, for better or for worse. For better or for worse. Those five words define Duffy almost entirely.
For better when he was a top-100 prospect and part of the Royals original "next wave"
For worse when he quit baseball in 2010 at age 21
For better when he struck out 9 batters in 3.2 innings against the Cardinals in 2011
For worse when he gave up 8 earned runs in 3 innings against the Yankees that same year
For better when he pitched maybe the best game of his career this past June against the White Sox
For worse when he had Tommy John Surgery
There seems to always be a Yin and a Yang with the Duffman. For pitchers with fringe to below average command, maybe even control in his case, inconsistency is ubiquitous.
For this article there are two statistical concepts we need to get a vague grasp of and I'll start with the easier of the two first: R-Squared.
If you work in finance or doing any personal investing, r-squared should be something you've seen before, especially if you do mutual fund or ETF investing. Basically, r-squared is the correlation in movement between two objects. Take for example the ever-popular Spider S&P 500 ETF (SPY) index fund. It has a 1.0 (or 100%) r-squared or a full correlation between the ETF and the index benchmark (the S&P 500). This means that every price change of SPY correlates with movement of the S&P 500.
Now, take for example perhaps the most popular mutual fund on Earth, the Fidelity Contrafund and it's $109B in assets. For the past 10-years it has has an r-squared of 0.8787 (87.87%). So, 87% of the funds movement has been due to movement in it's benchmark (in this case, the S&P 500).
So in essence, if you want perfect correlation, you want an r-squared of 1.
The harder statistic is p-value, which I'll do my best to describe through my admittedly basic knowledge of it to start with. First I'll start with an example from Wikipedia:
Suppose a researcher rolls a pair of dice once and assumes a null hypothesis that the dice are fair. The test statistic is "the sum of the rolled numbers" and is one-tailed. The researcher rolls the dice and observes that both dice show 6, yielding a test statistic of 12. The p-value of this outcome is 1/36 (because under the assumption of the null hypothesis, the test statistic is uniformly distributed), or about 0.028 (the highest test statistic out of 6×6 = 36 possible outcomes). If the researcher assumed a significance level of 0.05, he or she would deem this result significant and would reject the hypothesis that the dice are fair.
Man, that is not a fun paragraph to read. So basically, the "null hypothesis" here is that the dice are fair; that each die has an equal chance of landing on any side. The p-value in this case is 0.028. There is a threshold for rejecting the p-value (that the dice is "fair) and that threshold is 0.05. Since the p-value was 0.028 (smaller than 0.05) we can reject the hypothesis that the dice are fair
That's not an easy thing to grasp, but it will hopefully get a little easier below and we won't necessarily harp on it a lot. Just assume that anything above ~0.05 is not a good result for our exercise.
Let's put it in practice real quick here.
I've complied the fWAR and 2014 win totals for the 15 AL clubs and charted them. Our null hypothesis here: A team's fWAR has no bearing on their total win (which in inverse what we are trying to prove is that a teams fWAR DOES have a bearing on their total wins).
So you can see both the r-squared and p-value are good for our hypothesis.
Now, onto the a variable collection of select metrics for Duffy.
Let me set an expectation first: I didn't expect high correlation figures on any of the individual metrics given that there are a very large number of factors in play, the smaller sample size, and how different each individual metric is. While I don't think the data is meaningless, I think some correlation could be drawn. At least when we look at the r-squared and p-value results in relation to others, we can judge them a little better.
The FIP one is easy to see what correlates well: home runs, walks, and strikeouts, the cornerstones of FIP. Meanwhile strikes thrown has a little correlation there as well.
On the ERA side we have a few more to choose from. HR/9 appears again and that makes sense given how home runs live in the batted ball space. Another obvious one is BABIP, something we know pitchers have really no control over. "Left on base percentage" is an obvious one as well as the higher the strand rate, the fewer runs score when they reach base, something that obviously correlates well with ERA (less runs).
The more ground-balls a pitcher can induce the better (they are worth something like 0.05 runs) and strikeouts are obviously something else a pitcher can be effective with to produce multiple hits and turning them into runs. Duffy has a lower than average career ground-ball rate and a roughly league average K%. Maybe this is when good Duffy rears his head, when he's getting ground balls and striking out pitchers. Obviously, that's good for any pitcher of course, but in Duffy's case given his well below average career GB%, whenever he produces ground-balls are a higher rate than his average he gets better results.
Truthfully I was expecting BB% to have a higher correlation with Duffy since he is someone who struggles with command at time and walks = men on base = more chances for runs = higher ERA, but both the r-squared and p-value were pretty far off.
It's hard to find any non-obvious or hidden causes in his inconsistency through the data above. Maybe that further pushes that Duffy is an enigma, or maybe the experiment just was inconclusive (or irrelevant)...
Again, I'm not a statistician, and maybe my results and analysis are way off. If so, please feel free to chime in and correct me on anything.