It’s been awhile since we’ve thrown on our lab coats, slipped on the latex powdered gloves, and heated up the Erlenmeyer flask. So welcome back to the lab, let’s get our hands dirty.
This will be a quick and visual episode, in which today we are asking: which minor league stats are the most predictive? I’ve been on what I thought would either be a never ending or all too expensive quest to find career minor league data. Then this year our sabermetric lord and savior FanGraphs rolled out an updated minor league leaderboard that allowed us to view career numbers. Hallelujah, praise be to Mike Trout.
For this episode, we are just going to look at minor league pitching stats and 1) how that stat translates to major league stats and 2) how that stat predicts overall success, as measured by WAR.
Let me state this first: we are going to be using r-squared, a quick and dirty method for correlation. In essence, r-squared is the explained variance in a regression. A regression is made up of two parts: explained variance and unexplained variance, combining the two gets you total variance, and r-squared is simply explained variance divided by total variance, or the percentage of the total variance that is explained. While it’s a decent metric when we are working with linear regression, it isn’t perfect. We can play with our values to get any r-squared we want (by adding or removing variables - similar to p-hacking or data dredging). So be careful here, this isn’t an exhaustive study, but just something we can eyeball and get a good glance of what typically leads to success at the major league level.
For a player to qualify for our regression, he must have:
- Made the majors (obviously)
- Pitched 100 minor league innings (in part to remove guys who were just rehabbing)
- Pitched 30 major league innings
I toyed with making the major league threshold higher, but I wanted to be sure to included guys who were bad in the minors and also bad in the majors (which means they didn’t stick for long). We could nitpick this a bit and try to find an answer of survivability, but in the end the above results include 1,308 individual players.
- Not surprised MiLB FIP has had a higher correlation to MLB FIP than MiLB ERA to MLB ERA, given the difference in defense throughout the minors and some larger variance in park factors
- A bit surprised xFIP has a bit stronger, though not strong overall, correlation to it’s MLB equivalent than ERA and FIP. I don’t know the answer to this but do minor league parks typically have shallower outfield walls than the majors? I would guess so?
- MiLB and MLB groundball% seems to be linked strongly, but I thought K% would be a little stronger
I also want to dedicate a special section to Swinging Strike%. I’ve been using that as a big red or green flag for both hitting and pitching prospects for a few years. Unfortunately the data on it is finicky. Any SwStrk% in rookie ball levels is useless for some reason as the numbers are wildly high (sometimes a 60% swinging strike rate). Also for some reason starting in 2012 the data become much more reliable. For instance, the top 46 swinging strike rates for pitchers in AA and AAA from 2010-2019 are all from 2010 or 2011 and most of the top 100 rates are from pre-2012. This has led me to conclude that any swinging strike rates before that are also useless.
Having qualified that, here are the same charts above but for SwStrk% since 2012 and for AA/AA and A/A+ (FanGraphs only lets me split it out that way). Also we can include K% for those levels since we have the data split out already.
As you would probably expect, the numbers get a little more reliable (from an explanatory standpoint) in AA/AAA.
Sometime in the future, we’ll explore the same type of information but for hitters.