How good is a #1, #2, #3, #4, #5 starting pitcher?

When talking about starting pitchers, we like to throw around labels like “ace” or “#2 SP” or “#3-quality pitcher.”  And when we do this, we all have some idea in our heads as to what that means.  The problem is that everyone has a somewhat different conception of these classifications, and I don’t know how accurately those ideas reflect reality.  I think that if those terms are to be meaningfully descriptive, they must reflect how a pitcher compares to the other starting pitchers in his league.  For instance, if you are in the top 1/5th of starting pitchers in your league, you’re an “ace” or “#1 SP”.  So I devised a simple study to describe statistically what a #1, #2, #3, #4 and #5 pitcher is, and what that says about the Royals starters. 

Methodology:  For each of the last three years, I used a population of 70 American League starting pitchers.  This population was made up of the five pitchers from each team who made the most starts in that year.  This population was then ranked by four different pitching metrics.  Each ranking was broken into quintiles (five equal groups which correspond to the five starting pitcher classifications).

One could argue that this methodology has some weaknesses.  First, by using the five starting pitchers from each team with the most starts, the population includes some pitchers who made 34 starts and some who only made 15.  While the usual sample size caveats apply, I needed to include some pitchers who didn’t start all season long to get an accurate picture of what a starting pitcher in the AL is.  The guys who get the most starts are not necessarily 200 inning, 34 start pitchers.  But at the same time, I didn’t want to pollute the population with every pitcher who made any number of starts.  Many scrubs made 1-5 starts.  Including them in the population would have artificially inflated the rankings of most other pitchers, making some poor starters look good and mediocre starters look good.  I think looking at five starters from each team gives the most accurate representation of how good and bad AL starting pitchers are.

Second, the stats used are all rate stats.  I considered using a good, comprehensive, "what did this pitcher contribute to his team this year stat like VORP or SNLVAR.  The problem is that these stats create a very uneven playing field for comparing pitchers who made different numbers of starts.  A mediocre pitcher who started all season long could have a better VORP or SNLVAR than a good pitcher who only started for half the season.  The rate stats below show how each pitcher performed when he was pitching.  I think that is the best way to compare apples to apples.

ERA (Earned Run Average) – We’re all familiar with this one.  It is Earned Runs per nine innings pitched.  The problem with this stat is that it is heavily affected by batted ball randomness, the fielders behind the pitchers, and to a lesser extent the bullpen.  That is why ERA’s tend to be erratic and not very predictive.  That is why it is a good idea to at least augment one’s analysis by looking at some defense and bullpen independent pitching stats, such as the next three.

FIP (Fielding Independent Pitching) – This stat takes the "three true outcomes" (strikeout, walk, home run) and crunches them into a single rate stat roughly modeled on the ERA scale.  Over time, a pitcher’s career ERA is usually very close to his FIP.  FIP is a descriptor of past performance and therefore a better predictor of future performance, including future ERA.

xFIP (Expected Fielding Independent Pitching) – Research has shown that home runs are a function of flyballs allowed and park factor.  So xFIP normalizes for these factors, taking out home run luck and park factors, by simply counting HR’s as a percentage of the pitcher’s flyballs allowed.  Arguably, this makes xFIP a better descriptor of pitching performance by taking out another element of luck.

tRA – This is another defense and bullpen independent pitching stat which is park neutral.  It utilizes more possible outcomes than FIP and xFIP and seeks to determine the expected number of runs allowed by the pitcher.

I do want to make it clear that I’m not saying that these four stats or a composite of them is the final word on pitcher performance.  They are not.  But I do think they capture a lot of important information and screen out a lot of crap (at least three of them do).  I think these stats go a long way towards isolating and describing the most important information with regard to pitching performance. 


Nothing here was a huge surprise.  In ERA, #1 starting pitchers are roughly from the high 3’s down, #2’s are from the high 3’s to the low 4’s, #3’s are from the low 4’s to the mid 4’s, #4’s are from the mid 4’s to the low 5’s and #5’s are from the low 5’s and up.  FIP is roughly similar, but as it is scaled to have a hard midpoint of 4.50, it stays a little higher than current ERA values are skewing.  xFIP and tRA clearly skew to somewhat higher than both ERA and FIP.

One thing that we can see with each of the metrics is that American League starting pitching has gotten better each of the last three years.  For each stat and each SP classification, the numbers are getting better.  It’s hard to say if this is merely a short-term many trend or part of a larger trend moving from the high power 90’s and early 00’s into a lower power, pitching-centric 10’s.

Royals Starting Pitchers: One of the reasons I did this study is because I wanted to see how good the Royals starting pitchers have been relative to their American League peers.  We all know their ERA’s, but that is a pretty unreliable stat.  You have to dig deeper to determine if, for instance, Kyle Davies’s 4.06 ERA means he actually pitched well this season.  So you’ll see below the statistical ranks of some of the Royals 2007 and 2008 starting pitchers.  In each cell, you’ll see where the pitcher ranked in that stat among the 70 AL starting pitchers and then which class of pitcher that put them in.  In the last column, labeled "COMPOSITE," I averaged the ranks for all pitchers and then ranked those averages, along with the overall class that rank put them in.


I think this shows that Greinke isn’t a "future ace;" he’s an ace right now.  You can also see that Meche has been a very good #2 starting pitcher each of the two years he’s been a Royal.  In 2008, he just barely missed the cut for a #1.  Even with Bannister’s good BABIP luck and relatively low strikeout numbers, he was still a solid #3 in 2007.  But there’s no good news for him in the 2008 numbers.  No matter how deep you dig into his numbers, his 2008 season was crap.  While Davies had some batted ball and home run luck in 2008, he pitched pretty well overall.  It wasn’t just a mirage.  But he wasn’t great.  He was a #3, near the #3/#4 border.  While Hochevar’s ERA stunk, you can see that in the other metrics, he was a #3 or #4.  Overall, taking ERA into account, he was just barely a #4, and nearly in the #3 class.

Finally, I think something should be said about the predictive value of these stats.  I do think that past performance is a pretty good predictor of future performance.  But past performance does not determine future performance.  Players get better, players get worse and players have fluky good and bad seasons.  For instance, Bannister went from a middle-of-the-pack pitcher in 2007 to one of the worst in 2008.  Ervin Santana went from one of the worst pitchers in 2007 (composite ranking of 57) to one of the best in 2008 (composite ranking of 4).  Pitchers can and do improve their stuff, improve their control, change their pitch selection, tweak their mechanics and otherwise change with age.  But, of course, this doesn’t mean that past performance is irrelevant.  When you’ve got a significant sample of data to look at, I think past performance is the best indicator of what a player will do in the future.

This FanPost was written by a member of the Royals Review community. It does not necessarily reflect the views of the editors and writers of this site.