Working to understand FIP and the projections
* I started to post this as a comment in the pitching depth thread, but it doesn't directly have to do with the Ponson argument (about which I am indifferent), so I wanted to separate it out.
I've felt uneasy about the FIP projections and staff rankings, and the debates which followed, but was just a gut feel, and I couldn't (and still can't entirely) pinpoint exactly what it was. Marbotty touched on it in his post in NYRoyal's pitching depth thread when talking about Ponson's ERA relative to his ERA every year, but that wasn't the main reason for the unease.
At first, I thought it was due to the players that were being looked at -- we (on this site at least) rarely seem to look at good pitchers' FIPs. It doesn't seem to be used very often to justify why players are great. Instead, it's seemingly been used more to explain why certain performances weren't garbage; i.e. most of Ponson's career or 2008 Bannister.
Then I read more about FIP on fangraphs, and though it had a problem because it didn't take into account things like GB%/FB%/LD% (like xFIP does). I thought it would be giving an inherent advantage to pitchers who had a high LD% but an abnormally low HR/FB% relative to their LD%. I compared the Royals pitchers over the last 8 years, but didn't seem to see any correlation to LD%.
After this analysis, I was still left wondering if FIP gives a somewhat inherent advantage to finesse pitchers -- namely the pitch-to-contact type pitchers who aren't going to walk many, but aren't going to strike many out either.
For example, look back at the Baird-era pitchers: in 2003 the Royals had 4 starters throw at least 80 innings. All pitched at the level of #3 or better starters (Affeldt, Snyder, May, and Hernandez). In 2001, nearly the same deal as there were 4 at the #4 or better level (Byrd, Suppan, Durban, and Reichert).
As another example, look at the pitch-to-contact king, Chris George.
You just knew how to win Chris
Sure his 2003 was awful, but in 2004 he started 7 games for the Royals. He struck out only 3.21/9 and walked an awful 5.36/9. Yet due to only giving up 1 HR, he had the FIP of a #3 pitcher: 4.43. Despite the good FIP and sabrmetrics taking a firmer footing in front offices, he has not pitched in the majors since.
So, am I off on this point? Am I missing something?
---------------------------------------------------------------------
All of which brings me to another related topic: Are the FIP projections overly optimistic -- not just for the Royals, but for all of the AL Central? In looking at D_F's Driveline post the projections don't seem to map up well with NYRoyals' "How good is a #1,#2..." post. Granted, the post looked at the AL as a whole and not specifically the AL Central, but I can't imagine either the AL East or West divisions dragging the league down too much, if at all.
But based on that, there are only 4 pitchers total (out of over 40 starters for the AL Central) in the listings that are projected to have FIP sigificantly worse than the 5.11 FIP at the low end of a #5 pitcher. Actually, all but 7 fall into the #4 or better pitchers.
With the majority of the innings in the AL Central likely to be logged by some combination of the 41 on that list, will the contributions of the Lance Broadway's, Philip Humber's, and Rick Porcello's of the league really drag down the #5 average all the way to 5.11?
To the resident stats guys -- sorry if this seems ridiculous, or if I'm overlooking some aspects, but I'm still trying to learn this stuff.
2 recs |
22 comments
Comments
Short answer before I go away for a while
Not criticizing it or anything, but I just wanted to note for both of our sakes that NYRoyals’ stuff about rotation slots and my compilations of projections and WAR calculations based on them are two separate things.
Second, most people are aware of the limitations of FIP. In figuring Team WAR, when we do defensive projections (if you buy those), we figure that using ERA would be redundant.
If you look back at my older Driveline Post on Pitcher Value, I discuss some of these things and ways of accounting for them. ONe way is to average FIP and ERA, although for relievers, I thikn FIP alone is still the best because of sample size.
I discuss my favorite pitching stat, tRA, from Stat Corner, which is like FIP, but with more park adjustments, and accounting for the run expectancies of LDs, GBs, FBs, etc.
Good thoughts.
Bringing you more-or-less replacement level analysis and commentary to Driveline Mechanics and elsewhere since sometime in 2008.
by devil_fingers on Mar 24, 2009 3:59 PM EDT reply actions 0 recs
Short two-minute answer part II
The easiest solution (other than using tRA as suggested) is to looke at FIP as the baseline and then check the batted ball rates and adjust accordingly. For line drives (perhaps the most important category in that about 75% of line drives go for hits), 20% is about average, with most starters following in the range of 17% (good) to 23% (bad). The average groundball rate is about 43% and average flyball rate is about 37. Groundball are generally better than fly balls because while groundballs go for hits slightly more often than flyballs (around 27 to 23), flyball hits almost always go for extra bases and good number of those willl go for home runs. Groundball hits rarely go for extra bases. It is also key to check the pitcher’s home run per flyball rate (HR/FB). Over time, most pitchers rate regresses to 10-11%. A pitcher with a HR/FB rate below 10 percent is likely to give up more home runs going forward, while a pitcher with a rate higher than 11 percent is likely to give up fewer HRs going forward.
As for the what is a number one starter, etc., keep in mind that those numbers are not for every starter — only the five pitchers who made the most starts for their teams. Many of the ones excluded would be the ones who made only a few starts because they pitched poorly.
by Gopherballs on Mar 24, 2009 4:27 PM EDT up reply actions 0 recs
Thanks for the links
I need to take some time and read your older articles over at driveline
by Top Ramen on Mar 25, 2009 10:41 AM EDT up reply actions 0 recs
A quick thought
I don’t have much time, and I’ll probably respond more fully later, but I did want to throw something out there. I don’t think FIP rewards finesse pitchers. But it does reward groundball pitchers, and rightly so. Inducing a high percentage of GB’s can be almost as valuable as getting a lot of K’s (because GB’s very rarely become extra base hits and never become HR’s). So, if a pitcher has a high GB% and a low FB, he’s going to have fewer HR/9 (which is one of the three elements of FIP, as you know). And one of the reasons that HoRam and Ponson have decent FIP projections is because they have high GB and low FB. In recent years, Ponson has been in the 53-55 range for GB’s while HoRam has usually been in the low 50’s.
Also, I want to second d_f’s endorsement of tRA. It is basically a bigger, better version of FIP which includes more batted ball data. If I had tRA projections for Royals and AL Central pitchers, I would have used that. I just had to use the best stat available.
The immoderate moderator
by NYRoyal on Mar 24, 2009 4:26 PM EDT reply actions 0 recs
right, that's one thing I forgot to add
tRA can’t be deduced from current projection systems, or we’d use use
Bringing you more-or-less replacement level analysis and commentary to Driveline Mechanics and elsewhere since sometime in 2008.
by devil_fingers on Mar 24, 2009 4:32 PM EDT up reply actions 0 recs
What about using a three year weighted average of tRA?
i’m sure that’s a bunch of work, though
by marbotty on Mar 25, 2009 8:56 AM EDT up reply actions 0 recs
I've done a bit of that
and that would be a basic way of doing it. You’d need to include regression the mean, of course, adding in a certain amount of league average to each year (I don’t have my copy of the The Book with me, so I don’t remember how many IP of average to add per year). I would also recommend using tRA*, since that is each players yearly average regressed to itself.
But that’s pretty crude. To do it right, yoiu have to be more sophisticated. As I undertstand it, when properly projecting ERA, RA or FIP systems don’t just use those averages themselves, but project how many runs, home runs, strikeouts, hits, etc. the player is expected to give up based on a certain number of innings over the past years (regressed to average, then all the other adjustments for age, context, similarity scores, etc.), then calculate ERA, RA, and FIP from that.
That means for a more sophisticated projection of tRA, one needs to project each player HR, LD, GB, and FB rates, and also the league averages of each… I’m not saying it’s impossible, but it’s clearly a lot of work.
Bringing you more-or-less replacement level analysis and commentary to Driveline Mechanics and elsewhere since sometime in 2008.
by devil_fingers on Mar 25, 2009 9:55 AM EDT up reply actions 0 recs
yeah, i was really suggesting a three year average as a sort of quick and dirty solution
by marbotty on Mar 25, 2009 10:24 AM EDT up reply actions 0 recs
Spurred by this thought
instead of working this morning, I did “real” work: I made a crude tRA Marcels. It doesn’t do go by components, as I said,but does weight the average and incorporate appropriate regression to the mean. I also “backwards engineered” the IP and GS projections and came up with surprisingly decent numbers. I didn’t adjust for age yet, and don’t know if it’s worth doing that unless I can get an Excel/SQL datafile with the last three seasons of data… I ended up not using tRA* because it already regresses, and when I add my overall regression, it “doubly” regresses to the mean.
Maybe I’ll share it here at some point. Like I said, it’s just for fun, and very crude, but for a beginner like myself, I was very pleased with how it turned out — very close to the WAR results I got from my Driveline Series using PECOTA, ZiPS, and CHONE, and that was using better PT projections and FIP…
Just another example of how powerful simply regressing to the mean is, I guess.
Bringing you more-or-less replacement level analysis and commentary to Driveline Mechanics and elsewhere since sometime in 2008.
by devil_fingers on Mar 25, 2009 2:20 PM EDT up reply actions 0 recs
The main problem with tRA is how it is scored
There are huge differences from one scorer to the next at each stadium on what is considered a LD, PO, etc. I have heard of people calculating a scorer’s factor (like park factors).
by Jeff Zimmerman (TucsonRoyal) on Mar 25, 2009 9:56 AM EDT up reply actions 0 recs
True, batted ball data are important, but problematic
…due to human factors (those pesky humans). Hit f/x data will eventually improve this greatly.
The immoderate moderator
by NYRoyal on Mar 25, 2009 1:45 PM EDT up reply actions 0 recs
They could solve most of the problem if they had $ to pay for the BIS data
instead of using the free data available from MLB. The BIS is not absolutely perfect, but it is consistent in how the different batted balls are classified.
But is there that much difference between the MLB data and the BIS data? Groundball rates seem to generally match up, and line drive rates generally seem within a percentage point or two.
by Gopherballs on Mar 25, 2009 3:51 PM EDT up reply actions 0 recs
So, to paraphase someone...

“A team that has a porous infield defense having GB specialists is like the mule with a spinning wheel. No one know how he got it, and danged if he knows how to use it”
by Top Ramen on Mar 24, 2009 4:48 PM EDT up reply actions 1 recs
Well, yes...kind of
First, even if the IF is somewhat porous, GB’s are better for the pitcher and his team than FB’s. That is, unless the IF defense is genuinely bad. Second, will the Royals have a porous IF defense? I think the defense is solid, perhaps even above average on the left side. On the right side, Jacobs is awful, but what percentage of games will he play at first base? I don’t know. At second, Bloomy’s defense is average. There’s a lot of disagreement about Callaspo’s defense, but I say below average. Teahen would likely be well below average. Taken as a whole, I think this means a below average IF defense, but not an awful one. Long story short, I still like GB pitchers for this team.
The immoderate moderator
by NYRoyal on Mar 24, 2009 5:20 PM EDT up reply actions 0 recs
Here's hoping that Hillman sets lineups according to who's on the mound
With Hochevar pitching, as much as it pains me to say it, Bloomquist should probably be the 2B starter.
With some of the flyball-prone pitchers though, it would be nice to see an outfield of DeJesus-Crisp-Teahen(or Maier)
by Top Ramen on Mar 25, 2009 10:44 AM EDT up reply actions 0 recs
Some more thoughts
I asked Tom Tango a while back about this subject and he devised a Simple tRa when I was looking into FA pitching:
I was wanting a stat that is projectable from one season to the next and this one is not too bad.
I know the Tango crowd just dropped a good dime on measuring the time of various balls til they hit the ground to get a good data set. We will have to see if it become public or gets sold/put in a book.
by Jeff Zimmerman (TucsonRoyal) on Mar 24, 2009 5:49 PM EDT reply actions 0 recs
Some other thoughts
Then I read more about FIP on fangraphs, and though it had a problem because it didn’t take into account things like GB/FB/LD% (like xFIP does). I thought it would be giving an inherent advantage to pitchers who had a high LD% but an abnormally low HR/FB% relative to their LD%.
A flukey HR/FB percentage (anything significantly more or less than about 11) is going to skew a pitcher’s FIP. You can see this last with with HoRam (a 2.9 HR/FB really helped keep his FIP very low) and Peralta (his 18.8% killed him). But I don’t think this hurts FIP’s predictive ability much because projections use anywhere from 3-5 seasons of data, which should level out the deviations from the HR/FB mean caused by good or bad luck.
Then I read more about FIP on fangraphs, and though it had a problem because it didn’t take into account things like GB/FB/LD% (like xFIP does).
Does xFIP do this? It was my understanding that xFIP just normalizes HR/FB% in an attempt account for the good/bad luck discussed above. I know THT’s stat glossary describes xFIP in this way. But there may be multiple versions of xFIP. Is someone doing a xFIP with more batted ball data.
But I do agree that FIP is limited to the extent that it doesn’t take into account more batted ball data. That is why I think tRA is superior. But FIP remains clearly superior to stats like ERA in isolating and accurately describing pitching performance.
Sure [Chris George’s] 2003 was awful, but in 2004 he started 7 games for the Royals. He struck out only 3.21/9 and walked an awful 5.36/9. Yet due to only giving up 1 HR, he had the FIP of a #3 pitcher: 4.43. Despite the good FIP and sabrmetrics taking a firmer footing in front offices, he has not pitched in the majors since. So, am I off on this point? Am I missing something?
First, there’s a significant sample size issue. His 2004 MLB stats cover 42 innings. That’s not enough to evaluate how good a player really is both because it isn’t much data and because small samples can skew results. As you pointed out, him giving up only 1 HR really lowered his FIP. You really have to take sample samples of data with several grains of salt. Second, gopherballs was right on when he said that when you look at a pitcher’s FIP, you also have to look at the batted ball data to see what else was going on and if it looks like he the pitcher suffered some good or bad HR/FB luck. Augmenting FIP with LD, GB, FB% and HR/FB gives you a much more complete picture of a pitcher’s performance. While George’s FIP was ok, his LD% was very high, his GB% was low and his HR/FB% showed that he was the beneficiary of a lot of luck.
The immoderate moderator
by NYRoyal on Mar 25, 2009 6:06 AM EDT reply actions 0 recs
I think part of my problem in looking at it
was looking at specific years of pitchers especially on pitchers who were borderline AAAA quality, and smallish sample sizes for guys like George anyway.
by Top Ramen on Mar 25, 2009 10:37 AM EDT up reply actions 0 recs
nice post
Not to rehash the argument, but this post, along with the subsequent explanations of tRA by devil_fingers and NYRoyal, has helped to explain my confusion with the apparent suckage of Ponson and his seemingly generous FIPs.
When looking at Ponson’s career tRA’s, I am now convinced more than ever that Ponson is terrible — however, I’m no longer as convinced that he isn’t an improvement over the rest of the AL Central’s 8th starters. (I’m looking at you Dontrelle…)
Now it would be interesting to see a comparison of the AL Central’s rotations with regard to tRA. It’s too bad there aren’t the same sort of forecasted figures available for the pitchers. Guess we’ll have to wait until year’s end to get a sense of what happened.
by marbotty on Mar 25, 2009 9:15 AM EDT reply actions 1 recs
Rec'd
Guess we’ll have to wait until year’s end to get a sense of what happened.
Bringing you more-or-less replacement level analysis and commentary to Driveline Mechanics and elsewhere since sometime in 2008.
by devil_fingers on Mar 25, 2009 9:56 AM EDT reply actions 0 recs
I tried to use the sarcasm font under the George picture
unfortunately it didn’t work
by Top Ramen on Mar 25, 2009 10:39 AM EDT up reply actions 0 recs


















