Prospects, what are they good for? A Quick and Dirty Historical Analysis of WAR Generated by the BA Top 100 Prospects
via img801.imageshack.us (data: fangraphs.com)
The scatterplot above represents the average seasonal fWAR generated by ballplayers appearing in Baseball America's Top 100 list for the years 1995-2009 (i.e. a fifteen-year sample). Six years' worth of data (begining with the year of the BA ranking) is included for each ranked player (where available--obviously, the 2006-2009 prospects have yet to generate six-years' worth of data), and all of that data is averaged together to create a single per-season fWAR value for each position in the rankings (i.e. 1 through 100). Because I am interested in assessing the value of each ranking position, rather than the individual players, I have not eliminated duplicates. As a result, Joe Mauer (who was the #1 prospect in both 2004 and 2005) is responsible for a disproportionate share of the value associated with the number-one spot. He also did his share to help out the #4 spot (2003) and the #7 spot (2002).
The overall pattern seems pretty clear, and I took the liberty of adding a trend line to emphasize the point. The number-one spot has generated an average of 3.825 fWAR per season over the course of the last 15 years, and it tails off somewhat logarithmically after that, with the trend levelling out around 1 fWAR somewhere near the 60th ranked spot.
Also, when looking at this data, you should keep in mind that WAR is a counting stat, so it is heavily influenced by playing time. Because BA is ranking prospects, rather than established players, there are a disproportionately large number of partial seasons (including, for instance, September call-ups and other "cup of coffee" situations) in the data. With this in mind, consider the following graph, which compares the average number of "Top 100" prospects who log time in any given year after being ranked, with the average seasonal fWAR value generated by the entire group in each year.

via img31.imageshack.us (data: fangraphs.com)
As you can see from the vertical bars on this graph, roughly half of the players appearing in the BA list will actually see some time in the Majors in the same year that the list is published (i.e. "Year 1" in this graph). That fact surprised me a little bit, since BA does rank prospects in the low minors. It seems clear that the BA lists have an overall bias toward guys who are pretty close to ML-ready. You may also note from this graph that something like 20 guys on any given list are never going to see the Show, and that roughly one third will not be in the league long enough to reach free agency (i.e. six years).
Turning to the WAR values at the top of the graph, you can see that prospects generate much less average per-season WAR in "Year 1" than in the succeeding years. Some of that is undoubtedly due to the inevitable "adjustment period" associated with the jump to the Majors, but it is probably fair to say that the lion's share of that difference is attributable to playing time (i.e. many of the "seasons" included in that first-year average are actually just a handful of September games, etc.). Because the number of ranked prospects in the League levels off so much between years 3 and 4, I suspect that the playing-time distortion falls out of the data around that point. After that, the headcount starts to drop, presumably as the maginal guys wash out of the League, and I suspect the continued rise in seasonal fWAR can be attributed, mostly, to the "washouts" no longer dragging down the overall average. As a result of this general line of reasoning, I suspect the year 4 average seasonal fWAR of 1.598 comes much closer to the "true value" of these ballplayers, on the whole, than the total value of 1.388 fWAR/season that I calculate from the complete dataset.
I am hoping to ultimately add to this dataset, so that it will eventually include all of the BA top-100 lists going back to 1990, but with this year's list already on the way, I thought someone might be interested in seeing some of the info I have gleaned so far. Once I see the actually BA 2011 list, I'll use the numbers to put a value on the Royal's farm system based upon the number and position of our newly ranked prospects. I'll probably compare that to some of the other well-regarded farm systems of the past decade or so.
In the meantime, I made a couple of pretty graphs. I hope you like them.
40 comments
|
7 recs |
Do you like this story?
Comments
I also have had fun looking at the meaningless trivia in this data...
…for instance, over the last 15 years, it seems that MLB teams have benefitted more from having the 56th best prospect in the nation than having the #5 guy. Over that time, the number 56 spot has featured the likes of Matt Morris, Mags Ordonez, Adam Dunn and John Danks. Meanwhile, number-5 prospects have included flops like Joel Guzman and Jesse Foppert, along with a host of marginal-value guys like Aramis Ramirez, Travis Lee and Brian Hunter.
In the end, the fifth spot posts a seasonal average of 1.563 fWAR, and the 56th spot puts up 1.846 fWAR.
And the number 21 spot has outperformed its pedigree...
…with the likes of Garza, Markakis, David Wright, B.J. Upton, Morneau, Jay Payton and Andruw Jones. End result: 2.281 fWAR/season.
Thanks for putting all of this together it looks really good.
Also, the comment I’m replying to is really interesting.
It suggests that certain spots in the prospectus have created more fWAR. Which seems very strange. Interesting to think about though.
Dr. Ausgiano schools me in the classroom and on the field of battle
by MarioVanPeebles Republic of China on Jan 27, 2011 6:07 PM EST up reply actions
So far, I'm chalking the anomalies up to sample size.
While each datapoint theoretically represents about 80 individual MLB seasons, those seasons are still only generated by 15 individual players.
I’m keeping my eye on it though, since it would be interesting to discover some odd bias in the rankings. At the moment, the cluster from 60-70 seems to underperform as a group, and the cluster from 70-80 overperforms.
Definately agree with your assumption there.
It would be interesting to observe different clusters in a staggered set.
Like : 20-30, 25-35, 30-40, 35-45 ect. This might even out some noise from only 1 player, but then still get an idea of the trends.
Also it would be really cool if it is possible to try and attribute the underperforming/overperforming clusters to either flaws (and repeated flaws) in how players are ranked, or a real human condition.
…follow me here. Real human condition; A minor leaguer knows he isn’t ranked really high and that gives him more incentive to work harder than the higher ranked guy thus in the long run creating more fWAR at a certain lower cluster.
Dr. Ausgiano schools me in the classroom and on the field of battle
by MarioVanPeebles Republic of China on Jan 27, 2011 6:46 PM EST up reply actions
Greg Easterbrook (TMQ) would agree with you, at least about football players
the idea that being a rich, heralded rookie actually hampers a lot of people, compared to having to scratch and claw for a spot on the team.
The whole problem with the world is that fools & fanatics are always so certain of themselves, and wiser people so full of doubts. ~ Bertrand Russell
by SagehenMacGyver47 on Jan 29, 2011 3:59 PM EST up reply actions
Could it possibly be that attempting to parse out a precise order
for the top 100 professional baseball players not yet in the Major Leagues is NOT an exact science??? GASP! Whatever will the baseball pundits do with themselves? What if this translates to other sports?? How will Mel Kiper pay for his hair gel?
Seriously, though, this is good work. I love stuff that tries to predict the future value of players, even if it might ultimately be impossible. It’s still fun to analyze and talk about.
by Sweep_the_Leg on Jan 27, 2011 6:18 PM EST up reply actions
Actually,
I’m impressed by how well the data lines up. I was actually sort of expecting the scatterplot to be a blob, but wound up with a relatively clear trend, instead. It’s certainly far from an exact science, but the BA people do seem to have some idea what they’re doing.
(Also, I’m not sure they would even agree that “predicting future WAR” is what they’re trying to do. It’s just that, as a fan, I kind of wish that’s what they were doing.)
Welllll...
it DOES seem kind of “blobby” once you get past the top 20-30. Which is kind of the point I was getting at. It gets much harder to distinguish between the 57th and 58th best prospect, for example. I think you’d probably be better off just putting guys in tiers after the top 25 or so. I suppose a lot of prospect analysts do that, but we can’t help but want a hyper-detailed list for everything these days. Hopefully supplemented by lots of polls about the list!!
by Sweep_the_Leg on Jan 27, 2011 6:37 PM EST up reply actions
Nail on head
I personally don’t see the obsession with rankings at all.
A guy is either a good player or he isn’t. Does slapping a 4 or a 17 or a 56 on him tell us anything at all on top of that?
Rankings imply a granular, high resolution certainty that simply doesn’t exist when it comes to prospect evaluation.
Does slapping a 4 or a 17 or a 56 on him tell us anything at all on top of that?
That’s basically the question I am interested in answering. Based on early results, I suspect that the difference between a top-five prospect and someone around #50 is significant and the difference between #50 and # 100 really isn’t.
Actually, I have put guys in tiers..
…and after the first 50, it looks like there isn’t much differentiation. I just thought the scatterplot was pretty (and the right-side isn’t so much a “blob” as it is a levelling off).
Here’s how the average seasonal fWAR breaks out once you have combined the prospects into groups of (what I consider to be) meaningful sizes (a range of prospect rankings is on the left, and number on the right is the average seasonal fWAR generated by the guys in that “bucket”):
1-5: 2.399
6-10: 2.112
11-20: 1.869
21-30: 1.611
31-40: 1.376
41-50: 1.181
51-75: 1.042
76-100: 0.995
Are you counting guys that did not make the majors as
not having an fWAR? If so, if you add those players in as 0 WAR into the average. I think that might give you a more accurate picture of what the rankings do. You are cherry picking the data if you don’t include them.
Go Royals!
Well, actually, I'm doing it both ways to see what I find...
…because neither approach is perfect. The graphs above do not “zero-out” unplayed seasons. If you don’t play in a given year, you simply aren’t counted that year in these graphs.
The alternative, which I’m also doing, but which I didn’t include in this post, is to count every year not played as a “zero-WAR” year. That approach has the unfortunate side-effect of pulling down the averages of players who simply aren’t ready for MLB yet. And, keep in mind there are single-A ballplayers in the data, so there’s really no reason to hold that against them. In the larger scheme of things, that seems like a distortion in the results if you do it that way.
Of course, I don’t think that distortion is any worse than the “cherry picking” problem you describe… just different. Like I said, neither approach is perfect. Ultimately, I’m just collecting the data, trying different things with it, and hoping I find something interesting along the way.
BTW...here's the same scatterplot calculated the way you propose:
(i.e. w/missed seasons counted as 0 fWAR)

As you can see, the overall pattern remains, but the whole curve is shifted down by about half a win.
True.
…especially with the “Mauer effect” at work.
If you look at how the 2-5 spots perform, that #1 looks awful suspicious. But, it was more fun to take my trend line all the way up there, so I just did it.
More seriously, it also occurred to me that, given the way that prospects are ranked, it might make some sense to believe that it’s just easier to decide who’s number one than it is to decide who ranks just behind that guy. If so, I would expect the number one to be more consistent in terms of actual production… and the data does generally bear that out. I’m hopeful that adding five more years’ worth of lists will reduce the volatility…
a few more years of Hochevar posting
5 or 6 ERAs will help lower that #1 avg as well.
SNARK!
"We're gonna win with pitching and defense" General Manager Dayton Moore, circa winter 2009
"Where did all these Indians come from?" General George Armstrong Custer, circa summer 1876
With all due respect to the Snark...
…the good folks at Baseball America never ranked Hooch above #32 (2007). It was only the Royals who valued him higher than that.
On a serious note
Excellent post.
I do think, looking at the plots, it just reinforces the idea that quantity might be as important, or even more important, when it comes to prospects. As Royals fans, we are banking on that being true!
"We're gonna win with pitching and defense" General Manager Dayton Moore, circa winter 2009
"Where did all these Indians come from?" General George Armstrong Custer, circa summer 1876
..or even more important
“than quality, in a way” (that’s what I meant to type)
In other words, it might be more important to have 6 or 7 top 100 guys, even with none in the top 10, than it is to have only a couple of guys, even if one is in the top 5.
"We're gonna win with pitching and defense" General Manager Dayton Moore, circa winter 2009
"Where did all these Indians come from?" General George Armstrong Custer, circa summer 1876
Great work
I hope I can get my work done on the likelihood of success of prospects completed sometime this century.
You may know me as NYRoyal.
Whew, I was getting stressed
You may know me as NYRoyal.
by Scott McKinney on Jan 27, 2011 5:51 PM EST up reply actions
Uhhh...looks like somebody isn't up-to-date on their Mayan prophecies.....
LOL!
Killing time until time kills me
Here's a sobering thought...
The last time the Royals had a really strong showing in the BA 100 was 1999.
That year, we had six names on the list: Dos Carlos (#14 and #30), Jeff Austin (#55), Orber Moreno (#57), Jeremy Giambi (#64), and Dee “some people call me Dermal” Brown (#92).
Over the next six years (from 1999-2004), those guys posted a collective 28 fWAR. Also, between 1999 and 2004, Beltran single-handedly posted 28 fWAR. That’s right… the other five guys posted a collective “zero” (mostly attributable to Dee’s -3.8 fWAR).
Being a top 50 prospect and especially a top 25 prospect makes a huge difference
If a player is in the 51-100 group, he deserves no more than cautious optimism. Thankfully the Royals have multiple players in the right half of the top 100 and even in the top 25.
You may know me as NYRoyal.
by Scott McKinney on Jan 28, 2011 4:32 PM EST up reply actions
Absolutely.
…and once I’ve got the 2011 list, I’ll try to quantify the difference between 1999 and today using the numbers I’ve collected so far.
But glancing through the data just now, that list of names from 1999 jumped out at me and I was reminded what a disappointing bunch that was… and that’s even with the fact that Beltran really did go above and beyond.
Wait a minute...
Being a top 50 prospect and especially a top 25 prospect makes a huge difference
You've been looking at my scatterplots, haven't you?
Of course!
I had better kick the pace on my little research project up a notch or many others will have covered everything before I get a chance to “publish” it.
You may know me as NYRoyal.
by Scott McKinney on Jan 28, 2011 4:53 PM EST up reply actions
this....
and the fact that i think its pretty generally accepted that prospecting has gotten more accurate since then….its still far from perfect, but its better
Fire Everyone
by billybeingbilly on Jan 28, 2011 6:48 PM EST up reply actions
I'm curious if this is true.
In fact, this is one of the hypothesis I’ve been wanting to test with this fWAR data. I did create a bunch of these graphs using data from different years within my sample earlier today, trying to see if, for instance, the 2005-2009 lists or the 2000-2004 lists more accurately predicted future success than the 1995-1999 lists. Unfortunately, doing that reduces the sample size to the point where you can’t tell by just “eyeballing” the data. I’ll have to give the issue some more thought though, as I finish collecting the fWAR numbers for the earlier years…
I wondered the same thing about a month ago.
It makes sense that it’d be better with all the data available but nobody seems to know for sure and I guess that’s because it’s still such a crapshoot with prospects.
Glad I came, just wish I hadn't stayed so long.
People ask me what I do in winter when there’s no baseball...Rock Chalk Talk
Another angle
I would love to see where each of the picks succeed and fail. Such as with the #50 position:
10% created >10 WAR,
20% 2 to 8 WAR,
50% less than 2 WAR in major leagues
20% maxed at AAA ,
10% maxed out at AA
- .-. ..- … – / – …. . / .—. .-. - .. . … …
That's basically what I'm working on
For 1990-2003, over each player’s cost controlled years. The data collection process is a bear.
You may know me as NYRoyal.
by Scott McKinney on Jan 29, 2011 3:24 PM EST up reply actions
Sky Kalkman was posting these randomly on twitter a couple of weeks ago
I think
Glad I came, just wish I hadn't stayed so long.
People ask me what I do in winter when there’s no baseball...Rock Chalk Talk
IIRC, he was just mentioning some highlights of former Xth draft pick selections
Hopefully I’ll get more depth than that.
You may know me as NYRoyal.
by Scott McKinney on Jan 29, 2011 11:35 PM EST up reply actions

by 





















