The Royals worked out a trade a few days ago with the Angels that brought Ervin Santana to Kansas City. The move starts "The Process" of creating a respectable starting pitching staff after having the 2012 staff decimated after losing 2 starters to Tommy John surgery. The problem is that Santana looks to be just as much of an injury risk going into the 2013 season.
Over his career, Santana has been a relatively healthy pitcher. He has been on the DL twice, both in the 1st half of 2009. It was the only season in the last 5 when he didn't start 30 games. Since coming into the league in 2005, he is 13th in total game starts with 233. Few pitchers look to be as durable as Santana.
That view is changing. Some sources have already reported that Santana might be injured. Most of the rhetoric surrounding his possible injury focuses on his drop in velocity over the course of the 2012 season. He lost close to 2 MPH total.
A generally assumed philosophy is that a pitcher is hurt when the pitcher is experiencing a velocity loss. While not studied directly, I would have a tough time arguing that a pitcher wants to lose velocity and therefore make his pitches easier to hit. Some exceptions will exist, like the velocity drop a pitcher sees when they move from being a reliever to a starter, but a drop is not a good sign for a pitcher and can be an early sign of an injury.
Besides fastball speed, I have been trying to recreate the injury projection work started by Josh Kalk back in 2009 at the Hardball Times and recreated by Kyle Boddy earlier this fall. I have not completed the work to the level that I would like it to finally end up at. I feel have enough information to see if a pitcher is showing some signs of injury. With the details of the work in the Appendix at the end of the article, I created a value, Injury Index, which measures the chances a pitcher is hurt (0.0 being 0% chance of injury and 1.0 being 100% chance of injury).
I have gone ahead and plotted Santana's Injury Index and average fastball speeds over the past 2 seasons.
Through much of the 2011 season, Santana was able to maintain a fastball speed around 93 mph, except for a drop in early July. That was at least until the 2nd to last game of the season when he saw his velocity drop and his Injury Index spike.
In 2012, his velocity stayed near 92 mph and his injury factor steadily increased with a couple of large jumps along the way. On August 26, his injury index spiked near .8. It is the highest value I have seen without the pitcher eventually needing Tommy John Surgery.
Does the drop in velocity and the high Injury Index mean that Santana will be injured in 2013? No, but the chances are higher for him missing extensive time next year compared to other pitchers. The Royals really need a couple of free agent pitchers to give the team 30-35 starts in 2013. Santana was brought in to be a consistent starter and be on the mound every 5 days. It looks like the Royals may have spent quite a bit of their free agent money on damaged goods. At least Santana will only be on the Royals one year.
Appendix - Description of Injury Index
I have been wanting to recreate the work done by Josh Kalk for a while. I made it my goal this off season to have some working variation of it running. The model I have so far is just in its infancy and hopefully will grow in scope and accuracy with time.
Kalk and Boddy used neural networks to find injured pitchers. I had no idea on how to set up a neural network and could not find anyone else that could help. Instead, I used logistic regression for my analysis. Logistic regression takes different variables as inputs and and yes (0) or no (1) outputs. Then, it finds a percentage chance of the input variables leading to one of the two outputs.
For this analysis, I looked at the variance of a pitchers release points and breaks over the last 10 fastballs that a pitcher threw in a game. Also, I looked at the difference in average speed from the "first 4 fastballs of the last 10 pitches" minus the "average of the last 4 fastballs". I was basically looking to see if pitcher could maintain their speed and mechanics at the end of the game.
Here is an example of the breakdown in Santana's mechanics on August 26th. The main issue contributing to the high injury factor that day was an inconsistent release point. Here are his release points from the 6th and 7th inning on the 26th and the release points from the previous game he started.
As it can be seen with Pitchf/x data, his release point differs by over a foot during the game on the 26th. Also, he did not have this problem on the 21st.
Here are the two pitches in which his release point differs the most.
The differences are tough to tell, especially seeing the pitch only once. To show the difference, I put just the release points from 3 pitches together. I drew a box from the release point to the point were the grass meets the dirt in front of the batter's box.
The difference is easier to spot, but it is still not an ideal method. Pitchf/x data makes the differences easier to spot.
To get a sample of pitchers, I looked for pitchers that were assumed to be either healthy or hurt. I tried to stay away from the middle of the road pitchers whose health was in question. For the healthy pitchers, I took pitchers from the last 2 seasons, better Pitchf/x data, that missed no time to injuries and those pitchers that ended up having Tommy John surgery. I was looking at quality of data vice quantity of data to create the equation.
I am continuing to expand the data set, but it takes around an hour to collect the data for each pitcher and run an analysis. Right now, I am concentrating on automating the process so I will be able to get the data for all pitchers in seconds vice days.
The injury index it is not close to being 100 accurate and it never will be. It will just find the chances for injury. Here is a look at some pitchers and their Injury Index. It is a look at the pitchers that fit into all aspects of the model.
Here are some pitchers that had their Injury Index go over .8 once during the season and each of them ended up having TJS.
Now here is a look at the charts for a couple of healthy pitchers
Finally, here are a couple of exceptions. Danny Duffy showed no signs of injury and Gio Gonzalez looked to be DL bound in the middle of July.
To further expand out Gonzalez's data, here is his Injury Index plotted against his average fastball velocity.
There is definitely some trend of his velocity increasing when his Injury Index is low and vice versa. I have only plotted Santana's and Gonzalez's Injury Factor and average fastball speeds on the same axis, but with this super small sample size, there does look to be some correlation between the two.
So much more work needs to be done on this front and I will be releasing more and more information on it once it becomes available.
h/t to Bill Petti for the dual axis graphs