To follow up on the idea of Billy Butler hitting home runs in bunches, I will look at all 29+ home run hitters in 2012. Yes, I cherry picked to get Billy in there, but I already had the data and 29 is no more arbitrary than 30. The link below goes to the first post.
So, Billy seemed to cluster his home runs, meaning he hit a lot of home runs in the first few games after hitting one. This does not seem to be unusual in any way after looking at the other players' data.
As you can see, the graph shows that all of the 29+ home run hitters combined show the same tendency for heavy positive skewness. This graph includes all gaps between home runs for the top 28 home run hitters in 2012. There are 930 gaps total with an average of 4.2 games between home runs. About a third (66.8%) of the gaps are less than the average, and 57.7% of the home runs are hit in the same game or in the next three games after a home run is hit. Only one players, Alfonso Soriano, managed to have more than 50% of his homerun gaps above his average home run gap. He only managed that feat by having no home runs in his first 31 games, and the only gaps included were the number of games between home runs one and two, two and three, etc. That means long gaps before the first home run and after the last home run of the year were not included.
This rather cursory look at home run clustering shows that the probability of hitting the next home run on a cumulative basis looks something like this.
This is just another way to show that the majority of the density disappears rather quickly. There are several things that could contribute to what these graphs show. One is that series played in home run friendly parks cause players to get more home runs in those series, and therefore they are clustered. Another is the same idea, only that it is predicated on pitcher quality and handedness. I would like to use this data and regress it on those factors, which for the parks would not be terribly hard to do. The pitchers would be difficult as far as gathering the data is concerned, so if anyone knows how I can get this sort of match-up data easily I would appreciate it.