This semester I’m being re-acquainted with several statistical distributions, and thus, I can’t help but put them into baseball terminology. A lot of statistical models beyond the scope of raw data (the likes of ERA or OBP) can be used to describe almost every situation/subgame of baseball. Here’s an example.
For starters, the Poisson Distribution models the amount of hits during some interval of time. A simple model is for a line from zero to t, how many hits will be in this interval? This ‘interval’ can be applied to larger geometric shapes, like a circle, or any unit of area, including a baseball field.
Where am I getting at? What if we could model this distribution for where a baseball will land in a field? [lightbulb!] It’s still a premature idea, but if we condition the distribution based on player/pitcher history, assessment of pitch types and weight on ballpark environment, we could simulate where the ball is hit.
In the same way we can formulate a defender’s area or zone of defense as a poisson area. For every player you could dictate how much room or zone they are able to defend with different rates for each additional unit of area. I think this is a more helpful distribution for evaluating defense, and could help GM’s understand just how good a defender is. An example of this is shouldn’t a ball hit 3 feet to the left of an outfielder be judged as being an out at a different rate than a ball hit 15 feet away from the fielder? If that’s the case these rates could tell us something about the zone or range a fielder can cover, and the accuracy in which they are able to make successful plays.
I haven’t written out any code or graphed any of these ideas, but hopefully I’ll have time soon to show some data on what I have in my head.