In my final semester, I’m obligated to complete two projects, so I thought, why not center one of them around baseball data? So far, my inclination is to use regression analysis to hone in on some baseball questions that I’ve been pondering about lately. Here are some of my ideas, with problems/conflicts arising from further thought included:
Assessing Starting Pitcher’s Risk to Injury
This is a clear issue in baseball. With guys like Stephen Strasburg going into Tommy John surgery (a serious surgery that take 12-18 months to recover from), it’s in the best interest of team executives, players and fans alike to find reasons behind why many young pitchers are blowing out their arms. With this in mind, I believe there are two ways to use regression to learn how to keep these players healthy.
- We could use past injury data as our explanatory variables. This seems intuitive, as past-injuries would seem a good indicator of how likely you will be injured in the future. Thus, it seems a bit redundant analysis, and the only real conclusion that I see likely occurring is ‘once injuring your throwing arm’s elbow, you’re screwed.’ There’s gotta be some more in depth conclusion we can come up with.
- Pitching mechanics. There’s a ton of debate as to how much pitching mechanics really determine injury risk. Some argue good mechanics will help a pitcher last 20 years (the Greg Maddux or Jamie Moyer fans), but some also say if you change a young pitcher’s delivery (like King Felix), he may not have the success on batters. The issue with this is, where’s the data? How can I quantify pitching mechanics? So far researching this, I could probably make a bunch of parameters for pitching styles, like average speed on pitch types, pitching angle, right-handers vs left-handers, ball movement from pitcher’s release to home plate. Things like that may give us useful knowledge in what differentiates Mark Prior from Justin Verlander (two highly regarded young pitchers, but with Prior known for his past injuries). But in general, this data doesn’t exist (to my best knowledge).
Game Theory: Batter vs. Pitcher over the course of one game, or a career
A starting pitcher will generally face a batter at least 3 times before being pulled from the game. Thus I would argue at-bats are correlated with one another, since past at-bats give a pitcher or batter a better understanding what their foe will do in the next at-bat. Say if a batter strikes out his first time up, will he know what to do next time in his approach? Or if a pitcher gives up a home run to a batter, will he understand not to make the same mistake again? These questions scream game theory, and it’d be interesting to look at data for several pitchers vs batters to see which players learn more from their mistakes or failures.
Probability of a successful defensive play
It’d be cool to simulate defense, and be able to compare players’ defense skills. Assessing defensive skills is a huge problem in baseball, as there are a lot of variables and circumstances that make one ball put in play different from another. Currently, defense statistics are out there, but are very limited and still don’t say a lot when comparing players. Variables to regress on the outcome of a play would include type of ball hit (this includes ground balls, line drives, fly balls), angle of ball hit, ball speed/ drop rate, defensive ability of player, type of pitcher on the mound, etc.