Defense, IceBat

Gold Glove Nonsense

So…I was having a pretty good day, woke up, had some coffee, some class, relaxing before a midterm, midterm went well, excellent meal with my one and only, HIMYM, and then one of those naps where you don’t feel entirely groggy/sluggish after waking up (ie – the best kind). And then I turn my head to the daily baseball headlines and find that the Gold Glove Awards, an award for the best defenders in the game (or so you would like to think) for the AL were announced:

NEW YORK (AP)—Seattle right fielder Ichiro Suzuki has won his 10th straight Gold Glove and New York Yankees shortstop Derek Jeter has won his fifth overall…

I find it a little hard to believe that Derek Jeter was able to win this award, for the second time in a row. All the news sources point to his efficiency with only 6 errors over the year. Well what about those balls that he couldn’t possibly get to, considering his limited range (meaning he just can’t get to balls that are farther away the way other start shortstops can)? Those balls get scored as hits rather than “Balls Jeter couldn’t get to”.

Also, any advanced defensive metric out there these days answer many questions that can lead us toward comparing fielders. Like how much ground is a shortstop able to cover? Is this due to great timing/instinct or great footwork? How accurate is his arm? How many runs can he save over the course of the year? In any of these metrics, you will find Derek Jeter at the bottom of the list. Guaranteed.

Continue reading

Defense, Pitching

Batting Average on Balls Put in Play

First, I’d like to give an obligatory hat tip to the San Francisco Giants for winning the World Series against the Texas Rangers, 4-1. Despite my inner-feelings to not root for you (due to my allegiance to the A’s), that was one of the best pitching performances of post-season history, probably since the 2001 Arizona Diamondbacks. Despite losing, Texas has a lot to be proud of. They continued to play their type of baseball day in and day out.

The subject for tonight’s post is a metric not many casual baseball fans know of: batting average on balls put in play (or from hereon, BABIP). It essentially answers the question, out of all the balls a player hits that are field-able by the defense, what percentage of balls will fall for a hit? Note, this is different from a regular batting average, which includes strikeouts and home runs.

Baseball statisticians love this metric because, for obvious reasons, pitchers are not always in control of the amount of hits they allow in a game. There’s just too many factors that can affect the outcome of a hit: Hard line drives are caught by diving center fielders, a bloop single can fall between defenders, ground balls can barely get past the glove of an infielder. When these ‘are you serious?’-hits are allowed, we kind of assume tough luck has graced the pitcher. And when we see excellent defensive plays, we think the pitcher is lucky and fortunate to have player X in the outfield. How many times have you seen this happen in baseball games? Too often.

Continue reading

Defense, Pitching, Risk Aversion

Project/Research Ideas

In my final semester, I’m obligated to complete two projects, so I thought, why not center one of them around baseball data? So far, my inclination is to use regression analysis to hone in on some baseball questions that I’ve been pondering about lately. Here are some of my ideas, with problems/conflicts arising from further thought included:

Assessing Starting Pitcher’s Risk to Injury

This is a clear issue in baseball. With guys like Stephen Strasburg going into Tommy John surgery (a serious surgery that take 12-18 months to recover from), it’s in the best interest of team executives, players and fans alike to find reasons behind why many young pitchers are blowing out their arms. With this in mind, I believe there are two ways to use regression to learn how to keep these players healthy.

  • We could use past injury data as our explanatory variables. This seems intuitive, as past-injuries would seem a good indicator of how likely you will be injured in the future. Thus, it seems a bit redundant analysis, and the only real conclusion that I see likely occurring is ‘once injuring your throwing arm’s elbow, you’re screwed.’ There’s gotta be some more in depth conclusion we can come up with.
  • Pitching mechanics. There’s a ton of debate as to how much pitching mechanics really determine injury risk. Some argue good mechanics will help a pitcher last 20 years (the Greg Maddux or Jamie Moyer fans), but some also say if you change a young pitcher’s delivery (like King Felix), he may not have the success on batters. The issue with this is, where’s the data? How can I quantify pitching mechanics? So far researching this, I could probably make a bunch of parameters for pitching styles, like average speed on pitch types, pitching angle, right-handers vs left-handers, ball movement from pitcher’s release to home plate. Things like that may give us useful knowledge in what differentiates Mark Prior from Justin Verlander (two highly regarded young pitchers, but with Prior known for his past injuries). But in general, this data doesn’t exist (to my best knowledge).

Game Theory: Batter vs. Pitcher over the course of one game, or a career

A starting pitcher will generally face a batter at least 3 times before being pulled from the game. Thus I would argue at-bats are correlated with one another, since past at-bats give a pitcher or batter a better understanding what their foe will do in the next at-bat. Say if a batter strikes out his first time up, will he know what to do next time in his approach? Or if a pitcher gives up a home run to a batter, will he understand not to make the same mistake again? These questions scream game theory, and it’d be interesting to look at data for several pitchers vs batters to see which players learn more from their mistakes or failures.

Probability of a successful defensive play

It’d be cool to simulate defense, and be able to compare players’ defense skills. Assessing defensive skills is a huge problem in baseball, as there are a lot of variables and circumstances that make one ball put in play different from another. Currently, defense statistics are out there, but are very limited and still don’t say a lot when comparing players. Variables to regress on the outcome of a play would include type of ball hit (this includes ground balls, line drives, fly balls), angle of ball hit, ball speed/ drop rate, defensive ability of player, type of pitcher on the mound, etc.


    Defense as a Poisson Distribution

    This semester I’m being re-acquainted with several statistical distributions, and thus, I can’t help but put them into baseball terminology. A lot of statistical models beyond the scope of raw data (the likes of ERA or OBP) can be used to describe almost every situation/subgame of baseball. Here’s an example.

    For starters, the Poisson Distribution models the amount of hits during some interval of time. A simple model is for a line from zero to t, how many hits will be in this interval? This ‘interval’ can be applied to larger geometric shapes, like a circle, or any unit of area, including a baseball field.

    Where am I getting at? What if we could model this distribution for where a baseball will land in a field? [lightbulb!] It’s still a premature idea, but if we condition the distribution based on player/pitcher history, assessment of pitch types and weight on ballpark environment, we could simulate where the ball is hit.

    In the same way we can formulate a defender’s area or zone of defense as a poisson area. For every player you could dictate how much room or zone they are able to defend with different rates for each additional unit of area. I think this is a more helpful distribution for evaluating defense, and could help GM’s understand just how good a defender is. An example of this is shouldn’t a ball hit 3 feet to the left of an outfielder be judged as being an out at a different rate than a ball hit 15 feet away from the fielder? If that’s the case these rates could tell us something about the zone or range a fielder can cover, and the accuracy in which they are able to make successful plays.

    I haven’t written out any code or graphed any of these ideas, but hopefully I’ll have time soon to show some data on what I have in my head.