Crazy schedule for me these next few weeks. I’ll try to stay active, I promise. If not, IceBat will take over, but I’m guessing he won’t say much (he’s pretty shy and likes to chill in the corner of my room). Anyways, I thought I’d share a recent report I did for my times series class. It’s about the general shift of runs scored per game (by one team) over the years of MLB’s existence. If you have some time (and enjoy a few technical terms) I’ve uploaded a link below. Happy December holidays!
About two weeks ago, the Oakland Athletics won negotiating rights (through a $19 million bid) with Hisashi Iwakuma, who has played in the Japan Pacific League his entire career. Afterwards, GM Billy Beane made a couple of moves to suggest the A’s were at least 75% sure they would sign Iwakuma. Unfortunately, talks have stalled between the two sides. There are numerous reports suggesting Iwakuma wants Barry Zito (and we all know how well that went for the Giants) money or that the A’s are unwilling to negotiate beyond a $3-4 million average salary base. Either way, one of the sides has been castrated by the media as the demon.
But who’s right here? Is there enough past history of Japanese pitchers coming to the American market to justify a $15+ million average salary? Or can the A’s justify giving Iwakuma the same salary he received in Japan because of the high cost of the negotiating bid? I’ve listed some recent Japanese pitchers who made the move to the big leagues, and some meaningful figures.
Ever wonder the exact location, movement, speed, rotation, spin angle of a pitch? With Pitch F/X, every ball thrown in the majors is calculated to a science. It’s pretty awesome but even after spending hours looking at the data, it can be a bit confusing as to what the variables mean and how they are meaningful. I’ll try to explain most of the variables to the best of my abilities. I’ll be using F/X data from Dallas Braden’s perfect game on May 9, 2010 against the Tampa Bay Rays.
Batting averages, on-base percentages, and ERA are all standard metrics used to compare baseball players. But how do they help us to determine a player’s day-to-day production? I wanted to look at two player’s overall game performance over the course of 100+ starts. Let’s begin with Albert Pujols.
I think it would be an understatement to say Pujols has been the most consistent hitter in baseball over the past 5+ years. He’s a sure bet to be in the MVP talks from the beginning of each season. I decided to look at his Win Probability Added per game (a plus WPA means he added towards winning the game, while a negative WPA suggests a player was detrimental to the winning cause) over the 2010 season.
First, I’d like to give an obligatory hat tip to the San Francisco Giants for winning the World Series against the Texas Rangers, 4-1. Despite my inner-feelings to not root for you (due to my allegiance to the A’s), that was one of the best pitching performances of post-season history, probably since the 2001 Arizona Diamondbacks. Despite losing, Texas has a lot to be proud of. They continued to play their type of baseball day in and day out.
The subject for tonight’s post is a metric not many casual baseball fans know of: batting average on balls put in play (or from hereon, BABIP). It essentially answers the question, out of all the balls a player hits that are field-able by the defense, what percentage of balls will fall for a hit? Note, this is different from a regular batting average, which includes strikeouts and home runs.
Baseball statisticians love this metric because, for obvious reasons, pitchers are not always in control of the amount of hits they allow in a game. There’s just too many factors that can affect the outcome of a hit: Hard line drives are caught by diving center fielders, a bloop single can fall between defenders, ground balls can barely get past the glove of an infielder. When these ‘are you serious?’-hits are allowed, we kind of assume tough luck has graced the pitcher. And when we see excellent defensive plays, we think the pitcher is lucky and fortunate to have player X in the outfield. How many times have you seen this happen in baseball games? Too often.
In my final semester, I’m obligated to complete two projects, so I thought, why not center one of them around baseball data? So far, my inclination is to use regression analysis to hone in on some baseball questions that I’ve been pondering about lately. Here are some of my ideas, with problems/conflicts arising from further thought included:
Assessing Starting Pitcher’s Risk to Injury
This is a clear issue in baseball. With guys like Stephen Strasburg going into Tommy John surgery (a serious surgery that take 12-18 months to recover from), it’s in the best interest of team executives, players and fans alike to find reasons behind why many young pitchers are blowing out their arms. With this in mind, I believe there are two ways to use regression to learn how to keep these players healthy.
- We could use past injury data as our explanatory variables. This seems intuitive, as past-injuries would seem a good indicator of how likely you will be injured in the future. Thus, it seems a bit redundant analysis, and the only real conclusion that I see likely occurring is ‘once injuring your throwing arm’s elbow, you’re screwed.’ There’s gotta be some more in depth conclusion we can come up with.
- Pitching mechanics. There’s a ton of debate as to how much pitching mechanics really determine injury risk. Some argue good mechanics will help a pitcher last 20 years (the Greg Maddux or Jamie Moyer fans), but some also say if you change a young pitcher’s delivery (like King Felix), he may not have the success on batters. The issue with this is, where’s the data? How can I quantify pitching mechanics? So far researching this, I could probably make a bunch of parameters for pitching styles, like average speed on pitch types, pitching angle, right-handers vs left-handers, ball movement from pitcher’s release to home plate. Things like that may give us useful knowledge in what differentiates Mark Prior from Justin Verlander (two highly regarded young pitchers, but with Prior known for his past injuries). But in general, this data doesn’t exist (to my best knowledge).
Game Theory: Batter vs. Pitcher over the course of one game, or a career
A starting pitcher will generally face a batter at least 3 times before being pulled from the game. Thus I would argue at-bats are correlated with one another, since past at-bats give a pitcher or batter a better understanding what their foe will do in the next at-bat. Say if a batter strikes out his first time up, will he know what to do next time in his approach? Or if a pitcher gives up a home run to a batter, will he understand not to make the same mistake again? These questions scream game theory, and it’d be interesting to look at data for several pitchers vs batters to see which players learn more from their mistakes or failures.
Probability of a successful defensive play
It’d be cool to simulate defense, and be able to compare players’ defense skills. Assessing defensive skills is a huge problem in baseball, as there are a lot of variables and circumstances that make one ball put in play different from another. Currently, defense statistics are out there, but are very limited and still don’t say a lot when comparing players. Variables to regress on the outcome of a play would include type of ball hit (this includes ground balls, line drives, fly balls), angle of ball hit, ball speed/ drop rate, defensive ability of player, type of pitcher on the mound, etc.
Tim Lincecum is currently in his first year of arbitration with the Giants (If you don’t know what arbitration is, look at this post). In a short synopsis, it is speculated Lincecum will file for anywhere between $8 and $24 million for his one year contract. How it works is that Lincecum has to file what he believes he should get as salary, and the Giants can either accept the offer, or offer him something else (which is usually a lower salary). What’s important to note is both numbers will correlate as to what each side believes Timmy is going to be worth for the 2010 season. if both sides can’t agree to a salary, they both go to court and argue their sides for a correct amount.
What’s cool is that Lincecum has a lot of space to work with, as to what will actually be granted to him through this process. If he asks for way too much (like upwards of $24 million), then the Giants could get a huge discount (say $12 million) if the court sides with the Giants. However, if he asks for what is probably his worth (say $18 million), he’s probably leaving a couple million dollars on the table. So it’s an interesting game theory model in which both sides must also think about what the other will do in certain situations, the interdependency of choice. What’s best for both sides is coordination, yet the uncertainty of what the other will do causes conflict and may result in giving Lincecum either more or less of his true worth.
Here’s a good summary of the figures, taken from an article in Baseball Prospectus yesterday:
I think Lincecum will win $18 million, hands down, but that’s a different question as to what he should offer. The arbitration panel picks the offer (the team’s or the player’s) that they believe to be most correct. If the Giants propose $12 million and Lincecum goes $18, Lincecum wins. But, he also could win if he goes as high as $24! To suggest $18 million would be to leave possibly $6 million on the table. Now, I don’t think the Giants will go that low—they will go under $18 though—so, I wouldn’t recommend $24 million, but I think Lincecum will ask for more than $18. How much more has to do with factors that I know nothing about. Have the parties discussed figures for a long-run contract? I suspect they both know something about what the other party might offer.
It’s the same exact negotiations any business goes through in evaluating promotions and raises. What’s more important, the number of years serviced to the firm, or how much the employee has increased revenue in the past, even if it has only been one or two years.