Why the Yankees dominate Minnesota

As a fan and statistician, it is upsetting to think about the past 4 times Minnesota has played the Yankees in the playoffs. They have not one a single game against those Bombers; nothing to show or prove to fans of their successful season. A big fat zero in the win column. As Fangraphs points out, it’s almost illogical that one team can dominate like this over a span of 5+ years. The turnaround from each year’s team essentially makes each playoff match totally different from the year’s past. The only thing associating each year’s team together is team location and mascots.

So what’s the deal, Twin city? There one thing I noticed when watching the recent ALDS between the two teams.

It was obvious this team was scared from the seventh inning on. You could see it in how they played, and how commentators continually associated the bottom of the ninth with Mariano Rivera. Honestly in the post-season, I can see why such a reputation can be pretty intimidating. Especially with his manager’s low tolerance for trouble even in the eighth, he will turn to Mariano. So even if you muster a hit in the eighth, Joe Girardi will with no doubt bring out his big guns. Thus I don’t know how Minnesota could muster uup confidence in those situations. We saw it in the first two games. Minnesota blew a lead in the 6th, 7th innings, and couldn’t find their way out of the deficit. The Yankees are in the Twinkie’s heads. Better yet, Mariano Rivera is in the heads of the Twins.

And maybe this team is just too young. In the beginning of the year, i thought they were going to be great, anchored by a pitching staff including coming-of-age aces Scott Baker and Kevin Slowey. I still think fondly of these pitchers, but apparently they weren’t good enough to be slotted into the playoff rotation. Maybe we just haven’t seen the best of this Twins team. Better yet, we know each year’s team is drastically different, so why not be hopeful that the 2011 Twins > 2011 Yankees? It can happen.

Defense, Pitching, Risk Aversion

Project/Research Ideas

In my final semester, I’m obligated to complete two projects, so I thought, why not center one of them around baseball data? So far, my inclination is to use regression analysis to hone in on some baseball questions that I’ve been pondering about lately. Here are some of my ideas, with problems/conflicts arising from further thought included:

Assessing Starting Pitcher’s Risk to Injury

This is a clear issue in baseball. With guys like Stephen Strasburg going into Tommy John surgery (a serious surgery that take 12-18 months to recover from), it’s in the best interest of team executives, players and fans alike to find reasons behind why many young pitchers are blowing out their arms. With this in mind, I believe there are two ways to use regression to learn how to keep these players healthy.

  • We could use past injury data as our explanatory variables. This seems intuitive, as past-injuries would seem a good indicator of how likely you will be injured in the future. Thus, it seems a bit redundant analysis, and the only real conclusion that I see likely occurring is ‘once injuring your throwing arm’s elbow, you’re screwed.’ There’s gotta be some more in depth conclusion we can come up with.
  • Pitching mechanics. There’s a ton of debate as to how much pitching mechanics really determine injury risk. Some argue good mechanics will help a pitcher last 20 years (the Greg Maddux or Jamie Moyer fans), but some also say if you change a young pitcher’s delivery (like King Felix), he may not have the success on batters. The issue with this is, where’s the data? How can I quantify pitching mechanics? So far researching this, I could probably make a bunch of parameters for pitching styles, like average speed on pitch types, pitching angle, right-handers vs left-handers, ball movement from pitcher’s release to home plate. Things like that may give us useful knowledge in what differentiates Mark Prior from Justin Verlander (two highly regarded young pitchers, but with Prior known for his past injuries). But in general, this data doesn’t exist (to my best knowledge).

Game Theory: Batter vs. Pitcher over the course of one game, or a career

A starting pitcher will generally face a batter at least 3 times before being pulled from the game. Thus I would argue at-bats are correlated with one another, since past at-bats give a pitcher or batter a better understanding what their foe will do in the next at-bat. Say if a batter strikes out his first time up, will he know what to do next time in his approach? Or if a pitcher gives up a home run to a batter, will he understand not to make the same mistake again? These questions scream game theory, and it’d be interesting to look at data for several pitchers vs batters to see which players learn more from their mistakes or failures.

Probability of a successful defensive play

It’d be cool to simulate defense, and be able to compare players’ defense skills. Assessing defensive skills is a huge problem in baseball, as there are a lot of variables and circumstances that make one ball put in play different from another. Currently, defense statistics are out there, but are very limited and still don’t say a lot when comparing players. Variables to regress on the outcome of a play would include type of ball hit (this includes ground balls, line drives, fly balls), angle of ball hit, ball speed/ drop rate, defensive ability of player, type of pitcher on the mound, etc.


    Tulo’s Crazy September

    Update: The following graphics do have updated numbers from all of September.

    (I’m just going to ignore the fact that you, the reader, has just realized I have removed the 6-month old dust from this blog.)

    Full version: Tulo Heatmap

    Troy Tulowitzki is crazy! I was reading an article recently about his supposed September surge in numbers. I decided to take a look and created a heatmap in R. The visual is pretty simple to read: light blue=not so good and dark blue=on fire. Remember that so far, 15 games have been played. In that span, Tulo’s had 14 homeruns. September could end right now and he’d have matched or set new highs for a month’s worth of baseball. Although there isn’t much trend in his past numbers, Tulo has shown better numbers in the second half of the season in the ’07 and ’09 campaigns. If you’re Colorado, you’re lovin’ Tulo’s contribution to the Rockies’ playoff push. Let’s hope the next two weeks are interesting.

    Note: You can find a tutorial on heatmaps from the excellent data visualization blog, FlowingData.


    Coolest Website

    I discovered the most awesome baseball data website ever (besides Fangraphs), and I wanted to share with you.

    Hit Tracker is a website that tracks every single home run hit in the majors. With every home run, it gives an analysis of the ball’s path, from hitter to fan. The main categories of stats to look for on a player’s profile of home runs are True Distance, Speed of the Ball off the bat, the angle of elevation and wind/temperature conditions in the ballpark. All these statistics will give you a rough estimate as to whether the player was lucky to have it land out of the park or not. Another added bonus to the site is that they categorize each home run into: “Lucky”, “Just Enough” or “No Doubts”. This allows you to gauge a hitter’s power beyond just the amount of home runs he has, or by a slugging percentage.

    The variation in true distance is drastic, yet the same outcome occurs. Jimmy Rollins can get lucky by hitting a ball 380 feet out in Yankee Stadium that goes out, while Mark Reynolds can hit one 480 feet for a ‘no doubt’ trajectory, yet the return is the same. Seems a little unfair, no? Wouldn’t it be cool for a player to scout wind/temperature stats and know which field and at what vertical elevation to get it pass the fence? For football, kickers scout wind and trajectories of balls all the time, wouldn’t it be cool for hitters to do the same? I guess it’s different though, since the hitter doesn’t have full control of what the pitcher serves up.

    Check it out sometime. They also have links to watch the home run clips.