Fancy Stats and the Problems With Misleading Statistics

Neil Greenberg had an interesting, for lack of a better word, piece in his Fancy Stats blog on Wednesday about Stephen Strasburg’s issues with line drives and particular with his change up. There were numerous problems with this article. So in homage to the great Fire Joe Morgan, we’ll just break it down piece by piece.

We start off with this graph of various statistical results on line drives against Strasburg by year. linedrives

To start with there’s the hilariously useless y-axis which provides absolutely no context to what the bars of this graph actually mean. Furthermore, even just judging the bars relative to each other there’s already a small problem. The BABIP, batting average and on-base percentage are all lower than in 2010-2012, when Strasburg was at his best as a pitcher. Greenberg decides to focus on the power results, which isn’t the worst idea in the world, since the slugging and ISO numbers are higher than normal and more extra base hits on fewer hits can still be a problem.

So now that we have part one of his thesis: Strasburg is giving up more extra base hits on line drives this season and it’s hurting him, let’s move on to part two: that his change up is the root cause of this issue. Greenberg starts this off with another graph.

changeups

Well that trend certainly looks worrying, it started so low and is now so high! Ignoring the fact that his peak change up line drives is pretty much in line with his peaks in 2013 and 2010, there’s a much bigger problem here. The fact that he used month-to-month splits.  This year Strasburg has thrown 602 change ups, in 2013 he threw just 461. These entire year sample sizes are just big enough to make a fairly accurate judgement. But when you split 600 over the five months of this season it averages out to just 120 pitches. That’s a tiny sample and is pretty much useless. This should be obvious when you notice how extreme the points jump up and down. Just take a look at those fastball numbers in 2014 to get an idea.

His next graph has the exact same problem. And even worse the sample is split up even further into right handed and left handed batters.

changeupliners

That looks like a worrying trend at first. But Strasburg had one start in March, hardly a useful sample. If you remove that March start his line drive totals isolate between 20 and 30, again right in line with what he was putting up for his entire career. Let’s help Greenberg out by changing these graphs from month-to-month to our much more useful sample sizes of year-to-year and removing those non-MLB 2009 stats.

Brooksbaseball-Chart(1) Brooksbaseball-Chart(2)

Well those are some boring flat graphs. I guess we can see now why he didn’t choose to do it this way. And then the unexpected, Greenberg actually gets a point right when noting that lefties have been hitting for a higher ISO than righties against Strasburg’s change up. He does so using this graph. isochangeup

Which again suffers from the same month-to-month issues as the previous graphs. Let’s help him out again by giving him a year-to-year graph.

Brooksbaseball-Chart(3)

And presto we can see that Greenberg actually got this right. Lefties have been hitting for a much higher ISO against Strasburg’s change up in 2014, both compared to righties in 2014 and the same rates in 2011-2013. But here we see yet another problem, just because the ISO is higher it doesn’t make it automatically bad. Lefties have a .123 ISO against Strasburg’s change up in 2014, which is below average. To give you an idea of how low that is, Wilson Ramos’ 2014 ISO is .125, which is so low it has led to articles about where his power has gone. So while the difference is noticeable, it’s a stretch to call it a problem.

Greenberg follows that up with his worst point in the article. The impetus behind this supposed problem, he proclaims, is where Strasburg is locating his change up. Mainly, that lefties are mashing the ball low and away. Anyone with even a small knowledge of pitching should be able to tell you how patently ridiculous this claim is. There’s a reason lefty power hitters like Adam LaRoche and Ryan Howard get such extreme shifts, they’re dead pull hitters. In other words, they have a lot of issues with low and away pitches, which are very difficult to pull. Additonally, a change up is supposed to look like fastball out of the pitcher’s hand and then dart down and away from a hitter so they get on top of the ball and make weak contact, if they make contact at all. So the best place a pitcher could be locating his change up against lefties is low and away.

In his argument Greenberg uses these two graphs showing Strasburg’s zone location on change ups to lefties that were hit for line drives.

locations locations2

 

Ignoring the fact that there’s a total of 22 pitches across two years being displayed, a sample so small it shouldn’t have to be mentioned how useless it is, there’s an even bigger problem here. If Strasburg is throwing all of his change ups down and away to lefties, then why is it surprising that the line drives hit against him are also down and away? A more useful graph would be to show lefties’ ISO against Strasburg’s change ups by location, with the caveat that even these are such small samples as to be useless.

plot_profile

Hey, look at that, lefties haven’t hit for any power at all against Strasburg’s change up low and away. Hitters are sporting a .083 ISO against 12 such pitches in the strike zone and a .000 ISO on 35 such pitches outside of the strike zone. The three worst spots are actually middle low, perhaps the best location for power hitting, and up and away, which is exactly where Greenberg claimed was where Strasburg should be locating the pitch more. Now these sample sizes are much too small to make any real judgement against Strasburg, but they’re big enough to make you question Greenberg’s conclusions.

This isn’t the first time Greenberg has had issues with small sample sizes and selectively cherry picking stats to back up his arguments. Eric Fingerhut just recently wrote a great breakdown of an article on RGIII that Greenberg wrote that suffered from the exact same problems. I love advanced statistics and I am glad that the Washington Post has tried to support the advanced analytics community by establishing and promoting the Fancy Stats blog. But the issues presented in this post and in the post Mr. Fingerhut detailed are undermining the blog’s credibility. I want the Fancy Stats blog to succeed, having a voice for advanced statistical analysis in a major publication, which the Post still is, is huge. But if it keeps making these basic mistakes it will lose the support of the very people it should be giving a voice to.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s