If you follow baseball at all, you are probably familiar with the concept of ERA as a summary statistic of a pitcher’s performance. You may also be familiar with work over the last couple of decades developing the concept of Defense Independent Pitching Statistics (DIPS) such as Fielding Independent Pitching (FIP). The basic idea is that ERA can be unreliable both in isolating a pitcher's performance from his defense and luck, and (as a result) in projecting future performance. FIP addresses these problems by focusing on the outcomes over which a pitcher has the most control (strikeouts, walks, and home runs) and ignoring everything else. The result is a metric with some nice properties:

- FIP only includes outcomes which are independent of the defense
- FIP has a stronger year-to-year (Y2Y) correlation than ERA
- FIP predicts the following year's ERA better than ERA itself does
- FIP is easy to compute with access to basic pitching stats

Put together, these four properties have made FIP one of the most-used “advanced stats” for evaluating pitchers, with (2) and (3) recommending it over ERA, and (4) recommending it over similarly-intentioned stats such as tERA, xFIP, and SIERA, which have more complicated formulations. There’s just one problem with all of this: (1) isn’t actually true.

The fielding-dependence of FIP isn’t obvious at first, because it doesn’t come from the home runs, walks, or strikeouts. It comes from the normalization factor used to convert those three stats into rates, innings pitched:

Innings pitched is outs-recorded divided by three, and there are lots of different ways to record outs. Most of those ways are dependent on the defense. As a result, if you take two hypothetical pitchers who perform identically to each other but give one a good defense and the other a team full of Skip Schumakers*, the former will accrue more outs over time, and thus have a lower FIP.

Given all that, I decided to try squeezing a little more of the defense out of the equation. Rather than using IP, I normalized each rate by the number of plate appearances that resulted in one of the three pitcher-controlled outcomes. In other words, if we only consider PAs that resulted in a walk, strikeout, or home run, what percentage of that subset of PAs yielded each of the three outcomes. The result is a metric with even less-dependence on the defense and BABIP luck than FIP, which I’m referring to as FIPer (pending a better name):

This is a small tweak of the original FIP formula, but given that it is a purer embodiment of the goal of FIP (eliminating things outside of the pitchers control), and is no more difficult to compute (it actually requires one fewer component stat, as IP need not be known), even a small improvement with respect to the other desired properties would be relevant. I also performed the same tweak to xFIP, which replaces *HR* with the "expected" number of home runs given their fly ball tendencies, (computed by multiplying the number of fly balls given up by a pitcher by the league average HR/FB ratio).

With FIPer in hand I, like many others, turned to the internet for validation. In so doing, I happened across this post from Beyond the Boxscore looking at the Y2Y correlation of basically every pitching statistic you could think of. The analysis considered all pitchers from 2004-2011 who recorded at least 162 innings in two consecutive years (so, basically, healthy starting pitchers). I decided to piggyback on this great study and see how FIPer stacks up.

As referenced in the numbered points at the top of this post, there are two basic things to look at for the metric: Y2Y correlation with itself, and Y2Y correlation with ERA. On both counts, FIPer compares favorably with FIP.

As you can see, FIPer shows a considerably higher Y2Y correlation than FIP. Similarly, xFIPer shows improvement over of xFIP. This means that a player’s FIPer and xFIPer are more stable over time than his FIP and xFIP, respectively, consistent with the metrics being a better representation of the true talent level. FIPer also shows a stronger Y2Y correlation than FIP with ERA, meaning that it better predicts the following year’s ERA. In fact, for this particular data set, FIPer outperformed all the other metrics in the study, with xFIPer placing second.

I'd like to stress that these results are only considering a single set of data for which someone else had already done the work, so the comparisons to the full slate of stats is far from definitive. I have, however, looked at a number of different date ranges and IP cutoffs in comparing FIP, FIPer, xFIP, and xFIPer, and the modified metrics consistently resulted in better Y2Y correlation with self and with ERA.

If we take a look back at the four key features of FIP mentioned above, I’d argue that FIPer shows advantages over FIP on all counts. By avoiding the use of IP, it should be even less influenced by a pitcher’s defense than FIP, and it is just as easy to calculate. The Y2Y correlation with itself is stronger, as is the Y2Y correlation with ERA and even FIP.

**While one could easily construct a more defensively-inept team than one full of Skip Schumakers, I doubt any of them are as likely to actually be constructed.*

Posted on May 4, 2012

Filed under
Company Philosophy