College Football by the Numbers: July 2010

Saturday, July 31, 2010

Stat-Wise Heisman Rankings

The winner of the Heisman Memorial Trophy is supposed to be the season's most outstanding player. But we all have a different understanding of outstanding. Because there are no specific guidelines about what makes one a Heisman candidate, the process is riddled with problems. A player has a better chance if he spends more time on television, plays for a traditional power, has good weapons around him, etc.

With the Stat-Wise Heisman Rankings I have tried to imitate the more scientific aspects of Heisman voting while avoiding the non-competitive biases. Points are awarded to players based on their opponent adjusted stats (yards, yards per play), their performance in "important" games, and lose points for poor performances (especially turnovers) in losses. At least for the near future, the Stat-Wise Heisman Rankings are limited to offensive players.

Thursday, July 22, 2010

CFB National Championship Props

Based on historical data, poll voting patterns, and thousands of computer simulations, we can estimate each team's chance of hoisting the crystal football trophy. It turns out that the most difficult part of predicting national champions is not simulating games, but ranking teams to match human polls. In fact, 99% of the processing time in each simulation is devoted to ranking teams, not simulating games, but the payout should be worth it. The modified ranking algorithm has a strong correlation with past voting patterns, especially in identifying the BCS qualifiers.

The results above are based on 6,540 simulations using the data that would have been available in mid-October 2009. To jog your memory, Alabama was 6-0 with Auburn left as the toughest game on the regular season schedule (86% win probability). Florida was also heavily favored all the way out, but faced a slightly stickier road. Cincinnati was also undefeated, but simply wasn't as good of a team. TCU still had BYU and Utah left, but Boise St. was sitting pretty to finish the regular season 13-0. Texas was just about to play a two-loss Oklahoma.

Given this reality, Alabama was looking at a 63% chance of playing for a national championship and a 43% chance of winning it all. Florida and Texas were coming in with an 18% chance of a national title. The most likely scenario was an SEC champ vs. Big 12 champ championship game, with nearly a quarter of simulations pointing to a Texas vs. Alabama/Florida title game. Boise St had a 1/4 chance of playing for a national championship--they were all but guaranteed an undefeated season and would have been given the nod over most one loss teams-they were often the benefactors of a conference title game upset. An undefeated TCU was usually picked over a perfect Boise St team, but the Horned Frogs faced a tougher schedule. A few teams were still in the running, but needed to win out and needed a lot of lucky breaks along the way.

Given all the undefeated challengers and conference championship games, championship game participants averaged 12.5 wins. Boise St and TCU never reached a championship game after a loss. In one simulation, a three-loss Alabama team slipped into the championship game against an undefeated TCU and won.

Wednesday, July 21, 2010

Revised Historical Rankings

I've revised rankings for all college football teams since 1900 that have sufficient sample size (see "Past Rankings" on right panel). These rankings are based on a statistical equation that accounts for wins and losses, margin of victory, and strength of schedule. It is designed to mimic the ranking process, but without the biases and heuristics that limit human voters.

Looking at #1s, the computer and I only disagree once since my birth-Kansas in 2007. Dartmouth in 1970 is also a curious pick (#14 AP that season) and Cornell over Texas A&M in 1939 is personally disagreeable.

Tuesday, July 20, 2010

The Breakdown

Please wait one second while you are redirected

The Breakdown is an all-inclusive statistical tour of a college football game. In addition to computer simulated results, including scores, odds, team and individual statistics, the Breakdown sheds some light on the historical data and analytic techniques used to derive those predictions.

The first panel is a pre-game-post-game summary - an "Expected Box Score", including the expected score, odds, and team and individual statistics. Red and green numbers next to the team statistics compare the expected performance to the team's average performance. In this example, Texas' predicted 227 passing yard is 46 less than their season average, largely because the 59.4 completion percentage is 8 percentage points below their average. Individual predictions do not always account for injuries and suspensions, especially in-game injuries like that to Colt McCoy in this particular game.

Beginning in 2011, I have added a new tool. This second panel plots possible outcomes for a game. The darker the square, the more likely the outcome. For this game (now from the 2011 matchup between Alabama and Ole Miss), Alabama is heavily favored so most of the darkly colored dots fall above the diagonal line. The most likely outcome for this game is 38-7 Alabama (designated by the blue square). Alot each axis is the distribution for each team: Alabama will most likely score between 30 and 42 while Ole Miss will most likely score between 0 and 14.

The third panel is a summary of the two teams trend-O-meter, Hybrid, and cRPI (the cRPI* is multiplied by 100) - with national rankings in parentheses (I have now replaced the cRPI with the BPR). The hybrid rating is consistently among the reliable computer rankings, according to Massey's ranking comparison. You can see from the trend-O-meter that Alabama came into this game playing relatively well, but Texas did not.

Text boxes in this panel list more team statistics. Ratings (Unit, Rush and Pass) are adjusted to opponent strength. The unit rating is based on points scored/allowed, and the rush and pass ratings are based on yards/play gained/allowed. The bar graphs offer a summary of offensive and defensive match-ups. The portion below zero on each bar is representative of the opposing teams defensive strength in that area. The portion above the bar in the team's color is the predicted yards per run or pass for that team. The gray portion is what the team gains on average. In the title of the graph is percent of plays that the team runs or passes. In this case, Texas' defense should be particular effective against the Alabama pass offense, but because Alabama runs the ball 63% of the time, this advantage will not be as important. (Side-note: Alabama only allowed 46.8% completions that season, which is just disgusting.)

Panel 4 adds individual statistics and information on up to 6 previous meetings.

Next is a comparison of the two teams since 1980 (explanations of the Hybrid and cRPI). In this case, the hybrid ratings across seasons are standardized to range from zero to one.

The next panel has even more statistics and national rankings in parentheses. The most important numbers here are the sacks/pass, tackles for loss (TFL)/run, points/possession and TDs/possession. Here we see that deficiencies in the Texas offense were, in part, hidden because they averaged 14.3 possessions per game (given them more opportunities to score), and while they only averaged 5.2 plays/possession, they still averaged 2.7 points/possession, suggesting that many of the possessions started with good field position - a product of a very good defense (which only allowed 4.6 plays/possession).

Explanation of maps. In the maps, team's with similar styles are placed closed to one another. In this case, the number in parentheses is the point differential between what that team was expected to do and what they actually did. For example, in the "defense map: vs. Texas", we see that Nebraska and Oklahoma did better defensively against Texas than expected, and that Alabama is similar to these defenses, so that is an advantage for Alabama. On the other hand, the Texas offense is similar to offenses that did relatively well against Alabama (see "offense map: vs. Alabama"), so Alabama's advantage before is cancelled out. Moving down, we see that Texas is similar to defenses that performed relatively poorly against Alabama in Florida and Virginia Tech. This suggests that the Texas defense matches up poorly to Alabama's offense, so the net match up advantage in this game goes to Alabama.

Tuesday, July 13, 2010

Individual Statistics-Receivers

Finally, we have the catchers of the passes.

Enjoy

Monday, July 12, 2010

Individual Statistics-Quarterbacks

I've now added quarterbacks to the mix. The most important statistic here is the Adjusted+ completion percentage. It adjusts a quarterbacks completion percentage for both the quality of the pass defense against whom the pass was being thrown and the distance the path was thrown.

Enjoy

Wednesday, July 7, 2010

Individual Statistics-Running Backs

This season I hope to start posting individual players statistics. I've started working with running backs. I've defined running backs as players that carry the ball an average of 4 times per regular season game (48) and throw less than 2 passes per game. To be included in the leaderboard above, a player must have carried the ball 8 times per game (96). In addition to carries and rushing average, I've added a few interesting, statistically derived numbers. First, FD is the probability that, if given the ball threes times, the player would gain 10 yards (roughly the requirements for gaining a first down). The next two numbers are the probability that the player would lose one yard and gain 10 in one carry.

Enjoy

College Football by the Numbers