Welcome to Project Left Turn, a fun look at using statistics to understand the Indiana University Little 500. I completed this project in 2011 as an MBA student at the Kelley School of Business in Bloomington, IN.
2011 Race Predictions
It probably comes as no surprise that the model is predicting the Cutters to be this year’s winner. The rest of the field is as follows:
Place |
Team |
1 |
Cutters |
2 |
Delta Tau Delta |
3 |
Sigma Chi |
4 |
Phi Delta Theta |
5 |
Beta Theta Pi |
6 |
Black Key Bulls |
7 |
Phi Gamma Delta |
8 |
Sigma Nu |
9 |
Phi Kappa Psi |
10 |
Delta Chi |
11 |
Acacia |
12 |
Cru Cycling |
13 |
Gray Goat |
14 |
Hoosier Climber? |
15 |
Kappa Sigma |
16 |
Theta Chi |
17 |
Air Force Cycling |
18 |
Emanon |
19 |
Delta Upsilon |
20 |
#JungleExpress |
21 |
Pi Kappa Alpha |
22 |
Achtung |
23 |
Sigma Phi Epsilon |
24 |
Sigma Alpha Mu |
25 |
Evans Scholars |
26 |
Dodds House |
27 |
LAMP |
28 |
Sigma Pi |
29 |
Wright Cycling |
30 |
Phi Kappa Sigma |
31 |
CSF Cycling |
32 |
Delta Sigma Pi |
33 |
Sigma Alpha Epsilon |
Five teams are within the standard error of the model and thus have a shot at the win. Winning probabilities for the top five teams are listed below.
Team |
Chance of Winning |
Cutters |
35% |
Delta Tau Delta |
15% |
Sigma Chi |
13% |
Phi Delta Theta |
12% |
Beta Theta Pi |
8% |
How it works:
A regression of historical data reveals that 68% of the race outcome can be explained by ITT times, team pursuit times, and qualification times. The ITT time of the team’s fastest rider and the team pursuit time were the most significant variables to explain race outcome. The ITT time of the third rider was insignificant and not included in the regression model. The model can predict a team’s race time to within 161 seconds. The expected race times are assumed to be normally distributed. This assumptions allows for simulation of 1000 iterations using the standard error of 161. This revealed each team’s chances of winning as listed above.
The model assumes that a team’s four fastest riders ride in the race. This assumption is, of course, imperfect but it provides the only simple way to regress ITT times to race outcome. All times are normalized to their respective yearly averages. This accounts for anomalies in weather, track conditions, and excessive yellow flags.
Can you beat the computer?
We all know the Little 500 is mostly unpredictable. And even I know that a purely stastitical approach may not be the best way to make predictions. It is for this reason I would like to know: can you beat the computer?
If you think you can, email me your predictions by noon on Wednesday. Send me an excel spreadsheet with a list of men’s teams and finishing positions. Email files to eandreol@indiana.edu with subject “Prediction Contest”. Predictions must be made for the entire field of 33 teams. After the race I will compare your predictions to my computer predictions by caclulating the correlation coeficient.
As an added bonus the submitter with the highest correlation coefficient will recieve a free Little 5 themed Kilroy’s poster courtesy of Kilroy’s Was Here. Winners will be announced after the race.
Best of luck. Look for my race predictions Wednesday at noon!
Why we run the race
Saturday’s team pursuit results prove that all Little 500 event outcomes can be unpredictable. My excel model, which was based on ITTs and quals, had Cutters winning by 12 seconds. Instead, Cutters finished third and Delts secured the win, despite the computer only giving Delts a 9.3% chance of winning.
Here is a look at each team’s actual versus predicted results.
Team |
Actual |
Predicted |
Delta Tau Delta |
1 | 4 |
Sigma Chi |
2 | 7 |
Cutters | 3 |
1 |
Phi Delta Theta | 4 |
2 |
Beta Theta Pi |
5 | 3 |
Black Key Bulls | 6 |
5 |
Phi Gamma Delta |
7 | 9 |
Hoosier Climber? | 8 |
12 |
Gray Goat Cycling |
9 | 16 |
Cru Cycling | 10 |
11 |
Delta Chi |
11 | 10 |
#JungleExpress | 12 |
27 |
Phi Kappa Psi |
13 | 8 |
Kappa Sigma | 14 |
13 |
Wright Cycling |
15 | 29 |
Sigma Nu | 16 |
6 |
Theta Chi |
17 | 14 |
Delta Upsilon | 18 |
18 |
Sigma Pi |
19 | |
Emanon | 20 |
21 |
CSF Cycling |
21 | |
Acacia | 22 |
17 |
Sigma Phi Epsilon |
23 | 22 |
Achtung | 24 |
28 |
Cru Cycling B |
25 | |
Air Force Cycling | 26 |
15 |
Dodds House |
27 | 19 |
Sigma Alpha Mu | 28 |
20 |
Phi Kappa Sigma |
29 | 26 |
LAMP | 30 |
23 |
Sigma Alpha Epsilon |
31 | 31 |
Delta Sigma Pi | 32 |
25 |
The predictions were not as accurate as last year’s race predictions, but still performed better than expected with a .78 correlation coefficient. Twenty teams were predicted within 5 spots of their actual performance.
Look for race predictions to be posted by Wednesday.
Team Pursuit Predictions
With team pursuit underway, I wanted to make some predictions. It should be noted that the prediction model assumes that all qualifying teams (except CSF Cycling and Sigma Pi) will be riding in TP. This is already not accurate, as it seems like a few teams have not shown up.
The predictions are based on multivariate regression of historic qualification and ITT times. Significant variables were found to be a team’s fastest rider, a team’s third fastest rider, and a team’s average adjusted qualification time. These variables combine to explain approximately 57% of TP results. The model is not as strong as the race-day model, and should not be taken too seriously.
Here are the predicted results:
1 | Cutters |
2 | Phi Delta Theta |
3 | Beta Theta Pi |
4 | Delta Tau Delta |
5 | Black Key Bulls |
6 | Sigma Nu |
7 | Sigma Chi |
8 | Phi Kappa Psi |
9 | Phi Gamma Delta |
10 | Delta Chi |
11 | Cru Cycling |
12 | Hoosier Climber? |
13 | Kappa Sigma |
14 | Theta Chi |
15 | Air Force Cycling |
16 | Gray Goat |
17 | Acacia |
18 | Delta Upsilon |
19 | Dodds House |
20 | Sigma Alpha Mu |
21 | Emanon |
22 | Sigma Phi Epsilon |
23 | LAMP |
24 | Pi Kappa Alpha |
25 | Delta Sigma Pi |
26 | Phi Kappa Sigma |
27 | #JungleExpress |
28 | Achtung |
29 | Wright Cycling |
30 | Evans Scholars |
31 | Sigma Alpha Epsilon |
The model predicts the Cutters will beat Phi Delt by as much as 12 seconds, with the next four teams all being within 5 seconds of each other.
I will follow up tomorrow to compare the actual results.
How much does team pursuit matter?
Earlier this week, I posted a riveting analysis on the usefulness of quals as a predictor of race outcome. Scroll down or click here to read to the full scoop.
With team pursuit coming up in a few days, I think it is worthwhile to perform the same analysis with TP results.
Using the same statistical techniques, it can be concluded that a team’s TP position can explain 68% of a team’s race day outcome. Thus TP performs much better than quals as a sole predictor of race day performance.
While a team only has an 8.5% chance of finishing the race in its exact TP position, it is important to note that 70% of teams finish within 5 spots of their TP position. 96% of teams finish within 10 spots and no team has ever finished the race more than 16 spots away from its TP position.
Here is a look at TP winners and their following race performance. (TP was cancelled in 2006 and 2002)
Year | TP Winner | Race Finish |
2010 | Phi Delta Theta | 2 |
2009 | Black Key Bulls | 5 |
2008 | Cutters | 1 |
2007 | Cutters | 1 |
2005 | Phi Gamma Delta | 2 |
2004 | Cutters | 1 |
2003 | Gafombi | 1 |
2001 | Cutters | 7 |
2000 | Sigma Phi Epsilon | 8 |
For the nerds: Here is the scatterplot. Notice how there is less scatter as compared to the quals plot.
A note on ITTs
After looking through men’s ITT data, I was certainly impressed by the 4 second difference between first and second place. In fact, Young receives the honor of enjoying the largest margin of victory in this decade.
A more intriguing question, however, is how Young’s time compares to the overall field. By analyzing each year’s top 50 average, we can determine how the winning time compares to the 50 fastest riders.
Year | Winning Rider | Winning Time | Top 50 Average | Difference |
2011 | Eric Young | 142.00 | 151.94 | 9.94 |
2010 | Eric Young | 142.09 | 150.41 | 8.32 |
2009 | Eric Young | 138.25 | 147.20 | 8.95 |
2008 | Issac Neff | 139.75 | 147.67 | 7.92 |
2007 | Sasha Land | 138.94 | 146.54 | 7.60 |
2006 | Hans Arnesen | 137.68 | 146.80 | 9.12 |
2005 | Hans Arnesen | 135.70 | 146.81 | 11.11 |
2004 | Chris Vargo | 141.49 | 150.03 | 8.54 |
2003 | John Grant | 143.38 | 150.98 | 7.60 |
2002 | Luke Isenbarger | 145.00 | 151.60 | 6.60 |
2001 | Josh Beatty | 149.87 | 159.93 | 10.06 |
2000 | Chris Wojtowich | 144.28 | 154.38 | 10.10 |
Interestingly, Young does not hold the fastest average adjusted time. This honor goes to Hans Arnesen, whose blazing 135.70 was 11 seconds faster than the top 50 average.
This suggest that while Young was undoubtedly the fastest rider last Wednesday, the gap between the fastest rider and the field is not as large as we saw in 2005.
How much does quals matter?
Year |
Pole Team | Race Finish |
2010 |
Cutters |
1 |
2009 |
Phi Delta Theta |
15 |
2008 |
Sigma Alpha Mu | 14 |
2007 | Phi Kappa Psi |
2 |
2006 |
Cutters | 5 |
2005 |
Phi Kappa Psi |
6 |
2004 | Team Major Taylor |
4 |
2003 |
Phi Gamma Delta | 9 |
2002 | Phi Delta Theta |
11 |
2001 |
Phi Gamma Delta |
13 |
2000 | Delta Chi |
4 |
Last Year’s Results
Welcome to Project Left Turn, a statistical look at the Little 500. Over the next two weeks, I hope to draw excitement for race day in the nerdiest way possible: by analyzing data and making predictions. Before we get into this years data, let’s review what I predicted last year.
Below is a chart of my predictions versus the actual results. Teams in green performed better than prediction while teams in red performed worse.
So how well did I do? A high correlation suggests that there is significantly strong relationship between the predictions and actual results. The model, however, did not do a good job of predicting the exact place a team would finish. In fact, Phi Sigma Kappa was the only team to be accurately predicted.
However, 11 teams were predicted within one spot of their actual finishing position and 17 teams were within two spots. This suggests the model does a great job of predicting the general area a team will finish.
Below is a chart showing the number of teams who were within a given difference between predicted and actual finishing position.
Special props go to Dodds house for finishing 10 spots better than predicted. But what happened to Acacia? They finished 14 spots below the prediction.