Initial Evaluation of Eliteserien Prediction Models

There are a few prediction models available for the 2017 season of Eliteserien, and since five rounds have been played it is time to do an initial evaluation to see how these models are doing.

The models I’ll be comparing are our own ELO-model, Kroneball’s R.O.N.N.Y., as well as the betting odds of Betsson. Yes, betting odds are predictions, even though their main goal is to make money and not guess the correct outcome. To go from odds to probability, I take 1 and divide by the odds for each of the outcomes of a match to get an raw probability. This usually adds up to more than 1 (the house needs a cut), so to get the model probability I take the raw probability of each outcome and divide by the sum of the probabilities for the match. (For instance, in the match between Tromsø and Stabæk the odds were 1.88, 3.6 and 4.1. This translates into raw probabilities of 0.532, 0.278 and 0.244 which totals 1.054, which then becomes the model probability of 0.505 for a Tromsø win, 0.264 for draw and 0.231 for Stabæk. For comparison, the ELO-model had 0.42, 0.267 and 0.313, while R.O.N.N.Y. predicted 0.4254, 0.3704 and 0.2043.)

Before I dive into the evaluation, I’ll compare and contrast the models a bit and then do a summary of the season. As described, the betting odds aren’t mainly trying to predict the match outcomes, but in order to make money they have to predict fairly accurately. The Analytic Minds ELO-model (AMELO) uses past data about wins, draws and losses to predict the outcomes. As each game is played, the model updates with the new information and is thus dynamic to the changes that have occurred so far in the season. R.O.N.N.Y. uses more advanced statistics (and frankly, better data) to make its prediction using expected goals (xG). However, it has used last season’s data to make the predictions so far, which means it has not been able to respond to Stabæk’s good season opener for instance. In general, that model generally also predict a higher chance of draws than the other two models as well as what history would suggest. (In the 40 matches so far, the average draw probability of R.O.N.N.Y. is 31.0% compared to Betsson’s 25.2%, and AMELO’s 24.8%. Over the past three seasons, 23.8% of the games have ended in draws while 46.7% are won by the home team and 29.6% by the away team.)

So far this season, 26 games have been won by the home team, 9 by the away team, and only 5 games have ended in a draw (65%, 12.5% and 22.5%). That’s a major shift towards the home team, particularly from draws, and is likely to influence this evaluation. I suppose we can expect a reversion to the mean over the season as a whole, so things are likely to balance out a bit later.

Now for the evaluation itself. I’ll be using two approaches; Ranked Probability Score (RPS – as described in Solving the problem of inadequate scoring rules for assessing probabilistic football forecast models by Constantinou and Fenton.) as well as the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. RPS basically measures how close your predictions were, and punishes the model more for predicting a home win than a draw if the game was won by the away team. Say one model predicted a 50% chance of home win, 20% of draw and 30% chance the away team would win, while a second had 20%, 50% and 30%. Since the away team won, the second model will score better because it was closer to being a drawn match than a win for the home team. (The first model would have an RPS of 0.37, while the second would have 0.265. The closer to 0 the better the model). The ROC basically ranks the predictions and measures how well it identifies the correct outcome. This creates a curve, and the larger the area under that curve, the better the model is at distinguishing between high and low probability events. Basically, a 70% probability event should happen more often than a 50%, which in turn should happen more than 20% probability event.

First up is the RPS evaluation. In the table below I have listed the average RPS per round, as well as the average of all matches so far. In addition, I’ve listed the game outcomes of each round.

170427 RPS evaluation after 5 rounds

As can be seen, Betsson “won” the first two rounds, R.O.N.N.Y. the third one, while AMELO has edged out the last two rounds and has the lowest average so far, just a tick ahead of Betsson. The difference between AMELO and R.O.N.N.Y. can mostly be explained by the combination of a higher draw probability from R.O.N.N.Y. combined with remarkably few upsets. (Removing the drawn games, AMELO’s favorite has won 28 of the 35 games (80%).) This gives a double hit in the evaluation because less probability is allocated to the favorite when the draw rate is higher, and since there are few upsets the increased draw probability doesn’t help cushion the score. This is particularly evident in round 3, which contained 3 upsets according to AMELO (4 according to Betsson and R.O.N.N.Y.) as well as a draw. That makes for high RPS all around, but lower for the model that has higher chances of draws. AMELO has also benefited from being dynamic and picking up Stabæk’s and Sarpsborg 08’s good form early in the season, which R.O.N.N.Y. so far hasn’t. It will be interesting to revisit this again after round 10 and see how it stacks up then.

Now for the ROC. This is another place where we can see that there have been few upsets. The quicker the graph rises, the better it has hit with the predictions. Since AMELO has a more or less fixed draw rate, these are identified by the long diagonal lines.

170427 ROC after 5 rounds

AMELO (AUC of 0.813) and Betsson (0.786) follow each other closely here as well, while R.O.N.N.Y (0.658) lags a bit behind.  This is again due to the relatively high draw prediction rate, which makes it level off sooner than the others. For comparison, the highest draw probability of AMELO is 26.7%. It has made 59 predictions with a higher value, of which 32 have occurred while 17 have failed. Since there has been 40 correct outcomes and 80 misses, that means AMELO has made 80% of the correct predictions and only 21.25% of the misses by the time it starts looking at the draws. Again, as the outcomes (probably) revert to the mean, these unusually high AUCs will decrease and R.O.N.N.Y. will be closer to the rest. It’s also worth noting that all three models missed with their highest assigned probability of the season, which was Rosenborg winning at home against Aalesund. (AMELO assigned 65.5%, Betsson 79.8%, while R.O.N.N.Y had 81.4%)

In conclusion, the season opener has been good for AMELO and it’s ability to relatively quickly react and adapt to how the season progress. However, the season opener has featured few draws and few upsets, which is probably not going to continue. It will be interesting to see how R.O.N.N.Y. does in the next few rounds once this season is incorporated, both compared to the other models as well as against itself in the first rounds.

Brann and Viking kick off round 6 on Saturday, and here are AMELO’s projection for that round:170426 6 runde

This entry was posted in Fotball, Models. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s