Driver Probability of Win 2009

Schummy · 27 Apr 2009, 18:01 (Ref:2450894)

A relatively crude estimation of probability of win for a given future GP. An apriori 5% for every driver is considered (i.e. no knowledge of past seasons performances for drivers or teams).

A measure of "success" is calculated as a sort of percentage of win in this season with two tricks: win counts as 0.8 "wins" and 2nd place as 0.2 "wins"; secondly, "Newness" of each GP and weather conditions are considered. Each GP is discounted as they pass by with a factor of 0.95. Wet races are counted as 50% respect dry races (they give less info about future results). So the most weigthed GP is Bahrain, then Australia, then China and the last is Malaysia.

Finally, a priori and "season success" are counted together to get the "Probability of Win".

Code:

DRIVER INIT   AUS   MAL   CHI   BAH   SS   PoW
                                                           
BUT     5%    0.8   0.8         0.8   66%  41% 
VET     5%                0.8   0.2   21%  14% 
BAR     5%    0.2                      6%   6% 
WEB     5%                0.2          3%   4% 
HEI     5%          0.2                3%   4% 
Another 5%                             0%   2% 
                                                             
Newness       0.86  0.90  0.95  1                            
Weather       1     0.5   0.5   1                           
Weight        0.86  0.45  0.48  1.00

INIT = A priori PoW
SS = Season %Wins (adjusted)
PoW = Estimated Probability of Win

For teams:

Code:

Teams:
BRAWN  46%
RBR    18%
BMW     6%
Another 4%

Schummy · 27 Apr 2009, 18:09 (Ref:2450901)

A bit more info.

Code:

       pap   pS    pW   nW   fW    ptW fmin fmax tmin tmax 
BUT     5%   66%   41%   3   5,3   8,3   1   8    4   11        
VET     5%   21%   14%   1   1,8   2,8   0   4    1    5         
BAR     5%    6%    6%   0   0,7   0,7   0   2    0    2         
WEB     5%    3%    4%   0   0,5   0,5   0   2    0    2         
HEI     5%    3%    4%   0   0,5   0,5   0   2    0    2         
Another 5%    0%    2%   0   0,3   0,3   0   1    0    1

Columns mean:
pap = Probability a priori
pS = %Wins in season (adjusted)
pW = Probability of Win
nW = num wins so far
fW = future expected wins in the remainder of season
ptW = total wins expected in the season (nW + fW)
fmin, fmax = Interval (95%) for num of future wins
tmin, tmax = Interval (95%) for num of total wins in the season

Obviously, it is based on the calculations explained in the former post, so it depends on several arguable parameters.

Dutton · 28 Apr 2009, 07:17 (Ref:2451306)

Quote:

Originally Posted by Schummy

so it depends on several arguable parameters.

The 5% a priori aspect intrigues me.

I understand what it represents, and your reasons (I think), but I am not sure it is anymore useful than just starting things from "nothing"? Clearly, I do not mean starting INIT or AUS as 0, but rather just not having INIT at all (or else in a different form). Everyone goes into AUS with an equal chance of attaining 1 in that instance, whereas creating the imaginary INIT=5% starts things off with a value distortion.

Another, quite possibly clearer, way of putting my issue would be say that if the INIT=5% is meant to be rudimentary account for unknowns (which I am assuming it must be?), then isn't setting a specific set value contradictory (thus redundant)? Shouldn't it be a margin of error which straddles (+/- 2.5, as opposed to 5%)? It just seems setting it as a straight INIT=5% makes it a uniform deduction which makes the "unknown unpredictability" into a "known fluctuation"?

Having said all the above, I have no doubt I am missing something your (undoubtedly vastly) superior knowledge-experience base encompasses. An explanation as to where my thinking in the above has gone wrong would be greatly appreciated.

(Win=0.8, 2nd=0.2 : I need help...I am sure it all makes sense, but you will have to explain it,

.)

Marbot · 28 Apr 2009, 07:44 (Ref:2451325)

Interesting! I make all my calculations based on the premise that sh*t happens.

Dutton · 28 Apr 2009, 07:50 (Ref:2451331)

Just missed the edit 30 minute limit!

Well, Martyn, the point is that excrement-happens should have a case-relevant mathematical representation.

Quote:

Originally Posted by Schummy

INIT; SS; PoW; pap; pS; pW; nW; fW;ptW = (nW + fW); fmin, fmax; tmin, tmax

Oh, there have to be some funky equations possible from this lot!

Quote:

Originally Posted by Dutton

Everyone goes into AUS with an equal chance of attaining 1 in that instance, whereas creating the imaginary INIT=5% starts things off with a value distortion.

(Win=0.8, 2nd=0.2 : I need help...I am sure it all makes sense, but you will have to explain it,

.)

Just realised I should have said that I was meaning 1 in the sense of the 0.8. Assuming that where Schummy is saying 0.8 the norm would be 1.

1=0.8 for the purposes of what I was getting at.

Quote:

Originally Posted by Dutton

Shouldn't it be a margin of error which straddles (+/- 2.5, as opposed to 5%)? It just seems setting it as a straight INIT=5% makes it a uniform deduction which makes the "unknown unpredictability" into a "known fluctuation"?

I have just spotted my fudge up in phrasing. I meant to say INIT=5% makes a uniform deduction of an "unknown predictability" into a "known static value", as opposed to a somewhat "known fluctuation".

I am sorry about my imprecise and inelegant manner, Schummy, but I am just an interested layman. It isn't a degree or profession for me, so I am very far from fluent on terminology and expression (and all that).

[Not to mention deficient in knowledge!]

BootsOntheSide · 28 Apr 2009, 09:02 (Ref:2451385)

The obvious problem is that Heidfeld fluked a second place in Malaysia but was hopelessly uncompetitive everywhere else - does he really have a 4% chance of winning the next round? Statistics aren't really much use, especially as teams are modifying their cars so heavily at the moment - McLaren and Ferrari especially have showed progress since the first round, and have more chance of improving their cars as the year goes on than Brawn or Red Bull.

Dutton · 28 Apr 2009, 09:20 (Ref:2451402)

Quote:

Originally Posted by BootsOntheSide

The obvious problem is that Heidfeld fluked a second place in Malaysia but was hopelessly uncompetitive everywhere else - does he really have a 4% chance of winning the next round?

My understanding of this, with respect to your issue Boots, is that the randomness of a wet result will be reduced over the course of a season. If you combine this with the already 50% reduction relevancy of wet races, well, I think it is accounted for as far as reasonably possible for a bit of makeshift forum-fun.

Quote:

Originally Posted by Schummy

Each GP is discounted as they pass by with a factor of 0.95. Wet races are counted as 50% respect dry races (they give less info about future results). So the most weighted GP is Bahrain, then Australia, then China and the last is Malaysia.

The more races that go buy, the more that "random/unexpected" results are marginalised.

One also must remember, as far as I can make out, this is just meant for "number-crunching fun" as opposed to some accurate predictor. If it was meant to be even a tiny-way "certainly-probably" accurate, then it would enter the realm on vast mathematical arrays, functions, matrices, logs, and so on, of obscene complication.

It is just relatively simplistic number play. Nothing more; nothing less.

Marbot · 28 Apr 2009, 11:11 (Ref:2451481)

Quote:

Originally Posted by Dutton

Well, Martyn, the point is that excrement-happens should have a case-relevant mathematical representation.

Indeed.

I had forgotten to factor in that some drivers that may have gone to the toilet before the race may have in fact been unable to do so because of what they may have previously eaten or not eaten as the case may be.This,of course,may have to a very small degree influenced lap times.We should also factor in the need for drivers to drink almost twice their own body weight in fluids before the start of some races,a phenomenon known as 'taking the ****'.Interesting....carry on.

crmalcolm · 28 Apr 2009, 11:32 (Ref:2451494)

Quote:

Originally Posted by Marbot

I had forgotten to factor in that some drivers that may have gone to the toilet before the race may have in fact been unable to do so because of what they may have previously eaten or not eaten as the case may be.

So, much like when at the dog track, I should put my money on whoever visits the toilet just before the race!!!!

Seriously, in a number fun kind of way, the predictions through these stats are very intriguing.

Schummy, do you use these in any way in any of the F1 prediction comps, it would be interesting to see how a season pans out using predictions based on these stats. I'm assuming the prediction for the next race would be 1.Button, 2.Vettel, 3.Barichello, 4.Webber, 5.Heidfeld....

chavez169 · 28 Apr 2009, 13:53 (Ref:2451604)

This just caused some nasty flashbacks to my statistics lectures last year! I will try to make an educated contribution once i have cleaned the foam from my mouth.

Schummy · 29 Apr 2009, 03:02 (Ref:2451937)

Quote:

Originally Posted by Dutton

The 5% a priori aspect intrigues me.

I understand what it represents, and your reasons (I think), but I am not sure it is anymore useful than just starting things from "nothing"? Clearly, I do not mean starting INIT or AUS as 0, but rather just not having INIT at all (or else in a different form). Everyone goes into AUS with an equal chance of attaining 1 in that instance, whereas creating the imaginary INIT=5% starts things off with a value distortion.

Another, quite possibly clearer, way of putting my issue would be say that if the INIT=5% is meant to be rudimentary account for unknowns (which I am assuming it must be?), then isn't setting a specific set value contradictory (thus redundant)? Shouldn't it be a margin of error which straddles (+/- 2.5, as opposed to 5%)? It just seems setting it as a straight INIT=5% makes it a uniform deduction which makes the "unknown unpredictability" into a "known fluctuation"?

Having said all the above, I have no doubt I am missing something your (undoubtedly vastly) superior knowledge-experience base encompasses. An explanation as to where my thinking in the above has gone wrong would be greatly appreciated.

(Win=0.8, 2nd=0.2 : I need help...I am sure it all makes sense, but you will have to explain it,

.)

Two very interesting points that need further explanations. I know you like numbers, and certainly I like them too

.

I'll explain firstly the INIT thing. Suppose Button wins the two first GPs, so he has 100% wins. If we just use actual results probabilities would be Button 100% and any other 0%, which is wrong because we don't have a 100% of certainty of Button winning the 3rd GP.

The bayesian approach to avoid this "sampling collapse" (i.e. thinking is not possible whatever has not happened in sample) is to combine sampling info (the actual results) with an a priori info (the info one has *before* this sample, i.e. this season).

A neutral way to assign a priori info is to set the a priori probs equal for everyone (be it Ferrari or Force India, be it Hamilton or Nakajima). That way one's subjective bias doesn't change the calculations. But, appart for easiness, this neutral way (that I used in the table) is hardly the best info available. We *know* Ferrari has a bigger chance to win that Force India, what we have to do is measure it reasonably and put it as INIT (which I didn't do, out of lazyness

).

Anyway, when one has an a priori (INIT) info and a posteriori info (the sample, i.e. results) the bayesian rule is to make a weighted average (with some sacry details sometimes) between them. In our instance, Button's prob would be an average between 5% (INIT) and 100% (sample). Another driver would have an average between 5% and 0% (sample).

Last season I think I made a more complex INIT taking in account former season results for drivers and cars. Nice, but as GPs by, the INIT loses weight in the calculations, so it turns to be no significant (at the end of season).

Sorry, I think I have been rather "lengthish" in my explanation.

Now the 0.8-0.2 rule. When one driver wins a race it throws an info about his future possibility of winning again. But if the drivers gets a 2nd somehow it also signals a (lesser) possibility of him winning a future race. That's the reason 2nd positions has to be accounted as sampling datum for wins, not just "zero". A third position maybe also points to a little prob for a future win, but it is small and not very significant (in the past I did some brief calculations about this). So, I give 0.2 to 2nd and thus I have to give 0.8 for wins (to maintain race = 1), i.e. 2nd is considered four times weaker than 1st as sign of future a win.

(Why I cannot write more concise(ly)?

)

Schummy · 29 Apr 2009, 03:18 (Ref:2451938)

Quote:

Originally Posted by crmalcolm

Schummy, do you use these in any way in any of the F1 prediction comps, it would be interesting to see how a season pans out using predictions based on these stats. I'm assuming the prediction for the next race would be 1.Button, 2.Vettel, 3.Barichello, 4.Webber, 5.Heidfeld....

Just as boxers have specially forbidden to fight in the streets, I am vetoed in F1 games

. No, it's a joke, obviously.

You can ask my fellow mates in my "resident" forum (bikes) about my (lack of) performance in the guesses

. Ok, I do relatively well, but others do better. In the long term calculations surely help, but in just few rounds (we use 8 rounds) random events are too big to guarantee a win in the games. That's the funny thing!

Putting it as in Stock Market, if you have 100 investors, 99 of them without particular knowledge of it and 1 expert with massive knowledge, it is unlikely the expert ends the year in exactly the first position of profits. In the mentioned case of the bike forum, I can tell you there are some experts, with or without calculations!

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[Official] F1 2009 Round 2, 2009 Petronas Malaysian Grand Prix	Chatters	Formula One	326	10 Apr 2009 00:05
Probability of Win 2008	Schummy	Formula One	2	11 May 2008 20:15
Probability of Win	Schummy	Formula One	12	15 Apr 2007 22:18
Probability of win (warning: strange calculations!)	Schummy	Formula One	17	1 Jun 2005 10:57