# The Statistical Case for Global Rankings



## macky (Aug 14, 2010)

Read Ravi's post. Then read my post, reproduced below. This deserves a separate discussion, and, in my opinion, is a better solution than changing the competition format.



Performing when it counts is of course important. The problem with comparisons to other sports is that speedcubing inherently involves chance. I don't know how much of a top cuber's time variation is from pure chance, but it's considerable. Even when performing the very best, a top CFOPer's times will vary > 0.5 second just by the difficulty of last layer cases--and we aren't even considering F2L. With the continuing clustering of times near the top, this variation is starting to become enough to significantly affect the ranking in a competition.

I thought about all this back in WC07 and with my prayer before WC09. Because of complications such as those mentioned in this thread, I'm fine having the winner of the last round be the winner of the competition. Or if an organizer feels strongly, he can use the fastest average or the mean of averages. In any case, there's no rule change needed.

The winner of a competition is the cuber who performed "the best" (in a predefined sense) at that competition, not necessarily the best cuber. I therefore don't see too much need to change the competition format so that the best cuber has a better chance of winning the competition.* What I've suggested instead since 2007 is an official yearly overall ranking (see this thread for an example).

* But I'm sure people will soon be needing my prayer more often!


----------



## Faz (Aug 16, 2010)

So, you're suggesting having a yearly overall ranking for each event, WCA recognised, like what JustinJ did in his thread? So, you would take all of the cuber's averages, and determine the mean. Eg: "The 2010 champion for pyraminx is Yohei Oka - 4.00 average time"

Sounds like a cool idea. Maybe I just misread the thread totally, but it sounds like this is what you're proposing.


----------



## RCTACameron (Aug 16, 2010)

Judging by JustinJ's thread, Faz would probably be the first in everything. 

If someone only went to one competition in a year, and had a lucky average, then luck would still play a big part in the results.

However, I think this is a great idea, and it would be interesting to see who comes out on top. I hope that the WCA does this.


----------



## macky (Aug 16, 2010)

RCTACameron said:


> If someone only went to one competition in a year, and had a lucky average, then luck would still play a big part in the results.


We'd come up with a minimum number of competitions to avoid a situation like this.

For now, though, I want to hear what people think about having such a ranking without worrying too much about the implementation. Average of 5 is great for determining rankings within a competition, and so a world ranking by average of 5 is great for comparing best competition performances. But many people agree that average of 5 isn't a satisfactory measure of a cuber's true ability. This is where a ranking like this would come in. Some questions:
- (Would such a ranking be useful/interesting? Well yeah.)
- Is it worth being displayed on the WCA database as a separate official ranking?
- Is it just as important as the best average of 5 ranking?
- Is it important enough to warrant giving some title based on it? On an annual basis?
- (Would it be hard to agree on a consistent ranking method? Are there other practicality considerations?)


----------



## ExoCorsair (Aug 16, 2010)

Wouldn't it a combined average of all past competition results discourage newer cubers from participating, though (at least, until they got faster)?


----------



## macky (Aug 17, 2010)

ExoCorsair said:


> Wouldn't it a combined average of all past competition results discourage newer cubers from participating, though (at least, until they got faster)?


Only use results from the last X competitions or from the last Y months. As long as the competitor is improving, the global average decreases with more results using either scheme. Using last X competitions would discourage cubers who are out of practice and can't match their last official results from competing again unless they improve. So we'd need to use some combination of number of competitions (with a minimum) and time period.

But these things are mere details. I want more feedback on the main idea.


----------



## hawkmp4 (Aug 17, 2010)

How much different would the highest ranks of this new system be than the all time list? For example, thirty of the top 31 3x3 average results were set in 2010. Forty-five of the top 50, and 78 of the top 100. Only 3 in the top 100 were from 2008, none before then. So it's likely that some of the 2009 results are still within a calendar year. I like the idea, though, especially for the people who aren't in the highest tier of cubing.

EDIT: I think I misunderstood...feel free to ignore this post. I'm going to reread things.


----------



## PhillipEspinoza (Aug 17, 2010)

This kinda info would be useful to know, but why can't we just put it under the Statistics Page? You know, kinda like how they have the BLD Success Streak with a minimum of 5 solves.


----------



## macky (Aug 17, 2010)

PhillipEspinoza said:


> This kinda info would be useful to know, but why can't we just put it under the Statistics Page? You know, kinda like how they have the BLD Success Streak with a minimum of 5 solves.



That's perfectly possible. It depends on how important you think this type of ranking is compared to the best competition result (i.e. avg of 5) ranking.


----------



## hawkmp4 (Aug 17, 2010)

I'd be interested in seeing something like the mean of the last x averages in competition, within the last y months. That'd give a more accurate picture of the current cubing community. The 'last y months' part would knock inactive cubers (e.g., Nakajima) out and the 'last x averages' would reward consistent cubers over ones who get lucky on an average. I guess x and y would just have to be played with to see what gives a good result.


----------



## Ravi (Aug 17, 2010)

I'm not sure that directly averaging results is the best way to do it, since that could potentially discourage people from competing. I find it fairly plausible that, for example, on December 27, 20xx, Speedcuber A may have a yearly average of 9.93 seconds compared to 10.04 for Speedcuber B. Speedcuber A would then have two compelling reasons not to go to the Geographical Region C Open on December 28, 20xx: staying ahead of Speedcuber B and staying under 10 seconds. A better scheme, IMO, would never penalize people for competing more. At the same time, though, we don't want to give an massive advantage to someone who goes to twenty competitions a year compared to someone who only goes to one or two but is as fast or faster. Here's one scheme (intended more as an example than anything else):

For any average of x seconds, assign a value of 10^-x points. Add up these values over a one-year period. Then, to make the numbers more life-sized, take the negative log (to base 10) of the total. Thus, ten 12s would count as equal to one 11. This may (or may not) work decently among the top few competitors, but I don't think this exact system would work particularly well as a whole, because I don't think a 37-second average is worth a hundred million times more than a 45-second one. Feel free to refine.

I do like the idea in principle though, and I think something of the sort should be implemented even if the competition format is changed (which looks unlikely at the moment). Point to consider: Is there any reason to do this ranking only on a yearly basis? I think it would make a lot of sense to have a continual year-to-date ranking (as many sports already do) with awards, if any, given at the end of each calendar year.

Another idea I really like: a formula to objectively compare historical speedcubing results, with extra value to results achieved earlier. For example, which is more impressive: Macky's 14.52 average at Caltech Fall '04 or faz's 8.52 average at New Zealand Champs '10? Furthermore, which is more impressive: Macky's 14.52, 15.28, 15.38, and 15.68 topping the "until 2004" average list, or faz's 8.52, 9.21, 9.37, 9.38, and 9.82 topping today's average list? [Based on the trajectory of my thread, I think I should point out that these last questions are rhetorical and NOT INTENDED TO START A FLAME WAR between Mackyites and fazz0rs.]


----------



## macky (Aug 18, 2010)

Ravi said:


> For any average of x seconds, assign a value of 10^-x points. Add up these values over a one-year period. Then, to make the numbers more life-sized, take the negative log (to base 10) of the total. Thus, ten 12s would count as equal to one 11. This may (or may not) work decently among the top few competitors, but I don't think this exact system would work particularly well as a whole, because I don't think a 37-second average is worth a hundred million times more than a 45-second one. Feel free to refine.


I do like the idea of a point system. Let me think more.



Ravi said:


> Point to consider: Is there any reason to do this ranking only on a yearly basis? I think it would make a lot of sense to have a continual year-to-date ranking (as many sports already do) with awards, if any, given at the end of each calendar year.


That sounds good.



Ravi said:


> Another idea I really like: a formula to objectively compare historical speedcubing results, with extra value to results achieved earlier. For example, which is more impressive: Macky's 14.52 average at Caltech Fall '04 or faz's 8.52 average at New Zealand Champs '10? Furthermore, which is more impressive: Macky's 14.52, 15.28, 15.38, and 15.68 topping the "until 2004" average list, or faz's 8.52, 9.21, 9.37, 9.38, and 9.82 topping today's average list?


How about the multiple of SD away from the mean of top 100 averages (best avg of 5 of the top 100 people, not results)? This gives a sense of impressiveness as perceived by the community at the time. Note that this accounts for the increasing difficulty of improvement and the greater talent pool today since both factors lead to lower SD. I can't think of any other measure that would correctly account for these factors now and in the future. In any case, this type of data belongs to the Statistics page.

[edit] I went ahead and calculated this (I thank emacs's keyboard macros).

Not enough data in 2004.

*Top 100 up to 2005*
Mean = 19.6730
SD = 2.3949
mean - macky (14.52) = 5.1530
(mean - macky) / SD = 2.1517


*Top 100 up to 2006*
Mean = 17.0077
SD = 1.5912
mean - Anssi (13.22) = 3.7877
(mean - Anssi) / SD = 2.3804

(My 13.34 here gives 2.2421 SD, more than my 14.52.)


*Top 100 up to 2007*
Mean = 14.8121
SD = 1.2073
mean - Gungz (11.76) = 3.0521
(mean - Gungz) / SD = 2.5280


*Top 102 up to 2008 (because of ties)*
Mean = 13.4360
SD = 0.9570
mean - Nakaji (11.28) = 2.1560
(mean - Nakaji) / SD = 2.2529


*Top 100 up to 2009*
Mean = 12.2808
SD = 0.7329
mean - Tomasz (10.07) = 2.2108
(mean - Tomasz) / SD = 3.0165


*Current top 101*
Mean = 11.4573
SD = 0.7042
mean - faz (8.52) = 2.9373
(mean - faz) / SD = 4.1711



As I suspected, my 14.52 is farther away from the top 100 mean then than faz's 8.52 is from today's mean, but the massive clustering in today's ranking means that his average is farther away in terms of SD and so more impressive by this measure--as it should be in view of today's larger talent pool and faster times!

This analysis can of course also be done on a global ranking and would produce different results (e.g. Gungz didn't have many more chances to compete because of his military service), but this alone is interesting because the fastest average of 5 is what top speedcubers have always aimed at. For me there's some discrepancy between the SD distance and the degree of "holy s.." reaction, especially with Tomasz's 10.07, but this is probably because I was desensitized to seeing these kinds of times by then.

The shock factor shouldn't necessarily correspond to a measure like this, but are there other SD-like values that would be appropriate here?


----------



## macky (Aug 19, 2010)

This is based on something Tyson pointed out to me. Can someone who knows statistics well help me out here?

For SD to work, should we take a bigger sample size to account for the bigger talent pool? Maybe top 1% of the ranking instead of a fixed number? Do we need to analyze the distribution of times to find an appropriate sample size to take? Can we do this in some non-arbitrary way? Are there other measures like SD that are appropriate here?


----------



## qqwref (Aug 19, 2010)

Ravi said:


> A better scheme, IMO, would never penalize people for competing more.


Although this sounds on the surface like a good criterion, I don't think it necessarily is. If you don't want to penalize people for competing more, you simply cannot do any kind of *global* average of their times - it is always a possibility that in the next competition a cuber will do badly and make their average worse. Or, even worse, take this hypothetical: a cuber goes to 20 competitions in the year, each with one round of 3x3 (haha). So his average average will be the mean of those 20 averages, or something similar. But suppose this same cuber, just by luck, missed the 10 competitions in which he did the worst (and performed the same as before in the 10 he attended). So now his average average is better, assuming 10 competitions is enough. Of course, this cuber couldn't know that if he had attended those other 10 competitions as well his statistics would look worse.

I do have an idea of a statistic that would at least always make it worthwhile to attend another competition: for a given year, the ranking would be by the best (rolling) avg5 of consecutive averages, with the caveat that (a) a majority of those averages are in the correct year, and (b) all the averages are within one year of that year. (You could change it to make all averages within the year, but I think this could overly penalize people who don't attend a huge number of competitions. This way, going to one large competition a year would guarantee you at least a spot in the ranking.) Anyway, that way going to the next competition couldn't hurt your best avg-avg, because you have already completed it.



macky said:


> This is based on something Tyson pointed out to me. For SD to work, should we take a bigger sample size to account for the bigger talent pool? Maybe top 1% of the ranking instead of a fixed number? Can someone who knows statistics help me out here?


You should take a bigger sample size. The top 100 in 2006 was much more spread out than it is now, simply because there were fewer people with world-class skill; the top 100 is more close now. I think using a fixed percentage of the total cuber pool might help, but it's not perfect since ideally you'd only want to include people who you'd describe as reasonably close to world-class.


----------



## Ravi (Aug 19, 2010)

qqwref said:


> If you don't want to penalize people for competing more, you simply cannot do any kind of *global* average of their times - it is always a possibility that in the next competition a cuber will do badly and make their average worse.



What's your view on a point system? That would avoid the problem you mentioned, since it's a global sum rather than an average (compare: ATP race in tennis, or http://primes.utm.edu/bios/top20.php). One objection against it would be that it might place too much value on the number of competitions one attends rather than the quality of one's results; there may or may not be a good way to fix this. (Appropriately chosen polynomials _might_ work instead of the exponential I gave.) I think it would be good, though, to rank everyone who's completed at least one average within the year. This would be quite inclusive: all of the top 56* speedcubers by best average have competed at least once this year, although some (Gabriel Dechichi Barbar, Nakajima, Gungz, Shinichiro Sato, and Takumi Yoshida for example) have competed only once. Some kind of point system seemed like the simplest way to reward people for competing more without actually setting a quota (other than one average per year).

Of course, your idea, or something like it, could work too.



* ... and I'm 57th. Can I play the "competition-starved Midwesterner" card?


----------



## qqwref (Aug 20, 2010)

Ravi said:


> qqwref said:
> 
> 
> > If you don't want to penalize people for competing more, you simply cannot do any kind of *global* average of their times - it is always a possibility that in the next competition a cuber will do badly and make their average worse.
> ...


I think this is a pretty major objection and removes most of the value of a standard points system (i.e. any one where you accumulate points in each competition). If two cubers are exactly equal, wouldn't the winner essentially be the one who has been to the most comps in the year? The "most medals in 3x3" part of the statistics is like this and is clearly biased against good cubers who also attend a ton of comps. For winning to be possible with my system (let's give it a name for ease of discussion, perhaps majority-in-year average-average or MIYAA) you only need to do three 3x3 rounds a year; doing more competitions helps, but only in the sense that you have more chances to get a good result. I don't think MIYAA would have much bias against very good cubers who only go to one or two comps in a year.


----------



## reThinking the Cube (Aug 20, 2010)

Mean of weighted results for the previous 100 weeks.

S = results, (can be either 5-avg or single solves)

N = #of weeks ago this result was entered.

For each result S, create (100-N) occurrences in the (for computational purposes only) data set of results[R].

Use the mean of R for your ranking value.

(qq) can probably explain it better than me.


----------



## keemy (Aug 20, 2010)

reThinking:
not say you are wrong but one of the things I feel (and I think others) is that going to more competitions should not overly negatively effect your ranking. Also, going to only 1 competition getting an amazing result and not competing for a long time should not give you a great advantage. Unless I have misunderstood your proposal your system does both of these things thus making it undesirable.

simplified you want (if i read correctly) is

\( (\sum_{n=0}^{99} (100-n)(a_n))/(\sum_{n=0}^{99} (100-n)(R_n)) \)

where \( n= \) number of weeks ago, \( a_n= \) sum of results n weeks ago, \( R_n= \) number of results n weeks ago.


----------



## reThinking the Cube (Aug 20, 2010)

keemy said:


> reThinking:
> not say you are wrong but one of the things I feel (and I think others) is that going to more competitions should not overly negatively effect your ranking. Also, going to only 1 competition getting an amazing result and not competing for a long time should not give you a great advantage. Unless I have misunderstood your proposal your system does both of these things thus making it undesirable.
> 
> simplified you want (if i read correctly) is
> ...



Hmmm. How to solve the problem of - "going to only 1 competition getting an amazing result and not competing for a long time should not give you a great advantage."?

Result based on X# of solves? Maybe this could be made to work, but I can see your point. 

Here is another idea:
http://en.wikipedia.org/wiki/Elo_rating_system


----------



## hawkmp4 (Aug 20, 2010)

I don't see how ELO rankings could be used in a situation where we don't have head to head matches.


----------



## keemy (Aug 20, 2010)

I thought about the ELO actually but I don't think it would work very well for cubing to put it simply there is a very large location bias (ex. faz who lives in Australia has basically no competitors remotely close to him in 3x3 would not be able to make his rating very high) 

and I don't think your method can be salvaged without drastic change such that it would be easier to think of something else entirely.

so far the thing I like the most is qq's, ravi's is alright but it doesn't scale very well and someone could get a decently large advantage by going to enough competitions (ex A avgs 13 and B avgs 12 well if A just goes to enough competitions to get 10 times as many solves as B then A will be ranked better)


----------



## Ravi (Aug 20, 2010)

qqwref said:


> I think this is a pretty major objection and removes most of the value of a standard points system (i.e. any one where you accumulate points in each competition).



Well, yes, if we're trying to make this totally independent of the number of comps attended. I guess a point system would be more a measure of "prolificness," so to speak. But perhaps it could be adapted by dividing the total score by the number of averages completed before converting back to units of time. That would give a sort of weighted average favoring the faster times. Do you think that's desirable at all? It would be rather slanted, but at least it wouldn't have as strong a tendency to discourage people from competing.


----------



## qqwref (Aug 21, 2010)

reThinking the Cube said:


> Hmmm. How to solve the problem of - "going to only 1 competition getting an amazing result and not competing for a long time should not give you a great advantage."?


Either you have to create a bias towards people who attend more competitions, or you have to (implicitly or explicitly) require someone to have a certain number of results to ensure a place in the ranking.



reThinking the Cube said:


> Here is another idea:
> http://en.wikipedia.org/wiki/Elo_rating_system


A system like this is not applicable to cubing because times are unaffected by the skill of the other people in the round. You could only really apply it to in-round/in-competition placements, and as keemy correctly pointed out you would still have many problems associated with people not leaving their region. Besides, why even consider such a subjective and opaque system when we can already objectively measure each solve to (near-)centisecond accuracy?



Ravi said:


> I guess a point system would be more a measure of "prolificness," so to speak. But perhaps it could be adapted by dividing the total score by the number of averages completed before converting back to units of time. That would give a sort of weighted average favoring the faster times.


Could be good, but it is dangerous to try to compromise between favoring those who attend few comps and those who attend many, because finding a balance that doesn't favor either side would be difficult. You'd also have to be careful to keep the measure simple, because using a statistic that is not computed in an obvious way leads to the community waiting for each new published ranking rather than actively trying to optimize their result (see: ELO), and that's boring


----------



## Faz (Aug 21, 2010)

I highly doubt that someone would not attend any competitions because they did really well in one, just to get their name on the stats page >_>


----------



## reThinking the Cube (Aug 21, 2010)

hawkmp4 said:


> I don't see how ELO rankings could be used in a situation where we don't have head to head matches.



Here is the standard ELO rating formula: 

Rnew = Rold + K(Tactual -Texpected)

Simply stated: new rating = old rating, changed by the difference between the actual results and the expected results, multiplied by a scaling factor, K.

ELO rating is actually an estimation of human performance.

We can therefore make CubELO = estimated time that this human will take to solve a cube.

This rating would be a *performance predictor*.

Glicko rating system is an optimized version of ELO, and uses rating deviation, and volatility factors, to get better predictions, but is more complicated.

Any and all official results by a competitor could be used to calculate their CubELO.



keemy said:


> I thought about the ELO actually but I don't think it would work very well for cubing to put it simply there is a very large location bias (ex. faz who lives in Australia has basically no competitors remotely close to him in 3x3 would not be able to make his rating very high)



There is no location bias (as there would be in chess), since competitors are competing against the clock, and clocks are located everywhere.



qqwref said:


> A system like this is not applicable to cubing because times are unaffected by the skill of the other people in the round. You could only really apply it to in-round/in-competition placements, and as keemy correctly pointed out you would still have many problems associated with people not leaving their region. Besides, why even consider such a subjective and opaque system when we can already objectively measure each solve to (near-)centisecond accuracy?



reAnswered™ above.


----------



## qqwref (Aug 21, 2010)

reThinking the Cube said:


> \( R_{new} = R_{old} + K(T_{actual} - T_{expected}) \)
> 
> Simply stated: new rating = old rating, changed by the difference between the actual results and the expected results, multiplied by a scaling factor, K.
> 
> ...


So you're not really proposing an ELO-type ranking system with points, but instead another statistical measure aimed at estimating a typical performance. Fair enough, but there are already several proposals for this (global-average of avg5, best avg5 of avg5, recent avg12 of avg5, etc.). Can you explain exactly how yours would be better than the other ones? How exactly would it work out mathematically, anyway? It seems like it might be equivalent to some other measure but I'm not sure since your formula is iterative.


----------



## keemy (Aug 21, 2010)

reThinking the Cube said:


> keemy said:
> 
> 
> > I thought about the ELO actually but I don't think it would work very well for cubing to put it simply there is a very large location bias (ex. faz who lives in Australia has basically no competitors remotely close to him in 3x3 would not be able to make his rating very high)
> ...



This would defeat the point in using something like the elo which is best used when you want to compared expected performance relative to other competitors which was really it's only upside (the scrambles wouldn't matter as much because anyone at the same comp would get the same scrambles in the final and thus their ranking would changed based on how they placed there)

What you want to do would regain a scramble bias (which was the only elo advantage) give a heavy incentive to not compete if you were out of shape (for the purpose of these rankings) and give a large advantage to people who only go to a few competitions and get good times.


----------



## reThinking the Cube (Aug 21, 2010)

qqwref said:


> reThinking the Cube said:
> 
> 
> > \( R_{new} = R_{old} + K(T_{actual} - T_{expected}) \)
> ...



*CubELO would NOT be an average*. It is better because I reThink it to be better, and my reasoning is better than flawless. Seriously, this is a dynamic performance predictor. What could be better than that? It is not, as you have possibly suggested, equivalent to any other measure that has been proposed so far.


----------



## qqwref (Aug 21, 2010)

reThinking the Cube said:


> CubELO would NOT be an average.


Yes it is, if I'm interpreting it correctly. Your CubELO formula increases the "estimation" if you do worse and decreases it if you do better, does it not?

Here, from Wikipedia: "In mathematics, an _average_, or central tendency of a data set is _a measure of the "middle" value_ of the data set." Emphasis added by me. The whole idea of any type of average is to predict a typical element of the data set (in this case a cubing performance).


----------



## hawkmp4 (Aug 21, 2010)

Even if we say we'll use some form of ELO (which just seems wrong to me, the usefulness of the ELO system comes from its ability to compare and rank, say, chess players, where wins and losses aren't a useful to directly evaluate a player's skill) that DOESN'T resolve ANY of the problems brought up in this thread- we'd still have to come up with a way to measure performance.


----------



## reThinking the Cube (Aug 21, 2010)

qqwref said:


> reThinking the Cube said:
> 
> 
> > CubELO would NOT be an average.
> ...



No it isn't, you are misinterpreting (again). An approximating Bayesian Updating Algorithm would be technically more correct. K factor scaling makes all the difference.



hawkmp4 said:


> Even if we say we'll use some form of ELO (which just seems wrong to me, the usefulness of the ELO system comes from its ability to compare and rank, say, chess players, where wins and losses aren't a useful to directly evaluate a player's skill) that DOESN'T resolve ANY of the problems brought up in this thread- we'd still have to come up with a way to measure performance.



wt(DNF)?:confused:


----------



## macky (Aug 21, 2010)

qqwref said:


> reThinking the Cube said:
> 
> 
> > CubELO would NOT be an average.
> ...



Careful, I wouldn't argue with rTtC over technicalities; that's how he gets you.



reThinking the Cube said:


> qqwref said:
> 
> 
> > reThinking the Cube said:
> ...


I have a bad feeling that he actually knows Bayesian Updating (easy to understand anyway) and purposefully misapplied the term here to something not at all Bayesian but with a similar feel.


----------



## hawkmp4 (Aug 21, 2010)

reThinking the Cube said:


> hawkmp4 said:
> 
> 
> > Even if we say we'll use some form of ELO (which just seems wrong to me, the usefulness of the ELO system comes from its ability to compare and rank, say, chess players, where wins and losses aren't a useful to directly evaluate a player's skill) that DOESN'T resolve ANY of the problems brought up in this thread- we'd still have to come up with a way to measure performance.
> ...



Okay, one more try, and then I will just assume you're trolling. It's not that difficult. 


> Simply stated: new rating = old rating, changed by the difference between the actual results and the expected results, multiplied by a scaling factor, K.


Using the ELO system wouldn't resolve any of the issues brought up in this thread, because we would still have to decide on what we would use to determine 'results.' For example, best average in a competition? That favors competitors who went to comps with more rounds of 3x3, is that what we want to do? Average of averages? That brings up many of the issues already discussed here.


----------



## qqwref (Aug 21, 2010)

reThinking the Cube said:


> qqwref said:
> 
> 
> > reThinking the Cube said:
> ...


OK, so CubELO is *not* trying to estimate a typical performance? If it IS, then it is a type of average, whether you like it or not (that's just what the definition is). You said yourself: "CubELO = estimated time that this human will take to solve a cube." If you think this direct quote from your post is wrong, feel free to correct it. A measure can't fit into the statistician's definition of average and also not be a type of average.

This has nothing to do with the above paragraph, but just to clarify, what is the relation between \( R_{old} \) and \( T_{expected} \)?

Also, if you legitimately want us to consider your idea, we'd need an answer to this:


qqwref said:


> Can you explain exactly how yours would be better than the other ones? How exactly would it work out mathematically, anyway?


Nobody has the luxury of just being able to suggest a measure and have everyone accept it; the measure itself has to be good, at least with respect to some criterion that people care about. You say your method is a "dynamic performance predictor." Well, yeah, almost all averages are, and especially so for averages which weight the most recent results more, which I think yours would do for some values of K. Why is yours better than others?


----------



## reThinking the Cube (Aug 25, 2010)

Spoiler






hawkmp4 said:


> I don't see how ELO rankings could be used in a situation where we don't have head to head matches.





keemy said:


> I thought about the ELO actually but I don't think it would work very well for cubing to put it simply there is a very large location bias (ex. faz who lives in Australia has basically no competitors remotely close to him in 3x3 would not be able to make his rating very high)





keemy said:


> This would defeat the point in using something like the elo which is best used when you want to compared expected performance relative to other competitors which was really it's only upside (the scrambles wouldn't matter as much because anyone at the same comp would get the same scrambles in the final and thus their ranking would changed based on how they placed there)
> What you want to do would regain a scramble bias (which was the only elo advantage) give a heavy incentive to not compete if you were out of shape (for the purpose of these rankings) and give a large advantage to people who only go to a few competitions and get good times.





hawkmp4 said:


> Even if we say we'll use some form of ELO (which just seems wrong to me, the usefulness of the ELO system comes from its ability to compare and rank, say, chess players, where wins and losses aren't a useful to directly evaluate a player's skill) that DOESN'T resolve ANY of the problems brought up in this thread- we'd still have to come up with a way to measure performance.





hawkmp4 said:


> Using the ELO system wouldn't resolve any of the issues brought up in this thread, because we would still have to decide on what we would use to determine 'results.' For example, best average in a competition? That favors competitors who went to comps with more rounds of 3x3, is that what we want to do? Average of averages? That brings up many of the issues already discussed here.





Technically it is more correct, to view cubing as a competition between a cuber and a scrambled puzzle. Competitors aren't really competing directly against each other(as in chess), but rather, they are competing directly against a scrambled puzzle (problem). The time required to solve the problem however, can be used as a comparative measure.

A cuber rating system would be analogous to this chess tactics rater http://chess.emrald.net/index.php 
Here is the description of that site's rating system:http://chess.emrald.net/ratinginfo.php



qqwref said:


> reThinking the Cube said:
> 
> 
> > \( R_{new} = R_{old} + K(T_{actual} - T_{expected}) \)
> ...



Texpected = Rold(ScrambleBias/ImportanceBias). K would be based on non-standard rating deviation of the competitor(which is also based on other factors such as new cuber/or length of time since last rating period), and rating deviation of the scramble (which could be determined from all rankable results recorded for this scramble). This is not the specific mathematical expression of this algorithm, but is just a general description of the *form* (a type of Kalman filter (http://en.wikipedia.org/wiki/Kalman_filter)


----------



## macky (Sep 12, 2010)

Before this thread becomes too old, I want to at least bring the main ideas to Stefan (who can access the WCA database).

A continual ranking, possibly with recognition at the end of each calendar year for a given year. The question is the ranking system.

(1) Mean of the last X (say 5) averages. All averages must be within the last 12 months.
(2) Point system, e.g. Ravi's.
(3) qq's majority-in-year average-average: Best (rolling) avg5 of consecutive averages, with the caveat that (a) a majority of those averages are in the correct year, and (b) all the averages are within one year of that year.

The motivation for (2) over (1) was to make it always worthwhile to compete. Because point systems have an added layer of arbitrariness and favors frequent competitors, I think (3), under which competing can only improve the ranking, is the best solution we've seen so far.

Any more thoughts to add?


----------



## Lucas Garron (Sep 12, 2010)

I don't think I have, so let me voice my personal objection against mostly 1) but also a bit of 3).

Often, I don't have normal conditions for my averages. By helping out at most competitions, I'm often in the position where I'm forced to do an early round average at an inconvenient time, and without warm-up. When I perform terrible, I perform really terrible, and it doesn't help when I'm not properly warmed up.
I don't really want to have to choose between being helpful at a critical time and making sure to care about my personal rank. I can really imagine having done well at a previous competition, but having Tyson getting ready to yell at me: "Lucas, we don't have time for this [organizers practicing]. Stop whining and do your first round solves right now."

That was a more particular example, but in general, the proposals are asking competitors to perform well throughout several consecutive conditions. For various reasons, people like me rarely get the chance to focus during 5 consecutive rounds.


----------



## Lucas Garron (Sep 12, 2010)

I don't think I have, so let me voice my personal objection against mostly 1) but also a bit of 3).

Often, I don't have normal conditions for my averages. By helping out at most competitions, I'm often in the position where I'm forced to do an early round average at an inconvenient time, and without warm-up. When I perform terrible, I perform really terrible, and it doesn't help when I'm not properly warmed up.
I don't really want to have to choose between being helpful at a critical time and making sure to care about my personal rank. I can really imagine having done well at a previous competition, but having Tyson getting ready to yell at me: "Lucas, we don't have time for this [organizers practicing]. Stop whining and do your first round solves right now."

That was a more particular example, but in general, the proposals are asking competitors to perform well throughout several consecutive conditions. For various reasons, people like me rarely get the chance to focus during 5 consecutive rounds.


----------



## Tyson (Sep 12, 2010)

Yeah, I have to actually agree with Lucas here. And me yelling at him is somewhat accurate.

Right now with how things go, I know that Lucas will easily advance into the second round even if his first round average is terrible by his standards. That's why I ask him to go first so we have staff for the first round.

Organizers of competitions have been sacrificing their times for the sake of the competition since competitions have been being run. The people who make the competition happen don't have the luxury of warming up for 15 minutes before the round, since they were dealing with the round before, and then have to go early to get the next round started.

But yeah, Lucas is a good sport.


----------



## AvGalen (Sep 12, 2010)

WOW, high quality discussion in here.

I am a bit afraid to mix myself in here with trivial real-life scenario's, but the last example of Lucas is really obvious to me. Right now, the best of the best don't go full-out in the first rounds (just like many of the best sprinters don't). The purpose of the first round (qualifying) simply isn't the same as the purpose of the last round (winning). 

Another factor that seems to be completely overlooked in this discussion is improvement. Although I think that it is almost impossible to enter the cube-world and improve to world level within a year, some people still manage to do this. If someone would currently start at 35 seconds average, next competition do 17, next 12, then 10, then 8.5 he still wouldn't be among the top of the list which seems just plain wrong to me.

I am afraid reality is just to complicated to get translated to math in this situation


----------



## macky (Sep 12, 2010)

Lucas Garron said:


> That was a more particular example, but in general, the proposals are asking competitors to perform well throughout several consecutive conditions. For various reasons, people like me rarely get the chance to focus during 5 consecutive rounds.



qq should correct me if I'm wrong, but my interpretation of (3) is a rolling avg5 of averages, meaning as usual the average of the middle 3 among some 5 consecutive averages. That means you can have rounds 2 and 3 of one competition and 3 rounds of the next, and have the first-round average of the second competition thrown out. While I sympathize, I don't see how you could better account for the conditions you described without either throwing out first-round results altogether (unreasonable since you can set first-round average PBs) or discarding more than 20% of average results on either side (allows for too many bad rounds).

In European competitions, first rounds are often split into several groups, with each group judging and scrambling for another. With two groups, you can end up judging and then having to immediately go without much warm-up. This certainly isn't as severe as the conditions that organizers face, but it should be pointed out that, even for ordinary competitors, there are sometimes inevitable less-than-ideal conditions.


----------



## qqwref (Sep 13, 2010)

Lucas Garron said:


> Often, I don't have normal conditions for my averages. By helping out at most competitions, I'm often in the position where I'm forced to do an early round average at an inconvenient time, and without warm-up. When I perform terrible, I perform really terrible, and it doesn't help when I'm not properly warmed up.


I agree that a system which took the mean of consecutive averages would disadvantage you, but there's no way to fix this without also providing a huge benefit to inconsistent but fast cubers. How can I distinguish someone who should be getting a 12 average every time (but sometimes does 15s due to bad competition conditions), compared to someone who is very inconsistent and almost always gets either a 12s or a 15s average? You might deserve a 12 on the rank, where he doesn't; but if you have the same results as him, how can we tell the difference? I think you just have to accept that helping out is something which may hurt your standing in a ranking like this.



AvGalen said:


> If someone would currently start at 35 seconds average, next competition do 17, next 12, then 10, then 8.5 he still wouldn't be among the top of the list which seems just plain wrong to me.


I suppose you're saying that if his real times are around 8.5-10 then he should be near the top of the list. But if you just look at the averages he's done, you can't know that for sure; he only has two "good" averages recorded at all, and statistically - if we only look at the average times - we can't distinguish this guy from someone who normally gets 10-12 but had two very bad rounds and one very good round. For someone like this, I'd say they simply have to do well in more competitions to place well in a ranking like this. The goal is not to reward someone who is fast once or twice in competition, but to reward someone who is consistently fast, so it would be wrong to allow someone to do well once or twice and be catapulted to near the top.


----------



## Bryan (Sep 13, 2010)

So what's the advantage of taking only the last 5 averages? If you have a really bad competition (poor lighting venue or something), then that's going to "haunt" you for a while. And if there's someone who's super fast, but only goes to one competition a year, they'll be unranked.

And like Lucas, I don't do any warmup.


----------



## macky (Sep 13, 2010)

qqwref said:


> I agree that a system which took the mean of consecutive averages would



Wait, so you do mean "mean" here?



Bryan said:


> So what's the advantage of taking only the last 5 averages? If you have a really bad competition (poor lighting venue or something), then that's going to "haunt" you for a while.



(3) is a rolling average of 5 averages. This has the advantage that competing can only improve the ranking. The same can't be said for say the average of all averages within the last year. The number can be changed, but 5 corresponds to 2 competitions (edit: 3 in some countries), which seems reasonable--more than one, but not too many so a bad competition can't haunt you for too long.



Bryan said:


> And if there's someone who's super fast, but only goes to one competition a year, they'll be unranked.


I don't think results from a single competition (2 or 3 averages) would do justice to what purports to be a global ranking. The year-to-date restriction needs to be kept to filter out inactive cubers, so it seems impossible to solve that problem.


----------



## Bryan (Sep 13, 2010)

macky said:


> The number can be changed, but 5 corresponds to 2 competitions (edit: 3 in some countries), which seems reasonable--more than one, but not too many so a bad competition can't haunt you for too long.



Sure, for 3x3, but in some places, you have to be extremely fast to get beyond the first round. Plus, this idea wouldn't work well for other events that just have 1 round. I would want something that would work for all events the same.


----------



## qqwref (Sep 13, 2010)

macky said:


> qqwref said:
> 
> 
> > I agree that a system which took the mean of consecutive averages would
> ...


Sorry, meant mean or average.


----------



## AvGalen (Sep 13, 2010)

qqwref said:


> AvGalen said:
> 
> 
> > If someone would currently start at 35 seconds average, next competition do 17, next 12, then 10, then 8.5 he still wouldn't be among the top of the list which seems just plain wrong to me.
> ...


Yes, I am saying that this guy should be among the top. Otherwise people that improve quickly will always remain absent in this ranking untill they are well established. Someone like Faz (not going to many competitions, then being top of the world) would have shown up in this ranking untill after he had broken the world record 2 or 3 times. So basically this ranking wouldn't have acknowledged his (current) skill because he didn't go to competitions enough while improving.
In a sport that is still seeing WR's broken by 15% a year with a new World Champion every 2 years and WR-holder every year a system that doesn't account for improvement is significantly flawed.
Wouldn't the solution be simple enough to assign a weighing factor based on time? Last competition counts for 2.5, the one before that for 2 (then 1.5, 1, 0.5) or something like that? I will leave it to the theorists and number crunchers to determine if 2.5/2.0/1.5/1.0/0.5 are good values and if they should be applied before or after removing best/worst

(I am still reading Ravi's topic where the "first rounds are different from finals" is discussed in enough detail)


----------



## qqwref (Sep 13, 2010)

It just seems that, to me, the idea behind a global ranking (separate from the overall WCA ranking; separate from the 2010-only WCA ranking) is to reward consistent speed - not just speed itself. You can't prove that you are consistently fast by only one or two good results. I don't think that waiting to have 4 decent averages recorded is too much to ask; Feliks had a sub-10 avg5 of averages after his 3rd competition.


----------



## macky (Sep 13, 2010)

Bryan said:


> macky said:
> 
> 
> > The number can be changed, but 5 corresponds to 2 competitions (edit: 3 in some countries), which seems reasonable--more than one, but not too many so a bad competition can't haunt you for too long.
> ...


These are also problems you can't solve as long as you have a global ranking system deserving of the name (though there must be another adjective that better describes what we're getting at...). Even if you consider all averages within the last year, there needs to be minimum requirement for the number of averages.

I think you're right that 3x3 is the currently only category where any global ranking makes sense because of practicality concerns (enough data for everyone fast). The problems you brought are good reasons why any other global ranking should appear in the Statistics page, if at all. For me, this also means that we should design the ranking system for puzzles that meets the practicality requirements. Then it makes sense for 3x3 now, and maybe in the future, it will make sense for other puzzles.


----------



## reThinking the Cube (Sep 13, 2010)

Please study this:

http://chess.emrald.net/ratinginfo.php

Replacing the words *tactitician* with *cuber*, and *problem* with *puzzle*, yields an excellent description for a superior CUBER Rating System. All the other proposals are feeble by comparison.


Spoiler



This page is based on the FICS help page on Glicko Rating (http://glicko.com).

As you may have noticed, each cuber has a rating and an RD. RD stands for ratings deviation. 
*What RD represents*
The Ratings Deviation is used to measure how much a cuber's current rating should be trusted. A high RD indicates that the cuber may not be competing frequently or that the cuber has not solved very many cubes yet at the current rating level. A low RD indicates that the cuber's rating is fairly well established. This is described in more detail below under RD Interpretation. 
*How RD Affects Ratings Changes *
In general, if your RD is high, then your rating will change a lot each time you solve a cube. As it gets smaller, the ratings change per solve will go down. However, the scramble's RD will have the opposite effect, to a smaller extent: if it's RD is high, then your ratings change will be somewhat smaller than it would be otherwise. 
*How RD is Updated*
In this system, the RD will decrease somewhat each time you solve a cube, because when you solve more cubes there is a stronger basis for concluding what your rating should be. However, if you go for a long time without solving any cubes, your RD will increase to reflect the increased uncertainty in your rating due to the passage of time. Also, your RD will decrease more if the scramble's rating is similar to yours, and decrease less if your scramble's rating is much different. 
*Mathematical Interpretation of RD *
Direct from Mark Glickman: 
Each cuber can be characterized as having a true (but unknown) rating that may be thought of as the cuber's average ability. We never get to know that value, partly because we only observe a finite number of solves, but also because that true rating changes over time as a cuber's ability changes. But we can estimate the unknown rating. Rather than restrict oneself to a single estimate of the true rating, we can describe our estimate as an interval of plausible values. The interval is wider if we are less sure about the cuber's unknown true rating, and the interval is narrower if we are more sure about the unknown rating. The RD quantifies the uncertainty in terms of probability: 
·	The interval formed by current rating +/- RD contains your true rating with probability of about 0.67. 
·	The interval formed by current rating +/- 2 * RD contains your true rating with probability of about 0.95. 
·	The interval formed by current rating +/- 3 * RD contains your true rating with probability of about 0.997. 
For those of you who know something about statistics, these are not confidence intervals, but are called central posterior intervals because the derivation came from a Bayesian analysis of the problem. These numbers are found from the cumulative distribution function of the normal distribution with mean = current rating, and standard deviation = RD. For example, CDF[ N[1600,50], 1550 ] = .159 approximately (that's shorthand Mathematica notation.) 
*Credits *
The Glicko Ratings System was invented by Mark Glickman, Ph.D. who is currently at Boston University.


----------



## Stefan (Sep 13, 2010)

macky said:


> Before this thread becomes too old, I want to at least bring the main ideas to Stefan (who can access the WCA database).



Well, the export should give you access to all relevant data, too 



macky said:


> (3) qq's majority-in-year average-average: Best (rolling) avg5 of consecutive averages
> ...
> under which *competing can only improve the ranking*



Not quite. Yes, competing can't hurt *past* scores, but it *can* hurt a *future* score. Imagine your last three averages were somehow awesome. You could go to another competition in a few days, but you're out of shape and have no time to prepare (important exam coming up). If you compete, you'd destroy your chance at an awesome 5-streak. You might better leave out this competition and go to another in a month, for which you can prepare a lot.

And let me throw out another idea: Average the better half of averages in the past rolling year, minimum three. So if you had 14, average the best 7. If you had 4, average the best 3.


----------



## qqwref (Sep 13, 2010)

Interesting proposal, Stefan. It would certainly do away with a single competition ruining your average (although a single competition could definitely hurt your average). I wonder if it would be fair to consistent people, though. By discarding only the bad averages, there is an incentive to have a high SD on your averages. It would mainly measure how good you do on the best half of your rounds, but would ignore how you perform on the worst half, and I think this might skew the results against people who always perform almost equally good.

I wonder if the "law of large numbers" is going to become very important in ideas like this. Anything which averages a variable number of results could have this problem. If you go to a large number of competitions a year, say 15+ 3-round ones, then an average ranking which tracks some fixed percentage of your rounds will tend to give a result pretty close to what is expected, given the distribution of your averages. But if you only go to 2 or 3 competitions, you could get lucky and do very well compared to what would be expected of you. Then there is a disincentive to go to more competitions - you can expect that your ranking would even out (i.e. get worse) if you do. Maybe such disincentives are things we should just accept, rather than trying to fix.


----------



## Stefan (Sep 13, 2010)

qqwref said:


> I wonder if it would be fair to consistent people, though.



Well, it all depends on what we're after. What is the question that the new ranking is supposed to answer?

If someone always gets 13 seconds averages, I'd say _he can get 13 seconds averages_. If someone got seven 15 seconds averages and seven 11 seconds averages, I'd say _he can get 11 seconds averages_. Yes, it's not as consistent as the first guy, but I guess I care more about capability than consistency. One could argue that the single best average shows capability even better, but I do care about consistency as well, and would prefer "reasonably consistent" capability over one-time-exceptional capability.

So for me, the question this ought to answer is "What is someone (reasonably) capable of?" and I'm fine with ignoring bad outliers.



qqwref said:


> If you go to a large number of competitions a year, say 15+ 3-round ones, then an average ranking which tracks some fixed percentage of your rounds will tend to give a result pretty close to what is expected



We could also cap it from both sides. Not just minimum 3 like my previous suggestion but maybe also maximum 10. So from those 45 averages, averages the best 10.

The "half" and 3 and 10 are of course arbitrary and up for discussion, could also be "square root" of number of averages, minimum 5, and maximum I guess isn't needed when using square root .

Or simply average the best five averages, I think that's been mentioned before and I quite like it for both its value and its simplicity.



qqwref said:


> Maybe such disincentives are things we should just accept, rather than trying to fix.



Yeah, I agree. If someone really misses out on a competition just because of some ranking, I don't really care much about him. Also, as far as I understand, it's only suggested as an _add-on_, not a _replacement_, of the current ranking which would remain the main ranking.


----------



## AvGalen (Sep 13, 2010)

(don't think I need to quote all of what The Pochman said)

This is starting to look good. Could a top 25 be calculated with some of the variables you mentioned?

And I agree about missing the competitions. If I just got really good results in the previous competition I would want to go to another competition as soon as possible to try to break them again. If I would fail, I wouldn't care about my "ranking in the statistics" at all.



Spoiler



If Stefan doesn't complain about the way I wrote his name, I will continue to call him The Pochman from now on


----------



## Stefan (Sep 13, 2010)

AvGalen said:


> Could a top 25 be calculated with some of the variables you mentioned?



I might do it, though I hope someone else will do it first...



Spoiler






AvGalen said:


> If Stefan doesn't complain about the way I wrote his name, I will continue to call him The Pochman from now on



I hereby complain . Don't like how it sets me apart, already annoys me how Dene always calls me "Mr. Pochmann".


----------



## AvGalen (Sep 13, 2010)

StefanPochmann said:


> AvGalen said:
> 
> 
> > Spoiler
> ...





Spoiler



while (DidHeComplain("The Pochman"))
{
OutputRandomWord(WordTypes.FourLetter);
}


----------



## qqwref (Sep 13, 2010)

Spoiler



No! You can't make a while loop using a history-lookup function! It'll run forever!


----------



## AvGalen (Sep 13, 2010)

qqwref said:


> Spoiler
> 
> 
> 
> No! You can't make a while loop using a history-lookup function! It'll run forever!



Exactly my point. StefanPochmann won't change his mind, so I will be cursing forever


----------



## Stefan (Sep 13, 2010)

AvGalen said:


> StefanPochmann


Gah, how about just Stefan?

back to topic?


----------



## AvGalen (Sep 14, 2010)

StefanPochmann said:


> AvGalen said:
> 
> 
> > StefanPochmann
> ...



Stefan is the real person. He might change his mind with future insights.
StefanPochmann is a user (although some people claim he is a knowledgebase with a verbal spanking mode) that would only change his mind if proven wrong (some people claim these are minor updates or bugfixes), which would most likely be done by .... StefanPochmann.
(if you add a ? I feel obligated to answer)
but okay, Stefan it will be.

Back to topic indeed, although I will spend my time on the blindfolded topic I opened and on travelling again


----------



## macky (Sep 17, 2010)

AvGalen said:


> And I agree about missing the competitions. If I just got really good results in the previous competition I would want to go to another competition as soon as possible to try to break them again. If I would fail, I wouldn't care about my "ranking in the statistics" at all.


Arnaud, I think you meant here that you don't care much about rankings in general, but let me answer the people who have suggested that no one would skip a competion because of a ranking, specifically one that is just on the statistics page. I also finally formulated my answer to Stefan's question


StefanPochmann said:


> What is the question that the new ranking is supposed to answer?


which I really should haved done from the start.

I want this ranking to give a reasonable prediction of a competitor's average if ey were to do one competition round tomorrow, given that ey is in eir typical competition condition. The average-of-5 ranking doesn't do this well for two reasons: a number of top-100 cubers have one or two exceptionally fast averages, and some competitors are inactive or have recently been performing noticeably worse than their best average. Essentially, the average-of-5 ranking is a ranking by _all-time best_ competition performance, whereas what I want is a ranking by _recent average_ competition performance. I would use this new ranking, for example, to predict placements if the World Championship were to be held in a week. In my opinion, such a ranking would be both more interesting and more important than the average-of-5 ranking.



StefanPochmann said:


> qqwref said:
> 
> 
> > If you go to a large number of competitions a year, say 15+ 3-round ones, then an average ranking which tracks some fixed percentage of your rounds will tend to give a result pretty close to what is expected
> ...



I like all these ideas. Simply taking some number of best averages would avoid the problem of possibly slower early-round averages, though I would still discard at least the fastest average for the reason mentioned above.



StefanPochmann said:


> If someone always gets 13 seconds averages, I'd say _he can get 13 seconds averages_. If someone got seven 15 seconds averages and seven 11 seconds averages, I'd say _he can get 11 seconds averages_. Yes, it's not as consistent as the first guy, but I guess I care more about capability than consistency. One could argue that the single best average shows capability even better, but I do care about consistency as well, and would prefer "reasonably consistent" capability over one-time-exceptional capability.



A single number can't describe both capability and consistency. We've seen from our discussion so far that, once you start adding several desirable criteria, it's not easy to come up with a ranking system that's not arbitrary. If we abandon the idea of a single ranking, there's a much simpler approach. For each competitor's page, provide a link to a "detailed statistics" page that uses all averages within the past year to compute and display any number of useful charts and values: frequency plot of averages (say within 0.25 ranges), mean, SD, median and quartertiles, number of rounds, etc. A separate customizable ranking page could then allow any ranking method we've already talked about or someone may later suggest. For now, provide several options, and let the community decide which ranking methods are useful. The discussion of whether to choose one method for the official global ranking may be more appropriate after the community has understood the various methods available.


----------



## AvGalen (Sep 17, 2010)

Thanks Macky, answering that question made the whole thread a lot more clear.

What this thread needs now is the following:
- Randomly pick 10 competitions from this year and calculate the "average average" in several different ways based on the results of the competitors in the 5 competitions before that
- Compare the "average average" with the actual results from these competitions to see which way(s) of calculating provide meaningful results

For this, database access, statistical knowledge and "Mathlab" and time/motivation will be needed. I personally only have 1.5 out of these 4 requirements


----------



## macky (Sep 19, 2010)

I bet Stefan has all 4, even if StefanPochmann denies it.


----------



## qqwref (Sep 19, 2010)

macky said:


> ey [...] eir


oh god my eyes ;_;



macky said:


> I want this ranking to give a reasonable prediction of a competitor's average if [th]ey were to do one competition round tomorrow


Aha. Then how about this:
- Take all averages in the past, say, 6 months. If we don't have at least, say, 5 rounds, choose the most recent 5 rounds instead.
- Calculate a weighted average, with a total weighting of 1, as follows: the first (i.e. most recent) is worth k; the second, k(1-k); the third, k(1-(k + k(1-k))); and so on, giving each average a weight of k times the remaining weight, except that the last two rounds should be weighted equally. k should be a number between 1/2 and 1, chosen based on exactly how much you want past rounds to be counted.


----------



## macky (Sep 19, 2010)

macky said:


> I want this ranking to give a reasonable prediction of a competitor's average if [th]ey were to do one competition round tomorrow





AvGalen said:


> Thanks Macky, answering that question made the whole thread a lot more clear.


Of course, that was just my formulation, decided after figuring that "best overall" performance within the past 12 months is harder to define. We can also include ranking methods that attempt to provide some definition for this.



qqwref said:


> oh god my eyes ;_;


Spivak pronouns amuse me.



qqwref said:


> Aha. Then how about this:
> - Take all averages in the past, say, 6 months. If we don't have at least, say, 5 rounds, choose the most recent 5 rounds instead.
> - Calculate a weighted average, with a total weighting of 1, as follows: the first (i.e. most recent) is worth k; the second, k(1-k); the third, k(1-(k + k(1-k))); and so on, giving each average a weight of k times the remaining weight, except that the last two rounds should be weighted equally. k should be a number between 1/2 and 1, chosen based on exactly how much you want past rounds to be counted.


Yeah, some weighted mean average should definitely be one of the options.



StefanPochmann said:


> I might do it, though I hope someone else will do it first...


OK, I'm going to hack together some php/MySQL to try to generate rankings under several of the methods mentioned. Could you email me the source code for statistics.php to use as a reference?


----------

