# Some 3x3 BLD stats



## PatrickJameson (Mar 14, 2011)

My original reason for doing this was to see if there was a significant spike of success rates near when the paper regulation was put in place. Unfortunately there's insufficient data to prove or disprove my hypothesis. But I thought meh, why not publish the stats.

Although there _is_ a significant drop in average success rate after February 2007 there's also a significant rise in number of attempts(which could be caused by the increase in talk about blindfolded solving during that time). 

One interesting stat is that from 2007-2011 the success rate was pretty consistent at ~29.7±1.

Here's the stats I gathered in case anyone wants them for whatever reason. 

Per year:

```
Year	Total	Percent Solved
2003	3	66.6666666666667
2004	61	29.5081967213115
2005	254	37.4015748031496
2006	678	34.070796460177
2007	1412	30.7365439093484
2008	3647	29.9972580202907
2009	5155	28.8069835111542
2010	7397	28.7143436528322
2011	1725	27.4782608695652
```

Per Month:

```
Year	Month	Total	Percent Solved
2003	8	3	66.6666666666667
2004	4	2	50
2004	7	9	66.6666666666667
2004	8	20	20
2004	10	30	23.3333333333333
2005	1	6	50
2005	4	26	46.1538461538462
2005	5	6	66.6666666666667
2005	6	2	50
2005	8	27	29.6296296296296
2005	9	20	50
2005	10	71	28.169014084507
2005	11	72	38.8888888888889
2005	12	24	37.5
2006	1	33	30.3030303030303
2006	3	32	40.625
2006	4	123	27.6422764227642
2006	5	47	36.1702127659575
2006	7	41	29.2682926829268
2006	8	95	38.9473684210526
2006	9	90	36.6666666666667
2006	10	75	38.6666666666667
2006	11	142	32.3943661971831
2007	1	54	27.7777777777778
2007	2	42	33.3333333333333
2007	3	66	36.3636363636364
2007	4	161	27.9503105590062
2007	5	174	28.1609195402299
2007	6	51	41.1764705882353
2007	7	72	22.2222222222222
2007	8	136	27.2058823529412
2007	9	122	27.8688524590164
2007	10	211	39.8104265402844
2007	11	212	28.3018867924528
2007	12	111	31.5315315315315
2008	1	126	23.8095238095238
2008	2	244	35.655737704918
2008	3	278	28.7769784172662
2008	4	451	29.9334811529933
2008	5	387	31.0077519379845
2008	6	138	24.6376811594203
2008	7	470	28.936170212766
2008	8	288	28.125
2008	9	244	27.4590163934426
2008	10	408	36.0294117647059
2008	11	391	31.2020460358056
2008	12	222	24.7747747747748
2009	1	255	30.5882352941176
2009	2	558	27.9569892473118
2009	3	309	25.2427184466019
2009	4	462	27.0562770562771
2009	5	452	28.7610619469027
2009	6	272	31.25
2009	7	384	22.1354166666667
2009	8	692	30.0578034682081
2009	9	383	27.9373368146214
2009	10	603	34.8258706467662
2009	11	435	29.1954022988506
2009	12	350	27.4285714285714
2010	1	515	32.2330097087379
2010	2	685	26.1313868613139
2010	3	334	30.2395209580838
2010	4	578	27.3356401384083
2010	5	453	23.3995584988962
2010	6	337	32.0474777448071
2010	7	795	30.062893081761
2010	8	899	28.0311457174639
2010	9	679	28.4241531664212
2010	10	972	32.9218106995885
2010	11	609	26.2725779967159
2010	12	541	26.2476894639556
2011	1	632	28.9556962025316
2011	2	835	25.8682634730539
2011	3	258	29.0697674418605
```

Script:

```
#!/usr/bin/perl -w

use strict;
use warnings;

open FH, "data/WCA_export_Competitions.tsv";
my @rawComps = <FH>;
close FH;

my @fieldNames = split /\t/, $rawComps[0];
shift(@rawComps); 

my %compInfo;
foreach my $line (@rawComps) {
	my @curLine = split /\t/, $line;
	for (my $i = 0;$i < $#curLine;$i++) {
		$compInfo{$curLine[0]}{$fieldNames[$i]} = $curLine[$i];
	}
}

open FH, "data/WCA_export_Results.tsv";
my @rawEvents = <FH>;
close FH;

shift(@rawEvents);

my (@eventInfo, %solved, %attempts);
foreach my $line (@rawEvents) {
	my @eventInfo = split /\t/, $line;
	
	if ($eventInfo[1] eq '333bf') {
		for my $solveNum (10 .. 14) {
			my $year = $compInfo{$eventInfo[0]}{'year'};
			my $month = $compInfo{$eventInfo[0]}{'month'}-1;
			
			if ($eventInfo[$solveNum] > 0) {
				$solved{$year}[$month]++;
				$attempts{$year}[$month]++;
			} elsif ($eventInfo[$solveNum] < 0) {
				$attempts{$year}[$month]++;
			}
		}
	}
}

print "Year\tMonth\tTotal\tPercent Solved\n";
for my $key (sort keys %attempts) {
	for my $month (0 .. 11) {
		if (defined $attempts{$key}[$month]) {
			print $key."\t".($month+1)."\t".$attempts{$key}[$month]."\t".(100*($solved{$key}[$month]/$attempts{$key}[$month]))."\n";
		}
	}
}
```

From http://www.worldcubeassociation.org/results/misc/export.html:

```
Export of the World Cube Association results database

Date:     March 11, 2011
Remarks:  none
Contact:  Ron van Bruchem Netherlands [email][email protected][/email]
Website:  [url]http://www.worldcubeassociation.org/results[/url]

Description:
  This file contains public information on all official WCA competitions,
  WCA members and WCA competition results.

Goal:
  Goal of this file is for members of our community to do analysis on the
  information for statistical and personal purposes.

Allowed use: Information and parts of it may be published online,
but only under the following conditions:
  - A clearly visible link to World Cube Association website is added
    ([url]http://www.worldcubeassociation.org[/url]) with the notification that
    World Cube Association is the source and owner of the information.
  - A clearly visible notification is added that the published information
    is not actual information.
  - A clearly visible link to [url]http://www.worldcubeassociation.org/results[/url] is
    added with the notification that the actual information can be found via
    that link.
  - A clearly visible notification which date is taken for the source of the data.
  - The style and format of the information must be clearly distinguishable
    from the official WCA website

Software created by:
  Clément Gallet    France
  Stefan Pochmann   Germany
  Josef Jelinek     Czech Republic
  Ron van Bruchem   Netherlands


The export consists of these tables:
  Persons        WCA competitors
  Competitions   WCA competitions
  Events         WCA events (Rubik's Cube, Megaminx, etc)
  Results        WCA results per competition+event+round+person
  Rounds         The round types (first, final, etc)
  Formats        The round formats (best of 3, average of 5, etc)
  Countries      Countries according to [url]http://en.wikipedia.org/wiki/List_of_countries[/url]
  Continents     Continents

Most of the tables should be self-explanatory, but the result values of
the Results table need some explanation:

- The result values are in fields value1-value5, best and average.
- Value -1 means DNF
- Value -2 means DNS
- Value 0 means "nothing", for example a best-of-3 has value4=value5=average=0
- Positive values depend on the event, see column "format" in Events.
  - Most events have format "time", where the value represents centiseconds.
    For example, 8653 means 1 minute and 26.53 seconds.
  - Format "number" means the value is a raw number, currently only used
    by "fewest moves" for number of moves.
  - Format "multi" is for old and new multi-blind, encoding not only the time
    but also the number of attempted and solved cubes. Writing the value in
    decimal it is interpreted like this:
      old: 1SSAATTTTT
             solved        = 99 - SS
             attempted     = AA
             timeInSeconds = TTTTT (99999 means unknown)
      new: 0DDTTTTTMM
             difference    = 99 - DD
             timeInSeconds = TTTTT (99999 means unknown)
             missed        = MM
             solved        = difference + missed
             attempted     = solved + missed
    Note that this is designed so that a smaller value means a better result.
```


----------



## Stefan (Mar 14, 2011)

I got the same numbers using SQL, though I'd say let's not count DNS as attempts. Then the numbers differ slightly:


```
Month     Total   Percent Solved
2003-08      3    66.67
2004-04      2    50.00
2004-07      9    66.67
2004-08     18    22.22
2004-10     29    24.14
2005-01      6    50.00
2005-04     24    50.00
2005-05      6    66.67
2005-06      2    50.00
2005-08     27    29.63
2005-09     19    52.63
2005-10     67    29.85
2005-11     70    40.00
2005-12     24    37.50
2006-01     32    31.25
2006-03     27    48.15
2006-04    113    30.09
2006-05     43    39.53
2006-07     41    29.27
2006-08     95    38.95
2006-09     88    37.50
2006-10     73    39.73
2006-11    139    33.09
2007-01     53    28.30
2007-02     40    35.00
2007-03     66    36.36
2007-04    156    28.85
2007-05    168    29.17
2007-06     51    41.18
2007-07     72    22.22
2007-08    134    27.61
2007-09    119    28.57
2007-10    209    40.19
2007-11    196    30.61
2007-12    108    32.41
2008-01    126    23.81
2008-02    234    37.18
2008-03    272    29.41
2008-04    445    30.34
2008-05    359    33.43
2008-06    136    25.00
2008-07    456    29.82
2008-08    268    30.22
2008-09    244    27.46
2008-10    404    36.39
2008-11    390    31.28
2008-12    213    25.82
2009-01    248    31.45
2009-02    542    28.78
2009-03    303    25.74
2009-04    451    27.72
2009-05    438    29.68
2009-06    261    32.57
2009-07    368    23.10
2009-08    675    30.81
2009-09    367    29.16
2009-10    591    35.53
2009-11    421    30.17
2009-12    343    27.99
2010-01    503    33.00
2010-02    666    26.88
2010-03    328    30.79
2010-04    562    28.11
2010-05    440    24.09
2010-06    333    32.43
2010-07    747    31.99
2010-08    843    29.89
2010-09    644    29.97
2010-10    952    33.61
2010-11    583    27.44
2010-12    529    26.84
2011-01    619    29.56
2011-02    797    27.10
2011-03    256    29.30
```

Query:

```
select year, month, solved+dnf, 100*solved/(solved+dnf)
from
  (select year, month,
          sum((value1> 0)+(value2> 0)+(value3> 0)+(value4> 0)+(value5> 0)) solved,
          sum((value1=-1)+(value2=-1)+(value3=-1)+(value4=-1)+(value5=-1)) dnf
   from Results, Competitions
   where Competitions.id = competitionId
   and eventId = '333bf'
   group by year, month) tmp
order by year, month
```
Formatting:

```
perl -ne "printf qq(%d-%02d%7d%9.2f\n), split /[, ]+/" raw.txt > nice.txt
```


----------



## PatrickJameson (Mar 15, 2011)

I thought about DNS's and decided on keeping them in with the thought process of, people typically make a personal decision to take a DNS. In the case of finding a result to my original hypothesis, those who cheated may have decided not to continue after realizing they can not cheat with the regulation in place, making that DNS count as an attempt and a failure. Either way would have been bias towards one side of the hypothesis, however. Meh.

Of course for the sake of unrelated statistics, I'd agree with you .


----------



## Zane_C (Mar 21, 2011)

There are now officially 1000 people to have solved the 3x3 blindfolded.


----------



## Dene (Mar 21, 2011)

Dene is contributing 100% success rate to the DNS-excluded list :tu

But yea that is a cool statistic. It shows without a doubt that the present state of blindfold cubing has a success rate of just under a third.

It's very consistent too, which is interesting. But why is that? Two obvious options might be that either there is a group of about two thirds of cubers that have higher success rates, and a group of about one third of cubers that have lower success rates; or everyone roughly has about a 29% success rate. I'd be interested to see what happens to those numbers perhaps when the fastest solvers (say, top 200) were separated from the rest; or maybe when those with more attempts (say, >20) were separated from those with less attempts.

I dunno, just some ideas that probably wouldn't come to anything.


----------



## Mike Hughey (Mar 21, 2011)

I wonder a little if the reason for a success rate just under a third is the fact that most competitions do best of 3. Most people are just after one good solve (different people define "good" differently, but it's generally true - people like me excepted - I want 3 good solves!), so when they get a decent success, they go for broke on their other solves to try to get a PB time, probably dramatically decreasing their accuracy.

I also would love to see some of those other numbers.


----------



## Dene (Mar 21, 2011)

Oh that is a much better explanation. Teach me for trying to understand something I never do.


----------



## AvGalen (Mar 21, 2011)

Format of blind: best of 3.
Successrate of blind: 1 of 3.
Correlation found  (agreeing with Mike here)

Seriously, I try to get the first cube correct and try to go as fast as I can on the next cubes. This approach doesn't actually work for me because my "as fast as possible" solves aren't faster, but ARE often DNF's. However, IF I get the first one correct I use this approach anyway. Others will probably do the opposite, try to go as fast as possible on the first 2 approaches and when they fail both they just try to get the last one correct.

Before the paper was added blind was "for the elite cubers". After that time it got more popular, more people tried it and that resulted in a lower successrate. I also think that best of 2 was more common in the past, leading to a higher successrate for the reason I stated above.

I don't see any reason to think that people used to cheat before the paper and don't cheat anymore now. That doesn't mean that the paper is a bad idea though

(interpreting statistics is fun)


----------



## miniGOINGS (Mar 21, 2011)

AvGalen said:


> I don't see any reason to think that people used to cheat before the paper and don't cheat anymore now. That doesn't mean that the paper is a bad idea though
> 
> (interpreting statistics is fun)


 
How do you think that people currently cheat/try to cheat?

I think that if nothing else, using the paper deters would-be cheaters as well as adding validity to WR times, etc.


----------



## qqwref (Mar 21, 2011)

PatrickJameson said:


> I thought about DNS's and decided on keeping them in with the thought process of, people typically make a personal decision to take a DNS. In the case of finding a result to my original hypothesis, those who cheated may have decided not to continue after realizing they can not cheat with the regulation in place, making that DNS count as an attempt and a failure. Either way would have been bias towards one side of the hypothesis, however. Meh.


I see your point, but I don't think it's a useful/correct approach. If someone cheated, the paper regulation would only make them take DNS results in one competition (the first they attend that uses paper), whereas most people with multiple DNS results have done them over more than one competition. So, even if there are a substantial number of cheaters out there, I'd expect that the vast majority of DNS results are due to someone just not wanting to do any more BLD attempts. It might not hurt your hypothesis test to count all DNS results as DNFs, but the accuracy rates you get will be artificially low.


----------



## d4m4s74 (Mar 21, 2011)

Being a beginner to blindcubing I get conflicting memories when solving multiple cubes blindfolded in a row. So I'll probably add to the DNS stats next week (once I do one successful solve I'll call it an event (play on calling it a day))

also, do solves you're not allowed to do (as in best of 3 if you're under 5 minutes in one of the first 2 solves, otherwise best of 2)


----------



## AvGalen (Mar 21, 2011)

miniGOINGS said:


> How do you think that people currently cheat/try to cheat?
> 
> I think that if nothing else, using the paper deters would-be cheaters as well as adding validity to WR times, etc.


 
I think that people don't try to cheat now at all. The most probable way they could still cheat would be by knowing the scrambles before they attempt solving it, or by having a "friendly" judge


----------

