# WCA website crawling



## Ron (Jul 1, 2009)

Hi guys,

As you may have noticed the WCA website has become slower recently.
This is because our provider moved the WCA website to another (busier?) server. The reason mentioned was that some people are periodically crawling the WCA results.

So if you are using some tool to update your database of WCA results, then please stop doing this. If you need data, then send me an e-mail and I will gladly send you the data in a standardised format.
If there are generic requirements for data delivery, then let me know too. The discussion on the WCA Forum about this subject did not come to a conclusion.

Thanks,

Ron


----------



## StachuK1992 (Jul 1, 2009)

Hey Ron,

Approximately how large is the database?
I have a rather large server here, soon with FIOS connection, and there may be a chance of me being able to host at least a portion of the database, if that would be helpful.

StachuK


----------



## MichaelErskine (Jul 1, 2009)

I'm guilty off running automated searches of the WCA site but always with consideration and not very regularly. 

It would be extremely useful if the data normally searchable from the web interface were made available, perhaps on a mirror server somewhere, and kept up to date (e.g. once a week) in a reasonable relational format (perhaps a SQLite DB or MySQL dbdump).


----------



## JBCM627 (Jul 2, 2009)

Stachuk1992 said:


> I have a rather large server here, soon with FIOS connection, and there may be a chance of me being able to host at least a portion of the database, if that would be helpful.





msemtd said:


> It would be extremely useful if the data normally searchable from the web interface were made available, perhaps on a mirror server somewhere, and kept up to date (e.g. once a week) in a reasonable relational format (perhaps a SQLite DB or MySQL dbdump).



These two sound like they could go together well  I can see privacy being an issue though. Perhaps don't include birthdates or other sensitive info on the mirror?

If this comes through, while its being set up, the database could certainly use some restructuring... I talked with Dave briefly about this, and the "everything in one big table" format can be improved quite significantly to reduce memory usage and server load.


----------



## MichaelErskine (Jul 2, 2009)

JBCM627 said:


> I can see privacy being an issue though. Perhaps don't include birthdates or other sensitive info on the mirror?



Certainly - any organisation is obliged to do this for Data Protection. The data that is freely available via the web interface is all that need be exported - that is what is being web-crawled.


----------



## Ron (Jul 31, 2009)

Hi all,

The WCA database is now downloable at http://www.worldcubeassociation.org/results/wcadumpyyyymm.xlsx
We will post at the end of every month, so the first file is wcadump200907.xlsx (size 6.5MB).
If we forget to post, then send an e-mail to [email protected].
Now that we have the database online please stop crawling the WCA website to update your own databases.

Looking forward to your feedback.

Thanks,

Ron


----------



## Dave Campbell (Jul 31, 2009)

I cannot open the file as it was created by a newer version of Excel than the one i am running (and i am running 2002). I'd personally rather see an sql dump file than an Excel file.

Secondly, i see you have a naming convention in place, which is fine if you are trying to have versioning available. But i would suggest still making one "current" version that is always the same name each time. So i download the same file anytime i wish and i will get the latest version. I could then use the specific file names with the YYMMDD for any old version.

In the end, i don't crawl the WCA site, nor do i wish to have my own version. I just want to be able to query the current version for selected data. Which of course brings me back to my webservice request which is documented on the WCA board.


----------



## Ron (Jul 31, 2009)

Hi Dave,

Thanks for your feedback.

I specifically chose not to use a current file. Just to make sure that noone is downloading 6.5MB from the server just to check if there is a new version.

The problem with a web service is:
= if we have one service per competitor or so, then I am pretty sure we will get extremely heavy loads on the server, just by people who make unconsciously incompetent mistakes, e.g. calling the web service every time another web page is downloaded
= if we have one webservice for all competitors/results, then I am afraid caching will be a problem, which could again overload our server

I agree with you that the solution is not perfect, but at least it is easy for me and should still be workable for people who need the data.

About the Excel version: yes, I used Excel 2007 format, because that format includes compression. Otherwise the file size would be 18MB instead of 6.5MB.
The reason I chose Excel is that I think I tried everything, but exporting from MySQL to SQL queries gives problems with the special characters, like éöà. Any ideas on that are welcome.

Have fun,

Ron


----------



## masterofthebass (Jul 31, 2009)

David, I can create a sql dump for you if you would like. (as i'm sure that many people could )


----------



## MichaelErskine (Jul 31, 2009)

Ron said:


> The WCA database is now downloable at http://www.worldcubeassociation.org/results/wcadumpyyyymm.xlsx


Thanks Ron.

Processing...


----------



## brunson (Jul 31, 2009)

I noticed in the queries used to create ths Excel file that he didn't select * from the person table. I assume that would be to protect sensitive information like birthdates. The mysqldump would expose that information and might be a legal issue due to the number of minors in the database.


----------



## AvGalen (Jul 31, 2009)

masterofthebass said:


> David, I can create a sql dump for you if you would like. (as i'm sure that many people could )


As far as I know there is no such thing as an sql dump. You could make a mysql dump only if the backup was done in mysql and that would only be useful for people that run a mysql database.

I believe that Excel is a good option and that http://support.microsoft.com/kb/953336 (free Excel 2007 Viewer) might be helpful for some


----------



## Dave Campbell (Jul 31, 2009)

Ron said:


> Hi Dave,



Ron

That is fair about the current-file approach. Perhaps it would be a wise project to get one of your developers to do up a simple page that has a list of current files and when they were updated, so people are not doing it blindly. 

Not sure i completely understand your concern with the web service. It would be a basic REST architecture approach. The service just acts as a gateway to the data, and allows you to control which data you make available and even who you make it available to. 

The request to the web service contains what you are looking for, and who is looking for it. This approach allows you to permit or deny people as you see fit, even allowing certain data available to only certain people. You could charge a fee to people that wish to have the service. You also log their activity to ensure they are not sharing the access key around. 

To summarize, i would send a request like so:

http://www.worldcubeassociation.com/webservice.php?Key=1234&amp;Query=Person&amp;Value=Dave+Campbell


The data returned, instead of an HTML page that we are accustomed to receive via a browser, is an XML file with the response. I attached a very simple mock up of the type of XML file that could be returned.

And lastly, i am sure there is a way to handle the character encoding for an SQL dump file. You should be able to just handle things as UTF-8 or something. I will do some searching when i have more time to find out in case you wish to go that route. Of course, if any of the other guys know how to handle it already, feel free to beat me to it.


----------



## MichaelErskine (Jul 31, 2009)

There appears to be a hashed password in the competitions table 

The MySQL table definitions would be most useful for those of us importing back into MySQL!

EDIT: I've inferred some column types - perhaps they're not too far out -- e.g. : -

```
CREATE TABLE /*!32312 IF NOT EXISTS*/ "formats" (
  "id" varchar(10) NOT NULL DEFAULT '',
  "description" varchar(50) DEFAULT NULL,
  PRIMARY KEY ("id")
);

CREATE TABLE /*!32312 IF NOT EXISTS*/ "competitions" (
  "id" varchar(50) NOT NULL DEFAULT '',
  "name" varchar(50) DEFAULT NULL,
  "cityName" varchar(50) DEFAULT NULL,
  "countryId" varchar(50) DEFAULT NULL,
  "information" varchar(2000) DEFAULT NULL,
  "year" varchar(50) DEFAULT NULL,
  "month" varchar(10) DEFAULT NULL,
  "day" varchar(10) DEFAULT NULL,
  "endMonth" varchar(10) DEFAULT NULL,
  "endDay" varchar(10) DEFAULT NULL,
  "eventSpecs" varchar(500) DEFAULT NULL,
  "wcaDelegate" varchar(500) DEFAULT NULL,
  "organiser" varchar(500) DEFAULT NULL,
  "venue" varchar(800) DEFAULT NULL,
  "venueAddress" varchar(800) DEFAULT NULL,
  "venueDetails" varchar(800) DEFAULT NULL,
  "website" varchar(500) DEFAULT NULL,
  "cellName" varchar(100) DEFAULT NULL,
  "showAtAll" varchar(10) DEFAULT NULL,
  "showResults" varchar(10) DEFAULT NULL,
  "password" varchar(100) DEFAULT NULL,
  "showPreregForm" varchar(10) DEFAULT NULL,
  "showPreregList" varchar(10) DEFAULT NULL,
  "latitude" varchar(50) DEFAULT NULL,
  "longitude" varchar(50) DEFAULT NULL,
  PRIMARY KEY ("id")
);
```

EDIT2: I've imported the data into a MySQL database (rough and ready, no foreign keys, but accommodates the available data) and I have a mysqldump file available for anyone interested (bz2 compressed sql text 1.3Mb).


----------



## Lucas Garron (Jul 31, 2009)

Ron said:


> About the Excel version: yes, I used Excel 2007 format, because that format includes compression. Otherwise the file size would be 18MB instead of 6.5MB.
> 
> Have fun,
> 
> Ron


How about using http://us2.php.net/manual/en/book.zip.php or http://www.weberdev.com/get_example-4066.html on the XLS?
The .xlsx is 6.3 MB, and when I convert it to .xls using OpenOffice and zip it, it becomes 4.1 MB. And unzipping is much easier to do (say, automatically) than xlsx -> xls.

However, even if I need to convert, I'm happy with this. Mathematica imports it easily.


----------



## MichaelErskine (Jul 31, 2009)

Lucas Garron said:


> The .xlsx is 6.3 MB, and when I convert it to .xls using OpenOffice and zip it, it becomes 4.1 MB. And unzipping is much easier to do (say, automatically) than xlsx -> xls.



Or to be more inclusive (OpenOffice is free and cross-platform) import into OpenOffice and keep in OASIS OpenDocument format (already compressed - 3Mb - copy available for anyone interested)


----------



## Mike Hughey (Jul 31, 2009)

Ron said:


> The WCA database is now downloable at http://www.worldcubeassociation.org/results/wcadumpyyyymm.xlsx



I get "Page not found" when I try to access this. Am I doing something wrong, or has it been brought down because of problems?


----------



## masterofthebass (Jul 31, 2009)

The link has the yyyymm in it  You need to manually change it to the month/day.


----------



## Mike Hughey (Jul 31, 2009)

masterofthebass said:


> The link has the yyyymm in it  You need to manually change it to the month/day.



Wow I'm stupid. Sorry for being an idiot.


----------



## MichaelErskine (Jul 31, 2009)

mysqldump version here: http://www.tecspy.com/mick/wca-db/wca-dump200907.sql.bz2 (1.4Mb)
OpenDocument spreadsheet version here: http://www.tecspy.com/mick/wca-db/wca-dump200907.ods (3.0Mb)


----------



## Ron (Aug 3, 2009)

I added three new things to the Excel file (starting with the July 2009 version):
- Countries table
- Continents table
- SQL statements for creation of the tables

There are some additional tables that we use to generate the rankings on the pages without recalculating them all the time.
These tables are regenerated after adding new competition results.


----------



## brunson (Aug 12, 2009)

Ron,

How would you feel about providing a "mysqldump --no-data" of the original schema? I imported several sheets of the workbook last night and it's not too hard to infer the types, but since it's only a single command, I thought you may not mind.

Thanks


----------



## Ron (Aug 14, 2009)

> How would you feel about providing a "mysqldump --no-data" of the original schema?


Please clarify your question.


----------



## brunson (Aug 14, 2009)

Running the command "mysqldump --no-data schemaname" will generate a dump of all the SQL to create the schema without the data in it. That would be helpful to those of us who want to create a schema to import the data into.

I've also posted a more detailed request and an offer to assist over on the WCA site.

Thanks for all the work so far.


----------



## Ron (Sep 3, 2009)

The new file for August 2009 is online now.
You need to refresh the whole database, because we needed to change some past results as well.

Brunson, I tried what you said but it does not give me more information than the table definitions I added to the first page of the file. What more information do you need?


----------



## brunson (Sep 3, 2009)

Oh, I'm sorry, Ron. I didn't see that you had added that to the first page. My bad. 

Thanks!


----------



## brunson (Sep 3, 2009)

Dave Campbell said:


> I cannot open the file as it was created by a newer version of Excel than the one i am running (and i am running 2002). I'd personally rather see an sql dump file than an Excel file.


I just noticed that the results tab has exceeded the 65535 row limit of OpenOffice Calc and gnumeric, so I can't open it and retrieve all the result data. Bummer.


----------



## MichaelErskine (Sep 3, 2009)

brunson said:


> I just noticed that the results tab has exceeded the 65535 row limit of OpenOffice Calc and gnumeric, so I can't open it and retrieve all the result data. Bummer.


I shall create a sqldump version and an ODS with two results tabs when I get into the office tomorrow. There's no good reason to get locked into non-free software


----------



## brunson (Sep 4, 2009)

That would be so awesome, Michael. I was thinking of writing a python macro for OOCalc to automatically save the tabs as CSV, that's the only part that's still manual for me. I was disappointed at the 65535 row limitation of OO and ended up finding this: http://www.openoffice.org/issues/show_bug.cgi?id=30215

Normally I'd be all over complaining about proprietary formats, but Ron is doing us a huge service so I kept my mouth shut. 

If you can send it to me I can put it on my web server for anyone else that wants it.

Thanks again. And no end of thanks to Ron for making the data available in the first place.


----------



## MichaelErskine (Sep 4, 2009)

brunson said:


> That would be so awesome, Michael.


PM sent -- "_can has ODS?_" => "_kthxbye_" 

I still need to create an sqldump (changed now we have the table definitions -- but not MyISAM!!!) -- I'll do that later.


----------



## brunson (Sep 5, 2009)

Lucas Garron said:


> Ron said:
> 
> 
> > About the Excel version: yes, I used Excel 2007 format, because that format includes compression. Otherwise the file size would be 18MB instead of 6.5MB.
> ...




```
-rw-rw-r--. 1 brunson brunson  7144609 2009-09-03 08:41 wcadump200908.xlsx
-rw-rw-r--. 1 brunson brunson  3141192 2009-09-04 00:51 wcadump200908.ods
-rw-rw-r--. 1 brunson brunson  2438745 2009-09-05 13:19 wca200902-mysqldump.sql.gz
-rw-rw-r--. 1 brunson brunson  1694590 2009-09-05 13:15 wca200902-mysqldump.sql.bz2
-rw-rw-r--. 1 brunson brunson  1350371 2009-09-05 13:20 wca200902-csv.tar.bz2
```
In decreasing order of compactness: Excel, OOCalc, mysqldump gzip -9'd, mysqldump bzip2 -9'd, tar -cjf of the CSV files along with SQL scripts I made to create the schema and load it.


----------



## Bryan (Oct 2, 2009)

Do we have the new file for September? 

msemtd or brunson, do either of you have scripts that automatically import this into MySQL? I'm looking at implementing State Records.


----------



## Ron (Oct 2, 2009)

New file is online now.
Sorry for the delay. It is a busy time with WC 2009 coming up.


----------



## Pedro (Oct 2, 2009)

no results from Polish Open yet?


----------



## JBCM627 (Oct 2, 2009)

Bryan said:


> Do we have the new file for September?
> 
> msemtd or brunson, do either of you have scripts that automatically import this into MySQL? I'm looking at implementing State Records.



If you can export the spreadsheet to a csv or tsv, MySQL can import it: http://dev.mysql.com/doc/refman/5.1/en/load-data.html


----------



## brunson (Oct 2, 2009)

After I save each sheet to individual CSV files I use this script to load the data into the database.


```
use wca;
source schema.sql;

load data infile '/home/brunson/wca/results.csv' 
replace into table Results 
columns terminated by ',' optionally enclosed by '"' 
ignore 1 lines 
( id, pos, personId, personName, countryId, competitionId, 
  eventId, roundId, formatId, value1, value2, value3, value4, 
  value5, best, average, regionalSingleRecord, 
  regionalAverageRecord );

load data infile '/home/brunson/wca/competitions.csv' 
replace into table Competitions 
columns terminated by ',' optionally enclosed by '"' 
ignore 1 lines 
( id, name, cityName, countryId, information, year, month, 
  day, endMonth, endDay, eventSpecs, wcaDelegate, organiser, 
  venue, venueAddress, venueDetails, website, cellName, showAtAll, 
  showResults, password, showPreregForm, showPreregList, 
  latitude, longitude );

load data infile '/home/brunson/wca/persons.csv' 
replace into table Persons 
columns terminated by ',' optionally enclosed by '"' 
ignore 1 lines 
( id, subid, name, countryId, gender );

load data infile '/home/brunson/wca/rounds.csv' 
replace into table Rounds 
columns terminated by ',' optionally enclosed by '"' 
ignore 1 lines 
( id, rank, name, cellName );

load data infile '/home/brunson/wca/events.csv' 
replace into table Events 
columns terminated by ',' optionally enclosed by '"' 
ignore 1 lines 
( id, name, rank, format, cellName );

load data infile '/home/brunson/wca/formats.csv' 
replace into table Formats 
columns terminated by ',' optionally enclosed by '"' 
ignore 1 lines ( id, name );

load data infile '/home/brunson/wca/countries.csv' 
replace into table Countries 
columns terminated by ',' optionally enclosed by '"' 
ignore 1 lines 
( id, name, continentId, latitude, longitude, zoom );

load data infile '/home/brunson/wca/continents.csv' 
replace into table Continents 
columns terminated by ',' optionally enclosed by '"' 
ignore 1 lines 
( id, name, recordName, latitude, longitude, zoom );
```
The script references another script called schema.sql. It's just the creation statements from Ron's spreadsheet put into a single file:

```
CREATE TABLE `Competitions` (
  `id` varchar(32) NOT NULL default '',
  `name` varchar(50) NOT NULL default '',
  `cityName` varchar(50) NOT NULL default '',
  `countryId` varchar(50) NOT NULL default '',
  `information` mediumtext,
  `year` smallint(5) unsigned NOT NULL default '0',
  `month` smallint(5) unsigned NOT NULL default '0',
  `day` smallint(5) unsigned NOT NULL default '0',
  `endMonth` smallint(5) unsigned NOT NULL default '0',
  `endDay` smallint(5) unsigned NOT NULL default '0',
  `eventSpecs` text NOT NULL,
  `wcaDelegate` varchar(240) NOT NULL default '',
  `organiser` varchar(200) NOT NULL default '',
  `venue` varchar(240) NOT NULL default '',
  `venueAddress` varchar(120) default NULL,
  `venueDetails` varchar(120) default NULL,
  `website` varchar(200) default NULL,
  `cellName` varchar(45) NOT NULL default '',
  `showAtAll` tinyint(1) NOT NULL default '1',
  `showResults` tinyint(1) NOT NULL default '1',
  `password` varchar(45) NOT NULL default '',
  `showPreregForm` tinyint(1) NOT NULL default '0',
  `showPreregList` tinyint(1) NOT NULL default '0',
  `latitude` int(11) NOT NULL default '0',
  `longitude` int(11) NOT NULL default '0',
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `Continents` (
  `id` varchar(50) NOT NULL default '',
  `name` varchar(50) NOT NULL default '',
  `recordName` char(3) NOT NULL default '',
  `latitude` int(11) NOT NULL default '0',
  `longitude` int(11) NOT NULL default '0',
  `zoom` tinyint(4) NOT NULL default '0',
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `Countries` (
  `id` varchar(50) NOT NULL default '',
  `name` varchar(50) NOT NULL default '',
  `continentId` varchar(50) NOT NULL default '',
  `latitude` int(11) NOT NULL default '0',
  `longitude` int(11) NOT NULL default '0',
  `zoom` tinyint(4) NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `fk_continents` (`continentId`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `Events` (
  `id` varchar(6) NOT NULL default '',
  `name` varchar(54) NOT NULL default '',
  `rank` int(11) NOT NULL default '0',
  `format` varchar(10) NOT NULL default '',
  `cellName` varchar(45) NOT NULL default '',
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 PACK_KEYS=0;
CREATE TABLE `Formats` (
  `id` char(1) NOT NULL default '',
  `name` varchar(50) NOT NULL default '',
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `Persons` (
  `id` varchar(10) NOT NULL default '',
  `subId` tinyint(6) NOT NULL default '1',
  `name` varchar(80) NOT NULL default '',
  `countryId` varchar(50) NOT NULL default '',
  `gender` char(1) NOT NULL default '',
  `year` smallint(6) NOT NULL default '0',
  `month` tinyint(4) NOT NULL default '0',
  `day` tinyint(4) NOT NULL default '0',
  `comments` varchar(40) NOT NULL default '',
  KEY `fk_country` (`countryId`),
  KEY `id` (`id`),
  KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `Results` (
  `id` int(11) NOT NULL auto_increment,
  `pos` smallint(6) NOT NULL default '0',
  `personId` varchar(10) NOT NULL default '',
  `personName` varchar(80) NOT NULL default '',
  `countryId` varchar(50) default NULL,
  `competitionId` varchar(32) NOT NULL default '',
  `eventId` varchar(6) NOT NULL default '',
  `roundId` char(1) NOT NULL default '',
  `formatId` char(1) NOT NULL default '',
  `value1` int(11) NOT NULL default '0',
  `value2` int(11) NOT NULL default '0',
  `value3` int(11) NOT NULL default '0',
  `value4` int(11) NOT NULL default '0',
  `value5` int(11) NOT NULL default '0',
  `best` int(11) NOT NULL default '0',
  `average` int(11) NOT NULL default '0',
  `regionalSingleRecord` char(3) default NULL,
  `regionalAverageRecord` char(3) default NULL,
  PRIMARY KEY  (`id`),
  KEY `fk_tournament` (`competitionId`),
  KEY `fk_event` (`eventId`),
  KEY `fk_round` (`roundId`),
  KEY `fk_format` (`formatId`),
  KEY `fk_competitor` USING BTREE (`personId`)
) ENGINE=MyISAM AUTO_INCREMENT=66847 DEFAULT CHARSET=latin1 PACK_KEYS=0 AUTO_INCREMENT=66847 ;
CREATE TABLE `Rounds` (
  `id` char(1) NOT NULL default '',
  `rank` int(11) NOT NULL default '0',
  `name` varchar(50) NOT NULL default '',
  `cellName` varchar(45) NOT NULL default '',
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
```


----------



## Bryan (Oct 2, 2009)

So is there really not just a table with severyone's best? I know we can calculate it on our own, but it seems like that could cut down on a lot of bandwidth, since many people don't need all the results.

The other thing it does is for situations where the average may be different from the calculated average. There's a case where the Square-1 scramble was incorrect, so the solve was replaced with a DNF, but the mean was the mean of the remaining two. I believe this has only happened once though.


----------



## hr.mohr (Oct 15, 2009)

The WCA results page contains details on how many cubes was attempted in a mbf attempt but the Excel file does not contain this info. It only contain the "score"


----------



## masterofthebass (Oct 16, 2009)

the "score" has all that info built into it. From the wca sheet:

IF(x-y>y,-1,(99-y+x-y)*10000000+t*100+x-y)

where 
x=tried
y=solved
t=time

You can work backwards to find the rest.


----------



## hr.mohr (Oct 16, 2009)

Thanks Dan.


----------



## Ron (Nov 2, 2009)

The October 2009 file is now online.
http://www.worldcubeassociation.org/results/wcadump200910.xlsx


----------



## Ron (Nov 2, 2009)

> IF(x-y>y,-1,(99-y+x-y)*10000000+t*100+x-y)


We used this specific format of the results because we wanted to be able to sort the results on value.


----------



## Bryan (Nov 15, 2009)

Ron said:


> The October 2009 file is now online.
> http://www.worldcubeassociation.org/results/wcadump200910.xlsx



This doesn't work.

Does anyone happen to have the latest file that could export the results tables into CSV or MySQL format?


----------



## Ron (Nov 16, 2009)

Bryan,

The WCA server was migrated to a new server recently.
Not all files are on the new server yet. I will correct this later today.

Sorry,

Ron


----------



## Ranzha (Nov 16, 2009)

The server is down -sadfaec-.
Oh well, it'll be back up soon, amirite?

--R


----------



## Ron (Nov 16, 2009)

> The server is down


The main page is not working yet, but the deep links are: http://www.worldcubeassociation.org/results

Another problem is that accidentally most of the competition results have been deleted. We will correct that on Monday.


----------



## Bryan (Nov 17, 2009)

Bryan said:


> Ron said:
> 
> 
> > The October 2009 file is now online.
> ...



OK, this link is working again. I have no access to a system that can convert the results table*, so if someone could get me a CSV or MySQL format of the results table, that would be good.

*- I use OpenOffice on my Linux system, OpenOffice on my home laptop, and my work laptop has MS Office, but the older versions can't read past 65,536 rows.


----------



## JBCM627 (Nov 19, 2009)

Bryan said:


> Bryan said:
> 
> 
> > Ron said:
> ...


 
Just got access to a computer with Excel 2007. I've uploaded a (compressed) .csv version of the results sheet here:
http://koii.cubingusa.com/cm/assets/files/wcadump200910.zip (1.65MB)
http://koii.cubingusa.com/cm/assets/files/wcadump200910.7z (1.32MB, whee)


I also had no idea that the .xlsx files were structured just by zipping the xml files and such up... if you treat the .xlsx as a .zip file, you can unzip it, and get the xml files that way. Sort of cool.


----------



## Bryan (Nov 19, 2009)

JBCM627 said:


> Bryan said:
> 
> 
> > Bryan said:
> ...



Yeah, too bad the XML sucks  I did a lot of searching last night and found a python script that can convert an xlsx to a tsv file, which I can import into MySQL. I'll probably modify the python script so I can feed it the xlsx and it'll go right into the MySQL.


----------



## JBCM627 (Nov 19, 2009)

Bryan said:


> I did a lot of searching last night and found a python script that can convert an xlsx to a tsv file, which I can import into MySQL. I'll probably modify the python script so I can feed it the xlsx and it'll go right into the MySQL.


Well... Excel can also just save it as a tsv or csv, which is how I got that csv file. I thought that would be easier than a script, but I guess if you are automating it, that would be better.


----------



## hr.mohr (Nov 19, 2009)

Bryan said:


> *- I use OpenOffice on my Linux system, OpenOffice on my home laptop, and my work laptop has MS Office, but the older versions can't read past 65,536 rows.



The latest OpenOffice only displays 65.536 rows, if you save as CSV then all rows are exported.


----------



## Stefan (Nov 19, 2009)

JBCM627 said:


> http://koii.cubingusa.com/cm/assets/files/wcadump200910.zip (1.65MB)
> http://koii.cubingusa.com/cm/assets/files/wcadump200910.7z (1.32MB, whee)


Got that down to 952KB with NanoZip and 732KB with paq8px. The latter admittedly took 16 minutes . But wheeeeeee anyway.

Though yeah, putting out CSV or SQL shouldn't be that hard and I doubt something speaks against it. I'm currently getting to work on the system again and will look into this, too.


----------



## Ron (Nov 20, 2009)

I am also pro SQL. Now if someone would help me solve the special characters problem...


----------



## hr.mohr (Nov 20, 2009)

Ron said:


> I am also pro SQL. Now if someone would help me solve the special characters problem...



What is the problem with special characters?


----------



## LewisJ (Nov 20, 2009)

Unicode exporting? Is doing that a problem?


----------



## Ron (Nov 22, 2009)

> What is the problem with special characters?


If I use the export function in phpMyAdmin, then all special characters become garbage. So 'Frédérick Badie' becomes 'Fr~#8sd~#8srick Badie'.
I think I tried everything.


----------



## fw (Nov 22, 2009)

Ron said:


> > What is the problem with special characters?
> 
> 
> If I use the export function in phpMyAdmin, then all special characters become garbage. So 'Frédérick Badie' becomes 'Fr~#8sd~#8srick Badie'.
> I think I tried everything.



Thats not too bad, right? How many special characters are there.. We could write a small script which corrects them (I admit, thats not the best solution, but it would work. Just replace ~#8s with é again, etc.)


----------



## Bryan (Dec 6, 2009)

November's file isn't there yet. I would like to update State Rankings for CubingUSA.


----------



## Stefan (Dec 6, 2009)

Bryan said:


> November's file isn't there yet. I would like to update State Rankings for CubingUSA.


Working on the new export right now, almost finished, sorry for the wait.


----------



## Stefan (Dec 14, 2009)

Got Ron's OK to publish the new WCA export. Everything's built automatically with one click now, allowing more frequent updates. Let me know if something's problematic or unclear, or if you have suggestions for improvement.


----------

