# CoLPI - collective letter-pair images database for BLD. In all languages.



## Roman (Jul 4, 2019)

Inspired by Tom's Letter Pair List, this tool is intended to collect letter-pair images used by different people in all languages.
Huge thanks to *Tom Nelson* for the initial set of words and *Enoch Gray* for exporting this collection (as well as his own!) into CoLPI.

>>> *bestsiteever.ru/colpi* <<<

*Features*:

Quick search for words on certain letter-pair
Multiple languages support
PAO support
Export as table (.csv)
Export as Anki deck (.txt)




Spoiler: TODO



Before exporting to Anki or csv, make sure no words appear twice for different letter-pairs. How?

Deal with offensive words. Maybe in user settings panel he may specify "show offensive words" - an option which is disabled by default.

Deal with PAO type. So far users can specify/edit it but it isn't used anywhere.

Add two-letters-per-sticker option. Most likely I would never do that as it requires a complete DB rebuilding.

Passive learning page. Same as http://bestsiteever.ru/stare but for letter-pairs, so that beginners can "passively" learn words by staring at the screen while doing dishes or pushups.





Spoiler: Same tool for 3cycles?



edit: thanks to xyzzy, I have started this https://docs.google.com/document/d/1Eexb0EI5473gcbc81gMj8i4ZHb1Z7uqn_tzG7Tfzw10/edit?usp=sharing

I have an intention to do similar tool where users can submit their 3cycle/flip/twist/parity algs and vote for them.


----------



## xyzzy (Jul 4, 2019)

Roman said:


> I have an intention to do similar tool where users can submit their 3cycle/flip/twist/parity algs and vote for them. Fot that, I would need an advice from someone good at SQL on resolving this issue:


(I'm not good at SQL so maybe I'm talking nonsense.) I think what you're looking for is some way of canonically naming the 3-cycle cases. Rather than restricting to a fixed set of buffers, what if you consider the general problem of referring to _any_ 3-cycle (with _any_ buffer)? Then for edges you have six choices of a letter triplet, and for corners you have nine choices of a letter triplet; you could always pick the alphabetically earliest one as the canonical name for that 3-cycle. So for example, UF-FD-BL would have CKR, KRC, RCK, UHI, HIU, IUH as the six choices (using Speffz), and CKR is the alphabetically earliest one, so that's the canonical name. It wouldn't matter whether the user is actually using UF or DF (or FU or FD) as their buffer because the case's name doesn't depend on that.


----------



## CarterK (Jul 4, 2019)

xyzzy said:


> Rather than restricting to a fixed set of buffers, what if you consider the general problem of referring to _any_ 3-cycle (with _any_ buffer)?


This makes it a bit hard to learn. I think a good way to do it would be to choose the buffer, and then it gives you options for the next pieces after that. It would be a lot more user friendly to be able to see all the cycles for a specific buffer.


----------



## Hazel (Jul 4, 2019)

This is great! I added a few.


----------



## Lucas Garron (Jul 5, 2019)

Ooh, this is fantastic!

I will definitely be looking to this if I ever try to dust off my letter pair drafts.


----------



## Skittleskp (Jul 6, 2019)

This is absoultely amazing I will have to add my own things to it.


----------



## abunickabhi (Jul 6, 2019)

Yo, I will be doing heavy contributions to it.


----------



## abunickabhi (Jul 6, 2019)

Also, what about 5-Style and Letter Quads training tool.


----------



## Roman (Jul 9, 2019)

Updates!

Once logged in with WCA, you stay logged in forever until clicking "logout".
Quick quiz for the least filled letter-pair. You will see the question "Which word would you use for ...?" when open coLPI.
Fixed: accent marks collation. "Sabiá" and "Sábia" are now considered different words, "AÑ" and "AN" are different letter-pairs.
Added website footer with some stats.

A lot is yet to be done. Web programming turned out to be fascinating


----------



## Roman (Jul 10, 2019)

xyzzy said:


> (I'm not good at SQL so maybe I'm talking nonsense.) I think what you're looking for is some way of canonically naming the 3-cycle cases. Rather than restricting to a fixed set of buffers, what if you consider the general problem of referring to _any_ 3-cycle (with _any_ buffer)? Then for edges you have six choices of a letter triplet, and for corners you have nine choices of a letter triplet; you could always pick the alphabetically earliest one as the canonical name for that 3-cycle. So for example, UF-FD-BL would have CKR, KRC, RCK, UHI, HIU, IUH as the six choices (using Speffz), and CKR is the alphabetically earliest one, so that's the canonical name. It wouldn't matter whether the user is actually using UF or DF (or FU or FD) as their buffer because the case's name doesn't depend on that.



That is exactly what I need!
And I need to also refer to multiple cycles at once (like 2e2e algs) as well as flips and twists. And I think involving Speffz is redundant. Here is my approach: Cube cycles canonical representation.

Do you see any flaws in it? Am I reinventing the wheel?


----------



## qwr (Feb 24, 2021)

nice tool! why is ʧ included? It's probably clearer as ch or if you prefer one letter, č from Czech
also what does green mean? also can you show how many votes each phrase has?


----------



## Roman (Feb 24, 2021)

Thanks!
- "ʧ" vs. "ch" vs. "č" is just a matter of preference. I don't think "ch" wouldn't be clearer than the IPA symbol that explicitly denotes this sound.
- Green color means for this letter-pair there exists an image that has sufficient votes.
- There is no practical benefit from showing how many upvotes/downvotes does each word have (or is there?)


----------



## qwr (Feb 24, 2021)

Roman said:


> - There is no practical benefit from showing how many upvotes/downvotes does each word have (or is there?)


maybe more popular words are better. I don't know
why only have ch tho. why not sh or th.
actually in chinese, q has a ch sound like in qiyi. so maybe you can use q for ch


----------



## tx789 (Feb 24, 2021)

qwr said:


> nice tool! why is ʧ included? It's probably clearer as ch or if you prefer one letter, č from Czech
> also what does green mean? also can you show how many votes each phrase has?


č is worse than ch since how many people know about č who isn't czech. Ch is used in english so most people would be familiar with that. 

IPA allows for complete clarity. Assuming you know it but then /ʧ/ is a unique character. The sound th is in ipa is either a theta or eth.


----------



## qwr (Feb 24, 2021)

tx789 said:


> č is worse than ch since how many people know about č who isn't czech. Ch is used in english so most people would be familiar with that.
> 
> IPA allows for complete clarity. Assuming you know it but then /ʧ/ is a unique character. The sound th is in ipa is either a theta or eth.


yeah but ch isn't one letter. idk if that matters
also I think č was the former IPA symbol and easier to think about than ʧ


----------



## abunickabhi (Mar 14, 2021)

Not many contributors for Danish and Gujarati, I have to get interested speedcubers into contributing for these languages.

Also good feature update to toggle between kid's mode, and mode where all types of words are allowed.

Also the leader board system has changed, first it was all-time contribution and now its contribution done in last 90 days.


----------



## abunickabhi (Nov 8, 2022)

What should be the next language added to Colpi? (https://bestsiteever.ru/colpi/)

Languages already there:

Czech
Danish
German
English
Spanish
French
Gujarati
Hindi
Hungarian
Indonesian
Italian
Lithuanian
Macedonian
Malay
Dutch
Norwegian
Polish
Portuguese
Russian
Swedish
Slovene
Thai
Turkish
Vietnamese
Chinese


----------



## abunickabhi (Nov 8, 2022)

https://github.com/nbwzx/blddb/blob/main/assets/json/imageAlgToInfo.json

For Chinese category, can we "add" this list which is made by BLDers from china. @Roman 
It contains all the words they use to memo for that particular letter pair.


----------

