A deno script to extract JSON versions of Wikipedia Swadesh lists
You can install it here:
It takes no minutes and creates a single binary on your computer, and a
~/.deno directory, which has all the deno things, so if you don’t like it just remove that folder.
Wikipedia Swadesh lists
I wanted to compare Ukrainian and Russian (Слава Україні), so I looked for a Swadesh list on Wikipedia, and of course there are many, actually on Wiktionary:
Those two happen to have exactly the same
HTML structure, so parsing one will parse the other.
download-swadesh-list.js takes a URL on the command line, runs
parse-swadesh-list.js, and saves it to a file named
<language>_swadesh-lexicon.json. I have included the sample output
ukrainian_swadesh-lexicon.json in this repo so you can see what the output looks like.
Note: I think I might redesign this to take URLs from a file rather than command-line arguments, but here’s how it works for now.
You do this on the console:
$ deno run --allow-write --allow-net download-wikipedia-swadesh.js https://en.wiktionary.org/wiki/Appendix:Ukrainian_Swadesh_list
You run a deno program with
deno run <flags> <program-file>. Deno, unlike node, is “secure by default” — you have to give permission to your program to use the internet or write to or read from your filesystem when you run it. That’s what the two
--allow-net flags are doing. Then there’s the program itself
download-wikipedia-swadesh.js, and a URL, in this case
Might as well see how many Swadesh lists we can extract with this thing. It’s guaranteed to run into problems in its current form on some of them, however, since some languages have different columns in their Swadesh tables (Japanese, for instance.) Perhaps it’s worth modifying this to handle more languages. For now it might be easier to just find the ones that work for now. I dunno. 🤔