I’ve been putting together a few wordlists and am making them available here for anyone that’s interested in them.
Each wordlist has been processed to only contain unique values, and each archive will contain a few variants on each wordlists (no spaces, no punctuation etc.) so that you can pick and choose the right one for your requirements. There is also a combined version, which contains the base data and all the variants in one (usually very big) file.
The number of words listed below shows the number of words in the base wordlist, not including the variants.
|Set||Title||No. of Base Words||Date Created|
|GeoNames (727MB)||Countries/Regions||614,860||June 2014|
|OpenLibrary (1.6GB)||Authors||6,403,934||June 2014|
|Books (1.2MB)||Authors||11,988||June 2014|
|IMDb (183.8MB)||Actors Characters||2,417,740||June 2014|
|MusicBrainz (299.9MB)||Artists||918,778||June 2014|
|Proverbs (0.04MB)||Proverbs||2,048||June 2014|
|Wikipedia (744.6MB)||Article Titles||23,039,493||June 2014|
|Wikibooks (1.7MB)||Article Titles||109,658||June 2014|
|Wiktionary (68.2MB)||Article Titles||3,959,195||June 2014|