JW Spelling Bee 1: Unicorns and Other Oddities
Spelling Bee Unicorns and Other Oddities
The word JUKEBOX is a unicorn in the New York Times puzzle Spelling Bee. It's one of 5 unicorns (my label) in Spelling Bee, and it has the honor of being the most unicorn (unicorniest?) of the bunch, for reasons I'll explain later.
I got hooked on Spelling Bee earlier this year. It's a game in which you try to make as many words as possible out of 7 given letters. I'm also a Mathematica nerd, and I started dreaming up ways that Mathematica could help me analyze Spelling Bee puzzles.
My first foray into Spelling Bee programming was to write code that gives all possible solutions to a given puzzle (https://www.wolframcloud.com/obj/jeffw/SpellingBee). I wrote it so that it could run on my phone, which was handy for me because I usually do Spelling Bee while eating breakfast. For those of you who immediately thought, "That's just cheating," I will note that I use the program as a tool of last resort -- to find remaining words when I am really stuck. I have also found it handy to learn those special words that come up again and again in Spelling Bee -- words like BONOBO and RATATAT that are made from only a few letters.
Lately I've been doing some deeper dives into solutions with Mathematica and have had a blast discovering things about the puzzle that I could never do without computational technology.
Rules of Spelling Bee
For those of you new to Spelling Bee, I'll do a quick review of the rules. The game is to create words using letters from the "hive." I've shown an example here.
- Words must contain at least 4 letters.
- Words must include the center letter.
- Letters can be used more than once.
- Each puzzle contains at least one pangram. A pangram is a word that uses all 7 letters (at least once).
- Words are from the NYT's dictionary. They do not include words they consider offensive.
(A brief digression into dictionaries . . . a key starting point for any
analysis. I don't have access to the NYT dictionary, so I used a
92,518-word dictionary in Mathematica (available via
"DictionaryLookup[]"). I have found that it pretty faithfully represents
what's in the NYT dictionary. My dictionary tends to be missing a few
newer words such as ROMCOM, but otherwise is solid.)
Spelling Bee has a point system based on characteristics of the words (e.g., length), but I didn't make that point system a focus of my analysis, so I won't go into it here.
An unwritten rule of Spelling Bee is that none of the puzzles contain the letter "S." Why do they do this? Because including "S" would greatly increase the number of words, making the puzzles "too long" for users. If other people are anything like me, they're willing to give only a certain amount of time to solving a puzzle. So a puzzle has to hit that sweet spot of "takes awhile to figure out, but not too long," and removing the "S" accomplishes that.
Some Helpful Vocabulary and Notation
As I dug into the solutions of Spelling Bee, I found myself tripping over how to describe some aspects of it, so I started using a couple terms that I'm going to introduce here to make the discussion easier to understand.
Puzzle = one solvable instance of a NYT Spelling Bee puzzle, like the one pictured above. Note that my definition, a puzzle cannot consist of letters for which there no valid solution. My notation for a puzzle is the 7 letters that make up the puzzle, in lowercase alphabetical order, with the center letter underlined. So adelqru is one puzzle.
Puzzlecombo = a 7-letter string from which valid puzzles can be made. Each puzzlecombo has 7 different puzzles that could be made from it. (The "must have a pangram" requirement assures us that all 7 choices for a center letter yield a valid puzzle.) In my notation, adelqru is one puzzlecombo. (Note the absence of an underscored letter.)
I Start Crunching
My first foray into using Mathematica to analyze Spelling Bee was to compute all possible puzzlecombos that exist. Your initial impression might be that there are a whole lot of them, as there are 480,700 ways to choose 7 letters from 25 (remember, no S). But as it turns out, there are only 4802 puzzlecombos. The first puzzlecombo on the list is abcdefk (which yields the pangram FEEDBACK), and the last one is mnoprty (PROMONTORY).
The fact that there are 4802 puzzlecombos leads immediately to the conclusion that there are 7 x 4802, or 33,614 possible puzzles. My next step was to calculate the number of possible words that can be made for each of those puzzles. (Yes, Mathematica solved all 33,614 of them. It involved just a single line of code and a couple hours of computation. Gotta love Wolfram Language.)
The results are displayed in the histogram below. As you would expect from data that looks like this, the mean (64.9) is significantly higher than the median (55). The quartiles are 34, 55, and 86. (Quartiles split the data into quarters, so 25% of the values are below 34, 25% are between 34 and 55, and so on.) It was a surprise to me that there are puzzles with 200+ (even 300+) solutions.
Two Bounds on Puzzles
I discovered an interesting fact about the puzzles that NYT chooses to publish in a blog by Christopher Wolfram (https://christopherwolfram.com/projects/spelling-bee/). He was able to obtain data on all puzzles published to date, and he discovered that (to date), no puzzles have been published with fewer than 21 solutions and none have had more than 81.
Why do they do this? Again, it probably goes back to the "sweet spot" idea I brought up earlier. Puzzles with too few solutions may not feel very satisfying, and puzzles with too many solutions start to make the task of solving too tedious. So let's define NYT lower bound as 21 words and NYT upper bound as 81 words and plot the results on another histogram:
Puzzlecombos vs. Puzzles
The output from my experiments so far was one line of data for each puzzlecombo. The data was in the form shown below, where the puzzlecombo is listed first, followed by statistics about the number of words possible for that puzzlecombo.
abcdeit, {all, 117}, {a, 74}, {b, 46}, {c, 52}, {d, 92}, {e, 99}, {i,
66}, {t, 72}, {abdicate, abdicated, diabetic}
In the example above, 117 words can be made in all using the 7 letters in the puzzlecombo abcdeit. The ensuing pairs represent the number of words that exist for each puzzle, given a center letter. For instance, the puzzle abcdeit has 74 possible words. The final set of data is, of course, the set of pangrams for that puzzlecombo.
It was surprising to me that only 25% of puzzlecombos yield all 7 puzzles within the NYT lower and upper bounds. It was also surprising to me that the probabilities for 0-5 were almost the same (hovering around 0.1).
Beyond Both Bounds?
Are there any puzzlecombos with puzzles that are eliminated by both the NYT lower bound and the NYT upper bound? As it turns out, there are quite a few of them -- 123 in all. Below is an example of one. Almost all of the 123 have one "rare letter" such as x, z, j, or q.
efilrvx, {all, 93}, {e, 88}, {f, 47}, {i, 64}, {l, 58}, {r, 73}, {v, 41}, {x, 6}, {reflexive}
In looking through these 123 puzzlecombos, I spotted one in which all of the puzzles were out of bounds. Wow! These are rare birds . . . only 9 puzzlecombos (0.18%) fall into this category. As you might expect, the letters aside from the "rare letter" are all very commonly used letters. You'll never see any of these puzzles in the NYT. They are exiled to my island of misfit puzzles.
- acdertx, {all, 184}, {a, 149}, {c, 89}, {d, 119}, {e, 169}, {r, 151}, {t, 117}, {x, 20}, {execrated, extracted}
- adegrtz, {all, 182}, {a, 151}, {d, 118}, {e, 166}, {g, 87}, {r, 147}, {t, 90}, {z, 18}, {gazetteered}
- adeirtx, {all, 201}, {a, 131}, {d, 137}, {e, 181}, {i, 106}, {r, 168}, {t, 133}, {x, 15}, {extradite, extradited}
- adeprtz, {all, 205}, {a, 172}, {d, 123}, {e, 185}, {p, 110}, {r, 167}, {t, 113}, {z, 12}, {trapezed}
- deginrz, {all, 177}, {d, 127}, {e, 149}, {g, 119}, {i, 138}, {n, 124}, {r, 123}, {z, 14}, {energized}
- deinrtz, {all, 172}, {d, 119}, {e, 164}, {i, 120}, {n, 102}, {r, 113}, {t, 114}, {z, 10}, {tenderize, tenderized, tenderizer}
- deloprx, {all, 158}, {d, 108}, {e, 133}, {l, 85}, {o, 119}, {p, 96}, {r, 93}, {x, 15}, {exploder, explored}
- deoprtx, {all, 166}, {d, 98}, {e, 147}, {o, 127}, {p, 86}, {r, 125}, {t, 93}, {x, 19}, {exported, reexported}
- eginrtx, {all, 142}, {e, 116}, {g, 87}, {i, 110}, {n, 109}, {r, 97}, {t, 100}, {x, 9}, {exerting}
What About the Unicorn?
At the beginning of this blog, I mentioned JUKEBOX as a pangram with special properties. JUKEBOX is one of only 5 words in the English language associated with a puzzle (in this case, bejkoux) that has only one word as its solution (i.e., the pangram word itself). The other unicorns are abhimnp, dfhnoux, ahnprxy, and bimnruv. (I'll leave it to you to come up with the one-word solutions for these puzzles.) Personally, I find abhimnp a surprising unicorn, as the center letter is H, a fairly common letter.
What distinguishes JUKEBOX from the other 4 unicorns? It has the honor of having the lowest sum when you add all the words from the 7 puzzles associated with it -- a measly 28 (6 + 4 + 3 + 4 + 7 + 3 + 1) words in all!
bejkoux, {all, 8},{b, 6}, {e, 4}, {j, 3}, {k, 4}, {o, 7}, {u, 3}, {x, 1}, {jukebox}
What puzzlecombo lies on the other extreme of JUKEBOX? That would be adeginr, which yields a whopping 2004 words across all 7 puzzles (details below). And look at all those pangrams! You'll never see any of these puzzles in the NYT. (An online riot would ensue among Spelling Bee players!)
adeginr, {all, 419},{a, 256}, {d, 287}, {e, 330}, {g, 280}, {i, 242}, {n, 286}, {r, 323}, {arraigned, dandering, dangering, degrading, deranging, drainage, dreading, endangering, endearing, gardenia, gardening, grained, grenadier, grenadine, ingrained, niggarded, reading, regained, regarding, regrading, renegading, rereading}
I hope you had fun reading about my explorations. I welcome your feedback, comments, and suggestions for further exploration.
Comments
Post a Comment