Concept: Semantic Number [2019 or 2020]
- Flattening a vocabulary to a single number (word ID) such that the distance between ID's best approximates the semantic distance between the words.
Initialize:
- start with a random word, assign it to ID 0
- for the next ID, assign the word most related to the previous word but which has not been output yet
- repeat until all words have been mapped to an ID
Refine:
- Propose a random swap between a pair of word->ID mappings
- If the random swap results in a reduced divergence between distances in ID space and in semantic space, make the swap
- Proceed until convergence