Word embeddings are a family of algorithms that can be remarkably effective at representing the meanings of words, and their relationships to each other. We’ll cover the basics of word embeddings: what they do, how to train a model using word2vec, and how to use them to search for synonyms and analogies. And we’ll look at issues more specific to the humanities and social sciences, including how to compare models trained on different sets of texts to each other, when to use word2vec vs topic models, and strategies for visualizing models. Finally, we’ll talk about the social biases embodied in the space of language models, both as a technical problem with solutions and as an opportunity for algorithmic criticism.
Hands-on analysis and visualization will be done editing pre-written scripts in the R statistical environment; no prior programming experience is necessary. We’ll distribute several pre-trained models at the workshop, but you can try to train one on your own texts ahead of time as well.
LEVEL: Beginner
NOTES: Laptop with R and Rstudio programs installed required. Instructions available.