Word embeddings are the hottest new technology in natural language processing, and are used across linguistic computer science, from machine translation to information extraction and computational literary analysis. We will cover advanced topics in word embeddings, including: document similarity analysis, nearest-neighbor analysis, training vector spaces, and visualization. We will use literary texts as examples, but the methods are applicable across disciplines, and participants are encouraged to bring their own corpora to analyze. Python will be our workshop language, and we will use the libraries SpaCy, Word2Vec, and Sense2Vec.
Requirements: Please bring a laptop on which you’ve installed the Python libraries SpaCy, scikit-learn, pandas, matplotlib, word2vec, and sense2vec, as well as the `en_core_web_lg` language model. Check that you can load it successfully with `spacy.load(‘en_core_web_lg’)`. Refer to the SpaCy documentation for instructions on installing the language model. Working knowledge of Python is also necessary.