Note to all attendees: Session leaders will contact you with additional information, including a meeting link, for each individual workshop, event, or demonstration.

Text Analysis

Name: Text Analysis with a Zine Corpus
Start: 2022-02-10T15:00:00-05:00
End: 2022-02-10T17:00:00-05:00
Location: Online

February 2018
Building a Text Analysis Pipeline with Python

February 6, 2018 @ 2:00 pm - 4:00 pm EST

Pace University, 1 Pace Plaza, E101 1 Pace Plaza, New York

This workshop will show participants how to use the Python and the Natural Language Toolkit to load a plaintext document, split it into paragraphs/sentences/words, and retrieve dictionary headwords and part-of-speech information for the words in the document. We will then create charts and visualizations for the feature counts. LEVEL: Beginner/Intermediate NOTES: Bring personal laptop; required [...]

RSVP Now Free 25 spots left
Thinking Through Word Embeddings

February 7, 2018 @ 10:00 am - 12:00 pm EST

Babble Lab @ Pace University, Room 1105 163 William St., New York, NY, United States

Word embeddings are a family of algorithms that can be remarkably effective at representing the meanings of words, and their relationships to each other. We'll cover the basics of word embeddings: what they do, how to train a model using word2vec, and how to use them to search for synonyms and analogies. And we'll look [...]

RSVP Now Free 15 spots left
Analyzing Twitter Data for Beginners

February 7, 2018 @ 3:00 pm - 5:00 pm EST

Fordham Lincoln Center, Quinn Library Room 234 113 W 60th Street, New York, NY, United States

Interested in analyzing conversations on Twitter but don’t know where to start? This workshop will demonstrate how to use TAGS <https://tags.hawksey.info/get-tags/>, an open source tool developed by Martin Hawksey to collect and visualize Twitter data as it happens. Aimed at novice users, this session will experiment with small datasets generated from Twitter conversations under specific [...]

RSVP Now Free 20 spots left
Social Media Scraping for Qualitative Research

February 7, 2018 @ 4:00 pm - 6:00 pm EST

Bobst Library, NYU, Room 617 70 Washington Square South, New York, NY, United States

Interested in incorporating social media content into your qualitative research project? This workshop will introduce the basics of using small-scale web scraping of social media for qualitative analysis. Using NCapture, a web browser extension, and NVivo, a qualitative analysis software package, this session will focus on methods to incorporate the context from web pages, online [...]

RSVP Now Free 20 spots left
R for Text Analysis

February 8, 2018 @ 10:00 am - 12:00 pm EST

Studio@Butler 535 W. 114th St., New York, NY, United States

In this workshop, we will use R for text analysis, with a focus on the Tidy Text approach within the Tidytext framework. Your insights will be visualized and can also be turned into an interactive without any web coding skills, using Shiny R. The workshop is open to anyone with an interest in this topic. [...]

RSVP Now Free 25 spots left
Word Embeddings: Can Vectors Encode Meaning?

February 9, 2018 @ 2:00 pm - 4:00 pm EST

Columbia University, CEPSR, Room 620 530 West 120th Street, New York, NY, United States

Word embeddings, or vector representations of words, are commonly used in computer science to work with and analyze text. They are particularly useful as a powerful off-the-shelf tool when using open-source word embeddings previously generated by Google, Facebook, or other technology companies based on web crawls. We present the background and justifications for using vectors [...]

RSVP Now Free 15 spots left
February 2019
Text as Data in the Humanities

February 5, 2019 @ 12:00 pm - 2:00 pm EST

Bobst Library, NYU, Room 617 70 Washington Square South, New York, NY, United States

An introduction to text analysis for literature with a foundational overview of considerations for approaching computational text analysis in the humanities. This workshop will cover a) gathering text corpus, b) copyright considerations c) data cleaning, d) an introduction to the computational software tools e) reading the output and analysis that may include word frequencies, cluster [...]

Free
Advanced Topics in Word Embeddings

February 6, 2019 @ 10:00 am - 12:00 pm EST

Studio@Butler 535 W. 114th St., New York, NY, United States

Word embeddings are the hottest new technology in natural language processing, and are used across linguistic computer science, from machine translation to information extraction and computational literary analysis. We will cover advanced topics in word embeddings, including: document similarity analysis, nearest-neighbor analysis, training vector spaces, and visualization. We will use literary texts as examples, but [...]

Free
Intro to the Command Line

February 6, 2019 @ 10:30 am - 12:00 pm EST

Bobst Library, NYU, Room 619 70 Washington Square S, New York, NY, United States

Learn how to use the command line to perform basic tasks. We’ll begin by discussing why humanists would want to learn something so technical, then jump into learning how to create and edit files and directories. Knowledge of the command line can be applied in many contexts, including several of the other workshops offered this [...]

Free
What matters to your Congressperson?

February 6, 2019 @ 12:00 pm - 2:00 pm EST

Bobst Library, NYU, Room 619 70 Washington Square S, New York, NY, United States

What topics most preoccupy your member of Congress? Are those the sorts of things you prioritize? In this workshop users will learn how to navigate a database of Congress to constituent e-newsletters and how to perform text analyses in R to get a top level core of what members of Congress most focus on in [...]

Free
February 2020
NLP for non-data scientists – Event Extraction

February 4, 2020 @ 6:00 pm - 8:00 pm EST

Columbia (Butler Library room 208B) 535 West 114th St, New York, NY, United States

The amount of text data available is mind-boggling. We will explore programatic approaches to identify information about what happened and when it happened by gathering knowledge from text. Equipment: Python, Anaconda, Laptop Prerequisites: Working familiarly with Python

Free
Text as Data in the Humanities

February 5, 2020 @ 10:00 am - 12:00 pm EST

Bobst Library, NYU, Room 617 70 Washington Square South, New York, NY, United States

An introduction to computational text analysis for literature with basic introduction to software packages. This workshop is a primer for working with text as data in the humanities. This workshop will cover: gathering text corpora, data cleaning, an introduction to some computational software tools, reading the output and analysis of topic modeling and cluster analysis, [...]
Introduction to WebAnno

February 5, 2020 @ 6:00 pm - 8:00 pm EST

Studio@Butler 535 W. 114th St., New York, NY, United States

WebAnno is a web-based tool for linguistic annotation (marking up) of text, with layers for morphological, syntactic, and semantic annotation. We will work through tagging named entities and relationships in a text, exporting as a tab-delimited file, and using the annotated text as input into a (Python) machine-learning algorithm for named entity recognition. Equipment Requirements: [...]

Free
The Making and Knowing Project’s Digital Critical Edition and English Translation of a 16th-c. Manuscript of Artisanal Recipes

February 6, 2020 @ 2:00 pm - 4:00 pm EST

Columbia University, Fayerweather Hall, Room 513 1180 Amsterdam Avenue, New York, NY, United States

The Making and Knowing Project (Center for Science and Society, Columbia University) is excited to present Secrets of Craft and Nature in Renaissance France—a digital critical edition and English translation of a sixteenth-century French manuscript of artisanal recipes. The publication of this edition marks the culmination over five years of iterative, collaborative, and interdisciplinary work by [...]

Free
Starting to Text Mine the Digitized Library with HathiTrust Features.

February 7, 2020 @ 10:00 am - 12:00 pm EST

Pace University, Babble Lab, Rm. 202 41 Park Row, New York, NY, United States

Millions of books have been digitized in the past two decades. Thanks to a 2014 court ruling, about 15 million books are available for computational analysis in the HathiTrust including data about word counts on each individual page. In the next year or two, similar data will become available for JStor and Portico books. This [...]

Free
February 2021
FairCopy: A word processor for the digital humanities.

February 9, 2021 @ 8:00 am - 10:00 am EST

Virtual NY, United States

FairCopy is a simple and powerful tool for transcribing, editing, and studying manuscripts and historical texts. FairCopy gives humanists an editor to create TEI encoded texts without writing a single line of XML, so this rich format becomes accessible for everyone. Nick Laiacona will demonstrate the use of this new tool and its functionality. The [...]

Free
Build Your Own Text-as-Data Corpus: A Print-to-Bytes Primer

February 11, 2021 @ 6:00 pm - 8:00 pm EST

Virtual NY, United States

This hands-on workshop will teach participants how to construct their own digital text corpus for conducting humanities data analysis. We'll cover simple tools for turning printed texts in a variety of languages into computer-readable files, the use of Optical Character Recognition (OCR) software, and consider helpful tools for post-process correction of digitized texts. We’ll also [...]
Brooklyn College Covid-19 Archive@ A Journal of the Plague Year

February 12, 2021 @ 10:00 am - 12:00 pm EST

Virtual NY, United States

This digital archive has collected stories and experiences from the Brooklyn College community related to the Covid-19 pandemic. The archive resides within the larger, omnibus archive, A Journal of the Plague Year. This demonstration will review the principles that guided the project, the submission process and explore possible digital humanities projects based upon the archive [...]

Free
February 2022
Textual Corpus Creation with Corpus-DB

February 10, 2022 @ 1:00 pm - 3:00 pm EST

Online New York, NY, United States

In this workshop, participants will learn how to set up a text analysis project, by automatically assembling a large collection of text, using the Corpus-DB API. Corpus-DB allows digital humanities researchers to quickly assemble a textual corpus, according to publication date, literary genre, author, and more. We will generate corpora which may be of interest [...]

Free
Text Analysis with a Zine Corpus

February 10, 2022 @ 3:00 pm - 5:00 pm EST

Online New York, NY, United States

Working with transcribed zines from the Barnard Zine Library, we will engage participants in the ethics and steps of creating a corpus and how to explore them using Voyant-Tools and a pre-written Python script. Corpus metadata highlight zine creators holding one or more minoritized identities. All are welcome, and no coding experience is necessary. This [...]

Free

February 2018

February 2019

February 2020

February 2021

February 2022