Note to all attendees: Session leaders will contact you with additional information, including a meeting link, for each individual workshop, event, or demonstration. 

Using Optical Character Recognition (OCR) to Build a DH Corpus

Bobst Library, NYU, Room 617 70 Washington Square South, New York, NY, United States

Students will learn how to use common OCR software, including Tesseract and ABBYY Finereader, to build the text corpora they need to for common DH methods such as text mining, topic modeling, bibliographic visualizations, and text-as-data analyses. Skill Level Beginner Prerequisites None Equipment Requirements None

RSVP Now Free -1 spots left

Build Your Own Text-as-Data Corpus: A Print-to-Bytes Primer

Virtual NY, United States

This hands-on workshop will teach participants how to construct their own digital text corpus for conducting humanities data analysis. We'll cover simple tools for turning printed texts in a variety of languages into computer-readable files, the use of Optical Character Recognition (OCR) software, and consider helpful tools for post-process correction of digitized texts. We’ll also [...]

Go to Top