Lesson Index
Our lessons are organized by typical phases of the research process, as well as general topics. Use the buttons to filter lessons by category. If you can’t find a skill, technology, or tool you’re looking for, please let us know!
- acquire (13)
- transform (36)
- analyze (35)
- present (27)
- sustain (2)
- APIs (9)
- Python (35)
- Data Management (10)
- Data Manipulation (31)
- Distant Reading (16)
- Set up (7)
- Linked Open Data (2)
- Mapping (15)
- Network Analysis (7)
- Web Scraping (5)
- Digital Publishing (14)
- R (10)
- Machine Learning (6)
- creative coding (2)
- data visualization (21)
- modeling (1)
- sort by publication date
- sort by difficulty
Filtering Results: (113) date
-
Luling Huang
Creating a Dashboard for Interactive Data Visualization with Dash in Python
This lesson shows how to create interactive web-based dashboards using Python’s Dash library. Using two news media case studies, this lesson provides a practical guide for making digital humanities research outputs more accessible and engaging.
presenting data-visualization website 2025-03-28 3 -
Igor Sosa Mayor and Nabeel Siddiqui
Visualizing Urban and Demographic Data in R with ggplot2
This lesson demonstrates how to use R’s ggplot2 package to create sophisticated data visualizations through a ‘grammar of graphics’ framework. Using historical data about European sister-city relationships in the post-second world war period, including partnerships, population sizes, and geographic distances, the lesson guides readers through the process of creating various plots while exploring urban and demographic patterns.
presenting r data-visualization 2025-03-27 2 -
Jascha Schmitz, Malte Vogl, Aleksandra Kaye, and Raphael Schlattmann
Simulating Historical Communication Networks in Python
This lesson will introduce the core concepts, methodologies and discussions surrounding simulation methods for historical inquiry. You will learn the basics of programming a simulation model by building an Agent-Based Model of historical letter exchanges using the Python library mesa.
analyzing modeling network-analysis 2025-01-22 3 -
Ian Goodale
Analyzing Multilingual French and Russian Text using NLTK, spaCy, and Stanza
This lesson covers tokenization, part-of-speech tagging, and lemmatization, as well as automatic language detection, for non-English and multilingual text. You’ll learn how to use the Python packages NLTK, spaCy, and Stanza to analyze a multilingual Russian and French text.
analyzing python data-manipulation distant-reading 2024-11-13 2 -
Alex Wermer-Colan, Nicole 'Nikki' Lemire-Garlic, and Jeff Antsen
Text Mining YouTube Comment Data with Wordfish in R
In this lesson, you will learn how to download YouTube video comments and use the R programming language to analyze the dataset with Wordfish, an algorithm designed to identify opposing ideological perspectives within a corpus.
analyzing r 2024-08-07 3 -
Charles Goldberg and Zach Haala
Facial Recognition in Historical Photographs with Artificial Intelligence in Python
In this lesson, you’ll learn computer vision and machine learning principles for object recognition, and how to apply these principles using Python to recognize and classify smiling faces in historical photographs.
analyzing python machine-learning 2024-06-25 1 -
Susan Grunewald and Ruth Mostern
Working with Named Places: How and Why to Build a Gazetteer
A digital gazetteer records information associated with specific places. This lesson teaches you how to create a gazetteer from a historical text, using the Linked Places Delimited (LP-TSV) format.
acquiring data-management lod mapping 2024-03-27 1 -
Mita Williams
Designing a Deck of Timeline Cards for Tabletops and Tabletop Simulator
This lesson demonstrates how to use nanDECK to design and publish your own deck of printed or digital playing cards, and use them to test a group’s knowledge of historical events through a Timeline-like game mechanic. This lesson will also highlight best practices for handling digitized historical objects.
transforming website creative-coding 2024-03-18 1 -
Avery Blankenship, Sarah Connell, and Quinn Dombrowski
Understanding and Creating Word Embeddings
Word embeddings allow you to analyze the usage of different terms in a corpus of texts by capturing information about their contextual usage. Through a primarily theoretical lens, this lesson will teach you how to prepare a corpus and train a word embedding model. You will explore how word vectors work, how to interpret them, and how to answer humanities research questions using them.
analyzing python distant-reading machine-learning 2024-01-31 2 -
Grace Di Méo
Creating Interactive Visualizations with Plotly
This lesson demonstrates how to create interactive data visualizations in Python with Plotly’s open-source graphing libraries using materials from the Historical Violence Database.
presenting python data-visualization 2023-12-13 2 -
Jeff Blackadar
Transcribing Handwritten Text with Python and Microsoft Azure Computer Vision
Tools for machine transcription of handwriting are practical and labour-saving if you need to analyse or present text in digital form. This lesson will explain how to write a Python program to transcribe handwritten documents using Microsoft’s Azure Cognitive Services, a commercially available service that has a cost-free option for low volumes of use.
transforming python api data-manipulation 2023-12-06 2 -
Megan S. Kane
Corpus Analysis with spaCy
This lesson demonstrates how to use the Python library spaCy for analysis of large collections of texts. This lesson details the process of using spaCy to enrich a corpus via lemmatization, part-of-speech tagging, dependency parsing, and named entity recognition. Readers will learn how the linguistic annotations produced by spaCy can be analyzed to help researchers explore meaningful trends in language patterns across a set of texts.
analyzing data-manipulation distant-reading python 2023-11-02 2 -
Jonathan Reades and Jennie Williams
Clustering and Visualising Documents using Word Embeddings
This lesson uses word embeddings and clustering algorithms in Python to identify groups of similar documents in a corpus of approximately 9,000 academic abstracts. It will teach you the basics of dimensionality reduction for extracting structure from a large corpus and how to evaluate your results.
analyzing machine-learning network-analysis python data-visualization 2023-08-09 3 -
Jennifer Isasi
Sentiment Analysis with 'syuzhet' using R
This lesson teaches you how to obtain and analyse narrative texts for patterns of sentiment and emotion.
analyzing distant-reading r data-visualization 2023-04-01 2 -
Isabelle Gribomont
OCR with Google Vision API and Tesseract
Google Vision and Tesseract are both popular and powerful OCR tools, but they each have their weaknesses. In this lesson, you will learn how to combine the two to make the most of their individual strengths and achieve even more accurate OCR results.
transforming api python data-manipulation 2023-03-31 2 -
Nabeel Siddiqui
Creating Deep Convolutional Neural Networks for Image Classification
This lesson provides a beginner-friendly introduction to convolutional neural networks (CNNs) for image classification. The tutorial provides a conceptual understanding of how neural networks work by using Google’s Teachable Machine to train a model on paintings from the ArtUK database. This lesson also demonstrates how to use Javascript to embed the model in a live website.
analyzing machine-learning 2023-03-23 2 -
Christopher Goodwin
Creating GUIs in Python for Digital Humanities Projects
In this lesson, you will use Qt Designer and Python to design and implement a simple graphical user interface and application to merge PDF files. This lesson also demonstrates how to package the application for distribution to other personal computers.
presenting python data-management 2023-03-22 2 -
Anthony Picón Rodríguez and Miguel Cuadros
Introduction to Map Warper
This lesson introduces basic use of Map Warper for historical maps. It guides you from upload to export, demonstrating methods for georeferencing and producing visualizations.
transforming mapping data-visualization 2022-10-24 2 -
Yann Ryan
Making an Interactive Web Application with R and Shiny
This lesson demonstrates how to build an interactive webmap using R and the Shiny library. In the lesson, you will design and implement a simple application, consisting of a slider which allows a user to select a date range, and display a set of corresponding points, on an interactive map.
presenting mapping website r data-visualization 2022-10-19 2 -
Max Odsbjerg Pedersen, Josephine Møller Jensen, Victor Harbo Johnston, Alexander Ulrich Thygesen, and Helle Strandgaard Jensen
Scalable Reading of Structured Data
In this lesson, you will be introduced to ‘scalable reading’ and how to apply this workflow to your analysis of structured data.
analyzing api 2022-10-04 2 -
Chantal Brousseau
Interrogating a National Narrative with GPT-2
In this lesson, you will learn how to apply a Generative Pre-trained Transformer language model to a large-scale corpus so that you can locate broad themes and trends within written text.
analyzing python data-manipulation 2022-10-03 2 -
Daniel van Strien, Kaspar Beelen, Melvin Wevers, Thomas Smits, and Katherine McDonough
Computer Vision for the Humanities: An Introduction to Deep Learning for Image Classification (Part 1)
This is the first of a two-part lesson introducing deep learning based computer vision methods for humanities research. Using a dataset of historical newspaper advertisements and the fastai Python library, the lesson walks through the pipeline of training a computer vision model to perform image classification.
analyzing python machine-learning 2022-08-17 3 -
Daniel van Strien, Kaspar Beelen, Melvin Wevers, Thomas Smits, and Katherine McDonough
Computer Vision for the Humanities: An Introduction to Deep Learning for Image Classification (Part 2)
This is the second of a two-part lesson introducing deep learning based computer vision methods for humanities research. This lesson digs deeper into the details of training a deep learning based computer vision model. It covers some challenges one may face due to the training data used and the importance of choosing an appropriate metric for your model. It presents some methods for evaluating the performance of a model.
analyzing python machine-learning 2022-08-17 3 -
Matthew J. Lavin
Regression Analysis with Scikit-Learn (part 1 - Linear)
This lesson is the first of a two-part lesson focusing on an indispensable set of data analysis methods, logistic and linear regression. It provides an overview of linear regression and walks through running both algorithms in Python (using scikit-learn). The lesson also discusses interpreting the results of a regression model and some common pitfalls to avoid.
analyzing python 2022-07-13 3 -
Matthew J. Lavin
Regression Analysis with Scikit-learn (part 2 - Logistic)
This lesson is the second in a two-part lesson focusing on regression analysis. It provides an overview of logistic regression, how to use Python (scikit-learn) to make a logistic regression model, and a discussion of interpreting the results of such analysis.
analyzing python 2022-07-13 3 -
Erica Y. Hayes and Mia Partlow
Displaying a Georeferenced Map in KnightLab’s StoryMap JS
In this lesson, you will learn how to display a georeferenced map from Map Warper in KnightLab’s StoryMap JS, an interactive web-based map and storytelling platform.
presenting mapping 2022-05-16 2 -
Susan Grunewald and Andrew Janco
Finding Places in Text with the World Historical Gazetteer
Researchers often need to be able to search a corpus of texts for a defined list of terms and historians are often interested in certain places named in a text or texts. This lesson details how to programmatically search documents for a list of terms, including place names and then how to obtain coordinates and map historical place names with the World Historical Gazetteer.
presenting data-manipulation 2022-02-11 2 -
Gabi Kirilloff
Interactive Fiction in the Humanities Classroom: How to Create Interactive Text Games Using Twine
This lesson provides strategies for incorporating game creation into the classroom. The first half of the lesson discusses the challenges and benefits of teaching game creation while the second half includes a technical tutorial for Twine, an open source game creation tool.
analyzing website creative-coding 2021-12-04 1 -
Thomas Jurczyk
Clustering with Scikit-Learn in Python
This tutorial demonstrates how to apply clustering algorithms with Python to a dataset with two concrete use cases. The first example uses clustering to identify meaningful groups of Greco-Roman authors based on their publications and their reception. The second use case applies clustering algorithms to textual data in order to discover thematic groups. After finishing this tutorial, you will be able to use clustering in Python with Scikit-learn applied to your own data, adding an invaluable method to your toolbox for exploratory data analysis.
analyzing python data-manipulation 2021-09-29 3 -
Halle Burns
Crowdsourced-Data Normalization with Python and Pandas
Pandas is a popular and powerful package used in Python communities for data handling and analysis. This lesson describes crowdsourcing as a form of data creation as well as how pandas can be used to prepare a crowdsourced dataset for analysis. This lesson covers managing duplicate and missing data and explains the difficulties of dealing with dates.
transforming data-manipulation 2021-05-17 2 -
Matteo Romanello and Simon Hengchen
Detecting Text Reuse with Passim
In this lesson you will learn about text reuse detection – the automatic identification of reused passages in texts – and why you might want to use it in your research. Through a detailed installation guide and two case studies, this lesson will teach you the ropes of Passim, an open source and scalable tool for text reuse detection.
transforming data-manipulation 2021-05-16 3 -
Amanda Visconti, Brandon Walsh, and Scholars' Lab Community
Running a Collaborative Research Website and Blog with Jekyll and GitHub
In this lesson you will be introduced to the challenges and opportunities that Jekyll, a popular, static site generator, offers for publishing collaborative, ongoing research online.
presenting website data-management 2020-11-23 2 -
John R. Ladd
Understanding and Using Common Similarity Measures for Text Analysis
This lesson introduces three common measures for determining how similar texts are to one another: city block distance, Euclidean distance, and cosine distance. You will learn the general principles behind similarity, the different advantages of these measures, and how to calculate each of them using the SciPy Python library.
analyzing distant-reading 2020-05-05 2 -
Moritz Mähr
Working with batches of PDF files
Learn how to perform OCR and text extraction with free command line tools like Tesseract and Poppler and how to get an overview of large numbers of PDF documents using topic modeling.
transforming data-manipulation data-management 2020-01-30 2 -
Quinn Dombrowski, Tassie Gniady, and David Kloster
Introduction to Jupyter Notebooks
Jupyter notebooks provide an environment where you can freely combine human-readable narrative with computer-readable code. This lesson describes how to install the Jupyter Notebook software, how to run and create Jupyter notebook files, and contexts where Jupyter notebooks can be particularly helpful.
presenting python website 2019-12-08 1 -
Brad Rittenhouse, Ximin Mi, and Courtney Allen
Beginner's Guide to Twitter Data
Learn how to acquire Twitter data and process them to make them usable for further analysis.
acquiring data-manipulation api 2019-10-16 1 -
Go Sugimoto
Introduction to Populating a Website with API Data
This lesson introduces a way to populate a website with data obtained from another website via an Application Programming Interface (API). Using some simple programming, it provides strategies for customizing the presentation of that data, providing flexible and generalizable skills.
acquiring api 2019-05-22 2 -
Matthew J. Lavin
Analyzing Documents with TF-IDF
This lesson focuses on a foundational natural language processing and information retrieval method called Term Frequency - Inverse Document Frequency (tf-idf). This lesson explores the foundations of tf-idf, and will also introduce you to some of the questions and concepts of computationally oriented text analysis.
analyzing distant-reading 2019-05-13 2 -
Adam Crymble
Introduction to Gravity Models of Migration & Trade
This lesson introduces gravity models as a means for determining the probable distribution of entities across space in historical datasets. It does so through a case study of historical migration patterns.
analyzing data-manipulation 2019-03-18 3 -
Stephen Krewson
Extracting Illustrated Pages from Digital Libraries with Python
Machine learning and API extensions by HathiTrust and Internet Archive are making it easier to extract page regions of visual interest from digitized volumes. This lesson shows how to efficiently extract those regions and, in doing so, prompt new, visual research questions.
acquiring api 2019-01-14 2 -
Dave Rodriguez
Introduction to Audiovisual Transcoding, Editing, and Color Analysis with FFmpeg
This lesson introduces the basic functions of FFmpeg, a free command-line tool used for manipulating and analyzing audiovisual materials.
analyzing data-manipulation data-visualization 2018-12-20 2 -
Alex Brey
Temporal Network Analysis with R
Learn how to use R to analyze networks that change over time.
analyzing network-analysis r data-visualization 2018-11-04 3 -
Eric Weinberg
Using Geospatial Data to Inform Historical Research in R
In this lesson, you will use R-language to analyze and map geospatial data.
analyzing mapping 2018-08-20 2 -
Jacob W. Greene
Creating Mobile Augmented Reality Experiences in Unity
This lesson serves as an introduction to creating mobile augmented reality applications. Augmented reality (AR) can be defined as the overlaying of digital content (images, video, text, sound, etc.) onto physical objects or locations, and it is typically experienced by looking through the camera lens of an electronic device such as a smartphone, tablet, or optical head-mounted display.
presenting website mapping 2018-08-10 2 -
Charlie Harper
Visualizing Data with Bokeh and Pandas
In this lesson you will learn how to visually explore and present data in Python by using the Bokeh and Pandas libraries.
analyzing python data-manipulation mapping data-visualization 2018-07-27 2 -
Jeff Blackadar
Introduction to MySQL with R
This lesson will help you store large amounts of historical data in a structured manner, search and filter that data, and visualize some of the data as a graph.
transforming data-manipulation distant-reading r data-visualization 2018-05-03 2 -
François Dominic Laramée
Introduction to stylometry with Python
In this lesson you will learn to conduct ‘stylometric analysis’ on texts and determine authorship of disputed texts. The lesson covers three methods: Mendenhall’s Characteristic Curves of Composition, Kilgariff’s Chi-Squared Method, and John Burrows’ Delta Method.
analyzing distant-reading 2018-04-21 2 -
Patrick Smyth
Creating Web APIs with Python and Flask
Learn how to set up a basic Application Programming Interface (API) to make your data more accessible to users. This lesson also discusses principles of API design and the benefits of APIs for digital projects.
presenting api data-management 2018-04-02 2 -
Jon MacKay
Dealing with Big Data and Network Analysis Using Neo4j
In this lesson we will learn how to use a graph database to store and analyze complex networked information. This tutorial will focus on the Neo4j graph database, and the Cypher query language that comes with it.
analyzing network-analysis data-visualization 2018-02-20 3 -
Zoë Wilkinson Saldaña
Sentiment Analysis for Exploratory Data Analysis
In this lesson you will learn to conduct ‘sentiment analysis’ on texts and to interpret the results. This is a form of exploratory data analysis based on natural language processing. You will learn to install all appropriate software and to build a reusable program that can be applied to your own texts.
analyzing distant-reading 2018-01-15 2 -
Beatrice Alex
Geoparsing English-Language Text with the Edinburgh Geoparser
This tutorial teaches users how to use the Edinburgh Geoparser to process a piece of English-language text, extract and resolve the locations contained within it, and plot them as a web map.
presenting mapping 2017-10-31 3 -
Ryan Deschamps
Correspondence Analysis for Historical Research with R
This tutorial explains how to carry out and interpret a correspondence analysis, which can be used to identify relationships within categorical data.
analyzing data-manipulation network-analysis r data-visualization 2017-09-13 3 -
Shawn Graham
An Introduction to Twitterbots with Tracery
An Introduction to Twitter Bots with Tracery This lesson explains how to create simple twitterbots using Tracery and the Cheap Bots Done Quick service. Tracery exists in multiple languages and can be integrated into websites, games, bots.
presenting api 2017-08-29 2 -
Kim Pham
Web Mapping with Python and Leaflet
This tutorial teaches users how to create a web map based on tabular data.
presenting mapping 2017-08-29 2 -
John R. Ladd, Jessica Otis, Christopher N. Warren, and Scott Weingart
Exploring and Analyzing Network Data with Python
This lesson introduces network metrics and how to draw conclusions from them when working with humanities data. You will learn how to use the NetworkX Python package to produce and work with these network statistics.
analyzing network-analysis data-visualization 2017-08-23 2 -
Evan Peter Williamson
Fetching and Parsing Data from the Web with OpenRefine
OpenRefine is a powerful tool for exploring, cleaning, and transforming data. In this lesson you will learn how to use Refine to fetch URLs and parse web content.
acquiring data-manipulation web-scraping api 2017-08-12 2 -
Nabeel Siddiqui
Data Wrangling and Management in R
This tutorial explores how scholars can organize ‘tidy’ data, understand R packages to manipulate data, and conduct basic data analysis.
transforming data-manipulation data-management distant-reading r data-visualization 2017-07-31 2 -
Jonathan Blaney
Introduction to the Principles of Linked Open Data
Introduces core concepts of Linked Open Data, including URIs, ontologies, RDF formats, and a gentle intro to the graph query language SPARQL.
acquiring lod 2017-05-07 1 -
Stephanie J. Richmond and Tommy Tavenner
Using JavaScript to Create Maps of Correspondence
Demonstrates how to use the JavaScript library “Leaflet” to produce an interactive map that can be hosted online or viewed locally, and demonstrates how to customize many of its features.
presenting mapping 2017-04-24 2 -
Taylor Arnold and Lauren Tilton
Basic Text Processing in R
Learn how to use R to analyze high-level patterns in texts, apply stylometric methods over time and across authors, and use summary methods to describe items in a corpus.
analyzing distant-reading r data-visualization 2017-03-27 2 -
Justin Colson
Geocoding Historical Data using QGIS
Learn how to use QGIS to convert lists of place names in to geographic coordinates, allowing you to map them.
transforming mapping 2017-01-27 2 -
Peter Organisciak and Boris Capitanu
Text Mining in Python through the HTRC Feature Reader
Explains how to use Python to summarize and visualize data on millions of texts from the HathiTrust Research Center’s Extracted Features dataset.
analyzing distant-reading data-visualization 2016-11-22 3 -
Taryn Dewar
R Basics with Tabular Data
This lesson teaches a way to quickly analyze large volumes of tabular data, making research faster and more effective.
transforming data-manipulation r 2016-09-05 1 -
Brandon Walsh
Editing Audio with Audacity
In this lesson you will learn how to use Audacity to load, record, edit, mix, and export audio files.
transforming data-manipulation 2016-08-05 1 -
Jonathan Reeve
Installing Omeka
This lesson will teach you how to install your own copy of Omeka.
presenting website 2016-07-24 2 -
Ted Dawson
Introduction to the Windows Command Line with PowerShell
This tutorial will introduce you to the basics of Windows PowerShell, the standard command-line interface for Windows computers.
transforming data-manipulation get-ready 2016-07-21 1 -
M. H. Beals
Transforming Data for Reuse and Re-publication with XML and XSL
This tutorial will provide you with the ability to convert or transform historical data from an XML database (whether a single file or several linked documents) into a variety of different presentations—condensed tables, exhaustive lists or paragraphed narratives—and file formats.
transforming data-manipulation data-visualization 2016-07-07 1 -
Shawn Graham
The Sound of Data (a gentle introduction to sonification for historians)
There are any number of guides that will help you visualize the past, but this lesson will help you hear the past.
transforming distant-reading 2016-06-07 2 -
Matthew Lincoln
Reshaping JSON with jq
Working with data from an art museum API and from the Twitter API, this lesson teaches how to use the command-line utility jq to filter and parse complex JSON files into flat CSV files.
transforming data-manipulation 2016-05-24 2 -
Amanda Visconti
Building a static website with Jekyll and GitHub Pages
This lesson will help you create entirely free, easy-to-maintain, preservation-friendly, secure website over which you have full control, such as a scholarly blog, project website, or online portfolio.
presenting website data-management 2016-04-18 1 -
Miriam Posner and Megan R. Brett
Creating an Omeka Exhibit
Now that you’ve added items to your Omeka site and grouped them into collections, you’re ready for the next step: taking your users on a guided tour through the items you’ve collected.
presenting website 2016-02-24 1 -
Miriam Posner
Up and Running with Omeka.net
Omeka.net makes it easy to create websites that show off collections of items.
presenting website 2016-02-17 1 -
Adam Crymble
Using Gazetteers to Extract Sets of Keywords from Free-Flowing Texts
This lesson will teach you how to use Python to extract a set of keywords very quickly and systematically from a set of texts.
acquiring data-manipulation 2015-12-01 2 -
Sarah Simpkin
Getting Started with Markdown
In this lesson, you will be introduced to Markdown, a plain text-based syntax for formatting documents. You will find out why it is used, how to format Markdown files, and how to preview Markdown-formatted documents on the web.
presenting data-management 2015-11-13 1 -
Heather Froehlich
Corpus Analysis with Antconc
Corpus analysis is a form of text analysis which allows you to make comparisons between textual objects at a large scale (so-called ‘distant reading’).
analyzing distant-reading 2015-06-19 1 -
Marten Düring
From Hermeneutics to Data to Networks: Data Extraction and Network Visualization of Historical Sources
Network visualizations can help humanities scholars reveal hidden and complex patterns and structures in textual sources. This tutorial explains how to extract network data (people, institutions, places, etc) from historical sources through the use of non-technical methods developed in Qualitative Data Analysis (QDA) and Social Network Analysis (SNA), and how to visualize this data with the platform-independent and particularly easy-to-use Palladio.
transforming network-analysis data-visualization 2015-02-18 2 -
Vilja Hulden
Supervised Classification: The Naive Bayesian Returns to the Old Bailey
This lesson shows how to use machine learning to extract interesting documents out of a digital archive.
analyzing distant-reading 2014-12-17 3 -
Jon Crump
Generating an Ordered Data Set from an OCR Text File
This tutorial illustrates strategies for taking raw OCR output from a scanned text, parsing it to isolate and correct essential elements of metadata, and generating an ordered data set (a python dictionary) from it.
transforming data-manipulation 2014-11-25 3 -
Ian Milligan and James Baker
Introduction to the Bash Command Line
This lesson will teach you how to enter commands using a command-line interface, rather than through a graphical interface. Command-line interfaces have advantages for computer users who need more precision in their work, such as digital historians. They allow for more detail when running some programs, as you can add modifiers to specify exactly how you want your program to run. Furthermore, they can be easily automated through scripts, which are essentially recipes of text-based commands.
transforming data-manipulation get-ready 2014-09-20 1 -
James Baker and Ian Milligan
Counting and mining research data with Unix
This lesson will look at how research data, when organised in a clear and predictable manner, can be counted and mined using the Unix shell.
transforming data-manipulation 2014-09-20 2 -
James Baker
Preserving Your Research Data
This lesson will suggest ways in which historians can document and structure their research data so as to ensure it remains useful in the future.
sustaining data-management 2014-04-30 1 -
Dennis Tenen and Grant Wythoff
Sustainable Authorship in Plain Text using Pandoc and Markdown
In this tutorial, you will first learn the basics of Markdown—an easy to read and write markup syntax for plain text—as well as Pandoc, a command line tool that converts plain text into a number of beautifully formatted file types: PDF, .docx, HTML, LaTeX, slide decks, and more.
sustaining website data-management 2014-03-19 2 -
Caleb McDaniel
Data Mining the Internet Archive Collection
The collections of the Internet Archive include many digitized historical sources. Many contain rich bibliographic data in a format called MARC. In this lesson, you’ll learn how to use Python to automate the downloading of large numbers of MARC files from the Internet Archive and the parsing of MARC records for specific information such as authors, places of publication, and dates. The lesson can be applied more generally to other Internet Archive files and to MARC records found elsewhere.
acquiring web-scraping 2014-03-03 2 -
Jim Clifford, Josh MacFadyen, and Daniel Macfarlane
Georeferencing in QGIS 2.0
In this lesson, you will learn how to georeference historical maps so that they may be added to a GIS as a raster layer.
transforming mapping data-visualization 2013-12-13 2 -
Jim Clifford, Josh MacFadyen, and Daniel Macfarlane
Intro to Google Maps and Google Earth
Google My Maps and Google Earth provide an easy way to start creating digital maps. With a Google Account you can create and edit personal maps by clicking on My Places.
presenting mapping 2013-12-13 1 -
Jim Clifford, Josh MacFadyen, and Daniel Macfarlane
Installing QGIS 2.0 and Adding Layers
In this lesson you will install QGIS software, download geospatial files like shapefiles and GeoTIFFs, and create a map out of a number of vector and raster layers.
presenting mapping 2013-12-13 1 -
Jim Clifford, Josh MacFadyen, and Daniel Macfarlane
Creating New Vector Layers in QGIS 2.0
In this lesson you will learn how to create vector layers based on scanned historical maps.
presenting mapping data-visualization 2013-12-13 2 -
Seth Bernstein
Transliterating non-ASCII characters with Python
This lesson shows how to use Python to transliterate automatically a list of words from a language with a non-Latin alphabet to a standardized format using the American Standard Code for Information Interchange (ASCII) characters.
transforming data-manipulation 2013-10-04 2 -
Kellen Kurschinski
Applied Archival Downloading with Wget
Now that you have learned how Wget can be used to mirror or download specific files from websites via the command line, it’s time to expand your web-scraping skills through a few more lessons that focus on other uses for Wget’s recursive retrieval function.
acquiring web-scraping 2013-09-13 2 -
Seth van Hooland, Ruben Verborgh, and Max De Wilde
Cleaning Data with OpenRefine
This tutorial focuses on how scholars can diagnose and act upon the accuracy of data.
transforming data-manipulation 2013-08-05 2 -
Doug Knox
Understanding Regular Expressions
In this lesson, we will use advanced find-and-replace capabilities in a word processing application in order to make use of structure in a brief historical document that is essentially a table in the form of prose.
transforming data-manipulation 2013-06-22 2 -
Laura Turner O'Hara
Cleaning OCR’d text with Regular Expressions
Optical Character Recognition (OCR)—the conversion of scanned images to machine-encoded text—has proven a godsend for historical research. This lesson will help you clean up OCR’d text to make it more usable.
transforming data-manipulation 2013-05-22 2 -
Fred Gibbs
Installing Python Modules with pip
There are many ways to install external python libraries; this tutorial explains one of the most common methods using pip.
acquiring get-ready python 2013-05-06 1 -
Adam Crymble
Downloading Multiple Records Using Query Strings
Downloading a single record from a website is easy, but downloading many records at a time – an increasingly frequent need for a historian – is much more efficient using a programming language such as Python. In this lesson, we will write a program that will download a series of records from the Old Bailey Online using custom search criteria, and save them to a directory on our computer.
acquiring web-scraping 2012-11-11 2 -
Shawn Graham, Scott Weingart, and Ian Milligan
Getting Started with Topic Modeling and MALLET
In this lesson you will first learn what topic modeling is and why you might want to employ it in your research. You will then learn how to install and work with the MALLET natural language processing toolkit to do so.
analyzing distant-reading 2012-09-02 2 -
William J. Turkel and Adam Crymble
Code Reuse and Modularity in Python
Computer programs can become long, unwieldy and confusing without special mechanisms for managing complexity. This lesson will show you how to reuse parts of your code by writing functions and break your programs into modules, in order to keep everything concise and easier to debug.
transforming python 2012-07-17 2 -
William J. Turkel and Adam Crymble
Counting Word Frequencies with Python
Counting the frequency of specific words in a list can provide illustrative data. This lesson will teach you Python’s easy way to count such frequencies.
analyzing python 2012-07-17 2 -
William J. Turkel and Adam Crymble
Creating and Viewing HTML Files with Python
Here you will learn how to create HTML files with Python scripts, and how to use Python to automatically open an HTML file in Firefox.
presenting python website 2012-07-17 2 -
William J. Turkel and Adam Crymble
From HTML to List of Words (part 1)
In this two-part lesson, we will build on what you’ve learned about Downloading Web Pages with Python, learning how to remove the HTML markup from the webpage of Benjamin Bowsey’s 1780 criminal trial transcript. We will achieve this by using a variety of string operators, string methods, and close reading skills. We introduce looping and branching so that programs can repeat tasks and test for certain conditions, making it possible to separate the content from the HTML tags. Finally, we convert content from a long string to a list of words that can later be sorted, indexed, and counted.
transforming python 2012-07-17 2 -
William J. Turkel and Adam Crymble
From HTML to List of Words (part 2)
In this lesson, you will learn the Python commands needed to implement the second part of the algorithm begun in the lesson ‘From HTML to a List of Words (part 1)’.
transforming python 2012-07-17 2 -
William J. Turkel and Adam Crymble
Python Introduction and Installation
This first lesson in our section on dealing with Online Sources is designed to get you and your computer set up to start programming. We will focus on installing the relevant software – all free and reputable – and finally we will help you to get your toes wet with some simple programming that provides immediate results.
transforming python get-ready 2012-07-17 1 -
William J. Turkel and Adam Crymble
Keywords in Context (Using n-grams) with Python
This lesson takes the frequency pairs collected in “Counting Frequencies” and outputs them in HTML.
presenting python 2012-07-17 2 -
William J. Turkel and Adam Crymble
Setting up an Integrated Development Environment for Python (Linux)
This lesson will help you set up an integrated development environment for Python on a computer running the Linux operating system.
transforming get-ready python 2012-07-17 1 -
William J. Turkel and Adam Crymble
Setting Up an Integrated Development Environment for Python (Mac)
This lesson will help you set up an integrated development environment for Python on a computer running a Mac operating system.
transforming get-ready python 2012-07-17 1 -
William J. Turkel and Adam Crymble
Manipulating Strings in Python
This lesson is a brief introduction to string manipulation techniques in Python.
transforming python 2012-07-17 2 -
William J. Turkel and Adam Crymble
Normalizing Textual Data with Python
In this lesson, we will make the list we created in the ‘From HTML to a List of Words’ lesson easier to analyze by normalizing this data.
transforming python 2012-07-17 2 -
William J. Turkel and Adam Crymble
Output Data as an HTML File with Python
This lesson takes the frequency pairs created in the ‘Counting Frequencies’ lesson and outputs them to an HTML file.
transforming python website 2012-07-17 2 -
William J. Turkel and Adam Crymble
Output Keywords in Context in an HTML File with Python
This lesson builds on ‘Keywords in Context (Using N-grams)’, where n-grams were extracted from a text. Here, you will learn how to output all of the n-grams of a given keyword in a document downloaded from the Internet, and display them clearly in your browser window.
presenting python 2012-07-17 2 -
William J. Turkel and Adam Crymble
Understanding Web Pages and HTML
This lesson introduces you to HTML and the web pages it structures.
presenting python 2012-07-17 2 -
William J. Turkel and Adam Crymble
Setting Up an Integrated Development Environment for Python (Windows)
This lesson will help you set up an integrated development environment for Python on a computer running the Windows operating system.
transforming get-ready python 2012-07-17 1 -
William J. Turkel and Adam Crymble
Working with Text Files in Python
In this lesson you will learn how to manipulate text files using Python.
transforming python 2012-07-17 2 -
William J. Turkel and Adam Crymble
Downloading Web Pages with Python
This lesson introduces Uniform Resource Locators (URLs) and explains how to use Python to download and save the contents of a web page to your local hard drive.
acquiring python 2012-07-17 2 -
Ian Milligan
Automated Downloading with Wget
Wget is a useful program, run through your computer’s command line, for retrieving online material.
acquiring web-scraping 2012-06-27 1