LibGuides: * Linguistics: Linguistic Analysis Tools

TDM Studio

ProQuest TDM Studio
A text and data mining tool that provides access to rights-cleared publications to enable timely interrogation using text analysis and data visualization.
Click "Create account" button. Use your institution/university email address to create your account Create a password. To activate you account, click on the workbench and data visualization icon once you are logged in. Access ends 11/30/2025.
.

ProQuest TDM Studio offers users an online workbench where they can utilize visualization tools to perform text and data analysis of ProQuest sources (e.g., newspapers, journals). Options include geographic analysis, topic modeling, and sentiment analysis. More advanced users with coding expertise can write their own code and utilize Jupyter Notebooks.

TDM Studio User Guide

Other Tools

General

DHbox - An environment for digital humanities computational work that can be deployed quickly and easily from the cloud. Ready-to-go configurations of Omeka, NLTK, IPython, R Studio, and Mallet are included. (free)

Digital Resarch Tools (DiRT) Directory - Aggregates information about digital research tools for scholarly use. DiRT makes it easy to find and compare resources available for text mining and data visualization, etc.

Seeing Speech - Provides ultrasound tongue imaging (UTI) video of speech, magnetic resonance imaging (MRI) video of speech and 2D midsagittal head animations based on MRI and UTI data.

TokenX - A text visualization, analysis, and play tool. Created by University of Nebraska-Lincoln. (free)

Language Documentation

FLEx (Fieldwork Language Explorer) by SIL (Summer Institute of Linguistics) - helps in compiling dictionaries and links dictionary entries to text documents in order to facilitate annotation. Its predecessor "Toolbox" is discontinued, but still widely used in the language documentation community.

FileMaker - A relational database that is very useful in collecting different types of information (phonological, inflectional, semantic) for lexemes. (It is a proprietary program; it is available on our department student computers and students are not expected to buy it.)

CMDI Maker - a tool for compiling metadata for recordings H

HandBrake, Avidemux, and Audacity for video and/or audio conversion and editing.

Phonetic Analysis & Annotation

Elan - Analyze and annotate audio & video files. Allows for multiple tiers of transcription. (open source, free) YouTube tutorials available

Praat - Scientific analysis of speech in phonetics; record and visualize speech/ view spectograms. (open source, free) YouTube tutorials available

Modeling

MALLET (MAchine Learning for LanguagE Toolkit) - A collection of tools to document classification, sequence tagging, and topic modeling. There is also an add-on toolkit (Graphical Models in MALLET) for visualization. (open-source, free)

Text Mining/Analysis

AntConc- A freeware corpus analysis toolkit for concordancing and text analysis (works with Mac OS & Windows)

CasualConc - A text concordancing tool for Mac OS that allows you to analyze your own collection of text files.

Crossref Text and Data Mining for Researchers - Designed to allow researchers to easily harvest full text documents from all participating publishers regardless of their business model (e.g. open access, subscription). Provides step-by-step instructions

GloVe: Global Vectors for Word Representation - An unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

Juxta - An open-source tool for comparing and collating multiple witnesses to a single textual work; add or remove witnesses to a comparison set, switch the base text at will. Once you’ve collated a comparison, Juxta also offers several kinds of analytic visualizations.

TDM Studio - see info box above.

WordHoard - An application for the close reading and scholarly analysis of deeply tagged texts. WordHoard contains the entire canon of Early Greek epic in the original and in translation, as well as all of Chaucer, Shakespeare, and Spenser.

WordSeer - A Text Analysis Environment for Humanities Scholars. A collection of text analysis tools targeted at humanities scholars that includes side-by-side comparison, grammatical search, and document/sentence/word-set features.

Text Visualization

Bookworm - Created by Harvard. A tool for visualizing trends in repositories of digitized texts. Uses metadata and books collected by the Open Library. It at once describes the contents of the library as a whole in a useful and intuitive way.

Voyant - An easy to use and free text analysis tool. Upload text and Voyant will automatically determine word frequencies and colocates and display them graphically.

Working With Webpages

import.io - Instantly turn web pages into data

Tapor - This collection of text analysis tools hosted by the University of Alberta providing XML, HTML, and plain text analysis. Upload documents to extract common words, determine colocates, separate HTML tags, and extract XML tagged information.

<< Previous: Corpora & Datasets

Next: Educational Resources >>