Skip to Main Content

* Linguistics

A research guide for Linguistics, including library and other resources, where to find corpora and datasets, and more.

TDM Studio

ProQuest TDM Studio offers users an online workbench where they can utilize visualization tools to perform text and data analysis of ProQuest sources (e.g., newspapers, journals). Options include geographic analysis, topic modeling, and sentiment analysis. More advanced users with coding expertise can write their own code and utilize Jupyter Notebooks.

Other Tools

General

  • DHbox - An environment for digital humanities computational work that can be deployed quickly and easily from the cloud. Ready-to-go configurations of OmekaNLTKIPythonR Studio, and Mallet are included. (free)
  • Digital Resarch Tools (DiRT) Directory - Aggregates information about digital research tools for scholarly use. DiRT makes it easy to find and compare resources available for text mining and data visualization, etc. 
  • Seeing Speech - Provides ultrasound tongue imaging (UTI) video of speech, magnetic resonance imaging (MRI) video of speech and 2D midsagittal head animations based on MRI and UTI data.
  • TokenX -  A text visualization, analysis, and play tool. Created by University of Nebraska-Lincoln. (free)

Language Documentation

  • FLEx (Fieldwork Language Explorer) by SIL (Summer Institute of Linguistics) - helps in compiling dictionaries and links dictionary entries to text documents in order to facilitate annotation. Its predecessor "Toolbox" is discontinued, but still widely used in the language documentation community.
  • FileMaker - A relational database that is very useful in collecting different types of information (phonological, inflectional, semantic) for lexemes. (It is a proprietary program; it is available on our department student computers and students are not expected to buy it.)
  • CMDI Maker -  a tool for compiling metadata for recordings H
  • HandBrake, Avidemux, and Audacity for video and/or audio conversion and editing.

Phonetic Analysis & Annotation

Modeling

  • MALLET (MAchine Learning for LanguagE Toolkit) - A collection of tools to document classification, sequence tagging, and topic modeling. There is also an add-on toolkit (Graphical Models in MALLET) for visualization. (open-source, free)

Text Mining/Analysis

Text Visualization
  • Bookworm - Created by Harvard. A tool for visualizing trends in repositories of digitized texts. Uses metadata and books collected by the Open Library. It at once describes the contents of the library as a whole in a useful and intuitive way.
  • Voyant - An easy to use and free text analysis tool. Upload text and Voyant will automatically determine word frequencies and colocates and display them graphically.​
Working With Webpages
  • import.io - Instantly turn web pages into data
  • Tapor - This collection of text analysis tools hosted by the University of Alberta providing XML, HTML, and plain text analysis. Upload documents to extract common words, determine colocates, separate HTML tags, and extract XML tagged information.