DHbox - An environment for digital humanities computational work that can be deployed quickly and easily from the cloud. Ready-to-go configurations of Omeka, NLTK, IPython, R Studio, and Mallet are included. (free)
Digital Resarch Tools (DiRT) Directory - Aggregates information about digital research tools for scholarly use. DiRT makes it easy to find and compare resources available for text mining and data visualization, etc.
Seeing Speech - Provides ultrasound tongue imaging (UTI) video of speech, magnetic resonance imaging (MRI) video of speech and 2D midsagittal head animations based on MRI and UTI data.
TokenX- A text visualization, analysis, and play tool. Created by University of Nebraska-Lincoln. (free)
Language Documentation
FLEx (Fieldwork Language Explorer) by SIL (Summer Institute of Linguistics) - helps in compiling dictionaries and links dictionary entries to text documents in order to facilitate annotation. Its predecessor "Toolbox" is discontinued, but still widely used in the language documentation community.
FileMaker - A relational database that is very useful in collecting different types of information (phonological, inflectional, semantic) for lexemes. (It is a proprietary program; it is available on our department student computers and students are not expected to buy it.)
CMDI Maker - a tool for compiling metadata for recordings H
Elan- Analyze and annotate audio & video files. Allows for multiple tiers of transcription. (open source, free) YouTube tutorials available
Praat - Scientific analysis of speech in phonetics; record and visualize speech/ view spectograms. (open source, free) YouTube tutorials available
Modeling
MALLET (MAchine Learning for LanguagE Toolkit) - A collection of tools to document classification, sequence tagging, and topic modeling. There is also an add-on toolkit (Graphical Models in MALLET) for visualization. (open-source, free)
Text Mining/Analysis
AntConc- A freeware corpus analysis toolkit for concordancing and text analysis (works with Mac OS & Windows)
CasualConc - A text concordancing tool for Mac OS that allows you to analyze your own collection of text files.
Crossref Text and Data Mining for Researchers- Designed to allow researchers to easily harvest full text documents from all participating publishers regardless of their business model (e.g. open access, subscription). Provides step-by-step instructions
GloVe: Global Vectors for Word Representation - Anunsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.
Juxta- An open-source tool for comparing and collating multiple witnesses to a single textual work; add or remove witnesses to a comparison set, switch the base text at will. Once you’ve collated a comparison, Juxta also offers several kinds of analytic visualizations.
WordHoard- An application for the close reading and scholarly analysis of deeply tagged texts.
WordHoard contains the entire canon of Early Greek epic in the original and in translation, as well as all of Chaucer, Shakespeare, and Spenser.
WordSeer - A Text Analysis Environment for Humanities Scholars. A collection of text analysis tools targeted at humanities scholars that includes side-by-side comparison, grammatical search, and document/sentence/word-set features.
Text Visualization
Bookworm - Created by Harvard. A tool for visualizing trends in repositories of digitized texts. Uses metadata and books collected by the Open Library. It at once describes the contents of the library as a whole in a useful and intuitive way.
Voyant - An easy to use and free text analysis tool. Upload text and Voyant will automatically determine word frequencies and colocates and display them graphically.
Tapor- This collection of text analysis tools hosted by the University of Alberta providing XML, HTML, and plain text analysis. Upload documents to extract common words, determine colocates, separate HTML tags, and extract XML tagged information.