Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

* Linguistics

A research guide for Linguistics, including library and other resources, where to find corpora and datasets, and more.

Corpora

If you are interested in having the library purchase a dataset or getting access to a corpus that is not listed here, please email librarian Andrea Kingston with the relevant information.

General Corpora

Medical/Scientific Corpora

Phonetic & Phonological Data

Social Media

Click on Tools to access:

  • ​Hydrator - "Rehydrate" your Tweet ID sets into full tweets with metadata.
  • Tweet Catalog - A catalog of publicly shared Tweet ID sets. Add yours here!
  • Twarc - Archive Twitter JSON using this command line tool.
  • Diff Engine - Track changes in news articles through their RSS feeds.