Skip to Main Content

* Linguistics

A research guide for Linguistics, including library and other resources, where to find corpora and datasets, and more.


If you are interested in having the library purchase a dataset or getting access to a corpus that is not listed here, please email librarian Andrea Kingston with the relevant information.

General Corpora

Medical/Scientific Corpora

Phonetic & Phonological Data

Social Media

Click on Tools to access:

  • ​Hydrator - "Rehydrate" your Tweet ID sets into full tweets with metadata.
  • Tweet Catalog - A catalog of publicly shared Tweet ID sets. Add yours here!
  • Twarc - Archive Twitter JSON using this command line tool.
  • Diff Engine - Track changes in news articles through their RSS feeds.