TalkBank Downloading and Browsing

The TalkBank database contains transcript and media data collected from conversations with adults and older children. All of the data is transcribed in CHAT and CA/CHAT formats. The use of TalkBank data is governed by the Creative Commons License. Please remember to read and follow the Ground Rules for data-sharing.

Accessing TalkBank Data

There are two ways to access TalkBank data
  1. You can use the link labelled "Browsable Database" to play back media directly linked to transcripts in your browser.
  2. Or you can click on the link labelled "**Index to Corpora**" to access pages for each corpus which then have links for downloading the Transcripts and Media for work on your local machine.

Working with transcripts and media locally

Downloading Media using Chrome

We have packaged transcripts together into .zip files for easy downloading, but this doesn't work well for media. If you want to download all of the media for a given corpus, you can do this using an extension to the Chrome browser called Multi-File Downloader which is available from the Chrome Web Store. To install it in Chrome, open up the Extensions window and drag it onto the window. This will install a green downward-pointing arrow in your extensions list at the top of Chrome. When you navigate to a page from which you wish to do multiple downloads, you click on that icon and it explains how to proceed with the downloading. The items will go to your Chrome downloads folder. You can change the location of that folder inside your Chrome preferences.

Downloading Media using wget

If you find it tedious to download media files one by one, you can use wget. For example, to retrieve all the *.mp3 and *.wav audio in the CallFriend Taiwan Mandarin folder, you can run this one-line wget command:

$ wget -e robots=off -R "index.html*" -N -nH -l inf -r --no-parent https://media.talkbank.org/childes/Clinical-MOR/TBI/

Then the files download into a folder called "childes/Clinical-MOR/TBI" in the calling directory. The files within that folder will also maintain the original hierarchical structure. The program will not inform you about progress of the transfer, but you can monitor it by watching files pour into the folder on your computer.

If the folder you want to access requires a password, the command will be a bit longer. For example, to download from https://media.talkbank.org/dementia/English/Kempler/, you would use this form (where xxx would be replaced with the required username):

$ wget --user=xxx --ask-password -e robots=off -R "index.html*" -N -nH -l inf -r --no-parent https://media.talkbank.org/dementia/English/Kempler/

The meanings of the switches in these commands are:

Installing wget

Installation of wget depends on your system: