TalkBank Downloading and Browsing

The TalkBank database contains transcript and media data collected from conversations with adults and older children. Conversations with children are available from CHILDES. All of the data is transcribed in CHAT and CA/CHAT formats. The use of all CHILDES and TalkBank data is governed by the Creative Commons License. Please remember to read and follow the Ground Rules for data-sharing.

Accessing TalkBank Data

There are two ways to access TalkBank data
  1. You can use the link labelled "Browsable Database" to play back media directly linked to transcripts in your browser.
  2. Or you can click on the link labelled "**Index to Corpora**" to access pages for each corpus which then have links for downloading the Transcripts and Media for work on your local machine.

Working with transcripts and media locally

Downloading Media

If you find it tedious to download media files one by one, you can use wget. For example, to retrieve all the *.mp3 and *.wav audio in the CallFriend Taiwan Mandarin folder, you can run this one-line wget command:

$ wget -e robots=off -R "index.html*" -N -nH -l inf -r --no-parent

Then the files download into a folder called "childes/Clinical-MOR/TBI" in the calling directory. The files within that folder will also maintain the original hierarchical structure. The program will not inform you about progress of the transfer, but you can monitor it by watching files pour into the folder on your computer.

If the folder you want to access requires a password, the command will be a bit longer. For example, to download from, you would use this form (where xxx would be replaced with the required username):

$ wget --user=xxx --ask-password -e robots=off -R "index.html*" -N -nH -l inf -r --no-parent

The meanings of the switches in these commands are:

Installing wget

Installation of wget depends on your system: