CABank English CANDOR Corpus


Andrew Reece
--
BetterUp

Gus Cooney
--
Dartmouth College

Participants: 1450
Type of Study: Zoom conversations
Location: USA
Media type: video
DOI: xxx

Browsable transcripts

Download transcripts

Media folder

Citation information

Reece, A., Cooney, G., Bull, P., Chung, C., Dawson, B., Fitzpatrick, C., Glazer, T., Knox, D., Liebscher, A., & Marin, S. (2023). The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation. Science advances, 9(13), eadf3197. pdf here

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The article cited above describes the CANDOR project, data, and analyses. At TalkBank, we have broken up the full database into 16 segments based on the first number/letter of the video file name. We have also created new ASR versions for compatibility with TalkBank CHAT format, but these will need further checking. Each of the 16 folders also have the features and surveys from the CANDOR database, and the full corpus download also includes the 0speakers.csv file that lists the conversations in which each speaker participated.

Researchers wishing to access the TalkBank version of CANDOR should first apply for access from the BetterUp site . Once approved, they also need to also need to register at TalkBank by entering their email address and setting a password. After that they need to send an email to macw@cmu.edu requesting TalkBank CANDOR access.

Acknowedgements