CABank Cantonese WCT Corpus


Chu-Ren Huang
CBS
The Hong Kong Polytechnic University

Sarah Cen
CBS
The Hong Kong Polytechnic University

Tracy Luo
CBS
The Hong Kong Polytechnic University

Participants: 8
Type of Study: naturalistic
Location: Denmark
Media type: audio
DOI: doi:10.21415/T5DK6X

Browsable transcripts

Download transcripts

Media folder

Citation information

Some citation here.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The We Can Talk collection is a multi-modal, multi-lingual speech corpus intended to support development of technologies in the field of speaker recognition. This collection contains data from 223 speakers. The goal was to achieve the following amounts and types of data for each of 200 speakers.

Speakers were recruited in Hong Kong which was also the location of the telephone collection platform. Speakers were required to be native/fluent in Cantonese and at least one other language, and were asked to produce both video and phone-call recordings in Cantonese and other languages.

Each recruited speaker or "claque" was instructed to make 11 calls to their own friends or family with the aim of obtaining at least 10 calls per claque for this collection. They were also instructed to hold phone conversations for at least 8 minutes.

Additionally, each speaker was instructed to submit 1 selfie image and also 4 videos in which their face is visible and they are talking for 3-10 minutes to ensure that the collection requirement of 3 videos per speaker was met. Participants were able to submit videos either directly by uploading a video file via LDC's custom We Can Talk video submission web page or by submitting a URL via the same webpage that led to their own video on a publicly accessible media hosting site. Since many participants experienced video upload problems attempting to upload videos with large file sizes, many videos are closer to 3 minutes in duration rather than 10.