Linguistic Data Consortium
University of Pennsylvania
|Type of Study:||naturalistic|
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.
These corpora were contributed to TalkBank by the Linguistic Data Consortium. Thanks to Mark Liberman, Steven Bird, and Chris Cieri for sharing these audio data. Transcriptions for Spanish in CA-CHAT were produced by Tania Granadillo.
The CallFriend Spanish corpus of telephone speech was collected by the Linguistic Data Consortium primarily in support of the project on Language Identification (LID), sponsored by the U.S. Department of Defense.
This release of the CallFriend Spanish corpus consists of 60 unscripted telephone conversations between native speakers of Spanish for each dialect group. The recorded conversations last up to 30 minutes. All speakers were aware that they were being recorded. They were given no guidelines concerning what they should talk about. Once a caller was recruited to participate, he/she was given a free choice of whom to call. Most participants called family members or close friends. All calls originated in the United States.
Speakers were solicited by the LDC to participate in this telephone speech collection effort via the internet, publications (advertisements), and personal contacts. A total of 100 call originators were found per dialect, each of whom placed a telephone call via a toll-free robot operator maintained by the LDC. Access to the robot operator was possible via a unique Personal Identification Number (PIN) issued by the recruiting staff at the LDC when the caller enrolled in the project. The participants were made aware that their telephone call would be recorded, as were the call recipients. The call was allowed only if both parties agreed to being recorded. Each caller was allowed to talk up to 30 minutes. Upon successful completion of the call, the caller was paid $20 (in addition to making a free long-distance telephone call). Each caller was allowed to place only one telephone call. After a successful call was completed, a human audit of each telephone call was conducted to verify that the proper language was spoken, and to check the quality of the recording.
A second audit was conducted by a native speaker familiar with dialect variation in Spanish. Conversations were labeled as either "Caribbean" or "non-Caribbean" based on particular attributes in the speech of the participants. Callers in the "Caribbean" and "non-Caribbean" collections of CallFriend Spanish were identified primarily on the basis of consonant quality patterns, specifically, word-final "s".
Caribbean Speakers' Calls
Non-caribbeanSpeakers' Calls: table>