UWI French L2 Corpus

Hugues Peters
School of Humanities and Languages
University of NSW, Sydney


Participants: 9
Type of Study: longitudinal
Location: UWI, Mona, Jamaica
Media type: audio
DOI: doi:10.21415/T5G975

Citation information

Peters, Hugues. (2017) Comportements d'autocorrection et d'hésitation manifestés par les apprenants de FLE au cours de conversations orales spontanées. Bulletin Association Suisse de Linguistique Appliquée (VALS-ASLA), Numéro Spécial, Tome 2 : 133-145.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by the above references.

Project Description

The UWI French corpus project (Constitution of an oral corpus of Jamaican learners of French at UWI, Mona) was initiated in 2003 by the investigator, Dr. Hugues PETERS, then lecturer at the University of the West Indies (UWI), Mona, Jamaica.

The corpus available on Talkbank investigates the language acquisition of French as a foreign language of nine adult Jamaican learners studying French at the University of the West Indies (UWI), Mona. It is constituted of transcriptions of 25 one-on-one interviews conducted in French between the students and the investigator. These informal interviews (two or three per student) took place between November 2003 and October 2004 at UWI, Mona. They consisted of various tasks destined to elicit grammatical structures (see the description below). The orthographic transcriptions of these interviews contain 15,068 token-words (9,532 learners' and 5,536 investigator's token-words) excluding all retraced material (16,818 words with retraced material included).

Apart from these 25 informal interviews included in the Talkbank database, there are 42 other interviews that occurred during end-of-semester formal language exams (from April 2003 to November 2004) and with a tenth participant who did not complete the Talkbank consent form. These are not included in the Talkbank repository. An initial statement of the objectives of the project appeared in PETERS (2005) (Development of a longitudinal oral interlanguage corpus of Jamaican learners of French. Caribbean Journal of Education 27(2): 75-96). The overall aim of the project was to establish a longitudinal learner corpus of spoken French by adult Jamaican learners of French in an instructed setting at the University level. The specific objectives of this project are to:

All recordings have been transcribed and annotated by the investigator, Hugues PETERS, in CHAT format (MacWhinney 2000) between 2005 and 2017. The investigator had been trained to use the CHILDES protocols during his graduate studies at the Pennsylvania State University, at the occasion of a course on first language acquisition taught by Prof. Justine Cassell. The accuracy of the orthographic transcription was carefully checked multiple times by the investigator himself.

In 2009, a separate %mor tier including a word by word morphological decomposition of learners’ speech, pruned of fillers and retracing, was included. The task was performed using the French grammar contributed by Christophe Parisse, available from the CHILDES site at the time, and subsequently went through extensive disambiguation and verification by the investigator. The morphological decomposition on a separate %mor tier has been added only for learners' utterances, and not the investigator's utterances. A %mor tier for the utterances produced by the investigator is in preparation.

In 2010, the accuracy of transcriptions was independently verified by two research assistants at UNSW, Sydney, thanks to a research grant from the faculty of Arts and Social Sciences.

In 2016, each utterance from the orthographic transcriptions has been linked to the corresponding segment in the audio files, and disfluency encoding has been revised.

In 2017, the transcription files of the interviews that took place in October 2004 have been separated by task (discussion, story retelling, role playing, interrogative elicitation task) (see below).


After each interview, participants were asked to sign a participation consent form allowing the use of the data for research purposes and guaranteeing anonymity. After the recordings were completed in 2005, participants were asked to complete the Talkbank informed consent allowing for the dissemination of the corpus via Talkbank. Nine participants gave their authorisation. A tenth student who participated to the whole length of the project is not included in the Talkbank repository, as that participant was not available to sign the Talkbank consent form.

To preserve the anonymity of the participants, pseudonyms were used throughout in the transcription of the conversations: ‘Loc’ (for Locuteur 'speaker' in French) followed by two digits: Loc08, Loc12, Loc14, Loc16, Loc17, Loc18, Loc20, Loc33, and Loc38, and the names erased from the audio recordings. Additionally, all sensitive information that might be used to identify the participants (name, school attended, place of work, etc.) has been replaced by a generic label in the transcriptions (ex.: Name_of_School, Place_of_Work, etc.) and erased from the audio files. (Whole segments of) conversations may have been removed when topics were too sensitive or too personal (especially religious, political opinions, etc.).


The participants, eight women and one man, were all early bilingual native speakers of Jamaican English and Jamaican Creole. The latter is the vernacular language spoken at home and in informal situations in Jamaica, and constitutes a strong marker of Jamaican identity. The former is the language acquired at school (and/or at home) and used in more formal situation. All participants in the study were proficient users of standard Jamaican English, acquired throughout their studies leading up to the University level.

All these students learned French as a foreign language in an instructed context. There was little access to the French language outside of the classroom, apart from occasional events organised by the French embassy or the Alliance Française (such as the French film festival). All were University level students enrolled at UWI, Mona, taking French language courses as part of their studies (either as a Major or a Minor). At UWI, the French language programme consisted of 156 hours of language instruction per academic year (78 hours per semester) for the three-year duration of the B.A. The students would concurrently enrol in French culture and literature courses taught in English (with texts in French).

To better evaluate the longitudinal acquisition of French in an instructional setting, all participants retained for the project were students who contributed data during the whole length of the project, did not study French in immersion in a French speaking country and did not spend an extensive time in a French speaking country. Only one of these students, Loc17, spent a week holidaying in a French speaking country at the time of the interviews.

The sample of participants is slightly biased towards the top ability / highly motivated students, as they are the ones who continued successfully without interruption for the whole length of the three-year language program. The earliest data available in Talkbank has been collected at the end of the first semester of the second year of the language programme at UWI (coded as 21), that is, after 220 hours of instruction at the B.A. level and the latest data available in Talkbank has been collected at the end of the first semester of the third year of the language programme (coded as 31), that is, after 350 hours of instruction at the university level.

Furthermore, the participants had varied background in language learning before joining the French language programme at the BA level. To determine their previous language background, all were asked to complete a socio-biographical questionnaire. Three types of participants can be distinguished: • Three learners, Loc17, Loc18 and Loc38, completed the secondary certification of the Caribbean Examination Council (C-SEC): normally a total of 150-200 hours of instruction over four years at the secondary level; • Two learners, Loc14, Loc20, further completed the advanced certification of the Caribbean Advanced Proficiency Examination (CAPE), or the equivalent GCE A level of Cambridge: normally 300 additional hours of instruction over two years; • Four learners, Loc08, Loc12, Loc16 and Loc33, completed intensive beginner's programs in French either at UWI (250 hours in a year), or in private language institutes in Kingston (such as the Alliance Française de Kingston). Additionally, Loc08 and Loc12 completed the first year of the language program during an intensive 6-week summer programme at UWI (in July 2003).

Even though they had varied levels of proficiency and different educational background in French as a foreign language, the students had been placed at the same level in the French language programme at the B.A. level because of University regulations. The nine learners can also be distinguished by their exposure to Spanish: For Loc08, Loc12, Loc16, Loc33, and Loc38, Spanish was the first foreign language learned at school, and for L20, a language learned alongside French. Loc17 and Loc18 had received no exposure to Spanish and Loc14 only one semester exposure at school. Finally, Loc38 also followed an introductory level course in Japanese at UWI for one semester.

Situational Description

The 25 interviews took place from the first semester of the second year (coded as level 21) in November 2003 to the first semester of the third year (coded as level 31) in October 2004 of the French language program.

All interviews took place in the office of the investigator (room 47B, New Arts Building), on the UWI, Mona Campus. The investigator and the participant were sitting in front of each other with the recorder in between. There was often loud noise in the background (conversation, traffic, thunder, etc.).

Interviews were recorded using a minidisk recorder (Sharp Portable Mini disk recorder MD-MT290H) with a stereo condenser microphone, and subsequently transferred to electronic formats (WMA, WAV, MP3).

The activities completed during the interviews were designed to elicit various morpho-syntactic features. The investigator tried to maintain an informal relaxed atmosphere that contrasted with the more formal interviews that happened during oral exams. Included in the Talkbank repository are:

For the Talkbank repository, the transcription files of the activities that took place at level III1 in October 2004, during a 20-25-minute interview per participant, have been separated in different files according to tasks: informal discussion, comic strip story retelling, interrogative elicitation task and role play.

Here is the list of activities available for each student:
Date ActivityL08L12L14L16L17L18L20L33L38
Nov 2003 Conv + + + + + + + + +
Apr 2004 Story + + - + + - + + +
Oct 2004 Disc + + + + + + + + +
Oct 2004 Inter + + + + + + + + +
Oct 2004 Role + + + + + + + + -
Oct 2004 Story + + + + + + + + +

Some activities are missing either because of technical difficulties during the recording session, or because the student did not complete the activity. The explanations of tasks by the investigator and transitional conversations in between activities are not included in the Talkbank database. An additional activity supposed to elicit negative sentences is not included, as it did not produce the expected result for several students.

The names of the “cha” transcription files (as well as of the corresponding audio files) are coded in the following way: the first two digits indicate the individual code of each participant (08, 12, etc.), the next two digits indicate the year of study (2, 3) followed by the semester (1, 2) of study during which the recording took place, and finally, after an underscore, an indication of the situation of the interview (see labels above) is mentioned: The file 0831_Role.cha for instance, contains the transcription of participant Loc08, completing the role playing activity, during the first semester of the third year of the BA. Each file is linked with an .mp3 audio file of the same name.

Transcription conventions

The transcriptions use standard CHAT conventions (MacWhinney 2000). Please, note:


I particularly acknowledge the help of the UWI, Mona students who contributed their time and enthusiasm to the project. I gratefully acknowledge the help and support of my UWI, Mona colleagues: Françoise Cévaër, and Marie-José N'Zengou-Tayo from the Department of Modern Languages, UWI, Mona, as well as foreign language T.A.s. from 2003 to 2005, Gilles Lubeth, Virginie Busetto, and Karen Drapeau, and of Michele Kennedy from the Department of Linguistics & Philosophy, UWI, Mona, who showed me how to link the audio with the transcripts and discussed many issues with me. The research has been supported by a grant obtained in 2010 from the Faculty of Arts and Social Sciences, UNSW, Australia, to hire research assistants to verify the accuracy of the orthographic transcriptions.