FFLOC Corpus

Florence Myles
Department of Language and Linguistics
University of Essex


Participants: 60
Type of Study: tasks
Location: UK
Media type: audio
DOI: doi:10.21415/T5NS31

Browsable transcripts

Download transcripts

Media folder

Citation information

Publications using these data should cite:
Myles 2002: Full Report of Research Activities and Results. Linguistic Development in Classroom Learners of French.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

LINGDEV or Linguistic Development in Classroom learners of French: a Cross sectional Study: This directory contains sound files and corresponding transcripts from an ESRC-funded one year project which ran from October 2001 to September 2002 (ESRC grant R000234754). One of its aims was to provide a database of learner language for years, 9, 10 and 11 of secondary education in the UK context. The Project Director was Florence Myles and the other team members were Emma Marsden, Rosamond Mitchell and Sarah Rule.

Three groups of twenty learners in each of years 9, 10 and 11 (i.e. in their 3rd, 4th and 5th year respectively of learning French in the UK educational context; age 13-14, 14-15, 15-16 respectively) in a local secondary school were tested. In the LingDev files, children's ages are given as 13; for Year 9, 14; for Year 10, and 15; for Year 11. The Progression data are from Years 7, 8, and 9 and ages are given as 12, 13, and 14.

A gender-balanced sample from the three different year groups, and containing pupils of all the ability range, as judged by the teachers and the pupils' school grades, was used in the study. The sample is however slightly biased towards the top ability pupils, as they are more likely to show signs of further development. The participants were numbered 1 - 20 for each year group. However as this was a short term cross-sectional study if a cohort pupil was absent then a replacement pupil carried out the task and these were given random numbers between 60 and 90. This ensured that the number of pupils in each year that carried out a particular task was always 20. In selecting and involving informants in the research, the project followed the Recommendations on Good Practice in Applied Linguistics of the British Association of Applied Linguistics (1994) on the responsibility of researchers in respecting the privacy of participants, ensuring confidentiality of personal details and in maintaining openness about the goals of the research.


4 oral tasks were administered to all 60 subjects, on a one-to-one basis with a researcher. The tasks used were the same for all years, in order to enable a comparison of results. Moreover, some of the tasks were the same as those used in the 'Progression Project' (to enable comparisons to be drawn). The tasks were as follows:

All tasks were recorded digitally, and took around 15 minutes each, in a one-to-one situation with a researcher, making a total of around one hour of spoken language per pupil.

Additional Conventions

In this section, we describe some of the general decisions we have taken in the transcribing of French interlanguage oral data, as well as some of the adaptations we have made to the CHILDES system, in the context of L2 data. As will become obvious, many of the decisions were dictated by our research agenda in both the Linguistic Development and the Progression projects, and our choice to use the automatic morphosyntactic parser. And although it means that sometimes, the transcription is somewhat deviant from the actual phonological shape of the words produced by learners, we felt it is not too much of a problem as other researchers interested in e.g. phonology, can listen to the sound files as they read the transcripts, and add their own level of coding. The data has been transcribed orthographically. This is necessary in order to use the French morphosyntactic parser on the completed transcripts, as it will not recognise non-words. There is no extensive coding of errors and overlaps are not marked, since they can be heard in the sound files. Learner utterances have been carefully segmented into distinct utterances, but this has not been done for the researcher.

If a participant exactly repeats the researcher (or another participant in the case of pair tasks), it has been coded as follows:
*32N: [- eng] how do you say he goes?
*ADR: il va
*32N: il@g va@g au cinema
@g is added after every repeated word. @g has been added to the special form marker file sf.cut file in the French MOR program. @g is used to ensure the imitation is not included for analysis by the French morphosyntactic parser, as this could give misleading information about the current grammar of the learner .

In order for the French MOR program to ignore the English we coded whole utterances as follows:
*SAR: [- eng] yes you begin by asking questions
*43P: [- eng] how do you say dog?

Indeterminate Forms

In beginner datasets, it is often difficult to determine which form a learner has intended, as learners often produce something very approximate. There are four examples of this use of indeterminate forms that occur consistently in our data and we coded them as follows:

The Files are labelled in the following way:
Soundfiles: 01L9SAR.wav
Transcriptions: 01L9SAR.cha (01 is the number of the student, L is the task code, 9 is the student's year, SAR is the abbreviation for the researcher)