BELC (Barcelona English Language Corpus)

Carme Muñoz
English Linguistics and Applied Linguistics
University of Barcelona


Participants: 55
Type of Study: interview / storytelling
Location: Spain
Media type: audio
DOI: doi:10.21415/T5S89C
Browsable transcripts
Download transcripts
Media folder

Citation information

Project Description

The Barcelona English Language Corpus (BELC) has its origin in the Barcelona Age Factor (BAF) project. This is a project that examines the effects of age on the acquisition of English as a foreign language.

The BAF Project began at a moment when the changes in the timing of foreign language instruction brought about by a new Education Law were being progressively implemented in both primary and secondary schools around Spain, entailing an earlier introduction of the foreign language in primary education from grade 6 (11 years) to grade 3 (8 years). The replacement of the previous curriculum by the new curriculum took eight years, during which it was possible to find pupils who had begun English instruction at the age of 11, under the previous curriculum, and pupils who had begun English instruction at the age of 8, under the new curriculum. In addition to these central groups, two other age groups were also included in the design of the study, one of adolescents whose initial age of learning English was 14 and one of adults who began instruction in English at the age of 18 or older.

The research on age effects on the learning of English as a foreign language was conducted with students from state schools in Catalonia (Spain). It is important to note that Catalonia is a bilingual community with a majority language, Spanish, known by practically the totality of the population, and a minority language, Catalan, which is the community language and the language of instruction in the state school system in Catalonia. English is the first foreign language in most schools, hence being the third language of school pupils. It is also important to remark that the earlier introduction of the foreign language entailed a decrease in intensity. That is, whereas English had been taught for three hours per week under the former curriculum (beginning in grade 6), at the time of data collection in the new curriculum it was taught for two hours and a half per week on average from grade 3 to grade 10, and for two hours per week in grades 11 and 12. The approximate amount of instruction in English was about 750 hours under the former curriculum, distributed over seven years; and about 800 hours, distributed over ten years, under the new one.

Introduction to the Data

Data were collected at four times: after 200 hours of instruction, 416 hours, 726 hours and 826 hours (Time 1, 2, 3, and 4, respectively) though only one of the groups was available the four times (see Table 1 below). There were 2063 subjects in total, but it should be noted that a number of them had had more hours of instruction, either because of extracurricular exposure or because of retaking a course grade. Pupils with only school exposure (OSE) fulfilled the conditions for comparison. Table 1 below indicates the number of subjects in each group, the age at which they began instruction in English and each group’s mean chronological age at testing.

Table 1. Characteristics of subjects in the study
TimeGroup A
AO = 8
Group B
AO = 11
Group C
AO = 14
Group D
AO = 18+
Time 1
200 h.
A1AT = 10;9
N = 284
OSE = 164
B1AT = 12;9
N = 286
OSE = 107
C1AT= 15,9
N = 40
OSE = 21
D1AT = 28;9
N = 91
OSE = 67
Time 2
416 h.
A2AT = 12;9
N = 278
OSE = 140
B2AT = 14;9
N = 240
OSE = 96
C2AT= 19,1
N = 11
OSE = 4
D2AT = 39;4
N = 44
OSE = 21
Time 3
726 h.
A3AT = 16;9
N = 338
OSE = 71
B3AT = 17;9
N = 296
OSE = 51
Time 4
826 h.
A4AT = 17;9
N = 155
OSE = 71
(AO = age of onset; AT = age at testing; N = number of subjects; OSE = only school exposure)

The data included in BELC correspond to those subjects who could be followed longitudinally and for whom there are two, three or four collection times over a period of seven years, although not all subjects fulfilled all the tasks (See Table 2).


The files in the TalkBank database are taken across the four times and across four tasks. The files are grouped in folders by the tasks. The file names gives first the time (1, 2, 3, 4) then the group (A, B, C), then the task (c, i, n, r), then the subject number (L06, etc).

Written composition. The written composition dealt with a familiar topic: “Me: my past, present and future”. Students were given a set time (15 minutes), the same for everybody. (Younger and less proficient learners did not use up all the time they were given because of their language limitations.)

Oral narrative. The narrative was elicited from a series of six pictures at which the subjects could freely look before and while they were telling the story in the presence of the researcher. In the story there are two main protagonists, a boy and a girl, who are getting ready for a picnic; a secondary character, their mother; and a character that disappears and later reappears, a dog that gets into the food basket and eats the children's sandwiches.

Oral interview. It was a semi-guided interview that began with a series of questions about the subject’s family, daily life and hobbies. This constituted a warming-up phase that helped students feel more at ease. In general, interviewers attempted to elicit as many responses as possible from the learners, and accepted learner-initiated topics in order to create as natural and interactive a situation as possible.

Role-play. The role-play task was performed in randomly chosen pairs. In the role-play one of the students was given the role of the mother/father while the second student was given the role of the son/daughter. The latter had to ask permission to have a party at home and both students were asked to negotiate setting, time, activities (music, eating, drinking), etc. The researcher gave the initial instructions and when needed also elicited talk by reminding learners of topics for discussion or led the task to its completion by asking about the outcome of the negotiation.

Table 2. Spoken tasks performed by BELC longitudinal learners

The main results of the BAF Project so far can be found in the volume Age and the Rate of Foreign Language Learning (see below).

2014 Update: Description of the Subjects

The subjects (N=21) constitute a subsample from a larger on-going project (participants N=232, L1 Spanish and Ls1 Spanish and Catalan), which explores the influence of such independent variables as starting age, cumulative L2 input, frequency of the current contact with an L2, as well as the influence of cognitive abilities (working memory, attention switching capacity and language aptitude) on L2 proficiency and on L2 oral and written performance.

The subsample of the participants that we present here ( N=21; 6 male, 15 female) were undergraduate students, many of them majoring in English, with an intermediate to advanced level of English.

Their average age at first testing was 23.6 (SD 8.3) and the range 18-52.

This group had had at least 6 years of English language learning experience: the average length was 14.2 (SD 8.2; range 6-38 ).

The mean starting age, defined as the beginning of exposure to English as FL (preschool, primary school or secondary school) was 9.84 (SD 3.33) and the range 4-15.

Most of the participants were multilingual, and had been learning an L3 for at least 1 year (mean 2.6, SD 1.2, range 1-5).

2014 Update: Descriptions of the Data

The data that we present here contain the transcriptions of the EFL oral production task with the matching sound files, and EFL written compositions.
N=6 participants performed the oral production task and the written composition twice with 1 year's interval (Time 1 and Time 2).
N=15 participants performed the oral production task and the written composition twice with 2 years' interval (Time 1 and Time 3).

L2 oral production task:
L2 oral production was a video-retelling task elicited with the help of the video prompt (“Alone and Hungry” episode (7 minutes long) from the Charlie Chaplin movie). The subjects watched the whole episode once, then they watched the 1st part of the episode (3.5 minutes approximately) and were asked to retell this part. After that, the subjects watched the 2nd part of the movie, and subsequently did the retelling of the 2nd part. The transcriptions correspond to the retelling of the 1st part of the movie.

L2 written composition:
The written composition dealt with a familiar topic: “My past, present and future expectations”. Students were given 15 minutes to write the task.
The data are organized into 3 main files, which contain 2 sub-files each. The file name gives the type of the data:

