Aurora Bel Gaya
Department of Translation and Language Sciences
Pompeu Fabra University
|Type of Study:||longitudinal|
Bel, A. & García-Alcaraz, E. (2013) Subjects in the L2 Spanish of Moroccan Arabic speakers: evidence from bilingual and second language learners. T. Judy & S. Perpiñán (eds.) The Acquisition of Spanish as a Second Language: Data from Understudied Languages Pairings. Amsterdam: John Benjamins.
Bel, A. & García-Alcaraz, E., Rosado, E. (forthcoming) Reference comprehension and production in L2 Spanish: the view from null-subject languages. Issues in Hispanic and Lusophone Linguistics. Amsterdam: John Benjamins.
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.
The BCN-L2 Spanish Corpus was collected within a research project supported by two grants to Aurora Bel from the Ministry of Science and Innovation of the Spanish Government (FFI2009-09349 & FFI2012-35058). The project aims at investigating different phenomena at the syntax-pragmatic and syntax-morphology interface in the acquisition of new languages (mainly L2 Catalan and L2 Spanish) in educational contexts.
The corpus consists of 228 spoken and written narrative texts gathered following the procedure designed within the international project Developing Literacy in Different Contexts and Different Languages, P.I.: R. Berman (Berman, 2008). Participants were shown a three-minute silent video displaying scenes of interpersonal conflicts at school, and were then asked to tell and write in Spanish a similar story that happened to a friend. The fact that the participants were asked to tell somebody else’s story necessarily implies the production of third-person referents, as opposed to what happens with personal narratives.
Three research assistants (Júlia Perera, Mònica Tarrés and Estela García-Alcaraz, who also supervised the process) collected the data and transcribed the spoken and written texts. Transcription and assessment of language level was coordinated by Dr. Elisa Rosado.
Data collection was performed during the spring of 2011 and 2012 in different secondary schools in the metropolitan area of Barcelona. Participants are 88 native speakers of Moroccan Arabic (Darija) and 26 speakers of Berber (Amazigh) living in Catalonia. For all the participants Moroccan Arabic or Berber is their family language. In most cases their first contact with Spanish and Catalan (the two environmental languages) coincides with their entry in the Spanish school system (usually at preschool level). In general, they use the family language on a daily basis with family and the environmental languages with friends (for a detailed description of language use patterns and language proficiency, see Bel & García-Alcaraz 2013).
Participants were grouped into four age ranges (as established by the Spanish secondary education system, Enseñanza secundaria obligatoria, ESO). The correspondences between the different systems are shown in table 1 below.
Table 1. Age ranges and grades
|Age range||Spanish grade||US equivalent|
|12-13||1º ESO||7th grade|
|13-14||2º ESO||8th grade|
|14-15||3º ESO||9th grade|
|15-16||4º ESO||10th grade|
Participants were also classified into different levels of proficiency in Spanish. We followed the criteria established by the CEFR (Common European Framework of Reference for Languages, 2001), which divides learners into three levels, which can be further divided into six sublevels:
Table 2. Levels of proficiency in Spanish
|CERF||Level of proficiency|
|A Basic User||A1 Breakthrough or beginner||1|
|A2 Waystage or elementary||2|
|B Independent User||B1 Threshold or intermediate||3|
|B2 Vantage or upper intermediate||4|
|C Proficient User||C1 Effective Operational Proficiency or advanced||5|
|C2 Mastery or proficiency||6|
All participants were assigned a code number to ensure
confidentiality, and this number was used to identify the two files with
the transcription of their oral and written narratives. The filenames
use the following syntax:
Subject number: from 01 to 156
L1 language: dar stands for darija; ber stands for bereber
Age ranges: 1E, 2E, 3E, 4E where E stands for ESO
Text modality: o stands for spoken (oral), e stands for written
For example, a file that is named ‘10ber1Eo.cha’ is an oral text produced by participant number 10, who is a native speaker of Berber from the 1st grade of ESO.
ID headers are arranged as follows:
@Participants: STU Target_Student, INV Investigator
The participants are introduced in the Participants compulsory header with the codes STU (for Student) and INV (for Investigator), and their corresponding role. The information in the ID header for the target student is structured as follows: target language (spa=Spanish), project name (periferias_L2), participant code (STU), age, sex (male or female), participant’s L1 (ber=Berber or ary=Moroccan Arabic), subject number code (as explained above), participant’s role, grade in the Spanish school system (1E, 2E, 3E, 4E, as specified in Table 1) and level of proficiency in Spanish according to the CERF (from 1 to 6, as specified in Table 2).
All the collected texts (spoken and written) are orthographically transcribed following CHAT conventions and segmented into clauses, so that each tier contains a clause (Berman & Slobin 1994). All the transcriptions were checked by a second transcriber to ensure reliability. Other important remarks concerning transcription are listed below: