CHILDES English-German MPI-EVA-Leipzig Corpus


Antje Quick
English Department
Leipzig University

Nikolas Koch
German Department
Ludwig-Maximilians-Universität Munich

Stefan Hartmann
German Department
Heinrich-Heine-Universität Düsseldorf

Nina Julich-Warpakowski
English Department
University of Erfurt

Participants: 3
Type of Study: longitudinal case studies
Location: Germany
Media type: --

Browsable transcripts

Download transcripts

Citation information

Quick, Antje Endesfelder, Elena Lieven, Ad Backus & Michael Tomasello. 2018. Constructively combining languages: The use of code-mixing in German-English bilingual child language acquisition. Linguistic Approaches to Bilingualism 8(3). 393–409. https://doi.org/10.1075/lab.17008.qui.

Additional references and publications based on the Fion Corpus:

Quick, Antje Endesfelder, Ad Backus, Elena Lieven. 2021. Entrenchment effects in code-mixing: individual differences in German-English bilingual children. Cognitive Linguistics 32(2), 1-30. 10.1515/cog-2020-0036

Koch, Nikolas, Antje Endesfelder Quick, Stefan Hartmann. 2025. Recycling constructional patterns: The role of chunks in early bilingual acquisition. International Journal of Bilingualism. https://doi.org/10.1177/13670069251346103.

Ibbotson, Paul, Stefan Hartman, Nikolas, Antje Endesfelder Quick. 2024. Frequency, redundancy, and context in bilingual acquisition. Journal of Child Language. Published online 2024:1-15. doi:10.1017/S0305000924000473

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

Longitudinal dense corpus of three German-English bilingual children, Fion, Silvie and Lily (pseudonyms). Recordings were made by the parents with no researcher present. Presently, only the Fion corpus is available, Silvie and Lily are currently in preparation.

FION

The first corpus spans Fion’s age from 2;3 to 3;11, comprising 211 hours of recordings with 53,372 child utterances and 120,511 input utterances, excluding utterances containing unintelligible parts, such as xxx. All utterances including incomplete utterances comprise 108,474 utterances for the child and 184,923 for the input.

Fion is the child of a German-speaking mother and an English-speaking father. Both of Fion’s parents demonstrate relatively high proficiency in their non-native language, and therefore, the family did not settle on a family language, but each parent speaks his or her language during family conversations. Fion also has an older sibling, and occasional code-mixed utterances (<1%) occurred within the family. During his early months, German dominated Fion’s input, as his mother was the primary caregiver and the family resides in Germany. At 19 months, he began attending a German-speaking daycare for about four hours daily, and from age two, he attended a German-speaking kindergarten for six to eight hours per day. Thus, until age three, Fion’s input was predominantly German. However, his exposure shifted after his third birthday when the family spent an extended period in his father’s home country and frequent visits from his monolingual English-speaking grandparents, who spoke no German. This led to a marked increase in English input and a corresponding change in Fion’s language production: while early recordings show predominantly German speech, the later data feature a substantial proportion of English utterances. Fion was recorded in his home environment during natural activities such as play and mealtime.

SILVIE (the corpus is currently being prepared)

LILY (the corpus is currently being prepared)

Transcription

The data was initially transcribed when it was collected in the 2000s by a group of researchers and student assistants in Germany. The transcribers were German monolinguals but highly proficient at the C1-level in English. As part of the present project, the transcripts were revised and edited in line with current CHAT guidelines in order to be uploaded to Talkbank in 2023-2026. This was carried out by a group of researchers and student assistants based in Germany (Nina Julich-Warpakowski, Verena Dederer, Annika Klotz, Asude Kölün, Luca Müller, Maximilian Adolphi, Philippe Sander).

The transcribed sessions usually present a coherent recording session. There are a few transcripts that include different sessions on the same day. We marked the beginning of a new session within one and the same file by the movable header @New episode.

Given that the corpus is bilingual, in the transcriptions, deviation from the default language (usually English for most transcripts) is marked at the beginning of the utterance (if the entire utterance is in German), e.g.

*FAT: you don't like vikings do you ?

*CHI: [- deu] nee [: nein] .

A considerable number of words are similar in English and German (e.g., here / hier). We transcribed these in line with the context, i.e., if the utterance was in English, we used “here”, if it was in German, we used “hier”. If we were not sure, we used a special form marker to indicate that the word may belong to either of the two languages: here@s:eng&deu. In cases where the child (or any of the family members) combined English and German within a word, we used the @s:eng+deu or @s:deu+eng marker, as in these examples: Polizeisheep@s:eng+deu or tickeln@s:eng+deu .

As regards errors, we kept marking them to a minimum. We only indicated cases where we were sure that an agreement error had occurred, e.g.

*CHI: [- deu] ein Rittern [*] ; where ein indicates singular, but Rittern has a plural inflectional ending.

All names of the family members, including nicknames, were replaced by pseudonyms. One and the same pseudonym was used to replace the full as well as the nickname forms of a given person. Sometimes there was worldplay on the original names of the child. We have also replaced such instances with the respective pseudonym. That means that these kinds of wordplay are lost in the transcripts. Although such cases of wordplay are interesting from a linguistic perspective since they reflect the emerging linguistic awareness of the child, we decided that the more important issue was to maintain the child’s and his or her family’s anonymity. We also replaced all names of friends, visitors, or the wider family. In such cases, we did not use fixed pseudonyms, but we replaced them with the generic term “Personname” or “Lastname”. Whenever the family talked about real place names, we also replaced the names with generic terms like Cityname, Addressname, or Streetname. We did not do this when the family spoke more generally about where a certain place was, or when they were talking about sports teams.

The data also include situations in which caregivers read to the child. We marked longer stretches of reading by Bg: book (beginning of reading passage) and Eg: book (end of reading passage).

Acknowledgements

The corpus was prepared for publication as part of the project “Constructional Patterns in bilingual children’s code-mixed utterances. A usage-based corpus study” funded by the German Research Foundation (DFG), project number 504095269.

Thank you, Michael Tomasello and Elena Lieven, for initiating the bilingual corpora. We would also like to thank Brian MacWhinney for his support in the data preparation process, as well as Fion, Silvie, Lily and their families.