CHILDES German Szagun Corpus

Gisela Szagun
Institut fur Psychologie
University of Oldenburg
gisela.szagun@gmail.com

Participants:	22
Type of Study:	naturalistic
Location:	Germany
Media type:	audio
DOI:	doi:10.21415/T5KG7T

Citation information

Szagun, G. (2001). Learning different regularities: The acquisition of noun plurals by German-speaking children. First Language, 21, 109-141.

Szagun, G. & Stumper, B. (2012). Age or experience? The influence of age at implantation, social and linguistic environment on language development in children with cochlear implants. Journal of Speech, Language, and Hearing Research, 55, 1640-1654.

Szagun, G. & Schramm, S. A. (2016). Sources of variability in language development of children with cochlear implants: age at implantation, parental language, and early features of children’s language construction. Journal of Child Language, 43, 505-536.

Szagun, G., Stumper, B. Sondag, N. & Franik, M. (2007). The acquisition of gender marking by young German-speaking children: Evidence for learning guided by phonological regularities. Journal of Child Language, 34, 445-471.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

This data set comprises 2 large corpora of German child language: 1) a corpus of 22 typically developing children (TD) with 212 data files. This comprises 212 x 2 hours spontaneous speech of child and adult, i.e. altogether 424 hours; 2) a corpus of 22 deaf children with cochlear implants (CI) with 210 data files. This comprises 210 x 1 ½ hours, i.e. altogether 315 hours. Both studies present longitudinal data of early language development in these two groups. Besides child speech these corpora present a comprehensive sampling of child-directed adult speech.

Naturalistic Setting

Recordings took place during free play sessions in a large playroom at the Department of Psychology, Carl-von-Ossietzky University of Oldenburg, for the TD children, and in a smaller playroom for the CI children at Cochlear Implant Centrum (CIC) Wilhelm Hirte, Hannover. In both location there were varied sets of toys, i.e. cars and a garage and park house, zoo animals, farm animals, forest animals, a school with children and teachers, doll’s house, picture books, puzzles, medical kit, fire-station, shop and other sets. A parent or investigator played with the child.

Typically developing children (TD)

All children were recorded between 1;4 and 2;10, and a subgroup between 1;4 and 3;8. All child and adult speech has been transcribed, i.e. the total of 424 hours. The TD children also served as a control group for the CI children.

Children with cochlear implants (CI)

The 22 children with CI were deaf before onset of language. All the children were implanted before 4 years of age. They were matched with the 22 TD children for initial language level using number of words and MLU. Data points for the CI children were hearing ages. All children were recorded between hearing ages 0;5 and 1;11. After this period data collection continued for all 22 children but for varying lengths of time between hearing ages 2;4 and 3;6. Altogether, there are 210 data files, i.e. a total of 315 hours.

Complete transcriptions

Due to the (unexpected) wealth of data it took several years to supply a complete transcription of all child and adult speech. When these corpora were added to TalkBank initially only child speech but not all adult speech had been transcribed. A previous update in 2023 presented complete transcriptions for TD children. With this final update in the year 2026 complete transcriptions for both groups, TD and CI, are presented. Thus, ALL speech has been transcribed, with the exception of a few data points in the CI corpus. This is marked at the beginning of the respective texts.

Audio files are available for the majority of files and have been linked. Technical problems and missing recordings Some of the early audio files are not available. This is mainly due to the digital recording equipment not being available to us at the start of data collection for the research project. However, technical problems also occurred and led to some loss of audios.

Acknowledgements

The research was funded by Deutsche Forschungsgemeinschaft (DFG) (German Research Foundation) grants Sz 41/5-1 and Sz 41/5-2. The University of Oldenburg invested considerably in making building structures child-safe and suitable.