CHILDES Japanese-English CSLAD Corpus


Yuki Hirose
Language and Information Sciences
University of Tokyo

Theres Grüter
Second Language Studies
University of Hawaii

Participants: 1
Type of Study: longitudinal, naturalistic
Location: Hawaii and Japan
Media type: video
DOI: doi:10.21415/N146-P733

Browsable transcripts

Download transcripts

Link to media folder

Citation information

The (UH-UT) CSLAD corpus is available either through the current CHILDES page or through OSF (https://doi.org/10.17605/OSF.IO/3RZ47). The source data (movie files and annotated texts) are the same across both platforms, but the CHAT transcripts differ. If you intend to use the annotated data, please specify which version you are using and cite it accordingly. If you are only using the video data, it does not matter which platform you cite.

The CHAT transcripts in the CHILDES database include %mor and %gra lines generated by running Universal Dependencies through Batchalign (https://universaldependencies.org/). This state-of-the-art process automatically generates utterance-level bullets and the corresponding %wor lines (which define the portions that can be clicked to playback). No hand-corrections have been made.

When using the version in CHILDES, please cite: Hirose, Y., & Grüter, T. (2025). CHILDES Japanese-English CSLAD Corpus. https://doi.org/10.21415/N146-P733

The CHAT transcripts in OSF include %mor lines originally generated by the MOR function of the CLAN program, which were then hand-corrected for major part-of-speech errors. The bullets were also manually created and are not necessarily aligned by utterance. Further details of the annotation & correction methods can be found here https://osf.io/3rz47/wiki/home/ .

When using the version om OSF, please cite: Hirose, Y., & Grüter, T. (2024). UH-UT Child Second Language Acquisition Database: A collection of Longitudinal case studies (UH-UT CSLAD). https://www.doi.org/10.17605/OSF.IO/3RZ47

Project Description

The UH-UT Child Second Language Acquisition Database (UH-UT CSLAD) was created by Yuki Hirose Lab (Univ. of Tokyo) in collaboration with Theres Grüter (Department of Second Language Studies, University of Hawaii at Manoa). The aim of this database is to examine a child’s second language development longitudinally. The corpus contains spontaneous speech by a Japanese-speaking boy, who temporarily relocated from Japan to the U.S. at age 7 years 10 months with minimal previous experience with English.

Please refer to this link for details: https://osf.io/3rz47/wiki/home

Terms of use

By using the data offered on UH-UT CSLAD, you are agreeing to comply with the following terms of Use. The UH-UT CSLAD Building Team reserves the right to modify these Terms of Use. Such modifications will become effective immediately.

Precautions and Disclaimers

The transcription is done based on the recordings, but the accuracy may be affected due to multiple factors such as the conversation overlapping, sound environment, or the utterance not being clear because it is spoken by a child developing his second language. If in doubt of the transcription accuracy, please refer to the videos.

In terms of the transcription and chat file itself, all team members followed the same instruction manual, however, the annotation may slightly vary by each contributor, resulting in inconsistency between files. For example, if the utterance has a grammatical error, there may be included annotations that edit the utterance to make it grammatically correct, but the degree of including these annotations differs by the contributor.

The UH-UT CSLAD Development Team shall not be liable for any such aforementioned inaccuracy, incompleteness, inadequacy and the unfairness of the data and the information presented in the UH-UT CSLAD.

The data and the information of the UH-UT CSLAD may be changed or modified without any prior notice, and the UH-UT CSLAD Site may be discontinued or closed without any prior notice.