This page provides some guidelines to researchers and parents interested in creating a
new child language corpus. In many cases, this could be a longitudinal case-study of a single
child in the home. However, these guidelines could also be applied with minimal changes to
a corpus collected from several children.
Recordings can use either audio or video or a mix of the two. Suggestions for
equipment and methods for audio recordings can be found here and suggestions for equipment and methods
for video recording can be found here .
With modern audio equipment and cheap media storage, it is possible to record up to 24 hours a day. However,
you would be hard pressed to transcribe all that material. Given these obvious limitations, it is usually best to record
regularly during periods when the child is maximally active and talkative. Having said that, it is good to have samples across
activities, such as dinner time, bath time, peer play time, book reading time, and game time. The more frequent the recording during these
high activity times the better. For example, the dense corpora collected by the MPI in Leipzig and
Manchester recorded children for two hours each day for a week, but then did no recording for 3 weeks and started
again with the next dense recording week. This gives a good clear snapshot during the week of dense recording, although one then
wonders about what happens during the other weeks. If you are recording with video, it may become impractical to store so much
material, although you can help in this process by compressing to a good .h264 format as you go, or by mixing video with audio
recording. You may want to begin each recording with a statement about the date and where the recording is made.
It is best to keep the names of your transcript and media files simple. If you are studying a single child,
then the best format uses the age as the identifier for each transcript, as in 20112.cha for a session during which the child
was 2;1.12 (two years, one month, 12 days). The corresponding media file should be called 20112.wav or 20112.mp4.
To create transcriptions, you should use the CLAN editor, as described in the CLAN manual downloadable from the
You can rely on the CLAN programs for analysis. In some cases, these programs will help you by sending data to
Excel or R for further analysis.