CHILDES CHILDES Derived Corpora

This page provides an index to derived corpora and frequency counts that researchers have constructed based on segments of the CHILDES database
Corpus Description
BabySRL Cynthia Fisher, Dan Roth, and Christos Christodoulopoulos contributed this version of the Brown corpus that has been parsed and labelled for semantic roles.
Brent_Ratner Michael Brent at Washington University contributed this corpus derived from the CDS of the CHILDES Bernstein Ratner corpus. It is designed to train an automatic segmenter. The current version of this derived corpus was contributed by Sharon Goldwater.
Gaskins Metaphor This corpus lists and codes the metaphors found in the Lara, Thomas, and MPI-EVA-Manchester English corpora and the Szuman and Weist-Jarosz Polish corpora.
Determiners Counts of the emergence of the determiner category across several CHILDES corpora as analyzed in a forthcoming Psychological Science paper from Meylan, Frank, Roy, and Levy.
Johnson Sesotho Mark Johnson contributed this corpus of CDS (child-directed speech) from the CHILDES Sesotho corpus. The goal of the corpus was to train an automatic segmenter. The available materials include the Python script that can be run on the Sesotho corpus, along with the output in the form of sentences of child directed speech (CDS).
Pearl_Sprouse Lisa Pearl and Jon Sprouse contributed this corpus of Penn TreeBank style parses for selected corpora from the American English segment of the CHILDES database.
UCI_Brent_Syl Lisa Pearl and Lawrence Phillips at UC Irvine contributed this corpus derived from the CDS of the CHILDES Brent corpus. The goal was to train an automatic segmenter.