|
Sudha Arunachalam -- New York University sudha@nyu.edu |
| Participants: | 30 |
| Type of Study: | clinical |
| Location: | USA |
| Media type: | audio |
| DOI: | doi:10.21415/VT7P-YF83 |
Steen, K., Buonocore, T., Luyster, R., Sancimino, C., & Arunachalam, S. (2026, March). Introducing the NYU-Emerson Corpus: Transcripts of parent–child interaction with young autistic children [Poster presentation]. Meeting on Language in Autism, Atlanta, GA, United States.
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by the above reference.
Zoom audio: These transcripts were recorded over Zoom in family homes without external microphones. Therefore, sound quality was not always ideal. Unintelligible utterances (xxx) and untranscribable utterances (www) were marked as such, and comment and explanation tiers were used accordingly. Additionally, if the participants moved out of the room/camera frame for extended periods, this lexical material was not transcribed and an explanation tier was added. Privacy: In some instances, sections of the video were clipped and/or blurred to protect participant privacy. These instances were noted in the transcripts in comment or explanation tiers, with timestamps as needed. If audio only was clipped when an identifying name was used, the name was deidentified in the transcript. Further information regarding de-identification/privacy can be found in the Transcripts Conventions section below. Speech sound errors: Given the level of audio quality and the aim of this project, specific speech sound errors were not transcribed. If sound errors were consistent and evident throughout the transcript, the pattern or substitution was noted in the transcript header as a comment. Words were transcribed at the lexical level. Second language use: Second languages were transcribed and marked accordingly to the best of our ability.
This is a growing corpus with the intent to share transcripts and videos of English speaking parent-child interaction with other researchers. The first 30 transcript-video sets of the corpus include autistic children.
Participants contributing data to this corpus are a subset of those from a larger study. A sample of families (US, nationwide) were recruited for remote studies focusing on language learning in children on and off the autism spectrum. Families of autistic children were recruited through online advertisements, a specialized clinical recruitment service, and the SPARK national autism research registry (Feliciano, et al., 2018). Families of non-autistic children were recruited through online advertisements, parent organization emails, and our own research participant databases. Inclusion criteria for autistic children: 36.0 to 71.9 months old; previous medical or educational diagnosis of autism spectrum disorder (ASD), and score of 12 or higher on the Social Communication Questionnaire (Rutter, Bailey, & Lord, 2003). Inclusion for non-autistic children: 24.0-71.9 months old, no previous diagnoses that would affect language and/or cognition, no immediate family members diagnosed with autism, and a score of less than 12 on the Social Communication Questionnaire. Exclusion criteria for both groups: (a) were born before 37 weeks of pregnancy, (b) had uncorrected vision or hearing impairments, (c) were colorblind or (d) heard English less than 70% of the time.
Diagnostic status was confirmed by the study’s licensed psychologist using a multi-step process including review of diagnostic history, parent report measures, and a 15-minute guided parent-child interaction adapted from the CARS-2 (Schopler et al., 2010), which included the following components: free play/conversation, independent pretend play (without parent involvement), pretend play with parent, and sensory/cause-and-effect play. Parents were guided through appropriate toy selection before this session. These parent-child interaction sessions were recorded over Zoom between November 2020 and April 2023 and were transcribed for various projects and inclusion in this corpus. The sample at large includes 461 families (214 autistic, 247 non-autistic). To date, 142 families have consented to share transcripts and the accompanying video. The current published corpus contains 30 transcript-video sets from the autistic sample.
142 families have consented to share transcripts and the accompanying video. In this set, there are 61 autistic and 81 non-autistic children. The current published corpus (February, 2026) includes 30 autistic children. Specific demographic information about each child, including scores from study measures, is listed below and can be found in the linked NYU-Emerson Corpus Demographic & Measures Table. As data is added to the corpus, the table will be updated.
We use the term “parent-child interaction” as all of the play sessions in this corpus at present include a parent as the primary adult. However, in the larger anticipated corpus there are a few instances where a grandparent is the primary adult participant. This is noted in the speaker tier. The vast majority of parent-child interactions were recorded via Zoom at the family’s home. If the session location was not at home, this was noted in the transcript header as a comment. The play location within the building/residence varied based on where the child was most comfortable playing (e.g. floor, table, living room, bedroom, kitchen, etc.) and sometimes changed with sections of the play session based on individual family preference.
Details of the transcription process and CHAT coding are given here
Feliciano, P., Zhou, X., Astrovskaya, I., Chen, J. L., Daniels, A. M., Goin-Kochel, R. P., ... & Chung, W. K. (2018). SPARK: A US cohort of 50,000 families to accelerate autism research. Neuron, 97(3), 488–493. https://doi.org/10.1016/j.neuron.2018.01.015
Fenson, L., Marchman, V. A., Thal, D. J., Dale, P. S., Reznick, J. S., & Bates, E. (2007). MacArthur–Bates Communicative Development Inventories: User’s guide and technical manual (2nd ed.). Brookes Publishing.
Fenson, L., Marchman, V. A., Thal, D. J., Dale, P. S., Reznick, J. S., & Bates, E. (2007). MacArthur–Bates Communicative Development Inventories (2nd ed.). Brookes Publishing.
Fenson, L., Marchman, V. A., Thal, D. J., Dale, P. S., & Reznick, J. S. (2017). MacArthur–Bates Communicative Development Inventories, Third Edition (CDI-III). Brookes Publishing.
Gilliam, J. E. (2014). Gilliam Autism Rating Scale (3rd ed.). Pro-Ed.
Rutter, M., Bailey, A., & Lord, C. (2003). Social Communication Questionnaire (SCQ). Western Psychological Services.
Schopler, E., Van Bourgondien, M. E., Wellman, G. J., & Love, S. R. (2010). Childhood Autism Rating Scale (2nd ed.). Western Psychological Services.
Sparrow, S. S., Cicchetti, D. V., & Saulnier, C. A. (2016). Vineland Adaptive Behavior Scales (3rd ed.). Pearson.
Steen, K., Buonocore, T., Luyster, R., Sancimino, C., & Arunachalam, S. (2026, March). Introducing the NYU-Emerson Corpus: Transcripts of parent–child interaction with young autistic children [Poster presentation]. Meeting on Language in Autism, Atlanta, GA, United States.