CHILDES Mandarin BJCMC Corpus

>

Ziyin Mai
Department of Linguistics and Modern Languages
Chinese University of Hong Kong

website

Jingyao Liu

City University of Hong Kong

Shanshan Yan
School of Chinese as a Second Language
Peking University

Virginia Yip
Linguistics and Modern Languages
Chinese University of Hong Kong

website

Participants: 48
Type of Study: crossectional, naturalistic
Location: Beijing
Media type: audio
DOI:

Browsable transcripts

Download transcripts

Link to media folder

Citation information

Mai, Z., Shang, M., Liu, J., Yan, S., Matthews, S., & Yip, V. (2024) Acquiring Chinese in US, Hong Kong and Beijing: three new corpora and three verbal structures. Paper presented at the XVIth International Congress for the Study of Child Language (IASCL-2024), July 15-19, Charles University, Prague, Czech Republic.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

The Beijing Child Mandarin Corpus (BJCMC) was constructed to address the absence of systematic documentation of child Mandarin speech in naturalistic contexts at preschool age (3-6 years), and to serve as a monolingual baseline for the Child Heritage Chinese Corpus (CHCC) and the Hong Kong Mandarin-English Child Corpus (HKMECC). Participants (n = 48) were recruited in Beijing, China in 2023, mainly from the Haidian (n = 31) and Chaoyang Districts (n = 9). An effort was made to recruit children from families with mid-high socioeconomic status (SES), which was indexed through the mother’s education level. The children included in the current corpus were raised in Mandarin-dominant households and educated in mainstream Mandarin-medium schools in Beijing; they had not received direct and substantial exposure in other languages (for example, English) or Chinese varieties other than Mandarin at the time of participation. All participants were healthy without suspected or diagnosed language disorders.

For each participant, two research assistants (RAs, native speakers of Mandarin and postgraduate students majoring in Chinese language education at Peking University) made one-off visits to the participants’ home to i) record naturalistic RA-child interaction in Mandarin (30 minutes per session) and ii) administer the Mandarin receptive vocabulary subtest of the Wechsler Preschool and Primary Scale of Intelligence—Fourth Edition (WPPSI-IV; Wechsler, 2013) and if the child had some English input, Peabody Picture Vocabulary Test-Fifth Edition (PPVT-5) (Dunn, 2019). Two participants were excluded in the current corpus due to higher-than-acceptable score in the English vocabulary test (12th percentile) and low score in the Mandarin vocabulary test respectively. During the recording session, one assistant, after a brief warm-up session to establish rapport, interacted with the child in spontaneous playing or book-reading activities and were told to encourage the child to lead the conversation as much as possible; the other assistant recorded the interactions and remained silent in the recording. A Sony video camera was used by the assistants to record the interactions. The vocabulary tests were conducted either before or after the interaction recording.

The current corpus contains a total of 48 children (25 girls) with corresponding transcripts distributed across 16 specific data points (3 children per data point) from 3;0 to 6;9, with 3-month intervals between adjacent points. The recordings were manually transcribed and checked by native speakers of Mandarin trained by the Childhood Bilingualism Research Centre at the Chinese University of Hong Kong, following the conventions established for CHCC and HKMECC. More information of individual participants is presented in this table .

Acknowledgements

We would like to express our gratitude to Brian MacWhinney, Director of CHILDES for his expertise, advice and technical support. We thank the participating families for opening up their homes to us and allowing our students and research assistants to interact with the children and record the interactions.

Special thanks go to the students and Research Assistants who participated in the recording sessions and/or transcribed the speech data: Mengyao Shang, Jiaqi Nie, Yue Cao, Yue Chen, Ranee Cheng, Yingyu Su, Letu Li, Yishan Guan, Chengxi Li, Zihan Wang, Xuan Wang, Yue Li, Yihan Zhao. We gratefully acknowledge the support and help of our lab members and collaborators: Yuqing Liang, Xuening Zhang, Stephen Matthews, Yang Zhao.

The research was supported by the General Research Fund from the Hong Kong Research Grants Council (“Input and experience in early trilingual development” awarded to Ziyin Mai as PI, Project no. 14615820), and a National Social Sciences Project awarded to Shanshan Yan as PI and Ziyin Mai as Co-I (“Acquisition of Modal Expressions by Preschool Mandarin English Bilingual Children”, Project no. 24CYY081).